Analyzing Professional Sports Team Colors with R, Part 2

[This article was first published on r on Tony ElHabr, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

NOTE: This write-up picks up where the previous one left off. All of the session data is carried over.

Color Similarity

Now, I’d like to evaluate color similarity more closely. To help verify any quantitative deductions with some intuition, I’ll consider only a single league for this–the NBA, the league that I know the best.

Because I’ll end up plotting team names at some point and some of the full names are relatively lengthy, I want to get the official abbreviations for each team. Unfortunately, these don’t come with the teamcolor package, but I can use Alex Bresler’s nbastatR package to get them.

# Assign `df_dict_nba_teams` to Global environment.
nbastatR::assign_nba_teams()
nms_nba <-
  teamcolors::teamcolors %>% 
  filter(league == "nba") %>% 
  inner_join(
    df_dict_nba_teams %>%
      setNames(snakecase::to_snake_case(names(.))) %>%
      filter(!is_non_nba_team) %>% 
      select(name = name_team, slug = slug_team),
    by = c("name")
  )

colors_tidy_ord2_nba <-
  nms_nba %>% 
  select(name, league, slug) %>% 
  inner_join(colors_tidy_ord2, by = c("name", "league"))

To give the unfamiliar reader a better understanding of what exactly this subset of the teamcolors data incorporate, here’s a visualization of the primary and secondary colors of all NBA teams

After grabbing the abbreviations (or slugs), I can move on to breaking up the hex values into their RGB components. 1 I’ll be looking at only the primary and secondary colors again.

colors_ord2_nba_rgb_tidy <-
  colors_tidy_ord2_nba %>%
  add_rgb_cols() %>% 
  select(-hex) %>%
  tidyr::gather(rgb, value, red, green, blue)

colors_ord2_nba_rgb_tidy %>% 
  create_kable()
name league slug ord rgb value
Atlanta Hawks nba ATL primary red 225
Atlanta Hawks nba ATL secondary red 196
Boston Celtics nba BOS primary red 0
Boston Celtics nba BOS secondary red 187
Brooklyn Nets nba BKN primary red 6
Brooklyn Nets nba BKN secondary red 6
Charlotte Hornets nba CHA primary red 29
Charlotte Hornets nba CHA secondary red 0
Chicago Bulls nba CHI primary red 206
Chicago Bulls nba CHI secondary red 6
1 # of total rows: 180

With the RGB values extracted, I can use the widyr::pairwise_dist() function to compute the relative distance among teams in terms of RGB values for each color ordinality.I think the default method–“Euclidean” distance–is reasonable.

do_pairwise_dist <- function(data, method) {
  data %>% 
    group_by(ord) %>% 
    widyr::pairwise_dist(name, rgb, value, upper = TRUE, method = method) %>% 
    rename(name1 = item1, name2 = item2) %>% 
    select(everything(), value = ncol(.)) %>% 
    arrange(value, .by_group = TRUE) %>% 
    ungroup()
}

As one might expect, there’s not much difference between these two distance methods (if correlation is deemed a valid metric for quantifying similarity).

How exactly do all of the individual distances compare?

I think that the above plot does a good job of highlighting the average distance values (in terms of RGB) of each team. Additionally, by sorting the teams by value, it illustrates exactly which teams are the most “generic” (i.e. most similar to all other teams) and the most “unique” (i.e. least similar to all other teams.)

I can also use a heat map to visualize the same data (Who doesn’t like a good heat map?)

Like with the previous plot, I order the teams on each axis by total distance from all other teams–teams with the highest cumulative similarity to all other teams appear towards the bottom and left, while teams that contrast most with all others appear towards the top and right. And, to add some nuance, I emphasize the individual pairs that have the highest and lowest similarity with different colors.

Exactly which teams match most and least closely with one another (in terms of color similarity)? Here’s a list of the top and bottom matches for each team.

rank_overall name1 name2 dist
1 Sacramento Kings Memphis Grizzlies 173
1 Sacramento Kings Indiana Pacers 399
2 Memphis Grizzlies Sacramento Kings 173
2 Memphis Grizzlies Indiana Pacers 463
3 Boston Celtics Utah Jazz 174
3 Boston Celtics Indiana Pacers 483
4 Portland Trail Blazers Houston Rockets 63
4 Portland Trail Blazers Brooklyn Nets 521
5 Charlotte Hornets Minnesota Timberwolves 171
5 Charlotte Hornets Atlanta Hawks 472
6 Cleveland Cavaliers Miami Heat 55
6 Cleveland Cavaliers San Antonio Spurs 544
7 Houston Rockets Portland Trail Blazers 63
7 Houston Rockets San Antonio Spurs 541
8 Miami Heat Cleveland Cavaliers 55
8 Miami Heat San Antonio Spurs 529
9 Detroit Pistons Los Angeles Clippers 0
9 Detroit Pistons Oklahoma City Thunder 559
10 Los Angeles Clippers Detroit Pistons 0
10 Los Angeles Clippers Oklahoma City Thunder 559
11 Philadelphia 76ers Detroit Pistons 0
11 Philadelphia 76ers Oklahoma City Thunder 559
12 Utah Jazz New Orleans Pelicans 141
12 Utah Jazz Golden State Warriors 593
13 Minnesota Timberwolves Charlotte Hornets 171
13 Minnesota Timberwolves Houston Rockets 464
14 Chicago Bulls Toronto Raptors 0
14 Chicago Bulls Orlando Magic 584
15 Toronto Raptors Chicago Bulls 0
15 Toronto Raptors Orlando Magic 584
16 Phoenix Suns Indiana Pacers 143
16 Phoenix Suns Orlando Magic 561
17 New Orleans Pelicans Washington Wizards 0
17 New Orleans Pelicans Golden State Warriors 568
18 Washington Wizards New Orleans Pelicans 0
18 Washington Wizards Golden State Warriors 568
19 Atlanta Hawks Miami Heat 175
19 Atlanta Hawks Brooklyn Nets 493
20 New York Knicks Oklahoma City Thunder 75
20 New York Knicks Golden State Warriors 586
21 Oklahoma City Thunder New York Knicks 75
21 Oklahoma City Thunder Golden State Warriors 578
22 Denver Nuggets New York Knicks 142
22 Denver Nuggets Golden State Warriors 546
23 Brooklyn Nets Chicago Bulls 203
23 Brooklyn Nets Portland Trail Blazers 521
24 Los Angeles Lakers Indiana Pacers 111
24 Los Angeles Lakers Milwaukee Bucks 542
25 Dallas Mavericks Orlando Magic 0
25 Dallas Mavericks Indiana Pacers 586
26 Orlando Magic Dallas Mavericks 0
26 Orlando Magic Indiana Pacers 586
27 Golden State Warriors Los Angeles Lakers 122
27 Golden State Warriors Utah Jazz 593
28 Milwaukee Bucks Dallas Mavericks 231
28 Milwaukee Bucks San Antonio Spurs 644
29 Indiana Pacers Los Angeles Lakers 111
29 Indiana Pacers Milwaukee Bucks 617
30 San Antonio Spurs Chicago Bulls 225
30 San Antonio Spurs Milwaukee Bucks 644

These results don’t really agree with what I–and maybe other NBA fans–would have guessed. The Sacramento Kings (SAC) have purple as their primary color, which is relatively unusual. I would think that they would be in the lower half of these rankings. Whats going on? …

Color Theory

When doing this color-based analysis, several questions came to mind:

  1. Is the RGB model really the best framework to use for comparing colors? What about the HSL (Hue, Saturation, Lightness) model? Additionally, a quick Google search for “What is the best method for identifying similarity between colors?” indicates the YUV representation–a model I hadn’t heard of before–is best, (if human perception is the main concern).

  2. Is Euclidean distance the best “distance” method to use? But, because I’m curious, I’ll look at how different the results would be if the “Manhattan” distance is used instead.

  3. Is “distance” even the best method for determining color similarity. Why not a “similarity” metric (such as cosine similarity)?

Since I’m not expert in color models, and because I there is no definitive/conclusive research with which I can cross-check my findings for color similarity among NBA teams, I think its worthwhile to explore these questions in more detail. First, I’ll need to create HSL and YUV variations of the color data that I can compare to the RGB version that I’ve used up to this point. (This will help me answer the first question.) 2 Then, with each of these data sets in hand, I’ll tackle the latter two questions directly. In the end, by comparing the different models with different methods, I hope to come to some stronger conclusions and/or justifications of my findings about NBA team colors.

class=“section level3”>

Euclidean Distance vs. Manhattan Distance

I’ll look at two distance methods–Euclidean and Manhattan–to justify my choice of Euclidean distance before. To do this, I want to verify that the similarity determined by the two methods is nearly identical. (I would be surprised if they aren’t.)

rgb_euclidean rgb_manhattan
NA 97.16
97.16 NA
hsl_euclidean hsl_manhattan
NA 97.62
97.62 NA
yuv_euclidean yuv_manhattan
NA 96.26
96.26 NA

Indeed, it looks like there is high correlation found between the Euclidean and Manhattan distances calculated when the hex color values are broken down into color components, regardless of whether the RGB, HSL, or YUV representation is used.

Now, when keeping the distance method constant (Euclidean), how do the color models compare?

rowname rgb_dist hsl_dist yuv_dist rgb_dist NA 62.53 97.88 hsl_dist 62.53 NA 68.12 yuv_dist 97.88 68.12 NA

The numbers indicate that there is some strong positive correlation, especially between the RGB and YUV color schemas. This indicates that the conclusions that I came to regarding most similar and dissimilar NBA team colors would not be much different if using the HSL or YUV models instead of the RGB model.

Distance vs. Similarity

To compare distance (Euclidean) with cosine similarity, I can create and use a similar set of functions to those used for comparing distance methods. To visualize the results in an interpretable manner, I can use the network_plot() function from Dr. Simon’s corrr package`. This function is cool for visualizing correlation data in a way other than with a traditional correlation matrix. 3

It’s clear that the RGB and YUV schemas are fairly similar “within” both metrics–Euclidean distance and cosine similarity–and both are relatively dissimilar to HSL. However, all three color models show negative correlations “within” themselves when comparing the two metrics against one another. (i.e. The RGB schema has a negative correlation when comparing its distance values to its similarity values, and likewise for the HSL and YUV models.)

So, which color model and which metric should be used? In my opinion, the RGB model seems like a good choice, both because it is relatively similar to at least one other method (YUV) and because it is (probably) the most relatable scheme to people who don’t know much about color theory. For metric, I think that the choice of Euclidean distance is valid. My Google search (which makes the case for YUV) makes the assumption that Euclidean distance is being used. Additionally, a separate Google search for “euclidean distance vs. cosine similarity” turns up an easy-to-follow technical write-up that implies that cosine similarity is probably not really appropriate for this kind of color analysis.

Conclusion

That’s all I got for this topic. I hope that the techniques shown here are general enough that they can be applied to any set of color to extract some fun (and meaningful) insight.



  1. I use the same method as the one I used before. ^
  2. I don’t show the code for this, so check out the .Rmd document for detail. ^
  3. I could have actually used this same function to visualize the various distance methods all in one plot. ^

To leave a comment for the author, please follow the link and comment on their blog: r on Tony ElHabr.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)