Analyzing Professional Sports Team Colors with R, Part 2

[This article was first published on r on Tony ElHabr, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

NOTE: This write-up picks up where the previous
off. All of the session data is carried over.

Color Similarity

Now, I’d like to evaluate color similarity more closely. To help verify
any quantitative deductions with some intuition, I’ll consider only a
single league for this–the NBA, the league that I know the best.

Because I’ll end up plotting team names at some point and some of the
full names are relatively lengthy, I want to get the official
abbreviations for each team. Unfortunately, these don’t come with the
teamcolor package, but I
can use Alex Bresler’s nbastatR
to get them.

# Assign `df_dict_nba_teams` to Global environment.
nms_nba <-
  teamcolors::teamcolors %>% 
  filter(league == "nba") %>% 
    df_dict_nba_teams %>%
      setNames(snakecase::to_snake_case(names(.))) %>%
      filter(!is_non_nba_team) %>% 
      select(name = name_team, slug = slug_team),
    by = c("name")

colors_tidy_ord2_nba <-
  nms_nba %>% 
  select(name, league, slug) %>% 
  inner_join(colors_tidy_ord2, by = c("name", "league"))

To give the unfamiliar reader a better understanding of what exactly
this subset of the teamcolors data incorporate, here’s a visualization
of the primary and secondary colors of all NBA teams

After grabbing the abbreviations (or slugs), I can move on to breaking
up the hex values into their RGB components. 1
I’ll be looking at only the primary and secondary colors

colors_ord2_nba_rgb_tidy <-
  colors_tidy_ord2_nba %>%
  add_rgb_cols() %>% 
  select(-hex) %>%
  tidyr::gather(rgb, value, red, green, blue)

colors_ord2_nba_rgb_tidy %>% 
name league slug ord rgb value
Atlanta Hawks nba ATL primary red 225
Atlanta Hawks nba ATL secondary red 196
Boston Celtics nba BOS primary red 0
Boston Celtics nba BOS secondary red 187
Brooklyn Nets nba BKN primary red 6
Brooklyn Nets nba BKN secondary red 6
Charlotte Hornets nba CHA primary red 29
Charlotte Hornets nba CHA secondary red 0
Chicago Bulls nba CHI primary red 206
Chicago Bulls nba CHI secondary red 6
1 # of total rows: 180

With the RGB values extracted, I can use the widyr::pairwise_dist()
function to compute the relative distance among teams in terms of RGB
values for each color ordinality.I think the default method–“Euclidean”

do_pairwise_dist <- function(data, method) {
  data %>% 
    group_by(ord) %>% 
    widyr::pairwise_dist(name, rgb, value, upper = TRUE, method = method) %>% 
    rename(name1 = item1, name2 = item2) %>% 
    select(everything(), value = ncol(.)) %>% 
    arrange(value, .by_group = TRUE) %>% 

As one might expect, there’s not much difference between these two
distance methods (if correlation is deemed a valid metric for
quantifying similarity).

How exactly do all of the individual distances compare?

I think that the above plot does a good job of highlighting the average
distance values (in terms of RGB) of each team. Additionally, by sorting
the teams by value, it illustrates exactly which teams are the most
“generic” (i.e. most similar to all other teams) and the most “unique”
(i.e. least similar to all other teams.)

I can also use a heat map to visualize the same data (Who doesn’t like a
good heat map?)

Like with the previous plot, I order the teams on each axis by total
distance from all other teams–teams with the highest cumulative
similarity to all other teams appear towards the bottom and left, while
teams that contrast most with all others appear towards the top and
right. And, to add some nuance, I emphasize the individual pairs that
have the highest and lowest similarity with different colors.

Exactly which teams match most and least closely with one another (in
terms of color similarity)? Here’s a list of the top and bottom matches
for each team.

rank_overall name1 name2 dist
1 Sacramento Kings Memphis Grizzlies 173
1 Sacramento Kings Indiana Pacers 399
2 Memphis Grizzlies Sacramento Kings 173
2 Memphis Grizzlies Indiana Pacers 463
3 Boston Celtics Utah Jazz 174
3 Boston Celtics Indiana Pacers 483
4 Portland Trail Blazers Houston Rockets 63
4 Portland Trail Blazers Brooklyn Nets 521
5 Charlotte Hornets Minnesota Timberwolves 171
5 Charlotte Hornets Atlanta Hawks 472
6 Cleveland Cavaliers Miami Heat 55
6 Cleveland Cavaliers San Antonio Spurs 544
7 Houston Rockets Portland Trail Blazers 63
7 Houston Rockets San Antonio Spurs 541
8 Miami Heat Cleveland Cavaliers 55
8 Miami Heat San Antonio Spurs 529
9 Detroit Pistons Los Angeles Clippers 0
9 Detroit Pistons Oklahoma City Thunder 559
10 Los Angeles Clippers Detroit Pistons 0
10 Los Angeles Clippers Oklahoma City Thunder 559
11 Philadelphia 76ers Detroit Pistons 0
11 Philadelphia 76ers Oklahoma City Thunder 559
12 Utah Jazz New Orleans Pelicans 141
12 Utah Jazz Golden State Warriors 593
13 Minnesota Timberwolves Charlotte Hornets 171
13 Minnesota Timberwolves Houston Rockets 464
14 Chicago Bulls Toronto Raptors 0
14 Chicago Bulls Orlando Magic 584
15 Toronto Raptors Chicago Bulls 0
15 Toronto Raptors Orlando Magic 584
16 Phoenix Suns Indiana Pacers 143
16 Phoenix Suns Orlando Magic 561
17 New Orleans Pelicans Washington Wizards 0
17 New Orleans Pelicans Golden State Warriors 568
18 Washington Wizards New Orleans Pelicans 0
18 Washington Wizards Golden State Warriors 568
19 Atlanta Hawks Miami Heat 175
19 Atlanta Hawks Brooklyn Nets 493
20 New York Knicks Oklahoma City Thunder 75
20 New York Knicks Golden State Warriors 586
21 Oklahoma City Thunder New York Knicks 75
21 Oklahoma City Thunder Golden State Warriors 578
22 Denver Nuggets New York Knicks 142
22 Denver Nuggets Golden State Warriors 546
23 Brooklyn Nets Chicago Bulls 203
23 Brooklyn Nets Portland Trail Blazers 521
24 Los Angeles Lakers Indiana Pacers 111
24 Los Angeles Lakers Milwaukee Bucks 542
25 Dallas Mavericks Orlando Magic 0
25 Dallas Mavericks Indiana Pacers 586
26 Orlando Magic Dallas Mavericks 0
26 Orlando Magic Indiana Pacers 586
27 Golden State Warriors Los Angeles Lakers 122
27 Golden State Warriors Utah Jazz 593
28 Milwaukee Bucks Dallas Mavericks 231
28 Milwaukee Bucks San Antonio Spurs 644
29 Indiana Pacers Los Angeles Lakers 111
29 Indiana Pacers Milwaukee Bucks 617
30 San Antonio Spurs Chicago Bulls 225
30 San Antonio Spurs Milwaukee Bucks 644

These results don’t really agree with what I–and maybe other NBA
fans–would have guessed. The Sacramento Kings (SAC) have purple as
their primary color, which is relatively unusual. I would think that
they would be in the lower half of these rankings. Whats going on? …

Color Theory

When doing this color-based analysis, several questions came to mind:

  1. Is the RGB model
    really the best framework to use for comparing colors? What about
    the HSL (Hue,
    Saturation, Lightness) model? Additionally, a quick Google search
    for “What is the best method for identifying similarity between

    indicates the YUV
    –a model I hadn’t
    heard of before–is best, (if human perception is the main concern).

  2. Is Euclidean distance the best “distance” method to use? But,
    because I’m curious, I’ll look at how different the results would be
    if the “Manhattan”
    is used

  3. Is “distance” even the best method for determining color similarity.
    Why not a “similarity” metric (such as cosine

Since I’m not expert in color models, and because I there is no
definitive/conclusive research with which I can cross-check my findings
for color similarity among NBA teams, I think its worthwhile to explore
these questions in more detail. First, I’ll need to create HSL and YUV
variations of the color data that I can compare to the RGB version that
I’ve used up to this point. (This will help me answer the first
question.) 2 Then, with each of these
data sets in hand, I’ll tackle the latter two questions directly. In the
end, by comparing the different models with different methods, I hope to
come to some stronger conclusions and/or justifications of my findings
about NBA team colors.

class=“section level3”>

Euclidean Distance vs. Manhattan Distance

I’ll look at two distance methods–Euclidean and Manhattan–to justify my
choice of Euclidean distance before. To do this, I want to verify that
the similarity determined by the two methods is nearly identical. (I
would be surprised if they aren’t.)

rgb_euclidean rgb_manhattan
NA 97.16
97.16 NA
hsl_euclidean hsl_manhattan
NA 97.62
97.62 NA
yuv_euclidean yuv_manhattan
NA 96.26
96.26 NA

Indeed, it looks like there is high correlation found between the
Euclidean and Manhattan distances calculated when the hex color values
are broken down into color components, regardless of whether the RGB,
HSL, or YUV representation is used.

Now, when keeping the distance method constant (Euclidean), how do the
color models compare?

rowname rgb_dist hsl_dist yuv_dist rgb_dist NA 62.53 97.88 hsl_dist 62.53 NA 68.12 yuv_dist 97.88 68.12 NA

The numbers indicate that there is some strong positive correlation,
especially between the RGB and YUV color schemas. This indicates that
the conclusions that I came to regarding most similar and dissimilar NBA
team colors would not be much different if using the HSL or YUV models
instead of the RGB model.

Distance vs. Similarity

To compare distance (Euclidean) with cosine similarity, I can create and
use a similar set of functions to those used for comparing distance
methods. To visualize the results in an interpretable manner, I can use
the network_plot() function from
Dr. Simon’s corrr
. This function is cool for
visualizing correlation data in a way other than with a traditional
correlation matrix. 3

It’s clear that the RGB and YUV schemas are fairly similar “within” both
metrics–Euclidean distance and cosine similarity–and both are relatively
dissimilar to HSL. However, all three color models show negative
correlations “within” themselves when comparing the two metrics against
one another. (i.e. The RGB schema has a negative correlation when
comparing its distance values to its similarity values, and likewise for
the HSL and YUV models.)

So, which color model and which metric should be used? In my
opinion, the RGB model seems like a good choice, both because it is
relatively similar to at least one other method (YUV) and because it is
(probably) the most relatable scheme to people who don’t know much about
color theory. For metric, I think that the choice of Euclidean distance
is valid. My Google search (which makes the case for YUV) makes the
assumption that Euclidean distance is being used. Additionally, a
separate Google search for “euclidean distance vs. cosine similarity”
turns up an easy-to-follow technical
that implies
that cosine similarity is probably not really appropriate for this kind
of color analysis.


That’s all I got for this topic. I hope that the techniques shown here
are general enough that they can be applied to any set of color to
extract some fun (and meaningful) insight.

  1. I use the same method as the one I used before.
  2. I don’t show the code for this, so check out the .Rmd document for detail.
  3. I could have actually used this same function to visualize the various distance methods all in one plot.

To leave a comment for the author, please follow the link and comment on their blog: r on Tony ElHabr. offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)