**Stat Of Mind**, and kindly contributed to R-bloggers)

Two years ago, the entire NBA season went into lockout because of mostly financial reasons. However, one central point was also about keeping a competitive balance within the NBA, so that large and small-market teams alike would have a chance to compete for a championship. THis brings us to the obvious question “Is there competitive balance in the NBA”? If we define competitive balance by the variety of teams that win a championship then the blunt answer is definite “no”. Under true competitive balance, and assuming 30 teams per season, then a fair league would roughly allow each team to have 1/30 chances of winning the championship during any given season. If we look at the actual distribution of championships across teams from 1980 to now, we can see that this is clearly not the case:

We can use the properties of the multinomial distribution to find the probability of actually observing this distribution under the scenario of a fair league (1/30 chances of winning for each team), which happens to be p =3.812135e-27…

# NBA_finals_txt contains a list of all NBA champions from 1947 to present dat <- read.table(file='NBA_finals_data.txt', sep='\t', header=TRUE) head(dat) Year Lg Champion Runner.Up 1 2014 NBA San Antonio Spurs Miami Heat 2 2013 NBA Miami Heat San Antonio Spurs 3 2012 NBA Miami Heat Oklahoma City Thunder 4 2011 NBA Dallas Mavericks Miami Heat 5 2010 NBA Los Angeles Lakers Boston Celtics 6 2009 NBA Los Angeles Lakers Orlando Magic</code> # restrict analysis from 1980 to present champions <- as.vector(dat[which(dat$Year >= 1980), 'Champion']) champ.freq <- table(champions) # create vector of number of championships won by each team # we assume that there were ~30 active teams per year obs <- c(champ.freq, rep(0, (30-length(champ.freq)))) # compute probability of observing the list of champions we've had from 1980 to present # http://en.wikipedia.org/wiki/Multinomial_distribution nom <- factorial(sum(champ.freq)) denom <- prod(sapply(obs, factorial)) prob <- (nom / denom) * prod(sapply(obs, function(x) (1/30)^x)) prob [1] 3.812135e-27

While we’ve established that the competitive balance in the NBA is skewed towards a subset of teams, we can also attempt to define competitive balance as the separation between individual teams in the league. One way would be to look at playoff appearances over the years, but I focussed instead on closeness of games (mostly because it was more appealling). I have written a Python script to scrape all game scores from 1946 to 2014, which also includes home team information. The data was scraped from the landofbasketball website and dumped in a SQL database. Both scripts and data file can be obtained from my github account.

By defining competitive advantage as the point differential obtained between two teams that play against each other, a smaller overall point differential indicates that teams are of a similar level, and by extension, that the overall league is competitive. In order to do this, we can first look at the total number of points scored per game across the years 1980 to now.

While the 80′s had high scoring games, this gradually decreased from 1987 onwards, reaching its lowest point in the late 90′s. The turn of the century saw an increase in the number of points scored per game. Interestingly, the trend above does not match to the point differential per game, which stays pretty constant throughout the period of 1980-2014.

This indicates that despite changes in scoring trends over the past three decades, the overall game competitiveness of the NBA has remained fairly stable. Next, we can break down the data to the monthly level. In the plot below, we show the total number of points scored during each month of the NBA season from 1980 to 2014. Here, each line represents a month, and each block of line represents a year (omitting July-September when no games are played). As you can see, the total number of points scored tends to be stable during the course of a season, although there is a noticeable drop in points scored in the last two months of the season (May-June), which corresponds to the playoff portion of the season.

Again, we can compare the pattern above to the point differential of each game, as assessed on a monthly basis. There is a little fluctuation over time, although we notice that playoff games (played in the last two months of each season) tend to be a lot closer. Again, this shows that the volume of points scored during different months and years does not impact the overall competitiveness of the league, presumably because teams adapt as a whole to the pace and style of play that occurs during any given period time of time.

Competitive advantage can also be applied to games that are played at home or away. I often hear players, coaches and experts talk of the benefits of home-court advantage, and how the fans can really inspire the home team to a victory. Here, we can visualize the average point differential for teams when they play at home or away.

**Average point differential for teams playing at home**

**Average point differential for teams playing away from home**

Finally, we can digress a little and visualize the average number of points scored by each team during the period of 1980 to 2014

**leave a comment**for the author, please follow the link and comment on their blog:

**Stat Of Mind**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...