**Data Science Riot!**, and kindly contributed to R-bloggers)

Traditionally, statistics like wOBA (weighted on-base average) have been calculated using league averages. While building the `baseballDBR`

package, I thought it would be interesting to group the American and National when making wOBA calculations. In theory, there should be parity across the two leagues, but that is not always the case.

In order to calculate wOBA values for each league, the `baseballDBR`

package uses a ported version of Tom Tango’s SQL incantation to calculate wOBA using the Baseball Databank. While Tango admits, this calculation is not perfect, it normally has a plus/minus of less than one one-thousandth of a percent compared to Fangraphs’ values.

## Gathering wOBA Modifiers by League

## Plot Leauge wOBA Vales by Year

The plot shows the modern parity we expected. It also shows the affects of the “dead ball era” prior to 1920. However, what is interesting is the increase in `league wOBA`

between 1920 and 1930 in the American League It should be mentioned, the stat “league wOBA” here is the average OBP, or on-base percentage for each league.

## Using OBP to Find Outliers

There were obviously players in the American League between 1920 and 1930 that were blowing the curve, performing well above average. We can drill down deeper to find out exactly who those players were. Note, since the `league wOBA`

represents a league average `OBP`

, we will use OBP instead of wOBA to find our outliers.

There you have it, George Herman Ruth, blowing the curve once again!

**leave a comment**for the author, please follow the link and comment on their blog:

**Data Science Riot!**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...