American vs. National League wOBA Vales.

[This article was first published on Data Science Riot!, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Traditionally, statistics like wOBA (weighted on-base average) have been calculated using league averages. While building the baseballDBR package, I thought it would be interesting to group the American and National when making wOBA calculations. In theory, there should be parity across the two leagues, but that is not always the case.

In order to calculate wOBA values for each league, the baseballDBR package uses a ported version of Tom Tango’s SQL incantation to calculate wOBA using the Baseball Databank. While Tango admits, this calculation is not perfect, it normally has a plus/minus of less than one one-thousandth of a percent compared to Fangraphs’ values.

Gathering wOBA Modifiers by League

install.packages("baseballDBR")
library(baseballDBR)
# Load data from Baseball Databank
get_bbdb(table = c("Batting", "Pitching", "Fielding"))

# Run wOBA values for seperate leagues.
# Subset for 1900, when the AL was created. Exclude the Federal League.
w_vals <- wOBA_values(BattingTable = Batting, FieldingTable = Fielding, PitchingTable = Pitching, Sep.Leagues = TRUE) %>%
    subset(yearID >= 1900 & lgID != "FL", select = c("yearID", "lgID", "lg_woba"))
head(w_vals)
## # A tibble: 6 x 3
##   yearID  lgID   lg_woba
##    <int> <chr>     <dbl>
## 1   1900    NL 0.3383157
## 2   1901    AL 0.3281682
## 3   1901    NL 0.3203771
## 4   1902    AL 0.3279358
## 5   1902    NL 0.3085920
## 6   1903    AL 0.3002770

Plot Leauge wOBA Vales by Year

library(ggplot2)
ggplot(w_vals, aes(x=yearID, y=lg_woba, color=lgID)) + geom_point(shape=1) +
    scale_colour_hue(l=50) + geom_smooth() 

plot of chunk unnamed-chunk-4

The plot shows the modern parity we expected. It also shows the effects of the “dead ball era” prior to 1920. However, what is interesting is the increase in. league wOBA between 1920 and 1930 in the American League. It should be mentioned, the stat “league wOBA” is the average on-base percentage (OBP) for each league.

Using OBP to Find Outliers

There were obviously players in the American League between 1920 and 1930 that were blowing the curve, performing above average. We can drill down deeper to find out exactly who those players were. Note, since the league wOBA represents a league average OBP, we will use OBP instead of wOBA to find our outliers.

# Set NA to zero because sacrifice fly was not a tracked stat prior to 1954, but were still counted as at-bats and in the OBP calculation.
Batting$SF[is.na(Batting$SF)] <- 0

# Add OBP to data frame.
Batting$OBP <- OBP(Batting)

# Find the outliers. Subset for more than 200 at-bats to exclude part-time players.
al_outliers <- subset(Batting, OBP >= .5 & lgID == "AL" & yearID >= 1920 & yearID <= 1930 & AB >= 200)
al_outliers
##       playerID yearID stint teamID lgID   G  AB   R   H X2B X3B HR RBI SB CS  BB SO IBB HBP SH SF GIDP   OBP
## 18520 ruthba01   1920     1    NYA   AL 142 457 158 172  36   9 54 137 14 14 150 80  NA   3  5  0   NA 0.533
## 19040 ruthba01   1921     1    NYA   AL 152 540 177 204  44  16 59 171 17 13 145 81  NA   4  4  0   NA 0.512
## 20097 ruthba01   1923     1    NYA   AL 152 522 151 205  45  13 41 131 17 21 170 93  NA   4  3  0   NA 0.545
## 20637 ruthba01   1924     1    NYA   AL 153 529 143 200  39   7 46 121  9 13 142 81  NA   4  6  0   NA 0.513
## 21719 ruthba01   1926     1    NYA   AL 152 495 139 184  30   5 47 150 11  9 144 76  NA   3 10  0   NA 0.516

There you have it, George Herman Ruth, blowing the curve once again!

To leave a comment for the author, please follow the link and comment on their blog: Data Science Riot!.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)