Hacking The New Lahman Package 4.0-1 with R-Studio

September 30, 2015
By

(This article was first published on R Tricks – Data Science Riot!, and kindly contributed to R-bloggers)

The developers of the Lahman package for R have recently updated the package to include 2014 MLB stats! For those not familiar, this R package recreates Sean Lahman’s Baseball Database into a quick and handy little R package.

I’ve written on the Lahman package before, and even suggested adding a few advanced statistics to the battingStats() function and adding a pitchingStats() function. If you’re still reading, I’m going to assume you’ve already got an instance of Lahman already running on a rational database. So why should you care?

  • Speed: The R package is able to pull up the tables into R dataframes quickly and without the need for a database connection.
  • Easy of Use: Remember, a good programmer is a lazy programmer!
  • Reproducibility: You can easily modify functions as well as add your own and have them ready to go in your R environment.

Install from CRAN

install.packages("Lahman")

Pump up the battingStats function
If you’re brave, you can edit some of the functions once you’ve got the package installed.

battingStats <- edit(battingStats)

The new version offers more batting stats than last year’s, including OBP, SLG and OPS. In the script below I added ISO to the mix!

function (data = Lahman::Batting, idvars = c("playerID", "yearID",
"stint", "teamID", "lgID"), cbind = TRUE)
{
NA2zero <- function(x) {
x[is.na(x)] <- 0
x
}
AB <- R <- H <- X2B <- X3B <- HR <- RBI <- SH <- BB <- HBP <- SF <- TB <- PA <- OBP <- SlugPct <- SO <- ISO <- NA
vars <- c("AB", "R", "H", "X2B", "X3B", "HR", "RBI", "SB",
"CS", "BB", "SO", "IBB", "HBP", "SH", "SF", "GIDP")
d2 <- apply(data[, vars], 2, NA2zero)
d2 <- if (is.vector(d2)) {
as.data.frame(as.list(d2))
}
else {
as.data.frame(d2)
}
d2 <- plyr::mutate(d2,
BA = ifelse(AB > 0, round(H/AB, 3), NA),
PA = AB + BB + HBP + SH + SF, TB = H + X2B + 2 * X3B + 3 * HR,
SlugPct = ifelse(AB > 0, round(TB/AB, 3), NA),
OBP = ifelse(PA > 0, round((H + BB + HBP)/(PA - SH), 3), NA),
OPS = round(OBP + SlugPct, 3),
BABIP = ifelse(AB > 0, round((H - HR)/(AB - SO - HR + SF), 3), NA),
ISO = round((X2B + (2 * X3B) + (3 * HR) / AB), 3)
)
d2 <- d2[, (length(vars) + 1):ncol(d2)]
if (cbind)
data.frame(data, d2)
else data.frame(data[, idvars], d2)
}

Add pitchingStats function
For some reason, this package still doesn’t include advanced pitching stats. Luckily, R users can go right ahead and define their own functions like the only below.

pitchingStats <- function(data=Lahman::Pitching,
idvars=c("playerID","yearID","stint","teamID","lgID"),
cbind=TRUE) {
require('plyr')
NA2zero <- function(x) {
# Takes a column vector and replaces NAs with zeros
x[is.na(x)] <- 0
x
}
W <- L <- G <- GS <- CG <- SHO <- SV <- IPouts <- H <- ER <- HR <- BB <- SO <- BAOpp <- ER <- IBB <-WP <- HBP <- BK <- BFP <-GF <-R <- SH <- SF <- GIDP <-IP <-WHIP <-BABIP <-K_9 <-BB_9 <-Kpct <-BBpct <- NA
# Set needed variables for calculations
vars <- c('IPouts', 'BB', 'H', 'HR', 'BFP', 'SO')
d3 <- apply(data[, vars], 2, NA2zero)
d3 <- if(is.vector(d3)) {as.data.frame(as.list(d3)) } else {
as.data.frame(d3) }
d3 <- plyr::mutate(d3,
IP = IPouts / 3,
WHIP = ifelse(IP > 0, round((BB+H) / IP, 3), NA),
BABIP = ifelse(IP > 0, round((H-HR) / (BFP-SO-BB-HR), 3), NA),
K_9 = ifelse(IP > 0, round((SO*9) / IP, 3), NA),
BB_9 = ifelse(IP > 0, round((BB*9) / IP, 3), NA),
Kpct = ifelse(IP > 0, round(SO/BFP, 3), NA),
BBpct = ifelse(IP > 0, round(BB/BFP, 3), NA)
)
d3 <- d3[, (length(vars)+1):ncol(d3)]
if (cbind) data.frame(data, d3) else data.frame(data[,idvars], d3)
else data.frame(data[, idvars], d3)
}

That’s all for today kids, happy hacking, and job well done by the team on the Lahman package!

Photo by Sue.Ann

To leave a comment for the author, please follow the link and comment on their blog: R Tricks – Data Science Riot!.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)