Left-handed catchers

[This article was first published on Bayes Ball, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.


Benny Distefano – 1985 Donruss #166
(source: baseball-almanac.com)
We are approaching the twenty-fifth anniversary of the last time a left-handed throwing catcher appeared behind the plate in a Major League Baseball game; on August 18, 1989 Benny Distefano made his third and final appearance as a catcher for the Pirates. Distefano’s accomplishment was celebrated five years ago, in Alan Schwarz’s “Left-Handed and Left Out” (New York Times, 2009-08-15).

Jack Moore, writing on the site Sports on Earth in 2013 (“Why no left-handed catchers?”), points out that lack of left-handed catchers goes back a long way. One interesting piece of evidence is a 1948 Ripley’s “Believe It Or Not” item with a left-handed catcher Dick Bernard (you can read more about Bernard’s signing in the July 1, 1948 edition of the Tuscaloosa News). Bernard didn’t make the majors, and doesn’t appear in any of the minor league records that are available on-line either.


Dick Bernard in Ripley’s “Believe It or Not”, 1948-12-30
(source: 
sportsonearth.com)


There are a variety of hypotheses why there are no left-handed catchers, all of which are summarized in John Walsh’s “Top 10 Left-Handed Catchers for 2006” (a tongue-in-cheek title if ever there were) at The Hardball Times. A compelling explanation, and one supported by both Bill James and J.C. Bradbury (in his book The Baseball Economist) is natural selection; a left-handed little league player who can throw well will be groomed as a pitcher.

Throwing hand by fielding position as an example of a categorical variable



I was looking for some examples of categorical variables to display visually, and the lack of left-handed throwing catchers, compared to other positions, came to mind. The following uses R, and the Lahman database package.

The analysis requires merging the Master and Fielding tables in the Lahman database – the Master table gives the player’s name and his throwing hand, and Fielding tells us how many games at each position they played. For the purpose of this analysis, we’ll look at the seasons 1954 (the first year in the Lahman database that has the outfield positions split into left, centre, and right) through 2012.

You may note that for the merging of the two tables, I used the new dplyr package. I tested the system.time of the basic version of “merge” to combine the two tables, and the “inner_join” in dplyr. The latter is substantially faster: my aging computer ran “merge” in about 5.5 seconds, compared to 0.17 seconds with dplyr.
# load the required packages
require(Lahman)
require(dplyr)
#

The first step is to create a new data table that merges the Fielding and Master tables, based on the common variable “playerID”. This new table has one row for each player, by position and season; we use the dim function to show the dimensions of the table.

Then, select only those seasons since 1954 and omit the records that are Designated Hitter (DH) and the summary of outfield positions (OF) (i.e. leave the RF, CF, and LF).
MasterFielding <- inner_join(Fielding, Master, by="playerID")
dim(MasterFielding)
## [1] 164903     52
#
MasterFielding <- filter(MasterFielding, POS != "OF" & POS != "DH" & yearID > "1953")
dim(MasterFielding)
## [1] 91214    52

This table needs to be summarized one step further – a single row for each player, counting how many games played at each position.
Player_games <- MasterFielding %.%
  group_by(playerID, nameFirst, nameLast, POS, throws) %.%
  summarise(gamecount = sum(G)) %.%
  arrange(desc(gamecount)) 
dim(Player_games)
## [1] 19501     6
head(Player_games)
## Source: local data frame [6 x 6]
## Groups: playerID, nameFirst, nameLast, POS
## 
##    playerID nameFirst nameLast POS throws gamecount
## 1 robinbr01    Brooks Robinson  3B      R      2870
## 2 bondsba01     Barry    Bonds  LF      L      2715
## 3 vizquom01      Omar  Vizquel  SS      R      2709
## 4  mayswi01    Willie     Mays  CF      R      2677
## 5 aparilu01      Luis Aparicio  SS      R      2583
## 6 jeterde01     Derek    Jeter  SS      R      2531

This table shows the career records for the most games played at the positions (for 1954-2012). We see that Brooks Robinson leads the way with 2,870 games played at third base, and the fact that Derek Jeter, at the end of the 2012 season, was closing in on Omar Vizquel’s career record for games played as a shortstop.


Cross-tab Tables


The next step is to prepare a simple cross-tab table (also known as contingency or pivot tables) showing the number of players cross-tabulated by position (POS) and throwing hand (throws).

Here, I’ll demonstrate two ways to do this: first with dplyr’s “group_by” and “summarise” (with a bit of help from reshape2), and then the “table” function in gmodels.
# first method - dplyr
Player_POS <- Player_games %.%
  group_by(POS, throws) %.%
  summarise(playercount = length(gamecount))
Player_POS
## Source: local data frame [17 x 3]
## Groups: POS
## 
##    POS throws playercount
## 1   1B      L         411
## 2   1B      R        1515
## 3   2B      L           4
## 4   2B      R        1560
## 5   3B      L           4
## 6   3B      R        1889
## 7    C      L           4
## 8    C      R         980
## 9   CF      L         393
## 10  CF      R        1252
## 11  LF      L         544
## 12  LF      R        2161
## 13   P      L        1452
## 14   P      R        3623
## 15  RF      L         520
## 16  RF      R        1893
## 17  SS      R        1296

To transform this long-form table into a traditional cross-tab shape we can use the “dcast” function in reshape2.
require(reshape2)
## Loading required package: reshape2
dcast(Player_POS, POS ~ throws, value.var = "playercount")
##   POS    L    R
## 1  1B  411 1515
## 2  2B    4 1560
## 3  3B    4 1889
## 4   C    4  980
## 5  CF  393 1252
## 6  LF  544 2161
## 7   P 1452 3623
## 8  RF  520 1893
## 9  SS   NA 1296

A second method to get the same result is to use the “table” function in the gmodels package.
require(gmodels)
## Loading required package: gmodels
throwPOS <- with(Player_games, table(POS, throws))
throwPOS
##     throws
## POS     L    R
##   1B  411 1515
##   2B    4 1560
##   3B    4 1889
##   C     4  980
##   CF  393 1252
##   LF  544 2161
##   P  1452 3623
##   RF  520 1893
##   SS    0 1296

A more elaborate table can be created using gmodels package. In this case, we’ll use the CrossTable function to generate a table with row percentages. You’ll note that the format is set to SPSS, so the table output resembles that software’s display style.
CrossTable(Player_games$POS, Player_games$throws, 
           digits=2, format="SPSS",
           prop.r=TRUE, prop.c=FALSE, prop.t=FALSE, prop.chisq=FALSE,  # keeping the row proportions
           chisq=TRUE)                                                 # adding the ChiSquare statistic
## 
##    Cell Contents
## |-------------------------|
## |                   Count |
## |             Row Percent |
## |-------------------------|
## 
## Total Observations in Table:  19501 
## 
##                  | Player_games$throws 
## Player_games$POS |        L  |        R  | Row Total | 
## -----------------|-----------|-----------|-----------|
##               1B |      411  |     1515  |     1926  | 
##                  |    21.34% |    78.66% |     9.88% | 
## -----------------|-----------|-----------|-----------|
##               2B |        4  |     1560  |     1564  | 
##                  |     0.26% |    99.74% |     8.02% | 
## -----------------|-----------|-----------|-----------|
##               3B |        4  |     1889  |     1893  | 
##                  |     0.21% |    99.79% |     9.71% | 
## -----------------|-----------|-----------|-----------|
##                C |        4  |      980  |      984  | 
##                  |     0.41% |    99.59% |     5.05% | 
## -----------------|-----------|-----------|-----------|
##               CF |      393  |     1252  |     1645  | 
##                  |    23.89% |    76.11% |     8.44% | 
## -----------------|-----------|-----------|-----------|
##               LF |      544  |     2161  |     2705  | 
##                  |    20.11% |    79.89% |    13.87% | 
## -----------------|-----------|-----------|-----------|
##                P |     1452  |     3623  |     5075  | 
##                  |    28.61% |    71.39% |    26.02% | 
## -----------------|-----------|-----------|-----------|
##               RF |      520  |     1893  |     2413  | 
##                  |    21.55% |    78.45% |    12.37% | 
## -----------------|-----------|-----------|-----------|
##               SS |        0  |     1296  |     1296  | 
##                  |     0.00% |   100.00% |     6.65% | 
## -----------------|-----------|-----------|-----------|
##     Column Total |     3332  |    16169  |    19501  | 
## -----------------|-----------|-----------|-----------|
## 
##  
## Statistics for All Table Factors
## 
## 
## Pearson's Chi-squared test 
## ------------------------------------------------------------
## Chi^2 =  1759     d.f. =  8     p =  0 
## 
## 
##  
##        Minimum expected frequency: 168.1

Mosaic Plot


A mosaic plot is an effective way to graphically represent the contents of the summary tables. Note that the length (left to right) dimension of each bar is constant, comparing proportions, while the height of the bar (top to bottom) varies depending on the absolute number of cases. The mosaic plot function is in the vcd package.
require(vcd)
## Loading required package: vcd
## Loading required package: grid
mosaic(throwPOS, highlighting = "throws", highlighting_fill=c("darkgrey", "white"))


Conclusion

The clear result is that it’s not just catchers that are overwhelmingly right-handed throwers, it’s also infielders (except first base). There have been very few southpaws playing second and third base – and there have been absolutely no left-handed throwing shortstops in this period.

As J.G. Preston puts it in the blog post “Left-handed throwing second basemen, shortstops and third basemen”,
While right-handed throwers can be found at any of the nine positions on a baseball field, left-handers are, in practice, restricted to five of them.

So who are these left-handed oddities? Using the filter function, it’s easy to find out:
# catchers
filter(Player_games, POS == "C", throws == "L")
## Source: local data frame [4 x 6]
## Groups: playerID, nameFirst, nameLast, POS
## 
##    playerID nameFirst  nameLast POS throws gamecount
## 1 distebe01     Benny Distefano   C      L         3
## 2  longda02      Dale      Long   C      L         2
## 3 squirmi01      Mike   Squires   C      L         2
## 4 shortch02     Chris     Short   C      L         1
# second base
filter(Player_games, POS == "2B", throws == "L")
## Source: local data frame [4 x 6]
## Groups: playerID, nameFirst, nameLast, POS
## 
##    playerID nameFirst  nameLast POS throws gamecount
## 1 marqugo01   Gonzalo   Marquez  2B      L         2
## 2 crowege01    George     Crowe  2B      L         1
## 3 mattido01       Don Mattingly  2B      L         1
## 4 mcdowsa01       Sam  McDowell  2B      L         1
# third base
filter(Player_games, POS == "3B", throws == "L")
## Source: local data frame [4 x 6]
## Groups: playerID, nameFirst, nameLast, POS
## 
##    playerID nameFirst  nameLast POS throws gamecount
## 1 squirmi01      Mike   Squires  3B      L        14
## 2 mattido01       Don Mattingly  3B      L         3
## 3 francte01     Terry  Francona  3B      L         1
## 4 valdema02     Mario    Valdez  3B      L         1

My github file for this entry in Markdown is here: [https://github.com/MonkmanMH/Bayesball/blob/master/LeftHandedCatchers.md]

-30-

To leave a comment for the author, please follow the link and comment on their blog: Bayes Ball.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)