Pivoting on Text in R (vs. Excel)

[This article was first published on George J. Mount, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

This post follows a previous tutorial on pivoting on text in Excel. In this post I will reproduce the exercise in R. This way you begin to see the similarities and differences of the program and begin to diversify your data skill base.

 

Related: 5 Things Excel Users Should Know About R — Free Mini-Course

Similar to the previous lesson, we will be using the 2017 .csv version of the Lahman baseball dataset. We will again be using the People.csv and AwardsPlayers.csv files.

1. Set up our environment

In addition to reading in the two .csv files, we need to call two libraries for data manipulation: plyr and dplyr.

library(plyr)
library(dplyr)
players <- read.csv("C:/RFiles/people.csv")
awards <- read.csv("C:/RFiles/awardsplayers.csv")

2. Filter and merge tables

Using the filter and select functions from dplyr, we create a table containing all awards information since 2010 by player ID, along with that player’s first and last name.

I include the nrow function to verify that indeed the resulting table in R has the same number of rows as the Excel table: 502.

awards <- filter(awards, yearID >= 2010)
players <- select(players, playerID, nameFirst, nameLast)
tbl <- merge(awards, players, all.x = TRUE)
nrow(tbl)

3. Derive “First Name Last Name” column

Using the paste function with a space delimiter, we create a new column containing each player’s first and last name.

tbl$fullname <- paste(tbl$nameFirst, tbl$nameLast, sep = " ")

4. Pivot and view results

Now we use the ddply function from plyr to pivot on text. Similar to creating a measure in the Excel data model (Step 5 of the Excel post), we create a table summarized by awardID and yearID with the player name’s concatenated with a space delimiter (the paste function in R).

The View function then launches a spreadsheet-like viewer of the data frame.

x <- ddply(tbl, .(awardID, yearID), 
      summarize, names = paste(fullname, collapse = ", "))

View(x)

Below I show the results in R and Excel — they are the same. Excel, however, does include a “subtotal” measure where R does not, but the base information is the same.

Complete code below:

#read in files and call libraries
library(plyr)
library(dplyr)
players <- read.csv("C:/RFiles/people.csv")
awards <- read.csv("C:/RFiles/awardsplayers.csv")

#filter and merge data frames
awards <- filter(awards, yearID >= 2010)
players <- select(players, playerID, nameFirst, nameLast)
tbl <- merge(awards, players, all.x = TRUE)
nrow(tbl)

#create full-name column
tbl$fullname <- paste(tbl$nameFirst, tbl$nameLast, sep = " ")

#pivot on full-name text
x <- ddply(tbl, .(awardID, yearID), 
      summarize, names = paste(fullname, collapse = ", "))

#launch spreadsheet-like viewer of data frame
View(x)

Related: 5 Things Excel Users Should Know About R — Free Mini-Course

 

To leave a comment for the author, please follow the link and comment on their blog: George J. Mount.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)