Site icon R-bloggers

How to Convert Sweave LaTeX to knitr R Markdown: Winter Olympic Medals Example

[This article was first published on Jeromy Anglim's Blog: Psychology and Statistics, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The following post shows how to manually convert a Sweave LaTeX document into a knitr R Markdown document. The post (1) reviews many of the required changes; (2) provides an example of a document converted to R Markdown format based on an analysis of Winter Olympic Medal data up to and including 2006; and (3) discusses the pros and cons of LaTeX and Markdown for performing analyses.

Overview

The following analyses of Winter Olympic Medals data have gone through several iterations:

  1. R Script: I originally performed similar analyses in February 2010. It was a simple set of commands where you could see the console output and view the plots.
  2. LaTeX Sweave: In February 2011 I adapted the example to make it a Sweave LaTex document. The source fo this is available on github. With Sweave, I was able to create a document that weaved text, commands, console input, console output, and figures.
  3. R Markdown: Now in June 2012 I'm using the example to review the process of converting a document from Sweave-LaTeX to R Markdown. The souce code is available here on github (see the *.rmd file).

Converting from Sweave to R Markdown

The following changes were required in order to convert my LaTeX Sweave document into an R Markdown document suitable for processing with knitr and RStudio. Many of these changes are fairly obvious if you understand LaTeX and Markdown; but a few are less obvious. And obviously there are many additional changes that might be required on other documents.

R code chunks

Figures and Tables

Basic formatting

LaTeX things

R Markdown Analysis of Winter Olympic Medal Data

The following shows the output of the actual analysis after running the rmd source through Knit HTML in Rstudio. If you're curious, you may wish to view the rmd source code on GitHub side by side this point at this point.

Import Dataset

library(xtable)
options(stringsAsFactors = FALSE)
medals <- read.csv("data/medals.csv")
medals$Year <- as.numeric(medals$Year)
medals <- medals[!is.na(medals$Year), ]

The Olympic Medals data frame includes 2311 medals from 1924 to 2006. The data was sourced from The Guardian Data Blog.

Total Medals by Year

# http://www.math.mcmaster.ca/~bolker/emdbook/chap3A.pdf
x <- aggregate(medals$Year, list(Year = medals$Year), length)
names(x) <- c("year", "medals")
x$pos <- seq(x$year)
fit <- nls(medals ~ a * pos^b + c, x, start = list(a = 10, b = 1, 
    c = 50))

In general over the years the number of Winter Olympic medals awarded has increased. In order to model this relationship, year was converted to ordinal position. A three parameter power function seemed plausible, \( y = ax^b + c \), where \( y \) is total medals awarded and \( x \) is the ordinal position of the olympics starting at one. The best fitting parameters by least-squares were

\[ 0.202 x^{2.297 + 50.987}. \]

The figure displays the data and the line of best fit for the model. The model predicts that 2010, 2014, and 2018 would have 271, 295, and 322 medals respectively.

plot(medals ~ pos, x,  las = 1, 
        ylab = "Total Medals Awarded", 
        xlab = "Ordinal Position of Olympics",
        main="Total medals awarded 
     by ordinal position of Olympics with
     predicted three parameter power function fit displayed.",
        las = 1,
        bty="l")
lines(x$pos, predict(fit))

Gender Ratio by Year

medalsByYearByGender <- aggregate(medals$Year, list(Year = medals$Year, 
    Event.gender = medals$Event.gender), length)
medalsByYearByGender <- medalsByYearByGender[medalsByYearByGender$Event.gender != 
    "X", ]
propf <- list()
propf$prop <- medalsByYearByGender[medalsByYearByGender$Event.gender == 
    "W", "x"]/(medalsByYearByGender[medalsByYearByGender$Event.gender == "W", 
    "x"] + medalsByYearByGender[medalsByYearByGender$Event.gender == "M", "x"])
propf$year <- medalsByYearByGender[medalsByYearByGender$Event.gender == 
    "W", "Year"]
propf$propF <- format(round(propf$prop, 2))

propf$table <- with(propf, cbind(year, propF))
colnames(propf$table) <- c("Year", "Prop. Female")

The figure shows the number of medals won by males and females by year. The table shows the proportion of medals awarded to females by year. It shows a generally similar pattern for males and females. Medals increase gradually until around the late 1980s after which the rate of increase accelerates. However, females started from a much smaller base. Thus, both the absolute difference and the percentage difference has decreased over time to the point where in 2006 46 of medals were won by females.

plot(x ~ Year, medalsByYearByGender[medalsByYearByGender$Event.gender == 
    "M", ], ylim = c(0, max(x)), pch = "m", col = "blue", las = 1, ylab = "Total Medals Awarded", 
    bty = "l", main = "Total Medals Won by Gender and Year")
points(medalsByYearByGender[medalsByYearByGender$Event.gender == 
    "W", "Year"], medalsByYearByGender[medalsByYearByGender$Event.gender == 
    "W", "x"], col = "red", pch = "f")

print(xtable(propf$table,
             caption="Proportion of Medals that were awarded to Females by Year"), 
      type="html", 
      caption.placement="top",
      html.table.attributes='align="center"')
< !-- html table generated in R 2.15.0 by xtable 1.7-0 package --> < !-- Mon Jun 4 22:14:27 2012 -->
Proportion of Medals that were awarded to Females by Year
Year Prop. Female
1 1924 0.07
2 1928 0.08
3 1932 0.08
4 1936 0.12
5 1948 0.18
6 1952 0.23
7 1956 0.26
8 1960 0.38
9 1964 0.37
10 1968 0.37
11 1972 0.36
12 1976 0.35
13 1980 0.34
14 1984 0.36
15 1988 0.37
16 1992 0.43
17 1994 0.43
18 1998 0.44
19 2002 0.45
20 2006 0.46

Countries with the Most Medals

cmm <- list()
cmm$medals <- sort(table(medals$NOC), dec = TRUE)
cmm$country <- names(cmm$medals)
cmm$prop <- cmm$medals/sum(cmm$medals)
cmm$propF <- paste(round(cmm$prop * 100, 2), "%", sep = "")

cmm$row1 <- c("Rank", "Country", "Total", "%")
cmm$rank <- seq(cmm$medals)
cmm$include <- 1:10

cmm$table <- with(cmm, rbind(cbind(rank[include], country[include], 
    medals[include], propF[include])))
colnames(cmm$table) <- cmm$row1

Norway has won the most medals with 280 (12.12%). The table shows the top 10. Russia, USSR, and EUN (Unified Team in 1992 Olympics) have a combined total of 293. Germany, GDR, and FRG have a combined medal total of 309.

print(xtable(cmm$table, caption="Rankings of Medals Won by Country"), 
      "html", include.rownames=FALSE, caption.placement='top',
      html.table.attributes='align="center"')
< !-- html table generated in R 2.15.0 by xtable 1.7-0 package --> < !-- Mon Jun 4 22:14:27 2012 -->
Rankings of Medals Won by Country
Rank Country Total %
1 NOR 280 12.12%
2 USA 216 9.35%
3 URS 194 8.39%
4 AUT 185 8.01%
5 GER 158 6.84%
6 FIN 151 6.53%
7 CAN 119 5.15%
8 SUI 118 5.11%
9 SWE 118 5.11%
10 GDR 110 4.76%

Proportion of Gold Medals by Country

Looking only at countries that have won more than 50 medals in the dataset, the figure shows that the proportion of medals won that were gold, silver, or bronze.

NOC50Plus <- names(table(medals$NOC)[table(medals$NOC) > 50])
medalsSubset <- medals[medals$NOC %in% NOC50Plus, ]
medalsByMedalByNOC <- prop.table(table(medalsSubset$NOC, medalsSubset$Medal), 
                                 margin = 1)
medalsByMedalByNOC <- medalsByMedalByNOC[order(medalsByMedalByNOC[, "Gold"], 
         decreasing = TRUE), c("Gold", "Silver", "Bronze")]
barplot(round(t(medalsByMedalByNOC), 2), horiz = TRUE, las = 1, 
        col=c("gold", "grey71", "chocolate4"), 
        xlab = "Proportion of Medals",
        main="Proportion of medals won that were gold, silver or bronze.")

How many different countries have won medals by year?

listOfYears <- unique(medals$Year)
names(listOfYears) <- unique(medals$Year)
totalNocByYear <- sapply(listOfYears, function(X) length(table(medals[medals$Year == 
    X, "NOC"])))

The figure shows the total number of countries winning medals by year.

plot(x = names(totalNocByYear), totalNocByYear, ylim = c(0, max(totalNocByYear)), 
    las = 1, xlab = "Year", main = "Total Number of Countries Winning Medals By Year", 
    ylab = "Total Number of Countries", bty = "l")

Australia at the Winter Olympics

ausmedals <- list()
ausmedals$data <- medals[medals$NOC == "AUS", ]
ausmedals$data <- ausmedals$data[, c("Year", "City", "Discipline", 
    "Event", "Medal")]
ausmedals$table <- ausmedals$data

Given that I am an Australian I decided to have a look at the Australian medal count. Australia does not get a lot of snow. Up to and including 2006, Australia has won 6 medals. It won its first medal in 1994. Of the 6 medals, 3 were bronze, 0 were silver, and 3 were gold. The table lists each of these medals.

print(xtable(ausmedals$table, 
             caption='List of Australian Medals',
             digits=0),
      type='html', 
      caption.placement='top', 
      include.rownames=FALSE,
      html.table.attributes='align="center"') 
< !-- html table generated in R 2.15.0 by xtable 1.7-0 package --> < !-- Mon Jun 4 22:15:10 2012 -->
List of Australian Medals
Year City Discipline Event Medal
1994 Lillehammer Short Track S. 5000m relay Bronze
1998 Nagano Alpine Skiing slalom Bronze
2002 Salt Lake City Short Track S. 1000m Gold
2002 Salt Lake City Freestyle Ski. aerials Gold
2006 Turin Freestyle Ski. aerials Bronze
2006 Turin Freestyle Ski. moguls Gold

Ice Hockey

icehockey <- medals[medals$Sport == "Ice Hockey" & medals$Event.gender == 
    "M" & medals$Medal == "Gold", ]
icehockeyf <- medals[medals$Sport == "Ice Hockey" & medals$Event.gender == 
    "W" & medals$Medal == "Gold", ]

# names(table(icehockey$NOC)[table(icehockey$NOC) > 1])

The following are some statistics about Winter Olympics Ice Hockey up to and including the 2006 Winter Olympics.

icehockeygs <- medals[medals$Sport == "Ice Hockey" & 
    medals$Event.gender == "M" &
    medals$Medal %in% c("Silver", "Gold"),  c("Year", "Medal", "NOC")]
icetab <- list()
icetab$data <- reshape(icehockeygs, idvar="Year", timevar="Medal",
    direction="wide")
names(icetab$data) <- c("Year", "Gold", "Silver")

print(xtable(icetab$data, 
             caption ="Country Winning Gold and Silver Medals by Year in Mens Ice Hockey", 
             digits=0), 
      type="html",     
      include.rownames=FALSE,
      caption.placement="top",
      html.table.attributes='align="center"')
< !-- html table generated in R 2.15.0 by xtable 1.7-0 package --> < !-- Mon Jun 4 22:15:10 2012 -->
Country Winning Gold and Silver Medals by Year in Mens Ice Hockey
Year Gold Silver
1924 CAN USA
1928 CAN SWE
1932 CAN USA
1936 GBR CAN
1948 CAN TCH
1952 CAN USA
1956 URS USA
1960 USA CAN
1964 URS SWE
1968 URS TCH
1972 URS USA
1976 URS TCH
1980 USA URS
1984 URS TCH
1988 URS FIN
1992 EUN CAN
1994 SWE CAN
1998 CZE RUS
2002 CAN USA
2006 SWE FIN

Reflections on the Conversion Process

Additional Resources

If you liked this post, you may want to subscribe to the RSS feed of my blog. Also see:

To leave a comment for the author, please follow the link and comment on their blog: Jeromy Anglim's Blog: Psychology and Statistics.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.