Analysis of Winter Olympic Medal Data Using R

[This article was first published on Jeromy Anglim's Blog: Psychology and Statistics, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The Winter Olympics are on. The Guardian’s DataBlog has graciously compiled a database on Winter Olympic Medals. Thus, I thought I’d run a few quick analyses on the data in R. In this post I was hoping to show how one could quickly churn out some basic analyses (and answer some interesting questions) using R.

First, a disclaimer: I ran these analyses in about 45 minutes. Thus, I make no claims of perfect accuracy or in the source data provided by the Guardian. The data also does not include 2010 medals.

Below you will see:

  1. The R Console input and output
  2. The plots
  3. The source code on its own
CONSOLE INPUT AND OUTPUT

> # tips on reading a Google Spreadsheet:<br />> # http://blog.revolution-computing.com/2009/09/how-to-use-a-google-spreadsheet-as-data-in-r.html<br />> # Data taken from:"https://spreadsheets.google.com/ccc?key=0AgdO92JOXxAOdDVlaUpkNlB2WERtV3l1ZVFYbzllQWc"<br />> # http://www.guardian.co.uk/news/datablog/2010/feb/11/winter-olympics-medals-by-country<br />> <br />> googleLink <- "http://spreadsheets.google.com/pub?key=tsddww6vOYePkhPSxRpDeYw&single=true&gid=1&output=csv"<br />> medals <- read.csv(googleLink, stringsAsFactors = FALSE)<br />> savePlot <- TRUE # optional variable used to save or not save plots in code<br />> <br />> # remove rows that do not contain data<br />> medals$Year <- as.numeric(medals$Year)<br />Warning message:<br />NAs introduced by coercion <br />> medals <- medals[!is.na(medals$Year), ]<br />> <br />> <br />> # Quick look at data<br />> head(medals)<br />  Year     City      Sport     Discipline NOC           Event Event.gender<br />1 1924 Chamonix    Skating Figure skating AUT      individual            M<br />2 1924 Chamonix    Skating Figure skating AUT      individual            W<br />3 1924 Chamonix    Skating Figure skating AUT           pairs            X<br />4 1924 Chamonix  Bobsleigh      Bobsleigh BEL        four-man            M<br />5 1924 Chamonix Ice Hockey     Ice Hockey CAN      ice hockey            M<br />6 1924 Chamonix   Biathlon       Biathlon FIN military patrol            M<br />   Medal<br />1 Silver<br />2   Gold<br />3   Gold<br />4 Bronze<br />5   Gold<br />6 Silver<br />> sapply(medals, function(x) cbind(sort(table(x), decreasing = TRUE)))<br />$Year<br />     [,1]<br />2006  252<br />2002  234<br />1998  205<br />1994  183<br />1992  171<br />1988  138<br />1984  117<br />1980  115<br />1976  111<br />1968  106<br />1972  105<br />1964  103<br />1960   81<br />1956   72<br />1948   68<br />1952   67<br />1936   51<br />1924   49<br />1932   42<br />1928   41<br /><br />$City<br />                       [,1]<br />Turin                   252<br />Salt Lake City          234<br />Innsbruck               214<br />Nagano                  205<br />Lillehammer             183<br />Albertville             171<br />Lake Placid             157<br />Calgary                 138<br />Sarajevo                117<br />St. Moritz              109<br />Grenoble                106<br />Sapporo                 105<br />Squaw Valley             81<br />Cortina d'Ampezzo        72<br />Oslo                     67<br />Garmisch-Partenkirchen   51<br />Chamonix                 49<br /><br />$Sport<br />           [,1]<br />Skiing     1060<br />Skating     758<br />Biathlon    162<br />Bobsleigh   133<br />Luge        108<br />Ice Hockey   69<br />Curling      21<br /><br />$Discipline<br />                [,1]<br />Speed skating    455<br />Cross Country S  399<br />Alpine Skiing    367<br />Figure skating   207<br />Biathlon         162<br />Bobsleigh        115<br />Ski Jumping      114<br />Luge             108<br />Short Track S.    96<br />Nordic Combined   84<br />Ice Hockey        69<br />Freestyle Ski.    54<br />Snowboard         42<br />Curling           21<br />Skeleton          18<br /><br />$NOC<br />    [,1]<br />NOR  280<br />USA  216<br />URS  194<br />AUT  185<br />GER  158<br />FIN  151<br />CAN  119<br />SUI  118<br />SWE  118<br />GDR  110<br />ITA  101<br />FRA   83<br />NED   78<br />RUS   76<br />FRG   41<br />CHN   33<br />JPN   32<br />KOR   31<br />TCH   25<br />EUN   23<br />GBR   21<br />EUA   19<br />CZE   10<br />LIE    9<br />POL    8<br />CRO    7<br />AUS    6<br />BLR    6<br />BUL    6<br />EST    6<br />HUN    6<br />BEL    5<br />KAZ    5<br />UKR    5<br />SLO    4<br />YUG    4<br />ESP    2<br />LUX    2<br />PRK    2<br />DEN    1<br />LAT    1<br />NZL    1<br />ROU    1<br />SVK    1<br />UZB    1<br /><br />$Event<br />                                [,1]<br />individual                       195<br />500m                             133<br />1500m                            111<br />downhill                          97<br />slalom                            96<br />1000m                             94<br />giant slalom                      90<br />5000m                             78<br />singles                           72<br />ice hockey                        69<br />10km                              60<br />50km                              60<br />K90 individual (70m)              60<br />pairs                             60<br />10000m                            57<br />two-man                           57<br />four-man                          55<br />4x10km relay                      51<br />15km                              48<br />20km                              45<br />4x7.5km relay                     42<br />alpine combined                   42<br />3000m                             39<br />30km mass start                   39<br />doubles                           36<br />K120 individual (90m)             36<br />super-G                           36<br />5km                               30<br />moguls                            30<br />4x5km relay                       27<br />ice dancing                       27<br />aerials                           24<br />curling                           21<br />10km pursuit                      18<br />18km                              18<br />Half-pipe                         18<br />K120 team (90m)                   18<br />Team                              18<br />15km mass start                   15<br />3000m relay                       15<br />30km                              15<br />3x5km relay                       15<br />5000m relay                       15<br />7.5km                             15<br />Giant parallel slalom             12<br />Combined 10km + 15km pursuit       9<br />Combined 5km + 10km pursuit        9<br />12.5km pursuit                     6<br />Alpine combined                    6<br />giant-slalom                       6<br />Snowboard Cross                    6<br />Sprint 1,5km                       6<br />sprint 1.5km                       6<br />Team pursuit                       6<br />Team sprint                        6<br />12,5km mass start                  3<br />3x7.5km relay                      3<br />4x6km relay                        3<br />5km pursuit                        3<br />combined (4 events)                3<br />Combined 15 + 15km mass start      3<br />Combined 7.5 + 7.5km mass start    3<br />five-man                           3<br />Individual                         3<br />Individual sprint                  3<br />military patrol                    3<br />sprint                             3<br /><br />$Event.gender<br />  [,1]<br />M 1386<br />W  802<br />X  123<br /><br />$Medal<br />       [,1]<br />Gold    774<br />Silver  773<br />Bronze  764<br /><br />> <br />> <br />> # How many medals have been awarded in each Olympics?<br />> medalsByYear <- aggregate(medals$Year, list(Year = medals$Year), length)<br />> if (savePlot == TRUE)  png("fig1.png")<br />> plot(x ~ Year, medalsByYear, ylim = c(0,max(x)), <br />+     ylab = "Total Medals Awarded", bty="l",<br />+     main = "Total Medals Awarded in Winter Olympics by Year")<br />> if (savePlot == TRUE) dev.off()<br />windows <br />      2 <br />> <br />> # How has the amount of medals awarded to males and females changed over the years?<br />> # Get data.<br />> medalsByYearByGender <- aggregate(medals$Year, <br />+     list(Year = medals$Year, Event.gender = medals$Event.gender), length)<br />> medalsByYearByGender <- medalsByYearByGender[medalsByYearByGender$Event.gender != "X", ]<br />> <br />> # Plot results.<br />> if (savePlot == TRUE)  png("fig2.png")<br />> plot(x ~ Year, medalsByYearByGender[medalsByYearByGender$Event.gender == "M", ], <br />+     ylim = c(0,max(x)), pch = "m", col = "blue", <br />+     ylab = "Total Medals Awarded", bty="l",<br />+     main = "Total Medals Awarded in Winter Olympicsn by Gender and by Year")<br />> points(medalsByYearByGender[medalsByYearByGender$Event.gender == "W", "Year"],<br />+     medalsByYearByGender[medalsByYearByGender$Event.gender == "W", "x"],<br />+     col = "red", pch = "f")<br />> if (savePlot == TRUE) dev.off()<br />windows <br />      2 <br />> <br />> # Table of proportion female<br />> propFemalePerYear <- medalsByYearByGender[medalsByYearByGender$Event.gender == "W", "x"] / (<br />+       medalsByYearByGender[medalsByYearByGender$Event.gender == "W", "x"] +<br />+       medalsByYearByGender[medalsByYearByGender$Event.gender == "M", "x"])<br />> propFemalePerYear <- round(propFemalePerYear, 2)<br />> cbind(Year = medalsByYearByGender[medalsByYearByGender$Event.gender == "W", "Year"],<br />+     PropFemale = propFemalePerYear)<br />      Year PropFemale<br /> [1,] 1924       0.07<br /> [2,] 1928       0.08<br /> [3,] 1932       0.08<br /> [4,] 1936       0.12<br /> [5,] 1948       0.18<br /> [6,] 1952       0.23<br /> [7,] 1956       0.26<br /> [8,] 1960       0.38<br /> [9,] 1964       0.37<br />[10,] 1968       0.37<br />[11,] 1972       0.36<br />[12,] 1976       0.35<br />[13,] 1980       0.34<br />[14,] 1984       0.36<br />[15,] 1988       0.37<br />[16,] 1992       0.43<br />[17,] 1994       0.43<br />[18,] 1998       0.44<br />[19,] 2002       0.45<br />[20,] 2006       0.46<br />>         <br />> <br />> # Which countries have won the most medals?<br />> sort(table(medals$NOC), dec = TRUE)<br /><br />NOR USA URS AUT GER FIN CAN SUI SWE GDR ITA FRA NED RUS FRG CHN JPN KOR TCH EUN <br />280 216 194 185 158 151 119 118 118 110 101  83  78  76  41  33  32  31  25  23 <br />GBR EUA CZE LIE POL CRO AUS BLR BUL EST HUN BEL KAZ UKR SLO YUG ESP LUX PRK DEN <br /> 21  19  10   9   8   7   6   6   6   6   6   5   5   5   4   4   2   2   2   1 <br />LAT NZL ROU SVK UZB <br />  1   1   1   1   1 <br />> <br />> <br />> # Of the countries that have won more than 50 medals,<br />> # which have the highest percentage of gold medals?<br />> NOC50Plus <- names(table(medals$NOC)[table(medals$NOC) > 50])<br />> medalsSubset <- medals[medals$NOC %in% NOC50Plus, ]<br />> medalsByMedalByNOC <- prop.table(table(medalsSubset$NOC, medalsSubset$Medal), margin = 1)<br />> medalsByMedalByNOC <- medalsByMedalByNOC[order(medalsByMedalByNOC[, "Gold"], <br />+         decreasing = TRUE), c("Gold", "Silver", "Bronze")]<br />> round(medalsByMedalByNOC, 2)<br />     <br />      Gold Silver Bronze<br />  RUS 0.43   0.32   0.25<br />  URS 0.40   0.29   0.30<br />  GER 0.37   0.37   0.26<br />  SWE 0.36   0.26   0.37<br />  USA 0.36   0.37   0.27<br />  ITA 0.36   0.31   0.34<br />  GDR 0.35   0.33   0.32<br />  NOR 0.35   0.35   0.30<br />  SUI 0.32   0.31   0.36<br />  NED 0.32   0.38   0.29<br />  CAN 0.32   0.32   0.36<br />  FRA 0.30   0.29   0.41<br />  AUT 0.28   0.35   0.38<br />  FIN 0.27   0.38   0.34<br />> <br />> <br />> # How many different countries have won medals by year?<br />> listOfYears <- unique(medals$Year)<br />> names(listOfYears) <- unique(medals$Year)<br />> totalNocByYear <- sapply(listOfYears,  function(X) <br />+       length(table(medals[medals$Year == X, "NOC"])))<br />> <br />> # Table<br />> totalNocByYear  <br />1924 1928 1932 1936 1948 1952 1956 1960 1964 1968 1972 1976 1980 1984 1988 1992 <br />  10   12   10   11   13   14   13   14   14   15   17   16   19   17   17   20 <br />1994 1998 2002 2006 <br />  22   24   24   26 <br />> <br />> # Plot<br />> if (savePlot == TRUE)  png("fig3.png")<br />> plot(x= names(totalNocByYear), totalNocByYear, <br />+     ylim = c(0, max(totalNocByYear)),<br />+     xlab = "Year",<br />+     ylab = "Total Number of Countries",<br />+     bty = "l", <br />+     main = "Total Number of Countriesn Winning Medals By Year")<br />> if (savePlot == TRUE) dev.off()<br />windows <br />      2 <br />> <br />> # Which Countries have won a medal at every Olympics? <br />> propYearsOnePlusMedals <- apply(table(medals$NOC, medals$Year) > 0, 1, mean)<br />> <br />> #Answer<br />> names(propYearsOnePlusMedals[propYearsOnePlusMedals == 1.0]) <br />[1] "AUT" "CAN" "FIN" "NOR" "SWE" "USA"<br />> <br />> # Table Sorted by Proportion of Olympics with a Medal<br />> cbind(sort(propYearsOnePlusMedals, decreasing = TRUE)) <br />    [,1]<br />AUT 1.00<br />CAN 1.00<br />FIN 1.00<br />NOR 1.00<br />SWE 1.00<br />USA 1.00<br />FRA 0.95<br />SUI 0.95<br />ITA 0.80<br />GBR 0.65<br />NED 0.65<br />TCH 0.55<br />JPN 0.50<br />GER 0.45<br />URS 0.45<br />FRG 0.35<br />GDR 0.30<br />HUN 0.30<br />CHN 0.25<br />KOR 0.25<br />POL 0.25<br />AUS 0.20<br />BEL 0.20<br />BLR 0.20<br />BUL 0.20<br />LIE 0.20<br />RUS 0.20<br />CZE 0.15<br />EUA 0.15<br />UKR 0.15<br />CRO 0.10<br />ESP 0.10<br />EST 0.10<br />KAZ 0.10<br />PRK 0.10<br />SLO 0.10<br />YUG 0.10<br />DEN 0.05<br />EUN 0.05<br />LAT 0.05<br />LUX 0.05<br />NZL 0.05<br />ROU 0.05<br />SVK 0.05<br />UZB 0.05<br /><br />

THE PLOTS

THE R SOURCE CODE

# tips on reading a Google Spreadsheet:<br /># http://blog.revolution-computing.com/2009/09/how-to-use-a-google-spreadsheet-as-data-in-r.html<br /># Data taken from:"https://spreadsheets.google.com/ccc?key=0AgdO92JOXxAOdDVlaUpkNlB2WERtV3l1ZVFYbzllQWc"<br /># http://www.guardian.co.uk/news/datablog/2010/feb/11/winter-olympics-medals-by-country<br /><br />googleLink <- "http://spreadsheets.google.com/pub?key=tsddww6vOYePkhPSxRpDeYw&single=true&gid=1&output=csv"<br />medals <- read.csv(googleLink, stringsAsFactors = FALSE)<br />savePlot <- TRUE # optional variable used to save or not save plots in code<br /><br /># remove rows that do not contain data<br />medals$Year <- as.numeric(medals$Year)<br />medals <- medals[!is.na(medals$Year), ]<br /><br /><br /># Quick look at data<br />head(medals)<br />sapply(medals, function(x) cbind(sort(table(x), decreasing = TRUE)))<br /><br /><br /># How many medals have been awarded in each Olympics?<br />medalsByYear <- aggregate(medals$Year, list(Year = medals$Year), length)<br />if (savePlot == TRUE)  png("fig1.png")<br />plot(x ~ Year, medalsByYear, ylim = c(0,max(x)), <br />    ylab = "Total Medals Awarded", bty="l",<br />    main = "Total Medals Awarded in Winter Olympics by Year")<br />if (savePlot == TRUE) dev.off()<br /><br /># How has the amount of medals awarded to males and females changed over the years?<br /># Get data.<br />medalsByYearByGender <- aggregate(medals$Year, <br />    list(Year = medals$Year, Event.gender = medals$Event.gender), length)<br />medalsByYearByGender <- medalsByYearByGender[medalsByYearByGender$Event.gender != "X", ]<br /><br /># Plot results.<br />if (savePlot == TRUE)  png("fig2.png")<br />plot(x ~ Year, medalsByYearByGender[medalsByYearByGender$Event.gender == "M", ], <br />    ylim = c(0,max(x)), pch = "m", col = "blue", <br />    ylab = "Total Medals Awarded", bty="l",<br />    main = "Total Medals Awarded in Winter Olympicsn by Gender and by Year")<br />points(medalsByYearByGender[medalsByYearByGender$Event.gender == "W", "Year"],<br />    medalsByYearByGender[medalsByYearByGender$Event.gender == "W", "x"],<br />    col = "red", pch = "f")<br />if (savePlot == TRUE) dev.off()<br /><br /># Table of proportion female<br />propFemalePerYear <- medalsByYearByGender[medalsByYearByGender$Event.gender == "W", "x"] / (<br />      medalsByYearByGender[medalsByYearByGender$Event.gender == "W", "x"] +<br />      medalsByYearByGender[medalsByYearByGender$Event.gender == "M", "x"])<br />propFemalePerYear <- round(propFemalePerYear, 2)<br />cbind(Year = medalsByYearByGender[medalsByYearByGender$Event.gender == "W", "Year"],<br />    PropFemale = propFemalePerYear)<br />        <br /><br /># Which countries have won the most medals?<br />sort(table(medals$NOC), dec = TRUE)<br /><br /><br /># Of the countries that have won more than 50 medals,<br /># which have the highest percentage of gold medals?<br />NOC50Plus <- names(table(medals$NOC)[table(medals$NOC) > 50])<br />medalsSubset <- medals[medals$NOC %in% NOC50Plus, ]<br />medalsByMedalByNOC <- prop.table(table(medalsSubset$NOC, medalsSubset$Medal), margin = 1)<br />medalsByMedalByNOC <- medalsByMedalByNOC[order(medalsByMedalByNOC[, "Gold"], <br />        decreasing = TRUE), c("Gold", "Silver", "Bronze")]<br />round(medalsByMedalByNOC, 2)<br /><br /><br /># How many different countries have won medals by year?<br />listOfYears <- unique(medals$Year)<br />names(listOfYears) <- unique(medals$Year)<br />totalNocByYear <- sapply(listOfYears,  function(X) <br />      length(table(medals[medals$Year == X, "NOC"])))<br /><br /># Table<br />totalNocByYear  <br /><br /># Plot<br />if (savePlot == TRUE)  png("fig3.png")<br />plot(x= names(totalNocByYear), totalNocByYear, <br />    ylim = c(0, max(totalNocByYear)),<br />    xlab = "Year",<br />    ylab = "Total Number of Countries",<br />    bty = "l", <br />    main = "Total Number of Countriesn Winning Medals By Year")<br />if (savePlot == TRUE) dev.off()<br /><br /># Which Countries have won a medal at every Olympics? <br />propYearsOnePlusMedals <- apply(table(medals$NOC, medals$Year) > 0, 1, mean)<br /><br />#Answer<br />names(propYearsOnePlusMedals[propYearsOnePlusMedals == 1.0]) <br /><br /># Table Sorted by Proportion of Olympics with a Medal<br />cbind(sort(propYearsOnePlusMedals, decreasing = TRUE)) <br />

To leave a comment for the author, please follow the link and comment on their blog: Jeromy Anglim's Blog: Psychology and Statistics.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)