Analysis of Winter Olympic Medal Data Using R
[This article was first published on Jeromy Anglim's Blog: Psychology and Statistics, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
The Winter Olympics are on. The Guardian’s DataBlog has graciously compiled a database on Winter Olympic Medals. Thus, I thought I’d run a few quick analyses on the data in R. In this post I was hoping to show how one could quickly churn out some basic analyses (and answer some interesting questions) using R.
First, a disclaimer: I ran these analyses in about 45 minutes. Thus, I make no claims of perfect accuracy or in the source data provided by the Guardian. The data also does not include 2010 medals.
Below you will see:
- The R Console input and output
- The plots
- The source code on its own
CONSOLE INPUT AND OUTPUT
> # tips on reading a Google Spreadsheet:<br />> # http://blog.revolution-computing.com/2009/09/how-to-use-a-google-spreadsheet-as-data-in-r.html<br />> # Data taken from:"https://spreadsheets.google.com/ccc?key=0AgdO92JOXxAOdDVlaUpkNlB2WERtV3l1ZVFYbzllQWc"<br />> # http://www.guardian.co.uk/news/datablog/2010/feb/11/winter-olympics-medals-by-country<br />> <br />> googleLink <- "http://spreadsheets.google.com/pub?key=tsddww6vOYePkhPSxRpDeYw&single=true&gid=1&output=csv"<br />> medals <- read.csv(googleLink, stringsAsFactors = FALSE)<br />> savePlot <- TRUE # optional variable used to save or not save plots in code<br />> <br />> # remove rows that do not contain data<br />> medals$Year <- as.numeric(medals$Year)<br />Warning message:<br />NAs introduced by coercion <br />> medals <- medals[!is.na(medals$Year), ]<br />> <br />> <br />> # Quick look at data<br />> head(medals)<br /> Year City Sport Discipline NOC Event Event.gender<br />1 1924 Chamonix Skating Figure skating AUT individual M<br />2 1924 Chamonix Skating Figure skating AUT individual W<br />3 1924 Chamonix Skating Figure skating AUT pairs X<br />4 1924 Chamonix Bobsleigh Bobsleigh BEL four-man M<br />5 1924 Chamonix Ice Hockey Ice Hockey CAN ice hockey M<br />6 1924 Chamonix Biathlon Biathlon FIN military patrol M<br /> Medal<br />1 Silver<br />2 Gold<br />3 Gold<br />4 Bronze<br />5 Gold<br />6 Silver<br />> sapply(medals, function(x) cbind(sort(table(x), decreasing = TRUE)))<br />$Year<br /> [,1]<br />2006 252<br />2002 234<br />1998 205<br />1994 183<br />1992 171<br />1988 138<br />1984 117<br />1980 115<br />1976 111<br />1968 106<br />1972 105<br />1964 103<br />1960 81<br />1956 72<br />1948 68<br />1952 67<br />1936 51<br />1924 49<br />1932 42<br />1928 41<br /><br />$City<br /> [,1]<br />Turin 252<br />Salt Lake City 234<br />Innsbruck 214<br />Nagano 205<br />Lillehammer 183<br />Albertville 171<br />Lake Placid 157<br />Calgary 138<br />Sarajevo 117<br />St. Moritz 109<br />Grenoble 106<br />Sapporo 105<br />Squaw Valley 81<br />Cortina d'Ampezzo 72<br />Oslo 67<br />Garmisch-Partenkirchen 51<br />Chamonix 49<br /><br />$Sport<br /> [,1]<br />Skiing 1060<br />Skating 758<br />Biathlon 162<br />Bobsleigh 133<br />Luge 108<br />Ice Hockey 69<br />Curling 21<br /><br />$Discipline<br /> [,1]<br />Speed skating 455<br />Cross Country S 399<br />Alpine Skiing 367<br />Figure skating 207<br />Biathlon 162<br />Bobsleigh 115<br />Ski Jumping 114<br />Luge 108<br />Short Track S. 96<br />Nordic Combined 84<br />Ice Hockey 69<br />Freestyle Ski. 54<br />Snowboard 42<br />Curling 21<br />Skeleton 18<br /><br />$NOC<br /> [,1]<br />NOR 280<br />USA 216<br />URS 194<br />AUT 185<br />GER 158<br />FIN 151<br />CAN 119<br />SUI 118<br />SWE 118<br />GDR 110<br />ITA 101<br />FRA 83<br />NED 78<br />RUS 76<br />FRG 41<br />CHN 33<br />JPN 32<br />KOR 31<br />TCH 25<br />EUN 23<br />GBR 21<br />EUA 19<br />CZE 10<br />LIE 9<br />POL 8<br />CRO 7<br />AUS 6<br />BLR 6<br />BUL 6<br />EST 6<br />HUN 6<br />BEL 5<br />KAZ 5<br />UKR 5<br />SLO 4<br />YUG 4<br />ESP 2<br />LUX 2<br />PRK 2<br />DEN 1<br />LAT 1<br />NZL 1<br />ROU 1<br />SVK 1<br />UZB 1<br /><br />$Event<br /> [,1]<br />individual 195<br />500m 133<br />1500m 111<br />downhill 97<br />slalom 96<br />1000m 94<br />giant slalom 90<br />5000m 78<br />singles 72<br />ice hockey 69<br />10km 60<br />50km 60<br />K90 individual (70m) 60<br />pairs 60<br />10000m 57<br />two-man 57<br />four-man 55<br />4x10km relay 51<br />15km 48<br />20km 45<br />4x7.5km relay 42<br />alpine combined 42<br />3000m 39<br />30km mass start 39<br />doubles 36<br />K120 individual (90m) 36<br />super-G 36<br />5km 30<br />moguls 30<br />4x5km relay 27<br />ice dancing 27<br />aerials 24<br />curling 21<br />10km pursuit 18<br />18km 18<br />Half-pipe 18<br />K120 team (90m) 18<br />Team 18<br />15km mass start 15<br />3000m relay 15<br />30km 15<br />3x5km relay 15<br />5000m relay 15<br />7.5km 15<br />Giant parallel slalom 12<br />Combined 10km + 15km pursuit 9<br />Combined 5km + 10km pursuit 9<br />12.5km pursuit 6<br />Alpine combined 6<br />giant-slalom 6<br />Snowboard Cross 6<br />Sprint 1,5km 6<br />sprint 1.5km 6<br />Team pursuit 6<br />Team sprint 6<br />12,5km mass start 3<br />3x7.5km relay 3<br />4x6km relay 3<br />5km pursuit 3<br />combined (4 events) 3<br />Combined 15 + 15km mass start 3<br />Combined 7.5 + 7.5km mass start 3<br />five-man 3<br />Individual 3<br />Individual sprint 3<br />military patrol 3<br />sprint 3<br /><br />$Event.gender<br /> [,1]<br />M 1386<br />W 802<br />X 123<br /><br />$Medal<br /> [,1]<br />Gold 774<br />Silver 773<br />Bronze 764<br /><br />> <br />> <br />> # How many medals have been awarded in each Olympics?<br />> medalsByYear <- aggregate(medals$Year, list(Year = medals$Year), length)<br />> if (savePlot == TRUE) png("fig1.png")<br />> plot(x ~ Year, medalsByYear, ylim = c(0,max(x)), <br />+ ylab = "Total Medals Awarded", bty="l",<br />+ main = "Total Medals Awarded in Winter Olympics by Year")<br />> if (savePlot == TRUE) dev.off()<br />windows <br /> 2 <br />> <br />> # How has the amount of medals awarded to males and females changed over the years?<br />> # Get data.<br />> medalsByYearByGender <- aggregate(medals$Year, <br />+ list(Year = medals$Year, Event.gender = medals$Event.gender), length)<br />> medalsByYearByGender <- medalsByYearByGender[medalsByYearByGender$Event.gender != "X", ]<br />> <br />> # Plot results.<br />> if (savePlot == TRUE) png("fig2.png")<br />> plot(x ~ Year, medalsByYearByGender[medalsByYearByGender$Event.gender == "M", ], <br />+ ylim = c(0,max(x)), pch = "m", col = "blue", <br />+ ylab = "Total Medals Awarded", bty="l",<br />+ main = "Total Medals Awarded in Winter Olympicsn by Gender and by Year")<br />> points(medalsByYearByGender[medalsByYearByGender$Event.gender == "W", "Year"],<br />+ medalsByYearByGender[medalsByYearByGender$Event.gender == "W", "x"],<br />+ col = "red", pch = "f")<br />> if (savePlot == TRUE) dev.off()<br />windows <br /> 2 <br />> <br />> # Table of proportion female<br />> propFemalePerYear <- medalsByYearByGender[medalsByYearByGender$Event.gender == "W", "x"] / (<br />+ medalsByYearByGender[medalsByYearByGender$Event.gender == "W", "x"] +<br />+ medalsByYearByGender[medalsByYearByGender$Event.gender == "M", "x"])<br />> propFemalePerYear <- round(propFemalePerYear, 2)<br />> cbind(Year = medalsByYearByGender[medalsByYearByGender$Event.gender == "W", "Year"],<br />+ PropFemale = propFemalePerYear)<br /> Year PropFemale<br /> [1,] 1924 0.07<br /> [2,] 1928 0.08<br /> [3,] 1932 0.08<br /> [4,] 1936 0.12<br /> [5,] 1948 0.18<br /> [6,] 1952 0.23<br /> [7,] 1956 0.26<br /> [8,] 1960 0.38<br /> [9,] 1964 0.37<br />[10,] 1968 0.37<br />[11,] 1972 0.36<br />[12,] 1976 0.35<br />[13,] 1980 0.34<br />[14,] 1984 0.36<br />[15,] 1988 0.37<br />[16,] 1992 0.43<br />[17,] 1994 0.43<br />[18,] 1998 0.44<br />[19,] 2002 0.45<br />[20,] 2006 0.46<br />> <br />> <br />> # Which countries have won the most medals?<br />> sort(table(medals$NOC), dec = TRUE)<br /><br />NOR USA URS AUT GER FIN CAN SUI SWE GDR ITA FRA NED RUS FRG CHN JPN KOR TCH EUN <br />280 216 194 185 158 151 119 118 118 110 101 83 78 76 41 33 32 31 25 23 <br />GBR EUA CZE LIE POL CRO AUS BLR BUL EST HUN BEL KAZ UKR SLO YUG ESP LUX PRK DEN <br /> 21 19 10 9 8 7 6 6 6 6 6 5 5 5 4 4 2 2 2 1 <br />LAT NZL ROU SVK UZB <br /> 1 1 1 1 1 <br />> <br />> <br />> # Of the countries that have won more than 50 medals,<br />> # which have the highest percentage of gold medals?<br />> NOC50Plus <- names(table(medals$NOC)[table(medals$NOC) > 50])<br />> medalsSubset <- medals[medals$NOC %in% NOC50Plus, ]<br />> medalsByMedalByNOC <- prop.table(table(medalsSubset$NOC, medalsSubset$Medal), margin = 1)<br />> medalsByMedalByNOC <- medalsByMedalByNOC[order(medalsByMedalByNOC[, "Gold"], <br />+ decreasing = TRUE), c("Gold", "Silver", "Bronze")]<br />> round(medalsByMedalByNOC, 2)<br /> <br /> Gold Silver Bronze<br /> RUS 0.43 0.32 0.25<br /> URS 0.40 0.29 0.30<br /> GER 0.37 0.37 0.26<br /> SWE 0.36 0.26 0.37<br /> USA 0.36 0.37 0.27<br /> ITA 0.36 0.31 0.34<br /> GDR 0.35 0.33 0.32<br /> NOR 0.35 0.35 0.30<br /> SUI 0.32 0.31 0.36<br /> NED 0.32 0.38 0.29<br /> CAN 0.32 0.32 0.36<br /> FRA 0.30 0.29 0.41<br /> AUT 0.28 0.35 0.38<br /> FIN 0.27 0.38 0.34<br />> <br />> <br />> # How many different countries have won medals by year?<br />> listOfYears <- unique(medals$Year)<br />> names(listOfYears) <- unique(medals$Year)<br />> totalNocByYear <- sapply(listOfYears, function(X) <br />+ length(table(medals[medals$Year == X, "NOC"])))<br />> <br />> # Table<br />> totalNocByYear <br />1924 1928 1932 1936 1948 1952 1956 1960 1964 1968 1972 1976 1980 1984 1988 1992 <br /> 10 12 10 11 13 14 13 14 14 15 17 16 19 17 17 20 <br />1994 1998 2002 2006 <br /> 22 24 24 26 <br />> <br />> # Plot<br />> if (savePlot == TRUE) png("fig3.png")<br />> plot(x= names(totalNocByYear), totalNocByYear, <br />+ ylim = c(0, max(totalNocByYear)),<br />+ xlab = "Year",<br />+ ylab = "Total Number of Countries",<br />+ bty = "l", <br />+ main = "Total Number of Countriesn Winning Medals By Year")<br />> if (savePlot == TRUE) dev.off()<br />windows <br /> 2 <br />> <br />> # Which Countries have won a medal at every Olympics? <br />> propYearsOnePlusMedals <- apply(table(medals$NOC, medals$Year) > 0, 1, mean)<br />> <br />> #Answer<br />> names(propYearsOnePlusMedals[propYearsOnePlusMedals == 1.0]) <br />[1] "AUT" "CAN" "FIN" "NOR" "SWE" "USA"<br />> <br />> # Table Sorted by Proportion of Olympics with a Medal<br />> cbind(sort(propYearsOnePlusMedals, decreasing = TRUE)) <br /> [,1]<br />AUT 1.00<br />CAN 1.00<br />FIN 1.00<br />NOR 1.00<br />SWE 1.00<br />USA 1.00<br />FRA 0.95<br />SUI 0.95<br />ITA 0.80<br />GBR 0.65<br />NED 0.65<br />TCH 0.55<br />JPN 0.50<br />GER 0.45<br />URS 0.45<br />FRG 0.35<br />GDR 0.30<br />HUN 0.30<br />CHN 0.25<br />KOR 0.25<br />POL 0.25<br />AUS 0.20<br />BEL 0.20<br />BLR 0.20<br />BUL 0.20<br />LIE 0.20<br />RUS 0.20<br />CZE 0.15<br />EUA 0.15<br />UKR 0.15<br />CRO 0.10<br />ESP 0.10<br />EST 0.10<br />KAZ 0.10<br />PRK 0.10<br />SLO 0.10<br />YUG 0.10<br />DEN 0.05<br />EUN 0.05<br />LAT 0.05<br />LUX 0.05<br />NZL 0.05<br />ROU 0.05<br />SVK 0.05<br />UZB 0.05<br /><br />
THE PLOTS
THE R SOURCE CODE
# tips on reading a Google Spreadsheet:<br /># http://blog.revolution-computing.com/2009/09/how-to-use-a-google-spreadsheet-as-data-in-r.html<br /># Data taken from:"https://spreadsheets.google.com/ccc?key=0AgdO92JOXxAOdDVlaUpkNlB2WERtV3l1ZVFYbzllQWc"<br /># http://www.guardian.co.uk/news/datablog/2010/feb/11/winter-olympics-medals-by-country<br /><br />googleLink <- "http://spreadsheets.google.com/pub?key=tsddww6vOYePkhPSxRpDeYw&single=true&gid=1&output=csv"<br />medals <- read.csv(googleLink, stringsAsFactors = FALSE)<br />savePlot <- TRUE # optional variable used to save or not save plots in code<br /><br /># remove rows that do not contain data<br />medals$Year <- as.numeric(medals$Year)<br />medals <- medals[!is.na(medals$Year), ]<br /><br /><br /># Quick look at data<br />head(medals)<br />sapply(medals, function(x) cbind(sort(table(x), decreasing = TRUE)))<br /><br /><br /># How many medals have been awarded in each Olympics?<br />medalsByYear <- aggregate(medals$Year, list(Year = medals$Year), length)<br />if (savePlot == TRUE) png("fig1.png")<br />plot(x ~ Year, medalsByYear, ylim = c(0,max(x)), <br /> ylab = "Total Medals Awarded", bty="l",<br /> main = "Total Medals Awarded in Winter Olympics by Year")<br />if (savePlot == TRUE) dev.off()<br /><br /># How has the amount of medals awarded to males and females changed over the years?<br /># Get data.<br />medalsByYearByGender <- aggregate(medals$Year, <br /> list(Year = medals$Year, Event.gender = medals$Event.gender), length)<br />medalsByYearByGender <- medalsByYearByGender[medalsByYearByGender$Event.gender != "X", ]<br /><br /># Plot results.<br />if (savePlot == TRUE) png("fig2.png")<br />plot(x ~ Year, medalsByYearByGender[medalsByYearByGender$Event.gender == "M", ], <br /> ylim = c(0,max(x)), pch = "m", col = "blue", <br /> ylab = "Total Medals Awarded", bty="l",<br /> main = "Total Medals Awarded in Winter Olympicsn by Gender and by Year")<br />points(medalsByYearByGender[medalsByYearByGender$Event.gender == "W", "Year"],<br /> medalsByYearByGender[medalsByYearByGender$Event.gender == "W", "x"],<br /> col = "red", pch = "f")<br />if (savePlot == TRUE) dev.off()<br /><br /># Table of proportion female<br />propFemalePerYear <- medalsByYearByGender[medalsByYearByGender$Event.gender == "W", "x"] / (<br /> medalsByYearByGender[medalsByYearByGender$Event.gender == "W", "x"] +<br /> medalsByYearByGender[medalsByYearByGender$Event.gender == "M", "x"])<br />propFemalePerYear <- round(propFemalePerYear, 2)<br />cbind(Year = medalsByYearByGender[medalsByYearByGender$Event.gender == "W", "Year"],<br /> PropFemale = propFemalePerYear)<br /> <br /><br /># Which countries have won the most medals?<br />sort(table(medals$NOC), dec = TRUE)<br /><br /><br /># Of the countries that have won more than 50 medals,<br /># which have the highest percentage of gold medals?<br />NOC50Plus <- names(table(medals$NOC)[table(medals$NOC) > 50])<br />medalsSubset <- medals[medals$NOC %in% NOC50Plus, ]<br />medalsByMedalByNOC <- prop.table(table(medalsSubset$NOC, medalsSubset$Medal), margin = 1)<br />medalsByMedalByNOC <- medalsByMedalByNOC[order(medalsByMedalByNOC[, "Gold"], <br /> decreasing = TRUE), c("Gold", "Silver", "Bronze")]<br />round(medalsByMedalByNOC, 2)<br /><br /><br /># How many different countries have won medals by year?<br />listOfYears <- unique(medals$Year)<br />names(listOfYears) <- unique(medals$Year)<br />totalNocByYear <- sapply(listOfYears, function(X) <br /> length(table(medals[medals$Year == X, "NOC"])))<br /><br /># Table<br />totalNocByYear <br /><br /># Plot<br />if (savePlot == TRUE) png("fig3.png")<br />plot(x= names(totalNocByYear), totalNocByYear, <br /> ylim = c(0, max(totalNocByYear)),<br /> xlab = "Year",<br /> ylab = "Total Number of Countries",<br /> bty = "l", <br /> main = "Total Number of Countriesn Winning Medals By Year")<br />if (savePlot == TRUE) dev.off()<br /><br /># Which Countries have won a medal at every Olympics? <br />propYearsOnePlusMedals <- apply(table(medals$NOC, medals$Year) > 0, 1, mean)<br /><br />#Answer<br />names(propYearsOnePlusMedals[propYearsOnePlusMedals == 1.0]) <br /><br /># Table Sorted by Proportion of Olympics with a Medal<br />cbind(sort(propYearsOnePlusMedals, decreasing = TRUE)) <br />
To leave a comment for the author, please follow the link and comment on their blog: Jeromy Anglim's Blog: Psychology and Statistics.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.