userR2013 data analysis contest: data exploration
[This article was first published on Fellgernon Bit - rstats, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Description
The useR2013 conference is organizing a data analysis contest, check the rules here.
They have a package called useR2013DAC with two data sets: one from La Liga and the other one from the Formula 1. Once you download and install the package (available here), you can quickly explore the data using the following R commands:
Data exploration
<span class="c1">## Load the package</span>
library<span class="p">(</span>useR2013DAC<span class="p">)</span>
<span class="c1">## Explore laliga data</span>
data<span class="p">(</span>laliga<span class="p">)</span>
head<span class="p">(</span>laliga<span class="p">)</span>
<span class="c1">## Season Week HomeTeam AwayTeam</span>
<span class="c1">## 1 2008/09 1 Athletic Club Bilbao Union Deportiva Almeria</span>
<span class="c1">## 2 2008/09 1 Atlético Madrid Málaga CF</span>
<span class="c1">## 3 2008/09 1 Betis Sevilla Real Club Recreativo Huelva</span>
<span class="c1">## 4 2008/09 1 CA Osasuna Villarreal CF</span>
<span class="c1">## 5 2008/09 1 CD Numancia FC Barcelona</span>
<span class="c1">## 6 2008/09 1 Deportivo de La Coruña Real Madrid CF</span>
<span class="c1">## HomeGoals AwayGoals</span>
<span class="c1">## 1 1 3</span>
<span class="c1">## 2 4 0</span>
<span class="c1">## 3 0 1</span>
<span class="c1">## 4 1 1</span>
<span class="c1">## 5 1 0</span>
<span class="c1">## 6 2 1</span>
summary<span class="p">(</span>laliga<span class="p">)</span>
<span class="c1">## Season Week HomeTeam AwayTeam </span>
<span class="c1">## Length:1900 Min. : 1.0 Length:1900 Length:1900 </span>
<span class="c1">## Class :character 1st Qu.:10.0 Class :character Class :character </span>
<span class="c1">## Mode :character Median :19.5 Mode :character Mode :character </span>
<span class="c1">## Mean :19.5 </span>
<span class="c1">## 3rd Qu.:29.0 </span>
<span class="c1">## Max. :38.0 </span>
<span class="c1">## </span>
<span class="c1">## HomeGoals AwayGoals </span>
<span class="c1">## Min. :0.00 Min. :0.00 </span>
<span class="c1">## 1st Qu.:1.00 1st Qu.:0.00 </span>
<span class="c1">## Median :1.00 Median :1.00 </span>
<span class="c1">## Mean :1.65 Mean :1.14 </span>
<span class="c1">## 3rd Qu.:2.00 3rd Qu.:2.00 </span>
<span class="c1">## Max. :8.00 Max. :8.00 </span>
<span class="c1">## NA's :50 NA's :50</span>
lapply<span class="p">(</span>laliga<span class="p">,</span> class<span class="p">)</span>
<span class="c1">## $Season</span>
<span class="c1">## [1] "character"</span>
<span class="c1">## </span>
<span class="c1">## $Week</span>
<span class="c1">## [1] "integer"</span>
<span class="c1">## </span>
<span class="c1">## $HomeTeam</span>
<span class="c1">## [1] "character"</span>
<span class="c1">## </span>
<span class="c1">## $AwayTeam</span>
<span class="c1">## [1] "character"</span>
<span class="c1">## </span>
<span class="c1">## $HomeGoals</span>
<span class="c1">## [1] "integer"</span>
<span class="c1">## </span>
<span class="c1">## $AwayGoals</span>
<span class="c1">## [1] "integer"</span>
<span class="c1">## Explore formula1 data</span>
data<span class="p">(</span>formula1<span class="p">)</span>
head<span class="p">(</span>formula1<span class="p">)</span>
<span class="c1">## Pos No Driver Team Laps Time Grid Pts</span>
<span class="c1">## 1 1 8 Fernando Alonso Ferrari 49 1:39:20.396 3 25</span>
<span class="c1">## 2 2 7 Felipe Massa Ferrari 49 +16.0 secs 2 18</span>
<span class="c1">## 3 3 2 Lewis Hamilton McLaren-Mercedes 49 +23.1 secs 4 15</span>
<span class="c1">## 4 4 5 Sebastian Vettel RBR-Renault 49 +38.7 secs 1 12</span>
<span class="c1">## 5 5 4 Nico Rosberg Mercedes GP 49 +40.2 secs 5 10</span>
<span class="c1">## 6 6 3 Michael Schumacher Mercedes GP 49 +44.1 secs 7 8</span>
<span class="c1">## Race Season</span>
<span class="c1">## 1 2010 FORMULA 1 GULF AIR BAHRAIN GRAND PRIX 2010</span>
<span class="c1">## 2 2010 FORMULA 1 GULF AIR BAHRAIN GRAND PRIX 2010</span>
<span class="c1">## 3 2010 FORMULA 1 GULF AIR BAHRAIN GRAND PRIX 2010</span>
<span class="c1">## 4 2010 FORMULA 1 GULF AIR BAHRAIN GRAND PRIX 2010</span>
<span class="c1">## 5 2010 FORMULA 1 GULF AIR BAHRAIN GRAND PRIX 2010</span>
<span class="c1">## 6 2010 FORMULA 1 GULF AIR BAHRAIN GRAND PRIX 2010</span>
summary<span class="p">(</span>formula1<span class="p">)</span>
<span class="c1">## Pos No Driver </span>
<span class="c1">## Ret :254 1 : 58 Felipe Massa : 58 </span>
<span class="c1">## 1 : 58 10 : 58 Fernando Alonso : 58 </span>
<span class="c1">## 10 : 58 11 : 58 Heikki Kovalainen: 58 </span>
<span class="c1">## 11 : 58 12 : 58 Jenson Button : 58 </span>
<span class="c1">## 12 : 58 14 : 58 Kamui Kobayashi : 58 </span>
<span class="c1">## 13 : 58 15 : 58 Lewis Hamilton : 58 </span>
<span class="c1">## (Other):848 (Other):1044 (Other) :1044 </span>
<span class="c1">## Team Laps Time Grid </span>
<span class="c1">## Ferrari :116 55 :125 +1 Lap :268 1 : 58 </span>
<span class="c1">## Force India-Mercedes:116 56 :121 +2 Laps :102 10 : 58 </span>
<span class="c1">## HRT-Cosworth :116 53 : 92 Accident : 93 11 : 58 </span>
<span class="c1">## McLaren-Mercedes :116 57 : 80 +3 Laps : 41 12 : 58 </span>
<span class="c1">## STR-Ferrari :116 70 : 75 Hydraulics: 26 13 : 58 </span>
<span class="c1">## Lotus-Renault : 78 52 : 69 Gearbox : 24 14 : 58 </span>
<span class="c1">## (Other) :734 (Other):830 (Other) :838 (Other):1044 </span>
<span class="c1">## Pts Race Season </span>
<span class="c1">## :812 Length:1392 Min. :2010 </span>
<span class="c1">## 1 : 58 Class :character 1st Qu.:2010 </span>
<span class="c1">## 10 : 58 Mode :character Median :2011 </span>
<span class="c1">## 12 : 58 Mean :2011 </span>
<span class="c1">## 15 : 58 3rd Qu.:2012 </span>
<span class="c1">## 18 : 58 Max. :2012 </span>
<span class="c1">## (Other):290</span>
lapply<span class="p">(</span>formula1<span class="p">,</span> class<span class="p">)</span>
<span class="c1">## $Pos</span>
<span class="c1">## [1] "factor"</span>
<span class="c1">## </span>
<span class="c1">## $No</span>
<span class="c1">## [1] "factor"</span>
<span class="c1">## </span>
<span class="c1">## $Driver</span>
<span class="c1">## [1] "factor"</span>
<span class="c1">## </span>
<span class="c1">## $Team</span>
<span class="c1">## [1] "factor"</span>
<span class="c1">## </span>
<span class="c1">## $Laps</span>
<span class="c1">## [1] "factor"</span>
<span class="c1">## </span>
<span class="c1">## $Time</span>
<span class="c1">## [1] "factor"</span>
<span class="c1">## </span>
<span class="c1">## $Grid</span>
<span class="c1">## [1] "factor"</span>
<span class="c1">## </span>
<span class="c1">## $Pts</span>
<span class="c1">## [1] "factor"</span>
<span class="c1">## </span>
<span class="c1">## $Race</span>
<span class="c1">## [1] "character"</span>
<span class="c1">## </span>
<span class="c1">## $Season</span>
<span class="c1">## [1] "numeric"</span>
I don’t see a specific question that they want you to answer with this data, but if you find one related to data analysis or visualization then join the competition!
Note that you must be attending the conference in order to be eligible to compete.
Reproducibility
sessionInfo<span class="p">()</span>
<span class="c1">## R version 3.0.0 (2013-04-03)</span>
<span class="c1">## Platform: x86_64-apple-darwin10.8.0 (64-bit)</span>
<span class="c1">## </span>
<span class="c1">## locale:</span>
<span class="c1">## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8</span>
<span class="c1">## </span>
<span class="c1">## attached base packages:</span>
<span class="c1">## [1] stats graphics grDevices utils datasets methods base </span>
<span class="c1">## </span>
<span class="c1">## other attached packages:</span>
<span class="c1">## [1] useR2013DAC_0.1-1 knitr_1.2 </span>
<span class="c1">## </span>
<span class="c1">## loaded via a namespace (and not attached):</span>
<span class="c1">## [1] digest_0.6.3 evaluate_0.4.3 formatR_0.7 stringr_0.6.2 </span>
<span class="c1">## [5] tools_3.0.0</span>
To leave a comment for the author, please follow the link and comment on their blog: Fellgernon Bit - rstats.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.