PISA 2015 – how to read/process/plot the data with R

December 7, 2016

(This article was first published on SmarterPoland.pl » English, and kindly contributed to R-bloggers)

Yesterday OECD has published results and data from PISA 2015 study (Programme for International Student Assessment). It’s a very cool study – over 500 000 pupils (15-years old) are examined every 3 years. Raw data is publicly available and one can easily access detailed information about pupil’s academic performance and detailed data from surveys for studetns, parents and school officials (~2 000 variables). Lots of stories to be found.

You can download the dataset in the SPSS format from this webpage. Then use the foreign package to read sav files and intsvy package to calculate aggregates/averages/tables/regression models (for 2015 data you shall use the GitHub version of the package).

Below you will find a short example, how to read the data, calculate weighted averages for genders/countries and plot these results with ggplot2. Here you will find other use cases for the intsvy package.



stud2015 <- read.spss("CY6_MS_CMB_STU_QQQ.sav", use.value.labels = TRUE, to.data.frame = TRUE)
genderMath <- pisa2015.mean.pv(pvlabel = "MATH", by = c("CNT", "ST004D01T"), data = stud2015)

genderMath <- genderMath[,c(1,2,4,5)]
genderMath %>%
  select(CNT, ST004D01T, Mean) %>%
  spread(ST004D01T, Mean) -> genderMathWide

genderMathSelected <-
  genderMathWide %>%
  filter(CNT %in% c("Austria", "Japan", "Switzerland",  "Poland", "Singapore", "Finland", "Singapore", "Korea", "United States"))

pl <- ggplot(genderMathWide, aes(Female, Male)) +
  geom_point() +
  geom_point(data=genderMathSelected, color="red") +
  geom_text(data=genderMathSelected, aes(label=CNT), color="grey20") +
  geom_abline(slope=1, intercept = 0) + 
  geom_abline(slope=1, intercept = 20, linetype = 2, color="grey") + 
  geom_abline(slope=1, intercept = -20, linetype = 2, color="grey") +
  geom_text(x=425, y=460, label="Boys +20 points", angle=45, color="grey", size=8) + 
  geom_text(x=460, y=425, label="Girls +20 points", angle=45, color="grey", size=8) + 
  coord_fixed(xlim = c(400,565), ylim = c(400,565)) +
  theme_bw() + ggtitle("PISA 2015 in Math - Gender Gap") +
  xlab("PISA 2015 Math score for girls") +
  ylab("PISA 2015 Math score for boys")

To leave a comment for the author, please follow the link and comment on their blog: SmarterPoland.pl » English.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.


Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)