Proficiency levels @ PISA and visualisation challenge @ useR!2014

June 13, 2014

(This article was first published on SmarterPoland » PISA in english, and kindly contributed to R-bloggers)

16 days to go for submissions in the DataVis contest at useR!2014 (see contest webpage).
The contest is focused on PISA data and students’ skills. The main variables that reflect pupil skills in math / reading / science are plausible values e.g. columns PV1MATH, PV1READ, PV1SCIE in the dataset.
But, these values are normalized to have mean 500 and sd 100. And it is not that easy to understand what the skill level 600 means and is 12 points in average a big difference. To overcome this PISA has introduced seven proficiency levels (from 0 to 6, see that base on plausible values with cutoffs 358, 420, 482, 545, 607, 669.
It is assumed that, for example, at level 6 ,,students can conceptualize, generalize, and utilize information based on their investigations and modeling of complex problem situations, and can use their knowledge in relatively non-standard contexts”.

So, instead of looking at means we can now take a look at fractions of students at given proficiency level. To have some fun we use sp and rworldmap and RColorBrewer packages to have country shapes instead of bars and dots that are supposed to represent pupils that take part in the study. The down side is that area does not correspond to height so it might be confusing. We add horizontal lines to expose the height.

And here is the R code

library(RColorBrewer) <- map_data(map = "world")
cols <- brewer.pal(n=7, "PiYG")
# read students data from PISA 2012
# directly from URL
con <- url("")
prof.scores <- c(0, 358, 420, 482, 545, 607, 669, 1000)
prof.levels <- cut(student2012$PV1MATH, prof.scores, paste("level", 1:7))
plotCountry <- function(cntname = "Poland", cntname2 = cntname) {
  props <- prop.table(tapply(student2012$W_FSTUWT[student2012$CNT == cntname],
         prof.levels[student2012$CNT == cntname], 
  cntlevels <- rep(1:7, times=round(props*5000))
  cntcontour <-[$region == cntname2,]
  cntcontour <- cntcontour[cntcontour$group == names(which.max(table(cntcontour$group))), ]
  wspx <- range(cntcontour[,1])
  wspy <- range(cntcontour[,2])
  N <- length(cntlevels)
  px <- runif(N) * diff(wspx) + wspx[1]
  py <- sort(runif(N) * diff(wspy) + wspy[1])
  sel <- which(, py, cntcontour[,1], cntcontour[,2], mode.checked=FALSE) == 1)
  df <- data.frame(long = px[sel], lat = py[sel], level=cntlevels[sel])  
  par(pty="s", mar=c(0,0,4,0))
  plot(df$long, df$lat, col=cols[df$level], pch=19, cex=3,
       bty="n", xaxt="n", yaxt="n", xlab="", ylab="")
# PISA and World maps are using differnt country names,
# thus in some cases we need to give two names
plotCountry(cntname = "Korea", cntname2 = "South Korea")
plotCountry(cntname = "Japan", cntname2 = "Japan")
plotCountry(cntname = "Finland")
plotCountry(cntname = "Poland")
plotCountry(cntname = "France", cntname2 = "France")
plotCountry(cntname = "Italy", cntname2 = "Italy")
plotCountry(cntname = "United States of America", cntname2 = "USA")

To leave a comment for the author, please follow the link and comment on their blog: SmarterPoland » PISA in english. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)