**SmarterPoland » PISA in english**, and kindly contributed to R-bloggers)

16 days to go for submissions in the DataVis contest at useR!2014 (see contest webpage).

The contest is focused on PISA data and students’ skills. The main variables that reflect pupil skills in math / reading / science are plausible values e.g. columns PV1MATH, PV1READ, PV1SCIE in the dataset.

But, these values are normalized to have mean 500 and sd 100. And it is not that easy to understand what the skill level 600 means and is 12 points in average a big difference. To overcome this PISA has introduced seven proficiency levels (from 0 to 6, see http://nces.ed.gov/pubs2014/2014024_tables.pdf) that base on plausible values with cutoffs 358, 420, 482, 545, 607, 669.

It is assumed that, for example, at level 6 ,,students can conceptualize, generalize, and utilize information based on their investigations and modeling of complex problem situations, and can use their knowledge in relatively non-standard contexts”.

So, instead of looking at means we can now take a look at fractions of students at given proficiency level. To have some fun we use **sp** and **rworldmap** and **RColorBrewer** packages to have country shapes instead of bars and dots that are supposed to represent pupils that take part in the study. The down side is that area does not correspond to height so it might be confusing. We add horizontal lines to expose the height.

And here is the R code

library(ggplot2) library(reshape2) library(rworldmap) library(RColorBrewer) map.world <- map_data(map = "world") cols <- brewer.pal(n=7, "PiYG") # read students data from PISA 2012 # directly from URL con <- url("http://beta.icm.edu.pl/PISAcontest/data/student2012.rda") load(con) prof.scores <- c(0, 358, 420, 482, 545, 607, 669, 1000) prof.levels <- cut(student2012$PV1MATH, prof.scores, paste("level", 1:7)) plotCountry <- function(cntname = "Poland", cntname2 = cntname) { props <- prop.table(tapply(student2012$W_FSTUWT[student2012$CNT == cntname], prof.levels[student2012$CNT == cntname], sum)) cntlevels <- rep(1:7, times=round(props*5000)) cntcontour <- map.world[map.world$region == cntname2,] cntcontour <- cntcontour[cntcontour$group == names(which.max(table(cntcontour$group))), ] wspx <- range(cntcontour[,1]) wspy <- range(cntcontour[,2]) N <- length(cntlevels) px <- runif(N) * diff(wspx) + wspx[1] py <- sort(runif(N) * diff(wspy) + wspy[1]) sel <- which(point.in.polygon(px, py, cntcontour[,1], cntcontour[,2], mode.checked=FALSE) == 1) df <- data.frame(long = px[sel], lat = py[sel], level=cntlevels[sel]) par(pty="s", mar=c(0,0,4,0)) plot(df$long, df$lat, col=cols[df$level], pch=19, cex=3, bty="n", xaxt="n", yaxt="n", xlab="", ylab="") } par(mfrow=c(1,7)) # # PISA and World maps are using differnt country names, # thus in some cases we need to give two names plotCountry(cntname = "Korea", cntname2 = "South Korea") plotCountry(cntname = "Japan", cntname2 = "Japan") plotCountry(cntname = "Finland") plotCountry(cntname = "Poland") plotCountry(cntname = "France", cntname2 = "France") plotCountry(cntname = "Italy", cntname2 = "Italy") plotCountry(cntname = "United States of America", cntname2 = "USA") |

**leave a comment**for the author, please follow the link and comment on their blog:

**SmarterPoland » PISA in english**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...