Site icon R-bloggers

analyze the program for international student assessment (pisa) with r and monetdb

[This article was first published on asdfree by anthony damico, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
the authoritative source for evaluating educational achievement across nations, the program(me) for international student assessment ranks the math, science, and reading skills of 15-year-olds in more than sixty countries.  coordinated by the organisation for economic co-operation and development (oecd) and released every three years, this data set gives finland reason to gloat and anti-poverty advocates in the united states reason to fight.  participating countries must sample at least 5,000 teenagers, though some governments survey many more in order to provide education researchers with enough of a sample to perform within-country comparisons.  in the world of cross-border standardized testing, this is the big momma.

to understand what’s possible with pisa, either visit the international products page or – if you only care about one country – start on the participating economies page and click through to the country-specific website (so here’s america’s).

instead of processing the pisa microdata line-by-line, the r language stoically attempts to read everything into memory at once.  to avoid the unpleasantness of a seized-up computer, dr. lumley wrote the entire sqlsurvey package (to deal with this monster), and i tweaked, pruned, manicured that code to work on multiply-imputed big survey data.  if you’re already familiar with syntax used for the survey package, be patient and read my sqlsurvey examples carefully when something doesn’t behave as you expect it to – some sqlsurvey commands require a different structure (i.e. svyby gets called through svymean) and others might not exist anytime soon (like svyolr).  gimme some good news: sqlsurvey uses ultra-fast monetdb (click here for speed tests), so follow the monetdb installation instructions before running my code.  monetdb imports, writes, recodes data slowly, but reads it hyper-fast.  a magnificent trade-off: data exploration typically requires you to think, send an analysis command, think some more, send another query, repeat.  importation scripts (especially the ones i’ve already written for you) can be left running overnight sans hand-holding.

pisa is a pita to analyze, because it’s both multiply-imputed (like the survey of consumer finances) and big data (like the american community survey).  to help researchers deal with that complexity, the twentieth-century-dwelling statisticians at oecd wrote sas macros and spss functions as part of their analysis manual.  well guess what?  those languages are prohibitively expensive, so i’ve done gone and translated everything over to the r language, precisely reproducing their published results, then automating the download and importation into everybody’s favorite monetdb.  say buh-bye to buying proprietary statistical software.  this new github repository contains four scripts:


download import and design.R

analysis examples.R

variable recode example.R

replicate oecd publications.R



click here to view these four scripts



for more detail about the program for international student assessment (pisa), visit:

if you’re just looking for a couple data points, you ought to give the australian council for educational research’s interactive data selection tools a spin.  it’s a menu-drive table creator, so easy-to-use but inflexible.

you wouldn’t be analyzing the program for international student assessment right now without the work of not one but two dr. thomas lumleys.  (or, in latin, lumlii)  if you decide to hand-write a thank-you letter for all of their hard work using jefferson’s polygraph, you won’t even need to switch out the paper to fill in specific names.  just another example of the unparalleled efficiencies you’ll find when working in the r language with monetdb.

confidential to sas, spss, stata, and sudaan users: you are kissing the wrong frogs.  time to transition to r.  😀

To leave a comment for the author, please follow the link and comment on their blog: asdfree by anthony damico.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.