Gender gap and visualisation challenge @ useR!2014

[This article was first published on SmarterPoland » PISA in english, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

7 days to go for submissions in the DataVis contest at useR!2014 (see contest webpage).
Note that the contest is open for all R users, not only conference participants.
Submit your solution soon!

PISA dataset allows to challenge some ,,common opinions”, like are boys or girls better in math / reading. But, how to compare differences between genders? Averages are not fancy enough ;-) Let’s use weighted quantiles calculated with function wtd.quantile() from the Hmisc package.

Below we present quantile-quantile plots for results in math and reading. Each point represents a single centile. X coordinate stands for centiles of male scores while Y coordinate correspond to female scores respectively. Black line is the diagonal, points on this line correspond to equal centiles in both distributions.
Points below black line correspond to centile for which boys are doing better than girls. Points above black line correspond to centiles for which girls are doing better. For reading the story is simple, girls are doing better for all centiles, whole distribution is shifted. But for math the story is different for different countries.

[UK – distribution of math scores for boys is shifted in comparison to females]

[Finland – distribution of math scores for boys is wider, weak boys are weaker than girls but strong one are stronger than corresponding girl’s centile]

[USA – somewhere between Finland and UK]

What is the story for your country?

And the R code.

library(Hmisc)
library(ggplot2)
library(reshape2)
 
# read students data from PISA 2012
# directly from URL
con <- url("http://beta.icm.edu.pl/PISAcontest/data/student2012.rda")
load(con)
 
# plot quantiles for a country
plotCountry <- function(cnt) {
  cutoffs <- seq(0,1,0.01)
  selected <- student2012[student2012$CNT == cnt, ]
  getQuants <- function(group) {
    selectedG <- selected[group, ]
    wtd.quantile(selectedG$PV1MATH, weights=selectedG$W_FSTUWT, probs=cutoffs)
  }
  ecdf1 <- getQuants(selected$ST04Q01 == "Male")
  ecdf2 <- getQuants(selected$ST04Q01 == "Female")
  df1 <- data.frame(cutoffs, Male = ecdf1, Female = ecdf2, subject="MATH")
  getQuants <- function(group) {
    selectedG <- selected[group, ]
    wtd.quantile(selectedG$PV1READ, weights=selectedG$W_FSTUWT, probs=cutoffs)
  }
  ecdf1 <- getQuants(selected$ST04Q01 == "Male")
  ecdf2 <- getQuants(selected$ST04Q01 == "Female")
  df2 <- data.frame(cutoffs, Male = ecdf1, Female = ecdf2, subject="READ")
  df <- rbind(df1, df2)
  ggplot(df, aes(x = Male, y = Female, col = subject, shape = subject)) + 
    geom_point() +
    xlim(350,650) + ylim(350,650) + 
    geom_abline(intercept=0, slope=1) + ggtitle(cnt)
}  
 
# plot results for selected countries  
for (cnt in c("Canada", "United Kingdom","United States of America", "Finland"))
  ggsave(plotCountry(cnt), filename=paste0("Quant", cnt, ".png"), width=600/72, height=600/72)

To leave a comment for the author, please follow the link and comment on their blog: SmarterPoland » PISA in english.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)