I’m on spring break, and yesterday I took some time to check off some items on my to-do list, namely:
- Start getting acquainted with all the new features of ggplot2 [PDF].
- Get a handle on dealing with geographic data in R.
I’m not sure what I expected to see, which certainly weakens any conclusions I’d like to draw, but I am surprised at how little geographic patterning there is. I’m also almost certain that there are some data reporting problems. For example, that huge dark blue dot in the Northeast is Northeast High School, which reports that of their 652 graduates, 0 went on to any postsecondary education. I just don’t think that can be true, and not because I’m an idealist. Northeast is right down the street from where I grew up, and while its not a fancy prep school by any means, it has both a Magnet program, and an International Baccalaureatte program.
There’s no way that zero students from Northeast went on to postsecondary education, a category which includes non-degree granting programs and specialized training programs. It’s a lot more likely that they either didn’t report the numbers, or the Pennsylvania Department of Education lost them, and then didn’t distinguish between missing data and 0. Unfortunately, that calls all schools with reports of 0% postsecondary education into question, even though some schools probably did have 0 students go on to further education.
Looking at the distribution of the proportion of graduates going on to postsecondary education, the numbers are hugely bimodal (at least for the public schools).
Even after excluding the schools which reported 0 students going on to postsecondary education, there are still 3 schools with basically 0 students getting further education out of high school: Frankford (1/341), West Philly (1/208) and University City (2/205).
Excluding the schools which reported less than 1% of students going on the further education (assuming either that they have faulty data, or have acute problems of other sorts), I replotted the map (note that the colors now run from 50% to 100%).
Still no huge geographic patterns.
Here’s the R code that I used (including links to the data).