I’m on spring break, and yesterday I took some time to check off some items on my to-do list, namely:
- Start getting acquainted with all the new features of ggplot2 [PDF].
- Get a handle on dealing with geographic data in R.
I’ve done some furtive geographic analysis using R [pdf], but the code behind it was very hacky. There is a whole field of geospatial data analysis out there that I am really ignorant of, and still am, but I’ve made a little bit of progress.
I mostly followed the tutorial laid out here for making maps in ggplot2. The most difficult part was getting the rgdal package installed. It’s one of these packages that relies on other, non-R libraries being installed. I managed to get GDAL and Proj.4 installed (even though I honestly don’t know what they do,), and got rgdal installed (I had to work around an apparently non-standard installation location for Proj.4).
Now, it’s all about getting some good data, and fortunately, I stumbled across opendataphilly.org yesterday as well! I found a shapefile of all schools in Philadelphia, and a separate data set about how many public and charter high school graduates in 2010 went on to postsecondary education of various sorts. Unfortunately, there weren’t any shared IDs of any sort between the two data sets, so to join them I had to hack it by hand, mostly.
So, here is the result.
There’s no way that zero students from Northeast went on to postsecondary education, a category which includes non-degree granting programs and specialized training programs. It’s a lot more likely that they either didn’t report the numbers, or the Pennsylvania Department of Education lost them, and then didn’t distinguish between missing data and 0. Unfortunately, that calls all schools with reports of 0% postsecondary education into question, even though some schools probably did have 0 students go on to further education.
Looking at the distribution of the proportion of graduates going on to postsecondary education, the numbers are hugely bimodal (at least for the public schools).
Even after excluding the schools which reported 0 students going on to postsecondary education, there are still 3 schools with basically 0 students getting further education out of high school: Frankford (1/341), West Philly (1/208) and University City (2/205).
Excluding the schools which reported less than 1% of students going on the further education (assuming either that they have faulty data, or have acute problems of other sorts), I replotted the map (note that the colors now run from 50% to 100%).
Still no huge geographic patterns.
Here’s the R code that I used (including links to the data).