Philadelphia Schools

[This article was first published on Val Systems, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I’m on spring break, and yesterday I took some time to check off some items on my to-do list, namely:
  1. Start getting acquainted with all the new features of ggplot2 [PDF].
  2. Get a handle on dealing with geographic data in R.
I’ve done some furtive geographic analysis using R [pdf], but the code behind it was very hacky. There is a whole field of geospatial data analysis out there that I am really ignorant of, and still am, but I’ve made a little bit of progress.

I mostly followed the tutorial laid out here for making maps in ggplot2. The most difficult part was getting the rgdal package installed. It’s one of these packages that relies on other,  non-R libraries being installed. I managed to get GDAL and Proj.4 installed (even though I honestly don’t know what they do,), and got rgdal installed (I had to work around an apparently non-standard installation location for Proj.4).

Now, it’s all about getting some good data, and fortunately, I stumbled across yesterday as well! I found a shapefile of all schools in Philadelphia, and a separate data set about how many public and charter high school graduates in 2010 went on to postsecondary education of various sorts. Unfortunately, there weren’t any shared IDs of any sort between the two data sets, so to join them I had to hack it by hand, mostly.

So, here is the result.
I’m not sure what I expected to see, which certainly weakens any conclusions I’d like to draw, but I am surprised at how little geographic patterning there is. I’m also almost certain that there are some data reporting problems. For example, that huge dark blue dot in the Northeast is Northeast High School, which reports that of their 652 graduates, 0 went on to any postsecondary education. I just don’t think that can be true, and not because I’m an idealist. Northeast is right down the street from where I grew up, and while its not a fancy prep school by any means, it has both a Magnet program, and an International Baccalaureatte program.

There’s no way that zero students from Northeast went on to postsecondary education, a category which includes non-degree granting programs and specialized training programs. It’s a lot more likely that they either didn’t report the numbers, or the Pennsylvania Department of Education lost them, and then didn’t distinguish between missing data and 0. Unfortunately, that calls all schools with reports of 0% postsecondary education into question, even though some schools probably did have 0 students go on to further education.

Looking at the distribution of the proportion of graduates going on to postsecondary education, the numbers are hugely bimodal (at least for the public schools).

Even after excluding the schools which reported 0 students going on to postsecondary education, there are still 3 schools with basically 0 students getting further education out of high school: Frankford (1/341),  West Philly (1/208) and University City (2/205).

Excluding the schools which reported less than 1% of students going on the further education (assuming either that they have faulty data, or have acute problems of other sorts), I replotted the map (note that the colors now run from 50% to 100%).

Still no huge geographic patterns.

Here’s the R code that I used (including links to the data).

To leave a comment for the author, please follow the link and comment on their blog: Val Systems. offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)