Wednesday, March 7, 2012

Philadelphia Schools

I'm on spring break, and yesterday I took some time to check off some items on my to-do list, namely:
  1. Start getting acquainted with all the new features of ggplot2 [PDF].
  2. Get a handle on dealing with geographic data in R.
I've done some furtive geographic analysis using R [pdf], but the code behind it was very hacky. There is a whole field of geospatial data analysis out there that I am really ignorant of, and still am, but I've made a little bit of progress.

I mostly followed the tutorial laid out here for making maps in ggplot2. The most difficult part was getting the rgdal package installed. It's one of these packages that relies on other,  non-R libraries being installed. I managed to get GDAL and Proj.4 installed (even though I honestly don't know what they do,), and got rgdal installed (I had to work around an apparently non-standard installation location for Proj.4).

Now, it's all about getting some good data, and fortunately, I stumbled across yesterday as well! I found a shapefile of all schools in Philadelphia, and a separate data set about how many public and charter high school graduates in 2010 went on to postsecondary education of various sorts. Unfortunately, there weren't any shared IDs of any sort between the two data sets, so to join them I had to hack it by hand, mostly.

So, here is the result.
I'm not sure what I expected to see, which certainly weakens any conclusions I'd like to draw, but I am surprised at how little geographic patterning there is. I'm also almost certain that there are some data reporting problems. For example, that huge dark blue dot in the Northeast is Northeast High School, which reports that of their 652 graduates, 0 went on to any postsecondary education. I just don't think that can be true, and not because I'm an idealist. Northeast is right down the street from where I grew up, and while its not a fancy prep school by any means, it has both a Magnet program, and an International Baccalaureatte program.

There's no way that zero students from Northeast went on to postsecondary education, a category which includes non-degree granting programs and specialized training programs. It's a lot more likely that they either didn't report the numbers, or the Pennsylvania Department of Education lost them, and then didn't distinguish between missing data and 0. Unfortunately, that calls all schools with reports of 0% postsecondary education into question, even though some schools probably did have 0 students go on to further education.

Looking at the distribution of the proportion of graduates going on to postsecondary education, the numbers are hugely bimodal (at least for the public schools).

Even after excluding the schools which reported 0 students going on to postsecondary education, there are still 3 schools with basically 0 students getting further education out of high school: Frankford (1/341),  West Philly (1/208) and University City (2/205).

Excluding the schools which reported less than 1% of students going on the further education (assuming either that they have faulty data, or have acute problems of other sorts), I replotted the map (note that the colors now run from 50% to 100%).

Still no huge geographic patterns.

Here's the R code that I used (including links to the data).


  1. How did you do the work around to get rgdal installed?  I am having the same issue and cant figure it out.  Gdal and Proj.4 are installed properly. Thanks

  2. Do you get the following error?

    Error: libproj.a not found.If the PROJ.4 library is installed in a non-standard location,use --configure-args='--with-proj-lib=/opt/local/lib' for example,replacing /opt/local/* with appropriate values for your installation.If PROJ.4 is not installed, install it.

    As it turns out, I thought I solved it, but I didn't. For some reason, I only successfully installed rgdal for the 64 bit R, and I cannot figure out how to install it for the 32 bit. I'm on a Mac, and if you are too, launching and doing the regular install froms source works. I tried my damnedest to install it for 32 bit R by specifying --configure-args='--with-proj-lib=/opt/local/lib', but it would just not work.


Disqus for Val Systems