Wednesday, February 29, 2012

Update on Inquirer Data

Well, I just got word that the Inquirer has decided to make their dataset on homicides in Philadelphia publicly available. Apparently they haven't settled on a general data policy, but this one is now accessible. You can find data on every reported homicide in Philadelphia between Jan 1, 1988 and December 31, 2011 here:

Monday, February 20, 2012

Inquirer, Inquirer, let down your data!

So, I discovered last night that the Philadelphia Inquirer has put together a Google Fusion table containing a record for every homicide in Philadelphia county since 1988. I've used homicide data compiled by the Inquirer before to estimate the risk of homicide that normal Philadelphia residents have compared to UPenn affiliates. With 23 years of data, the possibilities to find all sorts of patterns are enormous. Homicide rate could be compared to economic indices, public policies, or climate even, and we could get some reliable results with a time depth like this!

But, the ability to export the data was turned off by the owner of the fusion table, by accident I assumed. I wrote to them about it, and apparently it is the Inquirer's policy to not let anyone access the data! They're concerned that someone might alter the data, and attribute it back the Inquirer. Here's the message I sent them when I heard about this.
I am a student at Penn, and that's why I'm interested in data generally. But I have no specific interest in the data related to my academic pursuits. I'm merely a concerned and interested Philadelphian who also has some quantitative know how.

I appreciate the sensitivity of the subject. In my own research, we spend a lot of time anonymizing interviews, and of course, it was a big issue with some of the Wikileaks data distributed by the NYT that it wasn't anonymized enough. However, is there precedent for altered data being hung around the neck of the original compiler? If there were an example case or two, your unease would make more sense to me. As it is though, since you are already maintaining the original data in a (relatively) publicly accessible way, it would be trivial for you, or anyone else, to demonstrate alteration or falsification of data attributed to the Inquirer.

The fact that you're already only distributing something which is publicly available from the PPD makes allowing public access to your compiled version even less risky. There are then two sources to turn to to verify the accuracy of data that someone attributes to the Inquirer.

My interest in this data spawns mostly from the fact that I'm a concerned Philadelphian with the necessary skills to analyze a data set like this. It looks like the Inquirer has done a great public service by compiling this data into a useful format from the various PPD reports. But it has only done so by a half measure so far, because the data is of no use when we can only look at the tables with our eyes. I'm also strongly influenced by the open data movement from within the research world. The best way to assert your confidence in your own research and analyses is to make the data openly available for anyone to recreate your results. Researchers who keep their data private are more and more looked upon with suspicion, and rightly so. The same goes for data journalism.

Moreover, there is a huge opportunity here for the Inquirer too. I am not the only person in Philadelphia who cares about data like this and knows how to analyze it. You have a forum to curate and display analyses and mashups contributed by your readers. The Guardian does something like this with their Data Blog, but frankly, the data sets they distribute are thin and uninteresting compared to what you could make available.

I hope you reconsider your data policy.
I'm frankly not too hopeful of a change of heart regarding making the data available. There's sure to be a lot more cases like this, of news organizations jumping onto the data journalism train, without really getting how it's supposed to work.

Disqus for Val Systems