It’s been a busy week with the Institutional Web Management Workshop last Monday to Wednesday in Sheffield and WordCamp UK happening in Manchester this weekend but on Friday I took a day off to pop down to a hack day in Liverpool.
The event was hosted by LJMU’s Open Labs at the Art and Design Academy in partnership with ScraperWiki and Trinity Mirror Merseyside (think Liverpool Daily Post and Echo, Ormskirk Advertiser, Southport Visiter etc etc!). The idea was that hacks (journalists) meet hackers (coders, not to be confused with crackers who break into systems!) for a day working on datasets to produce something at the end of the day.
The basic format was splitting into teams comprising a few hacks and a few hackers with an interest in a particular subject, being put into a “booth” for 6 hours and seeing what happened. The group I was in was focused around Liverpool datasets – think Doctors surgeries, educational statistics etc.
I’ve come across ScraperWiki before at Liver and Mash and while there was no requirement to use their system, since it’s recently added support for PHP alongside Python I thought I’d give it a try. We found some data to scrape on the NHS website and set about building a scraper.
The chaps at ScraperWiki would be the first to admit that their support for PHP is still very much beta and so it was a little harder than I expected. Eventually I got it scraping a set of data and used Yahoo Pipes to add location data to allow it to be mapped. Here’s what it looks like on Google Earth alongside school and transport datasets:
Okay, so not terribly exciting but it was useful to have a go at ScraperWiki and get an idea of some of the things that can be done with it. You can find my scraper on the ScraperWiki site; the Pipe is also available.
I think it was also very interesting to get journalists to meet coders. A few weeks ago I heard someone (possibly Alison Gow) say recently that you can’t get a job for the Guardian without talking about data and it’s becoming an increasingly important part of journalism. No longer is it enough to simply report the news or spout opinion – being open about where your data comes from can be just as important. So it was really good that Trinity Mirror are taking this so seriously.
Someone in my team asked when I raised the idea of using DBpedia (and hence Wikipedia) data how reliable it was and could it be trusted. My response was to point out that most Wikipedia articles cite their sources and asked how many news stories do the same!
I’m getting off topic now so I’ll leave it there! ScraperWiki are running a series of Hack Days across the UK (and beyond) so if you’re interested, make sure you sign up!