Tag Archives: scraperwiki

Hacks meet Hackers

Francis Irving

It’s been a busy week with the Institutional Web Management Workshop last Monday to Wednesday in Sheffield and WordCamp UK happening in Manchester this weekend but on Friday I took a day off to pop down to a hack day in Liverpool.

The event was hosted by LJMU’s Open Labs at the Art and Design Academy in partnership with ScraperWiki and Trinity Mirror Merseyside (think Liverpool Daily Post and Echo, Ormskirk Advertiser, Southport Visiter etc etc!). The idea was that hacks (journalists) meet hackers (coders, not to be confused with crackers who break into systems!) for a day working on datasets to produce something at the end of the day.

The basic format was splitting into teams comprising a few hacks and a few hackers with an interest in a particular subject, being put into a “booth” for 6 hours and seeing what happened. The group I was in was focused around Liverpool datasets – think Doctors surgeries, educational statistics etc.

I’ve come across ScraperWiki before at Liver and Mash and while there was no requirement to use their system, since it’s recently added support for PHP alongside Python I thought I’d give it a try. We found some data to scrape on the NHS website and set about building a scraper.

The chaps at ScraperWiki would be the first to admit that their support for PHP is still very much beta and so it was a little harder than I expected. Eventually I got it scraping a set of data and used Yahoo Pipes to add location data to allow it to be mapped. Here’s what it looks like on Google Earth alongside school and transport datasets:

Google Earth Three Layers

Okay, so not terribly exciting but it was useful to have a go at ScraperWiki and get an idea of some of the things that can be done with it. You can find my scraper on the ScraperWiki site; the Pipe is also available.

I think it was also very interesting to get journalists to meet coders. A few weeks ago I heard someone (possibly Alison Gow) say recently that you can’t get a job for the Guardian without talking about data and it’s becoming an increasingly important part of journalism. No longer is it enough to simply report the news or spout opinion – being open about where your data comes from can be just as important. So it was really good that Trinity Mirror are taking this so seriously.

Someone in my team asked when I raised the idea of using DBpedia (and hence Wikipedia) data how reliable it was and could it be trusted. My response was to point out that most Wikipedia articles cite their sources and asked how many news stories do the same!

I’m getting off topic now so I’ll leave it there! ScraperWiki are running a series of Hack Days across the UK (and beyond) so if you’re interested, make sure you sign up!

Liver and Mash

I’ve already blogged about my own Mashed Library Liverpool talk but I promised to say something about the rest of the event, so here goes!

Mandy Phillips and Owen Stephens

Mandy Phillips kicks of Liver and Mash

The day kicked off with welcome and introductions from Mandy and Owen. I’d heard bits about Mashed Library events before and I know the basics of Mashups but I didn’t really know who would be there and and what to expect. There was a good mix of attendees and speakers presenting “lightening talks”, “Pecha Kucha 20:20″ talks and workshops. The thing that persuaded me to agree to speak and convinced me that it wouldn’t just be a bunch of librarians (!!) was the scattering of local speakers…

Alison Gow

Alison Gow

Alison is Executive Editor (Digital) for Trinity Mirror Merseyside, publishers of the Liverpool Daily Post and Echo. Despite “knowing” her through the Twitter, Friday’s Mashed Libraries event was the first time I’d met her IRL! The slides of her talk “Open Curation of Data” are online covering some of the things journalists and the newspaper industry have had to deal with since the superinterweb came along.

Aidan McGuire and Julian Todd

Julian Todd and Aidan McGuire on ScraperWiki

Aidan and Julian demonstrated ScraperWiki a project supported by 4iP and aiming free data from inaccessible sources and make it available for those who wish to use it in new and innovative ways, for example mashups. “Screen Scraping” isn’t a new idea but typically it’s done by individuals, embedded into their own systems. If the scraped website changes then the feed breaks and there’s no way for others to build on the work done.

ScraperWiki aims to change that by providing a community driven source for storing scrapers. It’s like Wikipedia for code allowing you to take and modify a scraper I’ve written for your own purposes.

There are already dozens of scraped data sources and more are being added every day. It currently supports Python but my language of choice – PHP – will be added soon so I’ll be giving it a go then.

John McKerrell

John McKerrell on Mapping

John’s talk about mapping had the most interest so he presented it to all attendees briefly covering mapping APIs, OpenStreetMap and tracking your location with mapme.at.

Phil Bradley

The first Pecha Kucha 20:20 talk was about social media search tools. I wasn’t writing down the links so check on Phil’s Slideshare page for the presentation coming out. I will say that Google’s support for Twitter is now much better than he seemed to suggest – for example allowing you to drill into tweets for a particular time. It can also be more reliable than search.twitter.com when using shared IP addresses at a conference.

Gary Green

Gary Green 20/20 talk

Gary mentioned that this was his first presentation so I’m not sure a 20:20 talk was the best idea but he handled it pretty well!

Tony Hirst

Tony Hirst talking about Yahoo! Pipes

The afternoon was dedicated to one of three workshops – Arduino with Adrian Mcewen, Mapping with John McKerrell or Mashups with Tony Hirst. I’ve done a bit of each before so I sat at the back of Tony’s talk to try to soak up some new tips.

After a final cake break there was the prize giving for the mashup suggestions competition.

@briankelly, @m8nd1 and @ostephens presenting prizes

So all in all a really interesting day! Congratulations to Mandy Phillips and all the organising team for an excellent event.