Hacks meet Hackers

Francis Irving

It’s been a busy week with the Institutional Web Management Workshop last Monday to Wednesday in Sheffield and WordCamp UK happening in Manchester this weekend but on Friday I took a day off to pop down to a hack day in Liverpool.

The event was hosted by LJMU’s Open Labs at the Art and Design Academy in partnership with ScraperWiki and Trinity Mirror Merseyside (think Liverpool Daily Post and Echo, Ormskirk Advertiser, Southport Visiter etc etc!). The idea was that hacks (journalists) meet hackers (coders, not to be confused with crackers who break into systems!) for a day working on datasets to produce something at the end of the day.

The basic format was splitting into teams comprising a few hacks and a few hackers with an interest in a particular subject, being put into a “booth” for 6 hours and seeing what happened. The group I was in was focused around Liverpool datasets – think Doctors surgeries, educational statistics etc.

I’ve come across ScraperWiki before at Liver and Mash and while there was no requirement to use their system, since it’s recently added support for PHP alongside Python I thought I’d give it a try. We found some data to scrape on the NHS website and set about building a scraper.

The chaps at ScraperWiki would be the first to admit that their support for PHP is still very much beta and so it was a little harder than I expected. Eventually I got it scraping a set of data and used Yahoo Pipes to add location data to allow it to be mapped. Here’s what it looks like on Google Earth alongside school and transport datasets:

Google Earth Three Layers

Okay, so not terribly exciting but it was useful to have a go at ScraperWiki and get an idea of some of the things that can be done with it. You can find my scraper on the ScraperWiki site; the Pipe is also available.

I think it was also very interesting to get journalists to meet coders. A few weeks ago I heard someone (possibly Alison Gow) say recently that you can’t get a job for the Guardian without talking about data and it’s becoming an increasingly important part of journalism. No longer is it enough to simply report the news or spout opinion – being open about where your data comes from can be just as important. So it was really good that Trinity Mirror are taking this so seriously.

Someone in my team asked when I raised the idea of using DBpedia (and hence Wikipedia) data how reliable it was and could it be trusted. My response was to point out that most Wikipedia articles cite their sources and asked how many news stories do the same!

I’m getting off topic now so I’ll leave it there! ScraperWiki are running a series of Hack Days across the UK (and beyond) so if you’re interested, make sure you sign up!

Liver and Mash

I’ve already blogged about my own Mashed Library Liverpool talk but I promised to say something about the rest of the event, so here goes!

Mandy Phillips and Owen Stephens

Mandy Phillips kicks of Liver and Mash

The day kicked off with welcome and introductions from Mandy and Owen. I’d heard bits about Mashed Library events before and I know the basics of Mashups but I didn’t really know who would be there and and what to expect. There was a good mix of attendees and speakers presenting “lightening talks”, “Pecha Kucha 20:20″ talks and workshops. The thing that persuaded me to agree to speak and convinced me that it wouldn’t just be a bunch of librarians (!!) was the scattering of local speakers…

Alison Gow

Alison Gow

Alison is Executive Editor (Digital) for Trinity Mirror Merseyside, publishers of the Liverpool Daily Post and Echo. Despite “knowing” her through the Twitter, Friday’s Mashed Libraries event was the first time I’d met her IRL! The slides of her talk “Open Curation of Data” are online covering some of the things journalists and the newspaper industry have had to deal with since the superinterweb came along.

Aidan McGuire and Julian Todd

Julian Todd and Aidan McGuire on ScraperWiki

Aidan and Julian demonstrated ScraperWiki a project supported by 4iP and aiming free data from inaccessible sources and make it available for those who wish to use it in new and innovative ways, for example mashups. “Screen Scraping” isn’t a new idea but typically it’s done by individuals, embedded into their own systems. If the scraped website changes then the feed breaks and there’s no way for others to build on the work done.

ScraperWiki aims to change that by providing a community driven source for storing scrapers. It’s like Wikipedia for code allowing you to take and modify a scraper I’ve written for your own purposes.

There are already dozens of scraped data sources and more are being added every day. It currently supports Python but my language of choice – PHP – will be added soon so I’ll be giving it a go then.

John McKerrell

John McKerrell on Mapping

John’s talk about mapping had the most interest so he presented it to all attendees briefly covering mapping APIs, OpenStreetMap and tracking your location with mapme.at.

Phil Bradley

The first Pecha Kucha 20:20 talk was about social media search tools. I wasn’t writing down the links so check on Phil’s Slideshare page for the presentation coming out. I will say that Google’s support for Twitter is now much better than he seemed to suggest – for example allowing you to drill into tweets for a particular time. It can also be more reliable than search.twitter.com when using shared IP addresses at a conference.

Gary Green

Gary Green 20/20 talk

Gary mentioned that this was his first presentation so I’m not sure a 20:20 talk was the best idea but he handled it pretty well!

Tony Hirst

Tony Hirst talking about Yahoo! Pipes

The afternoon was dedicated to one of three workshops – Arduino with Adrian Mcewen, Mapping with John McKerrell or Mashups with Tony Hirst. I’ve done a bit of each before so I sat at the back of Tony’s talk to try to soak up some new tips.

After a final cake break there was the prize giving for the mashup suggestions competition.

@briankelly, @m8nd1 and @ostephens presenting prizes

So all in all a really interesting day! Congratulations to Mandy Phillips and all the organising team for an excellent event.

Breaking News

That picture of the plane in the Hudson river was for many people the first time that Twitter had brought them images of breaking news.  It was quickly retweeted around the internet and even made it onto traditional broadcast news.  Not surprising that major stories are broken this way, but over the last couple of days I’ve become more aware of this happening at a local level.

photoOn Saturday afternoon, I was enjoying a quiet drink in the Peacock on Seel Street when we spotted smoke coming out of a building across the road.  I resisted tweeting about it until the flames started coming out of the roof!

I wasn’t the only person to spot the fire and Twitter Search’s “nearby” function pointed me to several other people who’d seen it, telling me it was a Greek restaurant on Parr Street that was on fire. A few minutes later I got a message from a journalist at the Liverpool Daily Post and Echo asking if they could use one of my photos. A few minutes after that and it was on their website.

Sunday, and I learnt of the monkey escape at Chester Zoo from a friend’s tweet. Very nearly the “oh my God, I just saw a monkey run down the street” moment I’ve been waiting for!

Finally, just an hour ago I heard about a collapsed crane in Liverpool which, according to Alison Gow, will be the Echo’s first ever front cover Twitter byline.

It’s a mad world.

What does your “net identity” look like?

I have recently been asked to comment on an article for the Liverpool Daily Post which attempts to look at how employers use the web to locate ‘additional’ information about prospective employees. The article entitled “Should you worry about your net worth?” has already been published online and it makes interesting reading.

According to the article “one in five employers now finds information about candidates from the internet” and “over half of who say it will influence their final decision”. It references sites such as MySpace as easy targets for employers to search for and drill down on the more ‘informal’ information about prospective employees.

Whilst I am not surprised that companies are using the web in this way it does leave me thinking about the advice we should be giving to our staff and students. I do encourage the use of social networking sites such as MySpace and Facebook and don’t think we can or should be prescriptive about how people should use these but it is worth highlighting this trend from employers and reminding people that should they wish to keep a distinct divide between their personal and private persona – they should ensure this is reflected online.

As Craig Sweeney comments in the article: “The thing to remember is what might make your friends giggle today could come back and haunt you a couple of years down the line when you’re trying to land your dream job.”

>