I don’t like to predict the future – usually because I’m wrong – but I’m going to put my neck out on one point for the coming year. 2010 will be the year that data becomes important.
So let’s look at what’s happened over the last year.
Ordnance Survey Code-Point® Open data containing the location of every postcode in the country. With this people have been able to build some nice cool services like a wrapper API to give you XML/CSV/JSON/RDF as well as a hackable URL: http://www.uk-postcodes.com/postcode/L394QP (that’s Edge Hill, by the way)
The OS also released a bunch of other data from road atlases in raster format through to vector contour data. Of particular interest is OS VectorMap™ in vector and raster format – that’s the same scale as their paper Landranger maps and while it doesn’t have quite as much data, they’re beautifully rendered and suitable for many uses, but sadly not for walking.
Manchester has taken a very positive step in releasing transport data (their site is down as I type) – is it too much to hope that Merseytravel will follow suit?
data.gov.uk now has over 4600 datasets. Some of them are probably useful.
In May I gave a talk at Liver and Mash expanding on some ideas about data.ac.uk. Since then lots of other people have been discussing in far more detail than I, including the prolific Tony Hirst from the Open University who have become (I believe) the first data.foo.ac.uk with the release of data.open.ac.uk.
It took me a while to think of a subject to talk about but eventually I started considering the role higher education institutions play in mashups and in particular what we can bring to the party.
This actually builds on some ideas I’ve been thinking about for a while. On Christmas Eve last year I posted as part of our 25 days series an entry about 2010 being the year of open data. Edge Hill was closed by that point so I’m not surprised few people read it but I said:
I believe there will be an increasing call for Higher Education to open up its data. Whether that’s information about courses using the XCRI format, or getting information out of the institutional VLE in a format that suits the user not the developer, there is lots that can be done. I’m not pretending this is an easy task but surely if it can be done it should because it’s the right thing to do.
So my presentation expanded on some of these ideas. Firstly we need to accept that what we do online isn’t going to suit everyone. HEI websites are huge unwieldy beasts. Doing a Google search for site:edgehill.ac.uk produces over 8,000 results; warwick.ac.uk has 236,000 pages! Combine that with Sturgeon’s Law and we’re in trouble:
Sturgeon’s Law: 90% of everything is crud.
[Before anyone says it… yes that means 90% of what I say is crap!]
If we accept that our websites aren’t going to deliver everything to everyone we have two options: firstly we could throw resources at the problem to add more and more content, but we know from experience how that ends up:
Alternatively we can strip down to our core audience and find other ways to satisfy the so-called “long tail”. To me that means providing data in an open, accessible form that users can take and use in ways that suit them. Let’s do that.
Typically these feeds contain easily available information. News stories are recycled press releases. Often forthcoming events are available as an RSS or Atom feed or even as an iCal feed that can be subscribed to in Google or Outlook Calendar.
Universities also run courses and there’s a standard format for publishing them – XCRI-CAP.
So far, so general, but what other information are people looking for? Freedom of Information legislation came into force on 1st January 2005 applying to all public bodies including Universities. WhatDoTheyKnow from MySociety allows anyone to submit and track FOI requests – simultaneously the most awesome and scary thing for anyone working in a public sector organisation!
Most HEI websites also contain FOI pages or a publication scheme but often the information available is locked up in difficult to access documents. PDFs and unstructured webpages are typically the format of choice. We can make this information more open by publishing in more accessible formats. Maybe uploading to Google Docs (which allows export as CSV or through an API) would be an easy thing to do.
We also have systems containing interesting data – HR, Student Record System, VLE, Library Catalogue – but getting information out can be difficult. When procuring new systems we need to be asking the right questions about vendor’s approach to open data and APIs and building this into the requirements specification.
So my challenge is for us to create data.ac.uk. It would be great to do something sector wide along the lines of data.gov.uk (but, y’know, better!) but an easier model to get started with is something like the Guardian’s Data Store. Let’s start in the areas we have control over:
Let’s create a webpage, publish some links to existing data. If we have spreadsheets upload them to Google Docs and post the link. If we have systems with a rubbish API, let’s knock up a wrapper layer do expose something more useful. data.ac.uk isn’t going to happen overnight but each of us can do our bit to build a more open sector.
In the pub following the event the discussion continued with Brian Kelly from UKOLN making an interesting point:
If that’s not a good enough reason to take open data seriously, I don’t know what is.
One final point about the presentation. I noticed a tweet from Alison Gow of the Liverpool Daily Post and Echo:
Not sure about bravery – I didn’t actually recognise Julian until after the presentation was over which is probably a very good thing!