Monthly Archives: January 2008

Bad URLs Part 2: The Beauty of URLs

Last time I gave some examples of awful URLs but not everyone gets it wrong. Let me give you some examples of truly beautify URL structures and explain the benefits of them!

BBC LogoAsk Auntie

If you ask almost any UK-based web developer for a list of the best produced websites, the Beeb will be pretty high up. They do a lot of things very well, and you’d expect so with their budget! URLs are just one example. Think of a major TV programme on the BBC and add the name after bbc.co.uk and 95% of the time that’s the address of the website. Try it out…

http://www.bbc.co.uk/newsnight

Considering the size of the BBC site, they seem to have a very well organised structure. Not too many levels deep – usually only one or two – and URLs stay around for a very long time. Check out Politics 97, or the Diana Remembered website. See how even when names change the content follows – bbc.co.uk/childrens now takes you to an index page linking to CBeebies and CBBC.

A new development from the BBC, still in beta, is even more impressive. Their new BBC Programmes site is an index of every TV and radio programme shown on BBC stations. For each series it lists episodes, and scheduling information. Great, but didn’t the channel listings do this already? No – that only showed the next week and didn’t contain an archive; the new site gives every series, episode and showing a unique, permanent URL.

Programmes are represented by a short alphanumeric identifier rather than their full name:

http://www.bbc.co.uk/programmes/b006mk25

This has the advantage of being short but is hard to predict. In one of the comments on their introduction to the programmes site (and some other cool stuff they do with URLs), Michael Smethurst explains the reasoning behind their chosen structure:

We thought long and hard about the best way to make programmes addressable and, as ever, there’s no perfect solution. So…

…no channel cos not only do episodes get broadcast on multiple channels they can also change “channel ownership” over time.
[…]
and no brand > series > episode cos so many programmes don’t fit this model.
[…]
We’d love to have made human readable/hackable AND persistent urls (and have on the aggregation pages) but it just wasn’t possible

There’s another cool feature of BBC Programmes mentioned in that post:

Were also working on additional views so that in the near future by adding .json, .mobile, .rss, .atom, .iCal or .yaml to the end of the URL will give you that resource in that format.

You might not know (or care!) what each of those formats is, but what it means for every user is that they’re free to take the information that the BBC provide and use it within their own system. Already there is microformatted information embedded into every page.

Accessible UK Train TimetablesTrain Times done right

Another fantastic example of beautiful URL structure is from traintimes.org.uk. This site is an alternative to the awful official site which provide rail information. They offer a fully accessible interface to train times and fares in a format much easier to browse and navigate than National Rail. But along side the forms letting you search is some URL magic. Say you want to travel from Liverpool to London, simply tag it on to the end of the URL:

http://traintimes.org.uk/liverpool/london

Not leaving right now. Okay…

http://traintimes.org.uk/liverpool/london/20:30

Not leaving today? That’s fine too:

http://traintimes.org.uk/liverpool/london/08:00/wednesday

Want the price?

http://traintimes.org.uk/liverpool/london/08:00/wednesday/fares

The Train Times site has so much flexibility – you can use station codes instead of the full name and it will recognise a variety of date formats. National Rail could learn a lot!

That’s enough examples for now, but there will be more later on. Next time I’ll be looking at Edge Hill’s URLs and seeing what we’re doing right, but more importantly where we can improve.

Bad URLs Part 1: When URLs go bad

The humble URL is one of the most unloved things on the internet, yet without it there wouldn’t be a World Wide Web.

For the less techie out there, URLs are web addresses such as http://www.edgehill.ac.uk/. The identify every web site, page, image and video on the internet and on the whole they’ve done a pretty good job over the last 30 years.

In the beginning things were simple. You put a bunch of web pages in some directories on your server and there they were on the interweb. When you uploaded a page it would likely stay there forever. As the web grew, content moved from being static to dynamically generated and this is where it all started to go wrong.

Developers created ways of generating pages using scripts to pull information out of databases or from user input. As developers have a habbit of doing, they get caught up in the technology and lost sight of the user.

Have you ever looked at a web address and thought it was a foreign language? PHP, ASP, JSP, .do at the end of file names – these all indicate the scripting language used to create the website. I might find this interesting, but I bet 99% of people don’t!

Then there’s the query string – that’s the bit after the question mark in a URL. It tells the script extra information that it might need to know about the page you want. Very important, and certainly not bad in itself, but too often there is useless extra information passed in which means the URLs are too long and several subtlety different URLs might actually return the same result.

Ugly, long and and overly complex URLs are something that’s bothered me for quite a while. In the past I’ve created sites with some truely awful URL structures and it’s not big or clever – now I’m committed to doing things right. This is a topic that’s been discussed for a very long time – TBL‘s Cool URIs don’t change is a decade old; more lately Search Engine Optimisation rather than the idealistic goal of a pure site structure has been the main drive for clean URLs.

Let me give a few examples of Bad URLs. First up is Auto Trader:

http://search.autotrader.co.uk/es-uk/www/cars/FORD+KA/Ne-2-4-5-6-7-8-27-44-49-53-61-64-67-103-133-146,N-19-29-4294966844-4294967194/advert.action?R=200804302411772&distance=24&postcode=L39+4QP&channel=CARS&make=FORD&model=KA&min_pr=500&max_pr=5000&max_mileage=

You won’t be able to see the full link, but it contains loads of pointless extra information when all I want is to see the details of a car.

Often Content Management Systems – which are designed to make the creation of websites easier – are one of the main culprits in creating bad URLs. Brent Simmons has it pinned with this insightful comment:

Brent’s Law of CMS URLs: the more expensive the CMS, the crappier the URLs.

The example given is StoryServer by Vignette which produces the bizarre looking:

http://news.sky.com/skynews/article/0,,30200-1303092,00.html

I’m fairly sure they don’t have 302,001,303,092 stories on Sky News!

That’s all for now – next time I’ll be looking at some things being done right and the benefits it brings. If you have any examples of really bad URLs post them in a comment (that’s not an invitation to spammers!) and see who can find an example with the most bad features.

eXchanging Course Related Information

XCRIThree weeks without a noise from the Web Services blog! How have you coped, dear reader?! We’ve got lots going on with some exciting developments you’ll hear about over the coming few weeks but I’m going to talk about something that’s probably not quite as exciting to most people!

Before Christmas we submitted a proposal for JISC funding for a mini-project looking into implementing and testing the XCRI format. XCRI is an application of XML which is designed for exchanging course information between organisations. For example universities could provide a feed of courses to websites which aggregate course information, reducing the need to retype information.

I’m happy to say that we heard just before the holiday that our proposal was accepted! So now the work begins on integrating XCRI into our systems. This isn’t as hard as it might be – part of the work we’re doing redeveloping the corporate website is on the eProspectus and we’re working on ensuring from the start that all the information required to output valid XCRI feeds is available from the start.

About a week ago I attended the JISC CETIS Joint Portfolio SIG and Enterprise SIG Meeting at Manchester Met. I didn’t really know what to expect but there was a session outlining the XCRI project and developments from last year so I thought it would be useful.

The first morning session was from Peter Rees Jones about ePortfolios and how HE can integrate better with companies. More acronyms than you can shake a stick at, but many interesting thoughts.

Same for John Harrison’s session on “Personal Information Brokerage”. Some obvious comparisons with OpenID, but more than that offers. Edentity clearly think that Education (and delivery companies!) have the capacity to act as a hub for implementing some of the systems they propose. Personally, I suspect that the commercial sector will do more than they give it credit for. Looking at the criteria for selection:

  1. Need for further data sharing
  2. Clear organisational boundaries
  3. Capacity for collective action
  4. Demographics

John marked them down on 3 and 4 but I disagree. If that doesn’t describe Google, Amazon, Yahoo and a bunch of other online companies (including most that get a “Web 2.0” label), I don’t know what does. Okay, standards may be slow to establish at times, but when there’s the will it can happen!

So on to XCRI. There were a few presentations from people explaining the XCRI standard and how its been implemented in institutions. Mark Stubbs gave a good overview of the standard, where it’s come from and where it’s going. I’ve been using a useful diagram handout showing the proposed XCRI-CAP 1.1 schema for the last week to check that what we’re developing for the eProspectus is heading along the right lines.

A few of the last round of XCRI mini projects displayed their work – the University of Bolton probably most closely matching the work we’re doing at Edge Hill. They’ve not yet launched their new site but I’m keeping an eye out for it!

Some of the slides (including those from Selwyn Lloyd of Phosphorix – developers behind CPD Noticeboard) are on the website, so check it out if you’re interested.