Bad URLs Part 1: When URLs go bad

The humble URL is one of the most unloved things on the internet, yet without it there wouldn’t be a World Wide Web.

For the less techie out there, URLs are web addresses such as http://www.edgehill.ac.uk/. The identify every web site, page, image and video on the internet and on the whole they’ve done a pretty good job over the last 30 years.

In the beginning things were simple. You put a bunch of web pages in some directories on your server and there they were on the interweb. When you uploaded a page it would likely stay there forever. As the web grew, content moved from being static to dynamically generated and this is where it all started to go wrong.

Developers created ways of generating pages using scripts to pull information out of databases or from user input. As developers have a habbit of doing, they get caught up in the technology and lost sight of the user.

Have you ever looked at a web address and thought it was a foreign language? PHP, ASP, JSP, .do at the end of file names – these all indicate the scripting language used to create the website. I might find this interesting, but I bet 99% of people don’t!

Then there’s the query string – that’s the bit after the question mark in a URL. It tells the script extra information that it might need to know about the page you want. Very important, and certainly not bad in itself, but too often there is useless extra information passed in which means the URLs are too long and several subtlety different URLs might actually return the same result.

Ugly, long and and overly complex URLs are something that’s bothered me for quite a while. In the past I’ve created sites with some truely awful URL structures and it’s not big or clever – now I’m committed to doing things right. This is a topic that’s been discussed for a very long time – TBL‘s Cool URIs don’t change is a decade old; more lately Search Engine Optimisation rather than the idealistic goal of a pure site structure has been the main drive for clean URLs.

Let me give a few examples of Bad URLs. First up is Auto Trader:

http://search.autotrader.co.uk/es-uk/www/cars/FORD+KA/Ne-2-4-5-6-7-8-27-44-49-53-61-64-67-103-133-146,N-19-29-4294966844-4294967194/advert.action?R=200804302411772&distance=24&postcode=L39+4QP&channel=CARS&make=FORD&model=KA&min_pr=500&max_pr=5000&max_mileage=

You won’t be able to see the full link, but it contains loads of pointless extra information when all I want is to see the details of a car.

Often Content Management Systems – which are designed to make the creation of websites easier – are one of the main culprits in creating bad URLs. Brent Simmons has it pinned with this insightful comment:

Brent’s Law of CMS URLs: the more expensive the CMS, the crappier the URLs.

The example given is StoryServer by Vignette which produces the bizarre looking:

http://news.sky.com/skynews/article/0,,30200-1303092,00.html

I’m fairly sure they don’t have 302,001,303,092 stories on Sky News!

That’s all for now – next time I’ll be looking at some things being done right and the benefits it brings. If you have any examples of really bad URLs post them in a comment (that’s not an invitation to spammers!) and see who can find an example with the most bad features.

3 thoughts on “Bad URLs Part 1: When URLs go bad

  1. I’m not sure I agree that the “extra stuff” is pointless. Confusing to some? Maybe. But if it contains details required to generate a particular result, particularly session information relating to the user who generated the page; then it isn’t useless. Just because two subtly different URL’s generate the same result doesn’t mean what you subtracted from the URL is redundant. If I say “Hi,” and “Hello,” to the same person and their response is “Oh, hello!”, does that make either of the two greetings unnecessary? They both yielded the same results.

    I do agree with one thing, forward compatibility is something that people might want to take into account (regarding file extensions) — but it’s not too important. I hear alot about usability issues, and most people’s fix is to dumb something down, which limits people with the technical ability and doesn’t make much difference to the overall working of the web site.

    An interesting (and probably unrelated) fact about URL’s is that the domain isn’t always required. Again, it was a system implemented to dumb down an already useful and technical system. Try typing IP addresses instead of URL’s, and the same result will occur.

    For example: http://72.14.207.99

    Even neater? The decimal equivelant of an IP address will resolve to it’s URL. Try it: http://1208930147

  2. Session information can be passed in cookies which can help prevent man in the middle attacks.

    The query strings that I don’t like are where there are needless empty values, or where default values can be assumed. Some search engines are pretty bad for this – they pass in tons of extra information through hidden fields when they’re not needed. Our new search facility will look like this:

    http://www.edgehill.ac.uk/search?query=computing

    To limit it to courses:

    http://www.edgehill.ac.uk/search/courses?query=computing

Comments are closed.