Designing for information

Lately I’ve been thinking about how we design information. We spend a lot of time thinking about the design of the homepage, top level areas and Faculty and Department sites but when it comes to content, all too often it’s a copy-and-paste job from a Word document.

The move to WordPress for many parts of the site will mean that the people who know about the information will have more control over how it’s presented on the web but there is still a risk that it will remain dry blocks of text because that’s the easiest way to get it online.

I want us to take a new approach to much of the content we have. We need to address each area of the site and look at what is important and the best way to display it.

One site I believe does this really well is the new beta Government website www.gov.uk. They’ve taken what can be quite detailed information and extracted the important points, structured it and presented it clearly.

Take for example times that the clocks change. The current Government website Directgov has the information on a page titled Bank holidays and British Summer Time but when the clocks go forward is only shown towards the bottom and requires the user to think:

Compare this to the new GovUK site where clock changes are shown on a page on their own titled When do the clocks change?:

Front and centre on the page is the information that probably 99% of people want – the next date that clocks change.

This is basic information but making it easy to find and consume is invaluable and it’s the type of information that universities have in bucket loads!

These pages get thousands of views per month and that’s just the start. Unstructured content can be redesigned too by breaking it down into stages and identifying the goal of publishing it online.

Changing how we design information is a big task and will involve us working closely with content owners but the benefits to our users, and the university, are enormous. Work starts now.

Bad URLs Part 4: Choose your future

So we’ve looked at examples of bad and good URLs, what the current situation is at Edge Hill, but what are we doing about it?

Lots, you’ll be pleased to know! As part of the development work we’ve been doing for the new website design, I’ve been taking a long hard look at how our website is structured and plan to make some changes. There are two areas to the changes – ensuring our new developments are done properly and changing existing areas of the site to fit in with the new structure.

Firstly the new developments. We’re currently working on three systems – news, events and prospectus. News was one of the examples I gave last time where we could make some improvements so let’s look at how things might change.

Firstly, all our new developments are being brought onto the main Edge Hill domain – www.edgehill.ac.uk – and each “site” placed in a top level address:

http://info.edgehill.ac.uk/EHU_news/
becomes:
http://www.edgehill.ac.uk/news

News articles will drop references to the technology used and the database IDs:

http://info.edgehill.ac.uk/EHU_news/story.asp?storyid=765
becomes:
http://www.edgehill.ac.uk/news/2008/01/the-performance-that-is-more-canal-than-chanel

In this example the new URL is actually longer than the old one, but I can live with that because it’s more search engine friendly and the structure is human-readable. For example we can guess that the monthly archive page will be:

http://www.edgehill.ac.uk/news/2008/01

This idea is nothing new – for the first few years of the web most sites had a pretty logical structure – but it’s something that has been lost when moving to Content Management Systems.

The online prospectus is getting similar treatment where courses are currently referenced by an ID number the URL will contain the course title:

http://info.edgehill.ac.uk/EHU_eprospectus/details/BS0041.asp
becomes:
http://www.edgehill.ac.uk/study/courses/computing

As part of our JISC funded mini-project, we’ll be outputting XCRI feeds from the online prospectus. The URLs for these will be really simple – just add /xcri to the end of the address:

http://www.edgehill.ac.uk/study/courses/xcri
http://www.edgehill.ac.uk/study/courses/computing/2009/xcri

In the news site, feeds of articles, tags, comments and much more will be available simply by adding /feed to the URL. Same will apply to search results.

All this is great for the new developments, but we do have a lot of static pages that won’t be replaced. Instead, pages will move to a flatter URL structure. For example, the Faculty of Education site will be available directly through what is currently the vanity URL meaning that most subpages also have a nice URL:

http://www.edgehill.ac.uk/Faculties/Education/Research/index.htm
becomes:
http://www.edgehill.ac.uk/education/research

Areas of the site which were previously hidden away three or four levels deep will be made more accessible through shorter URLs.

How are we doing this? The core of the new site is a brand new symfony based application. This allows us to embed our dynamic applications – news, events and prospectus – more closely into the site than has previously been possible. symfony allows you to define routing rules which while look complex on the backend because of the way they mix up pages, produce a uniform look to the structure of the site.

For our existing content we’re using a combination of some lookup tables in our symfony application and some Apache mod_rewrite rules to detect requests for existing content. All the existing pages will be redirected to their new locations so any bookmarks will continue to work and search engines will quickly find the new versions of pages.

That’s all for this little series of posts about URLs. Hopefully it has helped explain some of my thinking behind the changes. If you’ve got any questions then drop me an email or post a comment.

Bad URLs Part 3: Confessions time

Over the last couple of posts I’ve been looking at URLs, good and bad. Now it’s time to examine what we do at Edge Hill and see how we fare!

Most of our website is currently made up of static pages so it looks something like this:

http://www.edgehill.ac.uk/Faculties/Education/index.html

Other areas of the site aren’t quite so great:

http://www.edgehill.ac.uk/Faculties/FAS/English/History/index.html

Not terribly bad but it’s a little bit long – I wouldn’t like to read it out over the phone and because the URL is structured to mirror the department, when names change, the URL could change as well.

The site structure is quite deep which has led to some quite strange locations for pages, for example the copyright page linked to from every page on the site is within the Web Services area of the site:

http://www.edgehill.ac.uk/Sites/ITServices/WebServ/Copyright.htm

For use in publications, there’s a whole bunch of “vanity URLs” like this:

http://www.edgehill.ac.uk/education

And that will redirect you to the page you’re looking for. These are great – easy to read over the phone or type in but since most of them force redirect to the actual page, most people don’t know about them – if you copy and paste into a document you’re producing, you’ll get the long URL. But they’re also not universal – short URLs exist for some departments and services but not others and sometimes it can be hard to pick a good vanity URL.

When we look at some of our dynamic content however, things aren’t quite so great.

http://info.edgehill.ac.uk/EHU_news/article.asp?id=4786

What’s wrong with this?

  • Mystery “info” server – splitting page rankings on Google
  • Tell the user what language we’re using for the page (ASP)
  • Meaningless ID numbers
  • EHU_news – why not just news?
  • Nothing that search engines can pick up on for keywords

The first site I worked on for Edge Hill – Education Partnership – is a bit of a mixed bag:

https://go.edgehill.ac.uk/ep/static/primary-mentors

It’s fairly readable with words describing the pages rather than ID numbers, but it’s hosted as part of the GO website despite being in the main website template. There’s also the “static” in there which is a by-product of the implementation rather than being an relevant to the URL. I’ve learnt my lesson and it won’t happen again.

Overall I think we score 5/10 – no nightmare URLs but lots of scope for improvement and next time I’ll be looking at some of the plans to change how our sites are structured and maybe get a little technical about how we’re implementing it!

Bad URLs Part 1: When URLs go bad

The humble URL is one of the most unloved things on the internet, yet without it there wouldn’t be a World Wide Web.

For the less techie out there, URLs are web addresses such as http://www.edgehill.ac.uk/. The identify every web site, page, image and video on the internet and on the whole they’ve done a pretty good job over the last 30 years.

In the beginning things were simple. You put a bunch of web pages in some directories on your server and there they were on the interweb. When you uploaded a page it would likely stay there forever. As the web grew, content moved from being static to dynamically generated and this is where it all started to go wrong.

Developers created ways of generating pages using scripts to pull information out of databases or from user input. As developers have a habbit of doing, they get caught up in the technology and lost sight of the user.

Have you ever looked at a web address and thought it was a foreign language? PHP, ASP, JSP, .do at the end of file names – these all indicate the scripting language used to create the website. I might find this interesting, but I bet 99% of people don’t!

Then there’s the query string – that’s the bit after the question mark in a URL. It tells the script extra information that it might need to know about the page you want. Very important, and certainly not bad in itself, but too often there is useless extra information passed in which means the URLs are too long and several subtlety different URLs might actually return the same result.

Ugly, long and and overly complex URLs are something that’s bothered me for quite a while. In the past I’ve created sites with some truely awful URL structures and it’s not big or clever – now I’m committed to doing things right. This is a topic that’s been discussed for a very long time – TBL‘s Cool URIs don’t change is a decade old; more lately Search Engine Optimisation rather than the idealistic goal of a pure site structure has been the main drive for clean URLs.

Let me give a few examples of Bad URLs. First up is Auto Trader:

http://search.autotrader.co.uk/es-uk/www/cars/FORD+KA/Ne-2-4-5-6-7-8-27-44-49-53-61-64-67-103-133-146,N-19-29-4294966844-4294967194/advert.action?R=200804302411772&distance=24&postcode=L39+4QP&channel=CARS&make=FORD&model=KA&min_pr=500&max_pr=5000&max_mileage=

You won’t be able to see the full link, but it contains loads of pointless extra information when all I want is to see the details of a car.

Often Content Management Systems – which are designed to make the creation of websites easier – are one of the main culprits in creating bad URLs. Brent Simmons has it pinned with this insightful comment:

Brent’s Law of CMS URLs: the more expensive the CMS, the crappier the URLs.

The example given is StoryServer by Vignette which produces the bizarre looking:

http://news.sky.com/skynews/article/0,,30200-1303092,00.html

I’m fairly sure they don’t have 302,001,303,092 stories on Sky News!

That’s all for now – next time I’ll be looking at some things being done right and the benefits it brings. If you have any examples of really bad URLs post them in a comment (that’s not an invitation to spammers!) and see who can find an example with the most bad features.

>