Create a better search engine than Google

The findings and opinions contained within this post are entirely mine and do not necessarily reflect those of Edge Hill University. The research and write up was done in my own time and is only posted here because it may be of interest to the HE community.

At BarCamp Liverpool I gave a talk about site search. It covered the same sort of topics as Martin Belam’s Euro IA Summit presentation Taking the ‘Ooh’ out of Google and I recommend that anyone interested in site search read his series of blog posts. Go read it now – it’s way better than I’m going to write here! While Martin reviewed news websites from across Europe, I’ve turned my attention to university websites and look at what institutions are doing now and how it can be improved.

Do we really need site search?

Before we get too tied up in what to do, it’s perfectly valid to question whether we even need a site search engine any more. Every internet user knows how to search – specifically, eveyone knows how “to Google”. The average query length has almost doubled over the last few years and there is some evidence to show people are using site names to restrict their search – for example “english pgce edge hill”.

Google have also introduced “search this site” boxes into search results which mean that when you search for “Edge Hill” you can filter down your results by making an extra query. Putting “english pgce” into this search box gives results for the query “english pgce”. This feature is already present for most HEIs:

Google Site Search for Edge Hill University

So how many people are using this feature? Our Google Analytics reports show that in the last six months, just 150 visitors to our site came from searches including “”. A much higher number of people are including “Edge Hill” in queries. Excluding searches which are looking diretly for us – for example “Edge Hill University”, “”, or worse or all “edgehill uni” – around 10% of Google referals come with some form of restriction to search our site.

How does this compare to our own site search? We’ve had 270,000 searches in the last six months, with 7% of visitors using it. Additionally, the point where people are wanting to search is usually after they’ve left Google and come to our site. Someone looking for courses for example isn’t going to go back to Google to search – they need seach within the context of the pages they’re looking at.

State of the Union Universities

One weekend in November I spent a few hours going through HEI websites and testing their search engines. On each site I searched for “computing” – most universities offer some form of computing course and it would typically find their IT Services department. I noted down which system they used for search, and whether they provided the ability to search just courses, news and events. All quite basic but it has still offered up some interesting findings.

Google Search Appliance by Adriaan BloemOver the last few years most universities have looked to Google to provide search. 63% of HEIs now use some form of Google-powered search engine. Most of those have bought a Google Search Appliance (or Google Mini) – a server you install locally which provides your own miniture Google Search engine. The interface and results are familiar for users – it’s clean and quick.

Others are using either Google Custom Search Engine or what I’ve recorded as “Google Syndicated Search” – both similar services where Google will spider your site remotely with no requirements to run your own servers. CSE allows you to embed the results within your own page while Syndicated Search allows basic branding to be applied to results displayed on a Google domain.

In a distant second place is ht://Dig – a system that’s been around for years. When I first came across it many years go I was amazed that I could run my own search engine. It was one of the first widely deployed site search systems and so a number of universities were early adoptors. It hasn’t moved with the times – the latest release of ht://Dig was in June 2004.

The short tail includes Ultraseek, Egothor and Novell’s QuickFinder – the system that powered Edge Hill’s site search until April this year. A few sites use search engines built into their CMS or custom built but it’s hard to find out much about these.

This is all very good but in many ways, the engine doesn’t matter – it’s what you do with it that counts!

The ugly, the bad and the good

Google has led the way with clean, uncluttered results pages but some search engines haven’t learnt that less is more and seem compelled to pack in every feature they can. Star ratings or other indicators of the quality of the page add little and can be distracting. Users know to expect that if a page is higher up the list then it’s a closer match to their query so it may not be necessary. On our own search we have a percentage relevance but I’m not sure it adds much value to the visitor so it might be getting cut soon.

The above example from Keele also shows another problem with some search engine blindly indexing the full content of pages. We know exactly which parts of a page are relevant and which are navigation bars, headers or footers. Including these in the index can distort results while adding little value. A search for “Freedom of Information” might return 10,000 pages simply because it’s linked from the footer of every page.

Worse than indexing these areas is displaying the text as part of the result summary. Here the first line of every result is the navigation.

Sussex University searchReturning an unfamiliar layout for the results page can lead to confusion. The University of Sussex have a combined results page showing “top matches”, news, events, and “full text” stacked in one page.

Advanced search

Many site search engines offer some form of “advanced” search and this can be a very useful feature for power users to track down more precisely what they’re looking for. This extra functionality inevitably comes with the risks of extra complexity. With Google Search Appliances the advanced search page is similar to the one on the main Google site and provides some options that might not be necessary. If the entire site is on a single domain or the only language is English, why do you need the option to restrict searches by either of these (maybe this is a question Google should be answering because they can tell from the index if there’s multiple domains or languages).

Topic specific searches provide potential for a whole extra level of complexity. Nottigham Trent University’s course finder requires you to enter level, mode and year of study, subject area and keywords. Why not just a simple box?

Advanced search can be very powerful and easy to use. The University of Warwick have just a few additional options, specifying dates using a drop down rather than a datepicker, and allowing searches to be limited to certain types of document or areas of the site. Edge Hill’s course search has just one “advanced” option – course type.

Tabbed Search

This is something Martin Belam urges caution over. The user may not understand the structure behind this scoping of search and thus limit the results more than they intended. Edge Hill uses tabs to provide scoped search for courses, news and events. Warwick also has a tabbed interface to pull in search results from their blogs and people finder. Care must be taken to ensure that scoping isn’t confused with site navigation and it makes sense to show visually where there are more results. Warwick use some nifty Ajax to load up the number of search results for each tab – an idea I liked so much I implemented it on the Edge Hill too!

More neat ideas

News search on the University of Oxford is integrated right into their news system – results are presented in the same format as the news homepage showing the latest stories making the results pages familiar. Each summary is accompanied by a thumbnail and icons flag up news stories with audio or video attached.

Everyone has a custom 404 page these days to help direct users to what they’re looking for (you do, don’t you?!) so why are most search engines unhelpful when there’s no results? At best they’ll correct your spelling and provide some hints on rewording your query – at worst they’ll just dump “No results found” to screen. This is turning visitors away when they most need our help. Most sites have an A to Z list or sitemap; prospective students can be helped by linking to the full list of courses.

Auto suggest for popular queries can be a good way to encourage users to search more acurately. I’ve not seen anything similar on HEI sites yet but Facebook’s search does a neat job of integrating specific people, groups and organisations with wider search – selecting these takes yuu direct to that page – a really handy quick links feature.

Northumbria’s search tag cloud is a little out of the way but it would be possible to add in a top searchs list somewhere a bit more visible. Storing a user’s search history would be very easy to do (even client side) and with a little logging enabled it would also be possible to implement “people who searched for X also searched for Y” functionality.

The final neat feature I came across was a few sites making use of Google’s KeyMatch feature. This is similar to Google AdWords but allows the site owner to specify what should show up for specified keywords. During my presentation someone suggested that users would blank these results because they see them as adverts. I’m not so sure – I think people will click the links if it’s clear where it’s going to. KeyMatch is a good way of making sure that important pages are at the top of the results even when the search algorithm doesn’t rank them.

All talk?

By now I’m sure you’re fed up of my holier-than-thou ranting about site search and I want to stress that what we’ve done at Edge Hill isn’t perfect by any means! The algorithm used for indexing and searching is quite crude in places but I think we’ve been able to improve the user experience by adding a few neat features and trying to keep in mind what the user is likely to search for, not just what a spider can find.

The search system we use is built on Zend Lucene and has had around a week of development hours over the last year but not everyone will be able to do that, so what can be done with existing resources? In most systems, changing templates to remove unnecessary features of alter the way results are formatted is quite trivial. Salford University have just launched a new site search powered by a Google Search Appliance and their results pages show what can be done with some clever XSLT – gone is the advanced search and look out for the nice file type icons using image replacement.

Finally, one last plug for Martin Belam’s Taking the ‘Ooh’ out of Google – he shows loads of simple ways to improve your results and it’s essential reading for anyone interested in site search.

BathCamp – a BarCamp in Bath

September has been ridiculously busy both at work and in Real Life™ and now it’s time to play catchup and tell you, dear readers, what I’ve been up to.

First of all, skip back a few weeks and I went down to Bath for another BarCamp.

BarCamp is an ad-hoc gathering born from the desire for people to share and learn in an open environment. It is an intense event with discussions, demos and interaction from participants.

I’d already been to BarCamps in Leeds and Newcastle (BarCamp North East) but BathCamp seemed set to be a little different.  The idea of running a BarCamp in Bath was originally thought up by a bunch of Museums type people but after the initial idea, the event diversified a lot to include the wider geek community.  There were still a lot of museums and Higher Education people there, which was great – at other BarCamps I’ve been pretty much the only person from HE – but not so many to disturb the balance.

An early start on Saturday morning to drive down to Bath got me to Invention Studios before most people had arrived.  There was the usual scramble to get a slot – I bravely went for the after lunch session.

A brief summary of some of the sessions (thanks to Tim Beadle via Jack Martin Leith for the names):

  • Giles Turnbull: Why Web Advertising Sucks.  Great talk about how web advertising is different to print and a lot of people haven’t worked out how it works yet.
  • Brian Kelly: Time to start thinking.  Slides with a black background and no UKOLN logo – what’s going on here!
  • Tim Beadle: Techies are from Mars, Marketers are from Venus.  Interesting to hear a bunch of geeks talk about how marketing don’t understand but quite a lot of willingness to enagage, which is good.
  • Laura Dewis: Disrupting the University: Social Learning.  Intersting to see what’s going on at the OU.  I’m not always sure the cool stuff they’re doing makes it to the front line all that quickly (based on what I’ve heard a few students saying) but they really push the boundaries.
  • Mike Nolan: Learn to juggle/drop balls.  Not sure the people downstairs appreciated the constant thuds of people dropping balls but I enjoyed leading the session.  Some people picked it up really well, others could do with practicing a little longer!
  • Rick Hurst: The Gurgitator.  Crazy way of scaffolding a web application using any framework from a standardised configuration.  Not sure I’d use it but interesting to see!
  • Ian Ibbotson: Opening up academic publishing
  • Drew Jones / Stuart Lowes: sfImageTransform: an image-manipulation
    plugin for Symfony
    . sfThumbnailPlugin on steroids.  A bit like WideImage but more extendible and more symfony-like.  Release is due soon so worth keeping an eye out for.
  • Mark Ng: Delicious Pecha Kucha. 20 seconds to talk about each of your last 10 delicious bookmarks.

In the evening there was a pub tub quiz before decamping to a bar in a former railway station.  Sunday had a few more sessions.  Overall there was a really friendly atmosphere at BathCamp making it my favourite BarCamp so far.  Many thanks to the organisers and sponsors for making a great weekend.

Ian Forrester on Backstage

Last night I went down to my second GeekUp meet at 3345 Parr Street in Liverpool. If you don’t know what GeekUp is, here’s a quote from the website:

GeekUp is a community of web designers, web developers, and other tech-minded folk from the North West. Our socials take place once a month in Leeds, Liverpool, Manchester, Preston and Sheffield they are always a lively place to share ideas and spread a little knowledge.

Ian Forrester.  Creative Commons licenced by Gavin BellThere’s usually a couple of talks before moving to the bar for chat and beer and this month’s talk was by Ian Forrester, the man behind

Backstage is a community built around data made available by the BBC. It encourages the public to make use of the data for cool stuff and highlights what the Beeb is offering. I’m not going to go into all the prototypes which have come out of backstage or list the feeds and APIs they advertise – you can find that out from the website – but there’s other interesting things going on as well!

Since backstage started, it’s focus has been on feeds and APIs but that seems to be changing now. They’ll soon be starting a fortnightly online show featuring interviews with the tech community, introducing the work people are doing and explaining the web in a bit more detail than BBC Webwise. This will be done on a shoestring, but with help from other areas of the BBC (such as Click) they hope to maintain high production standards.

At a slightly larger scale, backstage are joining up with IT Conversations to record speakers at UK based conferences. Traditionally there’s been a notable US-bias towards this kind of material so it will be great to see a bit more variety to the speakers.

The final thing (that I’ll talk about) is the support backstage and Ian himself are giving for tech events. While living in London, Ian organised BarCamps, GeekDinners and supported dozens of events. With his move to Manchester, he’ll be shifting some of his attention to what the North can offer. There was discussion of starting GeekDinners in Manchester (not as a direct competitor to GeekUp, it should be noted) and other web/tech events in the North are getting backstage support and sponsorship.

So a very interesting and informative GeekUp Liverpool this month, and very different to the last one I attended. There’s a great community of web developers and designers in and around Liverpool and GeekUp can play an important part in bringing people together so I’d encourage anyone with an interest in the web professionally (or even just a strong interest in technology) to pop along to see what it’s all about.

10ish five-minute ways to improve your website

IWMWThere’s some speakers to do the conference circuit who recycle the same old material each time they present and if I’m not careful I could turn into one of them! At this year’s IWMW, they held a “BarCamp” session. If you’re already familiar with BarCamps then don’t get too excited as it wasn’t a proper one, but it stole elements of the unconference concept to provide a forum for anyone attending the workshop to get up and talk about something that interests them. The organisers converted one of the 45 minute discussion group sessions into two 20 minute slots and provided nine rooms of various sizes to use.

Since I suggested it, I figured I should support it and put myself down for a session. I was busy preparing for my main parallel session so I didn’t have time to think of anything new, so I recycled my BarCamp North East session and delivered that. In Newcastle I only had a few people turn up so I was very pleased to see the room packed with about 30 people this time (although that included three from Edge Hill, apparently there to give me “moral support”).

I came up with the idea for the presentation after realising there were some really easy things that I’ve added to the site that not many other Universities seem to do. [I should add that I’m not saying we were first or unique with any of the suggestions, just that they’re not all “obvious”]. They include things like adding a link tag to your homepage so that the RSS feeds you provide can be easily discovered and wrapping your page footer in an hCard microformat.

It’s pleasing to note that the feed autodiscovery suggestion has got quite a lot of attention. A couple of weeks ago Brian Kelly (UK Web Focus, UKOLN) highlighted the that few Scottish universities were doing this and having already delivered my session at BarCamp North East I wasn’t too surprised, but one of the innovation competition entries showed autodiscovery is quite rare across UK HEIs. Tony Hirst explains the system on then check out the full name-and-shame list.

Edge Hill comes out fine for the feeds we offer on the homepage with news, events and job vacancies listed. There’s a few HEIs who offer other feeds – open days could be useful (and we have a feed available for it through a tag on the events system) – but the one that caught my eye was the University of Warwick’s recent changes feed which allows you to subscribe to find out when the homepage changes. Better still, they have this for every page in their CMS. Where this falls down is when feed readers like Google Reader just take the first feed in the page from those available through autodiscovery thus subscribing you to the recent changes feed instead of the more useful news feed.

You can see the ideas towards the end of my parallel workshop session slides (where I also went through the list) – skip to slide 41 unless you want to read about some of the “stuff what we’re doing at Edge Hill University“!

The other BarCamp session I went to was about Microsoft’s hosted student email solution, live@edu. A few institutions in the UK are in the final stages of deployment – Aberdeen already have some accounts live. Some aspects of Microsoft’s solution seem a bit less slick than Google’s while I was impressed with it’s potential for integrating with other Microsoft systems.

I really enjoyed the experience of presenting and attending the BarCamp sessions and I’d love to see them extended. My personal view would be to scrap the discussion groups, merge them into a solid block – say 2 hours in the afternoon of day two – and make the types of session clearer, whether they’re technical vs marketing or presentation vs discussion.

Other people talking about the BarCamp sessions:

  • Jeremy Speller: “I like the BarCamp idea – quite a lot of pressure to pack interesting stuff in in 15-20 minutes – but I think the format worked well.”

BarCampNorthEast: Day 2

SundayRecharged by all I could eat from Newcastle’s finest only Mongolian-Chinese restaurant (it was very good) I headed back to the Art Works Galleries for day two of BarCampNorthEast.

I caught the second half of Tom Morris’ session on Citizendium. It’s like Wikipedia but with stricter editing rules and obligatory disclosure of editor’s real names. Articles are reviewed by experts in the appropriate field, written with different audiences in mind and attempt overall to have a higher quality.

While this sounds like a noble cause, there’s something about it which makes me feel uneasy. In my notes for the session I put “seems a bit like Starship Troopers“:

Service guarantees citizenship

Next up, Gareth Rushgrove led a discussion session “do you need to move to London to further your career?”

I found it hard to draw any clear conclusions – a lot of the people there who had lived or worked in London said it wasn’t all it’s cracked up to be, but that it’s also possible to fill your week with tech meet-ups.

Mark Ng has moved out of London to Bournemouth and now won’t be taking any more work in central London.

What are my views?! Clearly I haven’t (yet?!) found the draw to London to be too great to resist. Working in the Higher Ed sector is probably different to commercial work, but the tech community in the North is vibrant and growing. BarCamps in Leeds, Manchester and Newcastle show there is the demand for and ability to organise good events. GeekUp meetings are now happening monthly in Liverpool, Preston, Leeds, Manchester and Sheffield. The thing the North doesn’t have is any big digital agencies, but do we or the industry need them?

Lightening talks up next where several people talk for a short time. First part overlapped with the moving to London discussion so I caught a few towards the end. Rather than attempt to write up anything that makes sense, here’s a brain dump:

  • Calais
  • Tagaroo
  • FeedShaver by Mark Ng and symfony powered!
  • BBC tagging service – an internal system that they use. Should it be “radio1” or “radio-one” etc. Provides a vocabulary of tags.

Jure Cuhalev, Head of User experience and Community Manager at Zemanta gave a brief demonstration of their system. It takes your blog posts and suggests relevant links, photos, articles and tags. Quite cool stuff and worth watching. Jure’s also a bit of a professional conference-goer – check out his blog for reviews of BarCampLondon4, Thinking Digital, @media and that’s just in the last couple of weeks!

BarCampNorthEast co-organiser Alistair MacDonald gave an introduction to Geocaching. If you’ve not come across Geocaching before, it’s a treasure hunt game played all over the world using GPS to find caches. There are hundreds of thousands all over the world including a couple left by my Explorers 🙂

Last session for the day was Emma Persky (blog) talking about Hand Gesture Recognition. Using a webcam she demonstrated how hands (or anything else!) can be tracked around the screen. The movement can then be used to control systems such as TVs.

Tara Hunt's Photo CCSo that was all for day two – after a quick tidy up of the venue most people went across town (maybe – I had little concept of where I was!) to the famous Belle and Herbs where I had a heart-attack inducing Breakfast Club:

The Breakfast Club is big – really big – you just won’t believe how vastly, hugely appetite-quenchingly big it is.

Now you tell me!

Overall thoughts? Great event. Different to BarCampLeeds, but not in a bad way. The smaller number of participants led to a more intimate feel.

I did mention that I’d comment on how HE conferences can learn from BarCamps but I’m going to leave that for another time. Hopefully I’ll be posting soon about issues like Twitter, live blogging, use of Laptops at conferences and a whole bunch of other topics, but if I forget, give me a gentle reminder!

BarCampNorthEast: Day 1

Last weekend I took a drive to Newcastle for BarCampNorthEast – 350+ mile round trip for the region’s first BarCamp. If you don’t know what a BarCamp is, the wiki describes it as:

BarCamp is an ad-hoc gathering born from the desire for people to share and learn in an open environment. It is an intense event with discussions, demos and interaction from participants.

It’s a form of Unconference which are growing in popularity and now happen in cities all over the world. The by-the-people, for-the-people ethos is what makes BarCamps different to virtually all mainstream conferences.

The organisers of BarCampNorthEast had booked out the Art Works Galleries for the two days to provide plenty of space for informal discussions as well as four areas for sessions.

Continue reading “BarCampNorthEast: Day 1”



This weekend I took a trip up the M62 to Leeds. No, not for the SSWG shopping trip, I went for a day of technology! That’s right, I experienced my first BarCamp!

BarCamp is an ad-hoc gathering born from the desire for people to share and learn in an open environment. It is an intense event with discussions, demos, and interaction from participants.

BarCamp is a network of unconferences organised all over the world. They’ve been going for about two years and are growing in popularity. Yesterday was BarCampLeeds, held at Old Broadcasting House, the former BBC studios now owned by Leeds Met University.

Since BarCamps are presented by and for the participants, and this being my first BarCamp, I “had to” do a session. After racking my brains all week I ended up finishing a presentation at 1:30am on Saturday morning. The subject was basically an introduction to symfony and why you should use it, combined with some case studies of the work we’ve done here at Edge Hill (with some slides borrowed from Alison’s IWMW talk!)

With nine timeslots and up to four rooms in use at any one time there was a wide variety of subjects ranging from entrepreneurs imparting their experiences to Live Coding demonstrations of Ruby. I tried to mix some business talks with stuff about web technologies and found it all pretty enjoyable.

The highlight for me was Tom Smith talking about “Stuff I Know”. The slides are online but they’re unlikely to make much sense. The line that sums it up was “pair programming is a bit like bran… you know you should eat it but you don’t really know why”. Apparently if you don’t have a real person to pair with, talking to the teddy bear is a good substitute.

I’m glad I went and I’d recommend it to anyone with an interest in technology (not necessarily a web focus). Thanks to the enormous generosity of the sponsors the whole weekend was pretty cheap for me (although the Etap Hotel proved that you really do get what you pay for… not impressed) so the journey was well worth it.