Tag Archives: search

Google Site Search

We’ve just launched a new beta version of the corporate website site search engine which will in the coming weeks replace the existing site search.

The new system is powered by Google Site Search – a version of the search engine that restricts results to a given set of pages. Currently it is available for site-wide searches before we extend it to power our main course search engine.

Try it out now and provide us with some feedback with how you find it.

Google search results preview

Google Search Results for "edge hill university"

Yesterday I noticed Google’s latest incremental feature on their search results page. It’s not long since they launched full results search as you type and now highlighting a search result pops up a thumbnail preview of the page.

This means people will be able to judge the contents of a site before they even get there making it even more important that we make an impression, and now in a small image where text can’t be easily read.

It does however provide an opportunity to show that there is extra content if we were to push “below the fold”, that is make our homepage longer so that not all the content appeared without scrolling.

Technically it’s quite an interesting implementation. They’re using data URLs to put the image content into a single request and long pages are broken into several sections each up to 405 pixels high.

Learning Something New Every Day

Today, I learnt about cononical linking.  Canonical linking is a way of letting search engines know that your content is accessible through multiple URLs, by publicly specifying the preferred URL of page content. This prevents Google penalising your site for having duplicate content.

I first came across the possibility of  search engines penalising sites for duplicate content when following WPDesigner’s excellent tutorial on creating WordPress Themes. Here, he recommends ways to change the content of pages which might be viewed as duplication by Google – Prevention not cure.

It wasn’t until today that Mike recommended that I “canonically link” Rose Theatre event pages because they are almost identical to the same page in the events section of the site. To do so, within the <head> tags of each Rose Theatre event page, add a link like the one below:

<link rel="canonical" href="http://www.edgehill.ac.uk/events/2010/03/09/stand-up-comedy" />

The link tag is an empty tag, and the use of the “rel” attribute defines the canonical nature of the tag. In our case we generate the link URL using symfony routing rules, URL parameters and the page slug so we don’t need to add the link for every page.

So now, when you visit Stand Up Comedy check out the source code and you’ll see the link, just like Google does.

Top 10 of 2009

It’s that time of year where you have to post an annual wrap-up of the previous year’s posts.

10. Google Apps Mail – POP/IMAP/iPhone

Just sneaking into the top ten is Steve’s introduction to some of the ways to access email for users of the new Google powered email.

9. New Departmental Sites

Sam introduces a summer’s worth of hard work (not so much from me – I was driving across America!)

8. Create a better search engine than Google

A post from late 2008 writing up a presentation I gave at BarCamp Liverpool and repeated at IWMW 2009.

7. Twouble with Twitters

An attempt to balance out #2 in the list by taking a sideswipe at those who are maybe a little too addicted 🙂

6. Argleton goes national!

A write up of some of the early coverage of the Argleton meme.

5. Rise of the Mega Menu

Coming soon to a website or portal near you – still a few things to iron out but we hope you’ll like what we do.

4. Roy Bayfield at the TV advert filming

Live on the set of the forthcoming TV advert and testing out the new Flip Camera we’ve got.

3. Browser stats

Everyone loves web stats, okay maybe it’s just me! Six months on and Internet Explorer has dropped to 76.9%, Firefox down a little to 13.5%, Opera has held steady while Webkit-based browsers, Safari and Chrome, have jumped to 5.6% and 3.6% respectively.  Breaking down IE shows IE6 use continues to fall (down to under 11%) while IE8 usage has trebled.  There’s hope for a standards-based-browser future yet!

2. What should @edgehill do on Twitter?

Little did I know when I wrote this post that it would unleash such a debate!  Ironically we’ve just had the 2’ of snow that benefitted Bath’s uptake so we’ll see whether usage grows!

1. Google Renames Village

And in at #1 is a little post I fired out about a typo on a map 🙂

Tags

As well as individual posts, a number of tag pages that rank pretty highly including “symfony”, “argleton”, “google maps”, “twiterdeck” and “facebook”.

Create a better search engine than Google

The findings and opinions contained within this post are entirely mine and do not necessarily reflect those of Edge Hill University. The research and write up was done in my own time and is only posted here because it may be of interest to the HE community.

At BarCamp Liverpool I gave a talk about site search. It covered the same sort of topics as Martin Belam’s Euro IA Summit presentation Taking the ‘Ooh’ out of Google and I recommend that anyone interested in site search read his series of blog posts. Go read it now – it’s way better than I’m going to write here! While Martin reviewed news websites from across Europe, I’ve turned my attention to university websites and look at what institutions are doing now and how it can be improved.

Do we really need site search?

Before we get too tied up in what to do, it’s perfectly valid to question whether we even need a site search engine any more. Every internet user knows how to search – specifically, eveyone knows how “to Google”. The average query length has almost doubled over the last few years and there is some evidence to show people are using site names to restrict their search – for example “english pgce edge hill”.

Google have also introduced “search this site” boxes into search results which mean that when you search for “Edge Hill” you can filter down your results by making an extra query. Putting “english pgce” into this search box gives results for the query “english pgce site:edgehill.ac.uk”. This feature is already present for most HEIs:

Google Site Search for Edge Hill University

So how many people are using this feature? Our Google Analytics reports show that in the last six months, just 150 visitors to our site came from searches including “site:edgehill.ac.uk”. A much higher number of people are including “Edge Hill” in queries. Excluding searches which are looking diretly for us – for example “Edge Hill University”, “www.edgehill.ac.uk”, or worse or all “edgehill uni” – around 10% of Google referals come with some form of restriction to search our site.

How does this compare to our own site search? We’ve had 270,000 searches in the last six months, with 7% of visitors using it. Additionally, the point where people are wanting to search is usually after they’ve left Google and come to our site. Someone looking for courses for example isn’t going to go back to Google to search – they need seach within the context of the pages they’re looking at.

State of the Union Universities

One weekend in November I spent a few hours going through HEI websites and testing their search engines. On each site I searched for “computing” – most universities offer some form of computing course and it would typically find their IT Services department. I noted down which system they used for search, and whether they provided the ability to search just courses, news and events. All quite basic but it has still offered up some interesting findings.

Google Search Appliance by Adriaan BloemOver the last few years most universities have looked to Google to provide search. 63% of HEIs now use some form of Google-powered search engine. Most of those have bought a Google Search Appliance (or Google Mini) – a server you install locally which provides your own miniture Google Search engine. The interface and results are familiar for users – it’s clean and quick.

Others are using either Google Custom Search Engine or what I’ve recorded as “Google Syndicated Search” – both similar services where Google will spider your site remotely with no requirements to run your own servers. CSE allows you to embed the results within your own page while Syndicated Search allows basic branding to be applied to results displayed on a Google domain.

In a distant second place is ht://Dig – a system that’s been around for years. When I first came across it many years go I was amazed that I could run my own search engine. It was one of the first widely deployed site search systems and so a number of universities were early adoptors. It hasn’t moved with the times – the latest release of ht://Dig was in June 2004.

The short tail includes Ultraseek, Egothor and Novell’s QuickFinder – the system that powered Edge Hill’s site search until April this year. A few sites use search engines built into their CMS or custom built but it’s hard to find out much about these.

This is all very good but in many ways, the engine doesn’t matter – it’s what you do with it that counts!

The ugly, the bad and the good

Google has led the way with clean, uncluttered results pages but some search engines haven’t learnt that less is more and seem compelled to pack in every feature they can. Star ratings or other indicators of the quality of the page add little and can be distracting. Users know to expect that if a page is higher up the list then it’s a closer match to their query so it may not be necessary. On our own search we have a percentage relevance but I’m not sure it adds much value to the visitor so it might be getting cut soon.

The above example from Keele also shows another problem with some search engine blindly indexing the full content of pages. We know exactly which parts of a page are relevant and which are navigation bars, headers or footers. Including these in the index can distort results while adding little value. A search for “Freedom of Information” might return 10,000 pages simply because it’s linked from the footer of every page.

Worse than indexing these areas is displaying the text as part of the result summary. Here the first line of every result is the navigation.

Sussex University searchReturning an unfamiliar layout for the results page can lead to confusion. The University of Sussex have a combined results page showing “top matches”, news, events, and “full text” stacked in one page.

Advanced search

Many site search engines offer some form of “advanced” search and this can be a very useful feature for power users to track down more precisely what they’re looking for. This extra functionality inevitably comes with the risks of extra complexity. With Google Search Appliances the advanced search page is similar to the one on the main Google site and provides some options that might not be necessary. If the entire site is on a single domain or the only language is English, why do you need the option to restrict searches by either of these (maybe this is a question Google should be answering because they can tell from the index if there’s multiple domains or languages).

Topic specific searches provide potential for a whole extra level of complexity. Nottigham Trent University’s course finder requires you to enter level, mode and year of study, subject area and keywords. Why not just a simple box?

Advanced search can be very powerful and easy to use. The University of Warwick have just a few additional options, specifying dates using a drop down rather than a datepicker, and allowing searches to be limited to certain types of document or areas of the site. Edge Hill’s course search has just one “advanced” option – course type.

Tabbed Search

This is something Martin Belam urges caution over. The user may not understand the structure behind this scoping of search and thus limit the results more than they intended. Edge Hill uses tabs to provide scoped search for courses, news and events. Warwick also has a tabbed interface to pull in search results from their blogs and people finder. Care must be taken to ensure that scoping isn’t confused with site navigation and it makes sense to show visually where there are more results. Warwick use some nifty Ajax to load up the number of search results for each tab – an idea I liked so much I implemented it on the Edge Hill too!

More neat ideas

News search on the University of Oxford is integrated right into their news system – results are presented in the same format as the news homepage showing the latest stories making the results pages familiar. Each summary is accompanied by a thumbnail and icons flag up news stories with audio or video attached.

Everyone has a custom 404 page these days to help direct users to what they’re looking for (you do, don’t you?!) so why are most search engines unhelpful when there’s no results? At best they’ll correct your spelling and provide some hints on rewording your query – at worst they’ll just dump “No results found” to screen. This is turning visitors away when they most need our help. Most sites have an A to Z list or sitemap; prospective students can be helped by linking to the full list of courses.

Auto suggest for popular queries can be a good way to encourage users to search more acurately. I’ve not seen anything similar on HEI sites yet but Facebook’s search does a neat job of integrating specific people, groups and organisations with wider search – selecting these takes yuu direct to that page – a really handy quick links feature.

Northumbria’s search tag cloud is a little out of the way but it would be possible to add in a top searchs list somewhere a bit more visible. Storing a user’s search history would be very easy to do (even client side) and with a little logging enabled it would also be possible to implement “people who searched for X also searched for Y” functionality.

The final neat feature I came across was a few sites making use of Google’s KeyMatch feature. This is similar to Google AdWords but allows the site owner to specify what should show up for specified keywords. During my presentation someone suggested that users would blank these results because they see them as adverts. I’m not so sure – I think people will click the links if it’s clear where it’s going to. KeyMatch is a good way of making sure that important pages are at the top of the results even when the search algorithm doesn’t rank them.

All talk?

By now I’m sure you’re fed up of my holier-than-thou ranting about site search and I want to stress that what we’ve done at Edge Hill isn’t perfect by any means! The algorithm used for indexing and searching is quite crude in places but I think we’ve been able to improve the user experience by adding a few neat features and trying to keep in mind what the user is likely to search for, not just what a spider can find.

The search system we use is built on Zend Lucene and has had around a week of development hours over the last year but not everyone will be able to do that, so what can be done with existing resources? In most systems, changing templates to remove unnecessary features of alter the way results are formatted is quite trivial. Salford University have just launched a new site search powered by a Google Search Appliance and their results pages show what can be done with some clever XSLT – gone is the advanced search and look out for the nice file type icons using image replacement.

Finally, one last plug for Martin Belam’s Taking the ‘Ooh’ out of Google – he shows loads of simple ways to improve your results and it’s essential reading for anyone interested in site search.

Twitter Part 3: Into the real world

TwitterAfter a short break from blogging while we finished off the redesigned website I’m back with the third and probably final part of my guide to Twitter.

It’s a fast moving world and since the last post, Twitter have stopped delivery of SMS to UK mobiles:

Let’s start with the bad news. Beginning today, Twitter is no longer delivering outbound SMS over our UK number. If you have been receiving SMS updates from Twitter via +44 762 480 1423, you’ll notice that they’ve stopped and you may want to explore some of the alternatives we’re suggesting.

Despite the title of the post, there is no good news for UK users! You can still send updates by SMS, which is quite useful for those “oh my God, I just saw a monkey run down the street” moments, but no longer can you make it seem like you’ve got friends by activating a stream of messages to your phone.

They suggest some alternatives which all rely on having data on your phone that’s not over priced – maybe it’s time to look into an iPhone after all! There’s also been a flurry of announcements from third parties who are readying to launch services to deliver tweets by SMS. These services appear to be around the 7p per message mark which IMHO is too expensive – I know first hand how much texts cost in bulk and this is a significant markup!

There’s a variety of other issues around this – these services will probably require you hand over your username and password which should be a practice that’s discouraged and Twitter don’t have any way of grouping or categorising your contacts. If I was going to pay to receive notifications I’d want more control over how many messages I receive from which people including the ability to differentiate between direct messages, @-messages and “noise”.

Anyway, back to the point of this post – how Twitter can impact on the real world. I’m going to cover a few examples of how Twitter has gone beyond virtual interactions.

Engaging with your community

One of the first things that brought home to me that services like Twitter have real uses was unrequested support from my ISP, PlusNet. I tweeted about some trouble I was having with my connection and within a few hours someone responded saying they were following up my problem. And I’m not the only one who’s found this:

Many other companies actively search for references to their products and services on Twitter as well as more generally online. Done well, it can be very good PR as well as improving the experience of users – everyone can see that you’re actively trying to solve problems.

Asking for help

Once you’ve built up a bit of a following, it’s time to start using them! Asking questions or inviting feedback about ideas can give you very quick results. It can also be a good way to expand your network – followers of followers will see the @-replies and maybe if it’s interesting will follow up the original question.

Conferences

One of the best uses of twitter I’ve found is acting as a supplementary back-channel for conferences. Either live blogging the event or just making contact with other participants, Twitter can connect people online in a physical location. At IWMW a significant number of people were Twittering – you can see the full list of posts referencing #iwmw2008 or @iwmw2008 through search.twitter.com.

I’d planned to write a bit more on the topic but I’ve broken the golden rule of blogging:

Never leave a post in draft for more than 48 hours

One final thing I will add is a note about sustainability. There’s a lot of questions about the reliability of Twitter (the feared “fail whale”), users outside US/Canada/India have complained about the switch off of SMS, and it’s a relatively closed system. So many people suggest alternatives – Jaiku (now owned by Google), identi.ca which claims to be an open, decentralised system and a variety of “life straming” services which build on the simple microblogging offered by Twitter but all have one key thing missing – people. No other services can match the range of contacts that can be found there and that’s what makes it so appealing.

Twitter Part 2: Bringing order to chaos

TwitterLast time I covered getting started with Twitter, building your network of contacts and interacting with others. This time I’m going to discuss some ways to manage your Twitter subscriptions and discover tweets about topics you’re interested in.

The easiest way to use Twitter is to login to the website to read and post messages. The web interface provides a way to see replies, search for people and send and receive direct messages. This works fine for general use but you have to remember to check for new messages on a regular basis. It would be better if messages came to you, which is exactly what you can do with an SMS gateway.

This frree service, operated by Twitter themselves, lets you link a mobile phone number to your account and have messages sent directly to your phone by text message. You can choose to have only selected users’ messages sent by SMS, restrict the hours of the day messages are delivered and you can even send messages to Twitter by SMS. There’s a limit of 250 messages per week so if you follow more than a handful of people you’ll want to limit which users you’re subscribed to.

SMS isn’t the a perfect solution though – it can be quite intrusive and best reserved for people who you’re really interested in. Twitter used to allow you to connect your account to an Instant Messaging system such as Live Messenger or an XMPP-compatible service (which include Google Talk and our own go.talk). Unfortunately in the struggle to cope with growing numbers, Instant Messaging gateways have been turned off. Fear not, because there’s an even better way to work with Twitter!

TwhirlTwhirl is one of many applications designed specifically for managing your Twitter accounts. There are many such programs for different operating systems and even some more advanced mobile phones. Generally though, they plug into the Twitter API and offer access to most of the features available through the Twitter website and often many more.

The Twhirl window is a bit like a combination of the friend list and message windows from a normal IM program. New messages appear at the top and you can post messages. Username, hashtags and messages are hyperlinked to give you more information and offer access to functions without over cluttering the interface.

I mentioned hashtags, so what are they? Hashtags are keywords put into messages starting with a hash (#) and used to identify a topic for that message. The major drive behind the adoption of them was the Hashtags.org website which required you follow the hashtags user in order for your tweets to be shown on the website. It’s still worth doing this but there’s a better way of tracking hashtags which isn’t reliant on opting in.

Usage of the hashtag syntax is very common but certainly not universal. It’s useful for keeping track of certain topics and allowing your followers to pick out at an instant what it relates to. One of the most common uses is in conferences where the hashtag creates a way of finding other people twittering. At the Institutional Web Management Workshop the tag #iwmw2008 was used and in some ways this was more useful than the official live blog service. I’m going to come back to conferences next time as the use of Twitter in the Real World deserves more attention.

For someone new to Twitter, the idea of hashtags might seem a little odd – why wouldn’t you just search for the topic you’re interested in rather than relying on an obscure opt-in service? The search box at the top of the Twitter site would (mis-) lead you to think you could bang in some keywords and get back useful results! No? Of course not – the search system on the main site is next to useless!

Fortunately the clever people at Summize had the solution and have developed a real-time search engine for Twitter messages. This is really neat work (far more impressive than Twitter itself IMHO) – so neat in fact that last month Summize was bought by Twitter and integrated to become search.twitter.com. Twitter Search is fantastically easy to use yet very powerful.

At its most simple, put keywords in and it’ll give you results back but you can also use it to search for replies, hashtags, limited by date and much more. The service is really quick and it even has some Ajax goodness which tells you when there’s new results matching your search without having to keep reloading the page. Best of all, if you’re a feed-nut, you can subscribe to any query as an RSS feed so you’ll not miss a tweet!

Twitter Search is a great way of finding people or topics of interest and next time I’ll cover some real world ways to use it!

Choice Part 6: Lucene in the sky with diamonds

Search is one of the key ways that visitors find what they’re looking for on our websites. A good search engine can quickly and acurately direct the user to the right place and make for a more efficient and productive experience.

In the past we’ve used Novell’s QuickFinder search service to spider the site, supplemented by a couple of custom search systems for things like courses. I’ve never been entirely happy with the results that QuickFinder provides.

Recently in Higher Education and beyond, there has been a trend towards Google’s search appliance and their hosted solutions. Both of these are excellent in terms of raw power – they will happily index every page on a site and searches are quick and mostly relevant. But there’s more to a good search engine than the size of the index – they must provide the results you’re looking for and present them in an easy to understand way. Here’s a fairly typical example of the top search result for a search for “Computing” (I’ve removed identifying names!):

The University of Somewhere

For Edge Hill it’s important that prospective students are able to find what they’re looking for. So in the above example it’s good that it has picked a page about the academic department rather than what at Edge Hill would be IT Services, but it’s actually the Faculty page giving the briefest of details. The summary doesn’t help at all – the spider has picked up details from the page header including the alternative text from the logo and the breadcrumb trail.

What we want are relevant results which allow the visitor to quickly identify what pages have been found with information that’s relevant to the results, not just scraped text. Some search engines are starting to do this – when Google finds videos it will show a thumbnail and allow you to play the video inline – so we can use some of these ideas when creating our own search system. Now let’s get a bit more technical!

Our website can be split into two types of information – structured and unstructured. When I say unstructured, I don’t mean that it’s hundreds of pages put online without any consideration – I’m talking about web pages of content that aren’t stored in a database. Structured information is pulled out from one of our databases – things like news, events or courses. Structured content is what most search engines find difficult because they don’t “know” what a page is all about, but we do, so we can tell our search engine what information is important and how we should represent it.

For our new website, we’ve introduced a new search system based on Zend Lucene. Lucene isn’t a full blown search engine, but it’s a library you can build on to provide full text indexing of almost anything you want. We’re using a symfony plugin which packages a lot of search functionality to allow us to index news, events, courses and other information directly from the database. We have control over what information is indexed for each type and the weightings applied to them. For example we give courses a slightly higher weighting than news.

For static content we have a custom spider which trawls all the other pages on the site and adds them to the index. This work like any other search engine, following links and determining which text is relevant. We try to exclude the header, footer and navigation from the index as this contains text which is common to many pages and adds little to the value of the page.

Edge Hill’s computing search resultWe can also do a lot with the search interface itself. Firstly, different types of result show different information. For example a course result shows the UCAS code, qualification, which campuses it runs at and allows the course to be added to the My Courses basket for comparison. News and events shows similar custom results while static pages show the usual snippet of text from the page, but without irrelevant text from outside the content area creeping in.

Overall the new search seems to be working quite well – we’re able to embed it into the rest of the site more than we’ve done in the past and provide custom search boxes for courses and news. There’s still work to do on it though to improve the accuracy of results, so if you’ve tried the search and not found what you were looking for easily, please let us know.