Archive for the 'Infrastructure' Category

Live Chat with Learning Services

Over the last couple of months we’ve been working with colleagues in Learning Services to roll out a “live chat” service allowing students to ask helpdesk staff questions online.

Here’s what it looks like for the student:

Live Chat web client

And a sneak peak of what the staff interface looks like:

Live Chat agent client

It makes use of an XMPP/Jabber server called Openfire with a plugin called Fastpath that talks to the Spark client installed on staff machines. Fastpath provides a web interface that can be embedded into the Learning Services website.

The service is currently a trial running 11am – 3pm so go along and take a look.

ERROR 2013 (HY000): Lost connection to MySQL server during query

If you’ve just set up a new mysqld service and your getting the above error when trying to connect over TCP (even when tunnelling in over SSH), don’t fret. The answer lies in /etc/hosts.allow. You need to add in a line similar to this:

mysqld: 192.168.0.123

This foxed me a while ago and I didn’t get round to fixing it as it wasn’t yet mission critical, however it also stopped us dead in our tracks yesterday. Thank fully to Neil in Core Services who had a moment of clarity we’re now back in business.

UPDATE:

Now I know the problem, a little more simpler Google querying led me here a 2004 post on MySQL’s access denied troubleshooting page which mentions that the cause of this is mysqld being compiled with tcp-wrapper support.

Your Wiki!


In an earlier post on our wiki I mentioned that we use Confluence from Atlassian this company has a good and steady release cycle. Currently we’re couple of versions behind the latest, but hopefully that’ll soon be addressed by a move to a new much faster server!

Some of the fixes and new features that we’ll get when we upgrade are:

  • The Widget Connector – a quick an easy way to embed content from Youtube, Flickr, etc etc.
  • Various fixes and upgrades to the Rich Text Editor
  • The Macro Browser – this is an awesome new tool that is part of the RTE that allows you to browse for useful macro’s to add features to your confluence pages in a quick and easy way
  • Various performance issues have been solved
  • The Confluence PDF export has been drastically improved, especially that you can provide a PDF CSS stylesheet
  • Office 2007 files are now fully supported, searchable and embeddable in the latest version of Confluence which is just great!

Is the wiki being used?

We’ve had over 20,000 page views in the past month and we still haven’t fully launched the system to the university at whole! That being said, the Faculty of Health whom I work for have been using it extensively over the past year at least.

A quick breakdown:

graph

Currently we’ve got about four and a half thousand pages in the wiki, with about sixteen thousand versions of these pages. That averages out to about four and a half versions of each page. There are also almost ten thousand documents attached to pages on the wiki.. Good work on editing people!

A load of “Gobbledigook” from an online form

You have Spam!

Web content editors, designers and developers have all worked hard to make their website interesting, attractive and functional.

A lot of time and money is spent promoting the website. People find the website talk about it and link to it from their website, blog, wiki, bulletin board ect.

Search engines then trawl through the internet looking for links and keywords (among other things). The more links the search engine finds the more interesting the target website must be (all these people linking to it, it must be good)

The search engine goes away tots up all the scores. The one with the most incoming links is the winner. They will be at the top of the search engine ranking for that month (Its not really that simple but I think this is basically what the spammers tell their clients).

The Gobbledigook you receive from the form submission always contains links to websites. The spammer is not trying to get you to click the link. Spammer wanted a link published on a website by filling in a form that would update the blog, wiki ect. Spammer is trying to get as many links as possible pointing at the clients website, increasing the site’s search engine ranking. The results can then lead to the site being listed ahead of other sites for certain searches, increasing the number of potential visitors and paying customers.

How is it done?

A computer programme is used this searches for publicly accessible forms. Once a form is found it adds content into all text fields, a non existent email address into the email field and HTML containing a link into the text area field usually comment, content or message field.

All websites that accept content via a form are at risk of receiving spam via their forms.

Solutions

Disallowing multiple consecutive submissions
Spammers often reply to their own comments. Checking that the users IP address is not replying to a user of the same IP address would help reduce the spam flooding our in boxes.
This however proves problematic when multiple users, behind the same proxy, wish to submit the same form which is quite often the case here.

Blocking by keyword
Spammers have to use relevant and readable keywords so the search engines can index them effectively
Spam could be reduced by blocking the keywords they use simply banning names of casino games, popular pharmaceuticals and certain body enhancements.

Drawback the list could be quite extensive and would have to be maintained.

CAPTCHA
Is a method used to display an automatically generated image of a combination of numbers and letters. The user then enters the letters in to a text field to validate the form.
A computer programme can not read the image and the form will not validate.

Drawbacks sometimes difficult to read and the form needs to be refreshed or submitted several times before you get a readable image.
This system can prove difficult or impossible for the visually impaired who rely on screen readers. Providing an audio version of the characters can resolve this.

CSS
Use CSS to hide a text field. A programme will find the field enter data our validation checks the field if it contains data the submit fails.

Drawback if a screen reader is used it will find the form filed and ask for data the form will then fail validation.

Distributed Solutions
Originally developed for use on blogs but now most form data can be submitted to one of the services.
When a user submits a form the content is sent to one of the services. The content is then filtered. The service looks for links and keywords it also compares the content against a database of known spam content already submitted. The content is then given a score and sent back to your server. The server then accepts,flags or rejects the content based on the values you set.
Akismet, Defensio, Mollom are some of the web based distributed services.

Drawback Valid users can be blocked. If a user is wrongly flagged as being a spammer it can be difficult for that user to post data to websites using the same service.

WordPress updated

We’ve just rolled out an upgrade to our blogging platform, powered by WordPress.  There’s a video demo of some of the new features here (although not all apply as we’re on the Multi User version of WordPress):

If you notice any problems, please let us know.

Video killed the radio star

If you’ve seen the homepage of the Edge Hill website since the new design went live you’ll see something we’ve not done before – embedded video featured predominantly on the site. Of course we’ve had video on the site for ages – we’ve been linking to a Windows Media streaming server for a several years and more recently we’ve been converting video to Flash so it can be embedded in pages (on the Careers website for example). The user experience has been mixed – availability of broadband wasn’t universal, plugin support was often sketchy and the process of getting video from tape to web complex.

That’s all changing though. The BBC iPlayer has brought online video to the masses. No longer is video a novelty, it’s expected as part of the whole website package and our job is to meet those requirements. So we’ve invested in new systems to create and manage video throughout the process from capture to encoding and streaming. The media development team have acquired a Tricaster box, currently located in the control room in the Faculty of Health, which allows them to do live mixing and a whole load of other things. IT Services (or should that be Steve Daniels) have installed an eStream system to encode and store video.

The first time you might have seen these used in anger is for the Graduation ceremonies last month. There we (Media Development and Web Services) successfully mixed the ceremony and streamed live video across the campus and onto the internet. We peaked at around 70 simultaneous connections and many more in total over the three ceremonies.

The eStream box allowed us to broadcast live video in Windows Media and Flash Video formats to ensure maximum compatibility with different systems. Since then the archive video has been available for viewing, again in both WMV and FLV formats.

As part of the new website design we wanted to allow video to be more widely available throughout the site. Corporate Marketing have been generating video specifically for the website and we needed a way of embedding this. There are a couple of aspects of the eStream system that I wasn’t particularly happy with and these were addressed specifically for the website.

Firstly we maintain our own database of videos that are used on the corporate site. Here we can store extra, website-specific information, tag videos correctly, and abstract the complexities of the eStream system – we don’t care whether video is hosted on the eStream box or elsewhere.

Secondly we use a different Flash video player. The one provided by the estream box isn’t very flexible and frankly looks a bit ugly. We’re instead using the open source FLV-Player which gives us more flexibility in how it looks and what features to offer.

The video functionality for the website isn’t just on the backend – we’re adding features on the website too. Each video has its own page which is linked to across the site, and we encourage others to link to through social bookmarking systems. For an example check out one of the TV Advert pages. On this page you’ll also see that we provide code to allow you to embed the video in your own page. Here’s what the same video looks like embedded right here:

We’re particularly happy with how this looks, especially compared to some other video sites. Here’s the same video embedded from YouTube:

It’s worth noting however that YouTube have started providing higher quality versions of some videos to view within their own site, but (currently?) only low quality versions are available for embedding.

This is just the beginning of video on our websites. Over the coming months we’ll be creating much more content – everything from students talking about courses through to the next round of inaugural lectures – and making them available even more deeply within the site.

If you think all this video really does mean the death of the radio star, fear not – we’re looking at podcasts too.

Migrating WordPress

WordPress LogoWe’re moving some websites to a new server. Hi has already been done except for the databases and this post is actually a test to see if the blogs have moved over successfully! Let us know if you see any odd behaviour!

The move will bring several sites onto newer hardware which may make them a little faster and will get rid of the blips of downtime we’ve had over the last month. We’ll also be upgrading to a newer version of WordPress MU in the very near future so if you blog with us look out for a fresh new control panel.

100% uptime in May!

Well, kind of! Pingdom’s monitoring shows that the main Edge Hill website has 100% uptime for May. A couple of our other sites had some downtime, but availability of our web services has generally been very good. So congratulations to everyone involved!

P.S. Happy Regular Expression Day for yesterday!

Choice Part 6: Lucene in the sky with diamonds

Search is one of the key ways that visitors find what they’re looking for on our websites. A good search engine can quickly and acurately direct the user to the right place and make for a more efficient and productive experience.

In the past we’ve used Novell’s QuickFinder search service to spider the site, supplemented by a couple of custom search systems for things like courses. I’ve never been entirely happy with the results that QuickFinder provides.

Recently in Higher Education and beyond, there has been a trend towards Google’s search appliance and their hosted solutions. Both of these are excellent in terms of raw power – they will happily index every page on a site and searches are quick and mostly relevant. But there’s more to a good search engine than the size of the index – they must provide the results you’re looking for and present them in an easy to understand way. Here’s a fairly typical example of the top search result for a search for “Computing” (I’ve removed identifying names!):

The University of Somewhere

For Edge Hill it’s important that prospective students are able to find what they’re looking for. So in the above example it’s good that it has picked a page about the academic department rather than what at Edge Hill would be IT Services, but it’s actually the Faculty page giving the briefest of details. The summary doesn’t help at all – the spider has picked up details from the page header including the alternative text from the logo and the breadcrumb trail.

What we want are relevant results which allow the visitor to quickly identify what pages have been found with information that’s relevant to the results, not just scraped text. Some search engines are starting to do this – when Google finds videos it will show a thumbnail and allow you to play the video inline – so we can use some of these ideas when creating our own search system. Now let’s get a bit more technical!

Our website can be split into two types of information – structured and unstructured. When I say unstructured, I don’t mean that it’s hundreds of pages put online without any consideration – I’m talking about web pages of content that aren’t stored in a database. Structured information is pulled out from one of our databases – things like news, events or courses. Structured content is what most search engines find difficult because they don’t “know” what a page is all about, but we do, so we can tell our search engine what information is important and how we should represent it.

For our new website, we’ve introduced a new search system based on Zend Lucene. Lucene isn’t a full blown search engine, but it’s a library you can build on to provide full text indexing of almost anything you want. We’re using a symfony plugin which packages a lot of search functionality to allow us to index news, events, courses and other information directly from the database. We have control over what information is indexed for each type and the weightings applied to them. For example we give courses a slightly higher weighting than news.

For static content we have a custom spider which trawls all the other pages on the site and adds them to the index. This work like any other search engine, following links and determining which text is relevant. We try to exclude the header, footer and navigation from the index as this contains text which is common to many pages and adds little to the value of the page.

Edge Hill’s computing search resultWe can also do a lot with the search interface itself. Firstly, different types of result show different information. For example a course result shows the UCAS code, qualification, which campuses it runs at and allows the course to be added to the My Courses basket for comparison. News and events shows similar custom results while static pages show the usual snippet of text from the page, but without irrelevant text from outside the content area creeping in.

Overall the new search seems to be working quite well – we’re able to embed it into the rest of the site more than we’ve done in the past and provide custom search boxes for courses and news. There’s still work to do on it though to improve the accuracy of results, so if you’ve tried the search and not found what you were looking for easily, please let us know.

Choice Part 2: A new platform

For years, most of the corporate website has been produced as static web pages using Dreamweaver. This has worked well – we’re able to ensure the design is tight, content correct and the site doesn’t grow to an unmanageable size.

To help manage content on the site, Web Services have produced a few key applications – news, eProspectus, job vacancies for example – and while they’ve worked great for each area, integration with the corporate site hasn’t gone much further than matching the template and manually linking between the static content and the dynamic applications.

With the Big Brief we’ve had the chance to build dynamic web applications into the core of the website. Instead of being an add on, our main website is dynamic and existing content is linked in. This allows pages on the site to fully embed content – for example we can have news and events on the homepage, or Faculty sites can list the courses they offer.

Symfony news roundupFor the corporate website we’ve extended our use of the symfony web framework. We’ve been using symfony for about 18 months, first for Education Partnership, then the Hi applicant website and the GO portal. I’ve posted before about some of the advantages it gives us, but it’s developed significantly over the last year and we’re starting to really see some of the benefits in terms of consistent coding standards, making use of plugins so that we’re not reinventing the wheel and allowing us to rapidly build new systems that integrate with the rest of the website.

Introducing symfony to the core of the corporate website is just the first step in making it more dynamic. We’re working on allowing visitors to the site to login to gain access to more personalised information, not just for staff and students, but for applicants, partners and other users of the site. The applications we’ve developed will allow dynamic content to be spread around the site – for example courses for a department, news feeds or relevant events.

In the next couple of posts we’ll be talking about some of the applications you’ll see on the site and then maybe, just maybe, I’ll talk some more about future plans!