Interaction Design: Serving Corporate Needs

Yesterday I got back from the HCI 2011 conference in Newcastle. Met some very nice people and had a good time.

If anyone’s interested in the paper I submitted, it’s copied below.


This paper looks at the ways in which professional interaction designers, despite the all too common rhetoric about serving humanity, end up uncritically serving corporate needs. It covers the conflict between the priorities of business and the goal of design; the influence of universities in setting students up to serve business interests; and how designers can resist by pursuing their own goals as radical professionals.

Read More »

Posted in General | Comments closed

Content Extraction at

Full-Text RSS 2.7 from is now available. I thought I’d write about one area of improvement in this release: content extraction.

Automatic Extraction

Up to now we’ve relied mainly on PHP Readability to automatically identify and extract articles from web pages, and this is still how the majority of articles are extracted. It works extremely well for most pages, but there are still occasions when it fails – e.g. picks out the wrong HTML element, or doesn’t find anything at all. Improving PHP Readability will be one area of focus for future releases.

In 2.7 we still use PHP Readability, but we now recognise and prioritise hNews microformatting – if detected, we extract the first element marked entry-title and all elements marked entry-content. This is a standard that will hopefully be used more widely on the web. (For those who’ve asked, Twitter updates are now extracted properly because of hNews support.)

Site Patterns

Recognising that auto-detection does sometimes fail, in version 2.5 we introduced custom extraction patterns: a way for users to override auto-detection and tell Full-Text RSS (using CSS selectors) which element it should extract as the content block.

The biggest change in 2.7 is the introduction of site patterns. Site patterns sit in between custom extraction and auto detection. They allow fine grained control over extraction on a per-site basis. A site, identified by its domain name, can now have its own config file detailing extraction rules. Each time a URL is processed, we check to see if a corresponding site config exists, and if it does, we refer to it for instructions. Users can specify XPath expressions to match title and body elements and define rules to strip superfluous elements.

Rather than create our own configuration format for site patterns, we chose to adopt the same format used by Instapaper. Here’s what the entry for looks like:

body: //div[@id = 'content']
strip_id_or_class: editsection
strip_id_or_class: toc
prune: no

Instapaper users will find these patterns by visiting (login required).

One big advantage for us in using the same config format is that we can make use of all the existing rules listed on Instapaper. Marco, Instapaper’s creator, has opened up the database to allow for public contributions. So, included in Full-Text RSS 2.7 is over 100 site configuration files which will be applied automatically (look inside the site_config/standard/ directory). Most of these are borrowed from Instapaper, but we’ll soon be adding our own which we’ll be sharing with everyone.

Users can also create their own site config files and drop them in the site_config/custom/ directory. Each site config is simply a text file named after the site. For example, if I wanted a special rule for extracting content from this site, I would create a file with the appropriate rules inside.

Extraction Process Overview

To summarise, Full-Text RSS 2.7 attempts to extract in the following order:

  1. Custom Extraction Pattern
  2. Site Patterns
  3. hNews
  4. PHP Readability

If at any stage we find we’ve got a successful title and body match, we do not proceed further. If, however, there is no match, we move down the list until there is (the only exception here is with custom extraction patterns – if the supplied CSS selector does not match, no further attempt is made).

Sound useful?

Full-Text RSS 2.7 is licensed under the AGPL and available to try or buy at

Posted in General | Comments closed

Workers of the World Relax

Source: (via Medialens)

Posted in General | Comments closed

Processing JS plugin update

I’ve just updated the Processing JS WordPress plugin to use Processing.js 1.0. Thanks to digitalawakening for posting update instructions.

I was happy to learn last month that the plugin is being used on Golan Levin’s course Interactive Art & Computational Design at Carnegie Mellon University. See Processing Embedding.

Posted in Code, General | Tagged , , | Comments closed

The War You Don’t See

The War You Don’t See – an excellent documentary by John Pilgerbuy a DVD copy here.

Watch the rest: part 2, 3, 4, 5, 6

Posted in General | Comments closed