Term Extraction in PHP

The new version of the term extraction tool on fivefilters.org is now in PHP.

Read the blog post explaining what’s new.

For anyone looking for a simple way to carry out term extraction on English text using PHP, here’s a snippet using the PHP port of Topia’s Term Extractor:

require 'TermExtractor/TermExtractor.php';

$text = 'Politics is the shadow cast on society by big business';

$extractor = new TermExtractor();
$terms = $extractor->extract($text);

// We're outputting results in plain text...
header('Content-Type: text/plain; charset=UTF-8');

// Loop through extracted terms and print each term on a new line
foreach ($terms as $term_info) {
  // index 0: term
  // index 1: number of occurrences in text
  // index 2: word count
  list($term, $occurrence, $word_count) = $term_info;
  echo "$term\n";
}
This entry was posted in Code. Bookmark the permalink. Both comments and trackbacks are currently closed.