JavaScript-like innerHTML access in PHP

As part of an update to the Five Filters Full-Text RSS service, I’ve been porting some JavaScript code (Arc90’s current version of Readability) to PHP. It contains a lot of DOM manipulation which translates very easily – thanks to PHP5’s DOM support. But one thing I wasn’t able to do was manipulate the DOM tree through the innerHTML property.

In JavaScript, it’s very easy to do. The Mozilla Developer Network’s page on innerHTML gives the following example:

var content = element.innerHTML;  
// Returns a string containing the HTML syntax describing all 
// of the element's descendants
element.innerHTML = content;  
// Removes all of element's descendants, parses the content 
// string and assigns  the resulting nodes as descendants of 
// the element.

Using PHP’s magic getter and setter methods, it’s possible to extend DOMElement to achieve this type of access and manipulation. My attempt at doing it is JSLikeHTMLElement. Here’s an example of how to use it (with relevant lines highlighted):

require_once 'JSLikeHTMLElement.php';
$doc = new DOMDocument();
$doc->registerNodeClass('DOMElement', 'JSLikeHTMLElement');
$doc->loadHTML('<div><p>Para 1</p><p>Para 2</p></div>');
$elem = $doc->getElementsByTagName('div')->item(0);

// print innerHTML
echo $elem->innerHTML; // prints '<p>Para 1</p><p>Para 2</p>'

// set innerHTML
$elem->innerHTML = 'FF';

// print document (with our changes)
echo $doc->saveXML();

Download: JSLikeHTMLElement.php. Feedback appreciated.

This entry was posted in Code and tagged , , , . Bookmark the permalink. Both comments and trackbacks are currently closed.

12 Comments

  1. Chris Dary says:

    Hey Keyvan,

    I’d be really interested to see your progress on porting the changes of Readability to PHP – I took at look at five filters’ source control (here: http://bazaar.launchpad.net/~keyvan/fivefilters/content-only/files ) but that source looks pretty out of date. Do you have any plans on putting your work up anytime soon? Great work so far!

    Chris Dary – Tech Lead on Readability at Arc90

  2. Keyvan says:

    Thanks Chris! Readability 1.6.2 has been ported over and will be available soon (probably over the weekend). It’s being used on http://fivefilters.org/content-only/ right now. I’ll email you when it’s up.

    Thanks again for the great work on Readability. I’ve just seen your changes to 1.7 and I’m tempted to squeeze a few of those changes in before I release. πŸ™‚

    Keyvan

  3. Kevin says:

    Hi Keyvan,

    This is exactly what I was looking for! Wasn’t looking forward to writing my own implementation! I’ve integrated it into a project I’m working on and will give all due credits.

    Great work!

  4. Keyvan says:

    Kevin: that’s good to hear – glad it helped! πŸ™‚

  5. Benny Born says:

    Oh great, exactly what I needed – I would have gone crazy if I hadn’t found this πŸ™‚

  6. Keyvan says:

    Benny: glad it helped. πŸ™‚

  7. You just saved my life… πŸ™‚ Thanks!

  8. Kyle Robinson Young says:

    Genius! Works like a charm. Thanks!

  9. Fabien says:

    Thanks a lot, your function save my life πŸ˜‰
    Works very fine for me πŸ˜€

  10. Terry Lin says:

    Hi, Thanks for your post. I cannot download the script you provided. When I clicked the link it redirected me to the homepage. Can I have the script? Thank you.

  11. Keyvan says:

    Hi Terry, sorry about that, BitBucket removed domain mapping from their offering so broke a bunch of our URLs. I’ve updated the links to point to the correct location.

  12. Erik says:

    Very nice! Much easier than other methods cited on the net.