Clean up HTML on paste in CKEditor

We use CKEditor at FiveFilters.org for our PastePad service. The idea is to allow users to paste content that’s not currently publically available on the web for processing with one of our web tools. This can be content that’s in a Word document, an email, or behind a paywall.

CKEditor can automatically clean up HTML it identifies as coming from MS Word, but there’s no way to force cleanup on all pasted content. By default, HTML cleanup occurs in the following two cases:

  1. User clicks the ‘paste from word’ toolbar icon
  2. User pastes content copied from MS Word itself

In the second case, CKEditor looks for signs of MS Word formatting. It does this by testing whatever you paste against the following regular expression:

/(class=\"?Mso|style=\"[^\"]*\bmso\-|w:WordDocument)/

If there’s a match, it will be cleaned up. Otherwise it will paste as normal.

I want to avoid editing core files, so my solution is simply to ensure that this regular expression always matches pasted content. Here’s what I’ve come up with:

CKEDITOR.on('instanceReady', function(ev) {
    ev.editor.on('paste', function(evt) {    
        evt.data['html'] = '<!--class="Mso"-->'+evt.data['html'];
    }, null, null, 9);
});

I haven’t tested extensively, but this appears to work as expected (CKEditor 3.6.2). You can try it out.

What the code does is it registers a new listener for the paste event, just like the Paste from Word plugin. When it receives the pasted HTML, it simply prepends an HTML comment containing one of the strings the Paste from Word plugin looks for. The listener has a priority of 9 to ensure it runs before the plugin which will trigger the actual cleaning (default priority of 10).

Note: I posted this solution on StackOverflow as an alternative to another solution, titled “CKEditor – use pastefromword filtering on all pasted content.” StackOverflow recently deleted some of my answers (and hid them from me) so I’m moving the rest of my meagre contributions over to my own blog.

This entry was posted in Code. Bookmark the permalink. Both comments and trackbacks are currently closed.

10 Comments

  1. Expanism says:

    Thanks for this post, nice and short code 🙂 should I paste this in config.js and if so where precisely in the .js file?

  2. Keyvan says:

    Hi Expanism, if you look at the link to our PastePad page, you’ll see we do this on the page itself, in a script element in the HTML header:

    $(document).ready(function() {
      CKEDITOR.on('instanceReady', function( ev ) {
        ev.editor.on( 'paste', function( evt ) {    
          evt.data['html'] = '<!-- class="Mso" -->'+evt.data['html'];
        }, null, null, 9);
      });
    });

    You could also probably put this in config.js, but I didn’t try.

  3. Vyache says:

    Is it possible to force paste from word without the prompt to tell the users to okay or cancel? I just want it to clean up for every case.

  4. Keyvan says:

    Yes, that’s what this bit of code does – applies the cleanup to everything without prompt.

  5. Vyache says:

    I mean, say I want to do something like this:

    evt.editor.execCommand(‘RemoveFormat’, evt.data.html);

  6. Jordan says:

    This is one of those rare so-simple-it-works solutions, and also one that required coming at the problem from a very clever route.

    Thank you! I have honestly been searching for solutions and testing off and on for over a year now.

  7. Luciano says:

    This will remove garbage and some other things. You can add this to your ckeditor config file.

    CKEDITOR.on('instanceReady', function(ev) {
    ev.editor.on('paste', function(evt) {
    evt.data.dataValue = evt.data.dataValue.replace(//g, '' );
    evt.data.dataValue = evt.data.dataValue.replace(/ /g,'');
    evt.data.dataValue = evt.data.dataValue.replace(//g,'');
    console.log(evt.data.dataValue);
    }, null, null, 9);
    });

  8. Ed Bedell says:

    Very useful code snippet, but unfortunately I have found a feature/buglet, I believe. Basically when using the “paste as plain text” icon in CKEditor, one gets a “cleansing” popup dialog to ctrl-V into. When one pastes copied text into this now, with the above code, “undefined” get inserted into original text as opposed to pasted text.

    Thoughts on why this might be happening?

    Thanks,

    Ed

  9. Robert says:

    This doesn’t seem to work in Ckeditor 4+

  10. For ckeditor 4+:


    CKEDITOR.on('instanceReady', function(ev) {
    ev.editor.on('paste', function(evt) {
    evt.data.dataValue = '<!--class="Mso"-->'+evt.data.dataValue;
    }, null, null, 2 );
    });