This Drupal module allows HTML Purifier to be used as a filter in an input format. HTML Purifier removes malicious HTML code and ensures that output is standards compliant.
Most WYSIWYG editors require either that the "full HTML" input format is used or a format with a large set of allowed tags, including some dangerous ones. Because the default HTML filter has only limited knowledge about HTML, this possibly allows for XSS attacks.
HTML Purifier has two main advantages:
This module has been tested with the following other modules:
More modules need to be tested, especially WYSIWYG editors (HTMLArea, FCKeditor, TinyMCE). If you have done so, please tell me about your experiences.
You can download the current development version from the SVN repository. This code does not include the HTML Purifier code, you will need to download that separately. See the included INSTALL.txt for details.
The latest version of this module can be downloaded from the HTMLPurifier project page.
GPL
Comments
Anonymous
Sat, 01/27/2007 - 12:33
Permalink
As I told you by email I
As I told you by email I tried your module with the DRUPAL-4-7 branch with tinymce. I haven't tested so much but I get sometimes this kind of PHP error:
DOMDocument::loadHTML() [function.loadHTML]: Opening and ending tag mismatch: div and b in Entity, line: 1 in
/www/modules/htmlpurifier/library/HTMLPurifier/Lexer/DOMLex.php on line 57.
Another thing is that this filter is only for new nodes, comments. What's about the search box, contact field,... ? I'll think about enabling htmlpurifier globally.
Bart
Sat, 01/27/2007 - 17:48
Permalink
As I told you by email I
As I told you by email I tried your module with the DRUPAL-4-7 branch with tinymce. I haven't tested so much but I get sometimes this kind of PHP error:
DOMDocument::loadHTML() [function.loadHTML]: Opening and ending tag mismatch: div and b in Entity, line: 1 in
/www/modules/htmlpurifier/library/HTMLPurifier/Lexer/DOMLex.php on line 57.
This is a warning message from PHP. HTML Purifier tries to suppress it but there is a bug in drupal 4.7 which causes it to show the message anyway.
I think you can fix it by editing the error_handler function in includes/common.inc. This change should fix it: http://cvs.drupal.org/viewcvs/drupal/drupal/includes/common.inc?r1=1.548&r2=1.549
In the 4.7.6 release this fix will be includes as well.
Once the page is cached, html purifier isn't invoked so the error doesn't occur.
Another thing is that this filter is only for new nodes, comments. What's about the search box, contact field,... ? I'll think about enabling htmlpurifier globally.
You can use it for old nodes as well, once you add htmlpurifier to an input format, all existing nodes with that input format will use htmlpurifier. Input formats are used when a node is displayed, not when it is saved.
Search and contact only allow plain text to be used, they will convert all HTML to plain text so there is no need for html purifier there.
The module currently covers most of the user contributed content, except for the places where no input formats are used. There are indeed modules that don't use input formats but implement their own filtering mechanism. The aggregator module comes to mind as one such module.
I don't know if there is a way to pass the output from these modules through html purifier as well, aside from changing the theme functions but thats just ugly.
DrupalDude
Fri, 08/10/2007 - 19:29
Permalink
Please add it in the Module section on drupal.org
I didn't find your module in the module list on drupal.org:
http://drupal.org/project/Modules
could you please add it there?
Bart
Sat, 08/11/2007 - 16:54
Permalink
Will do once its really
Will do once its really usable.
Currently this filter acts on-output but HTML Purifier is way to slow to run it on any decent-sized site like this.
It works fine for blogs like this where pretty much all posts and comments are in the filter cache, but if you have a larger site that is constantly being indexed by crawlers, this will kill your server. Believe me, I tried ;)
I'm still looking for the best/easiest way to do on-input filtering. I guess I can use form validation but i'm not sure how I can detect which fields need to be filtered and which don't. Filtering node->body is a start, but with modules like CCK, that probably isn't enough. I'll ask the drupal gods once I have some more time to work on this. Suggestions are welcome of course.
Joe
Fri, 10/12/2007 - 04:37
Permalink
Hi Bart, I don't know if
Hi Bart, I don't know if you're still reading comments on this post or not. If you are, and are still working on this project, you might want to take a look at http://drupal.org/project/safehtml and see if you can use this module as a model of how to html purifier working on content before it's put into the database.
If not, well I may give it a go. I'd like to try Drupal for a project I'm working on, and have already decided to use the YUI Editor module for it. My next step would be to get html purifier working to filter input before it's put into the database.
Bart
Sat, 10/13/2007 - 20:33
Permalink
already done
Thanks for the suggestion. I actually implemented the feature in the exact same way a few weeks ago (code that is currently in SVN). So you may want to check that out. Basically what it does is create an input filter just like one would do with output filters. When a new node is submitted, the module checks if the input format contains the html purifier filter and if so, filters the input before it is inserted into the database.
I had some difficulties implementing the same for comments (the comment hook isn't as powerful as nodeapi), so currently this only works for the body and teaser fields of nodes. I plan to add support for CCK textareas in the future as well.
SwoopNStash
Thu, 11/08/2007 - 08:46
Permalink
I'm interested in this
I'm interested in this module because of the input filter difficulties I've been having getting various editors to work. However, I am not a programmer. Is this module's current state such that someone such as myself would be able to set up and use this effectively or is it still in the testing stages? If it wouldn't be appropriate for me, what suggestions might you have in terms of other modules?
Bart
Sun, 11/11/2007 - 18:28
Permalink
I'd say testing
It works for plain old nodes, but not for comments or any other content types. So for now i'd say its still in testing.
Progress is slow because the project for which I needed a module like this was canceled, so there isn't that much motivation anymore.
Joe
Tue, 11/13/2007 - 17:04
Permalink
I'll see what I can do
Mind you I got a 9 week old baby and have several huge projects going on at work, so I don't have a lot of time. However, I'm working on my family website and want to use drupal and htmlpurifier for everything. So, I'll grab what you got in svn and see if I can get it to work for comments and such as well. I'm trying to use either YUI or ExtJS RTE's for blogs, comments, and other features and really want to have html purifier in place for it. Since it's a personal project, I'll have the drive. I have a few other things I want to get done first on the site though, so I'll add it to my list.
arium
Wed, 12/19/2007 - 12:27
Permalink
Error message when using with Drupal 5.5
I installed it in my my machine for Drupal, enabled the module, added it as an input format and tried it out by writing a new story but this is the error message I got.
Any idea as to what's wrong?
Bart
Thu, 01/03/2008 - 13:24
Permalink
Drupal.org issue
FYI, others looking for a solution to this issue should have a look at http://drupal.org/node/203642
Bart
Fri, 12/28/2007 - 09:08
Permalink
issue
Which code did you use? The one from drupal.org or the one from my SVN repository?
You should use the one from drupal.org, it has the latests updates, some of which I hadn't commited to SVN yet. If the problem still occurs with that code, create an issue on drupal.org. Edward has a far better knowledge about the htmlpurifier internals than I do (he's the one who wrote it after all).
liquidIce
Sat, 01/19/2008 - 02:30
Permalink
A module using htmLawed?
Is anyone developing a module using htmLawed instead of HTMLPurifier? Someone should! htmLawed is only one file and is just a tenth in size and memory consumption.
video
Fri, 01/25/2008 - 04:36
Permalink
The word you're looking for is 'sycophant.'
Someone should! htmLawed is only one file and is just a tenth in size and memory consumption.
astewart
Mon, 05/05/2008 - 01:37
Permalink
htmdawed drupal mod
drupal htmlawed module -- uses nodetype-specific values. great!
Matt
Sat, 08/16/2008 - 06:42
Permalink
Does it work with
I was wondering if the module you developed works with the Geshi filter module. I use this on my Visual Basic Source site in order to display the VB6 code for my samples and tutorials. Just curios if anyone has tried this or not.
Also I was wondering why this doesn't show up on the drupal.org page? Have you just not got around to it or is it not officially supported?
Bart
Sat, 08/16/2008 - 17:58
Permalink
drupal.org
There are official releases on the project page, but the module is now maintained by ezyang instead of me.
I never tried it in combination with Geshi, but I think it should work, all it does is insert some style tags. Its typically modules that inject javascript code that cause problems.
David (of Melod...
Sat, 04/25/2009 - 16:35
Permalink
HTMLPurifier or htmLawed: which one I should use?
Is there any comparison of HTMLPurifier and htmLawed?
They both look like a good and useful modules, but I don't know which one I should use.
I'm setting my new Drupal-based site now and thinking about safety-related add-ons, but the amount of information I have is definitely not enough.
Best regards,
Dave
Bart
Sat, 04/25/2009 - 17:10
Permalink
Depends
It really depends what you need. HTMLPurifier does a whole lot more, not just filtering out malicious codes but also ensuring standards compliance.
There is a comparison (albeit pretty biased) here: http://htmlpurifier.org/comparison.html
The main problem with HTMLPurifier is that all the validation it does, comes at a pretty high performance cost. It heavily relies on caching, which is why I'm not using it on a large site. I'm afraid the server will die when the cache gets flushed :)
Need a home gen...
Tue, 02/21/2012 - 07:27
Permalink
Electronics
Need a home generator, All Power America offer you with most comprehensive range of diesel fuelled generator at best prices with our value merchant at right time.
Need a home generator
Add new comment