-
Notifications
You must be signed in to change notification settings - Fork 18
An active fork of http://htmlcleaner.sourceforge.net
License
amplafi/htmlcleaner
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
NOTE: This fork of htmlcleaner is now merged back into the http://htmlcleaner.sourceforge.net/ project as of version 2.4 2.4 is officially released! This fork is kept only to help with patch submission to the official version. ========================================================================== * omitHtmlEnvelope behavior change: * output all the html contained in the body not just first TagNode contents. ( useful for cleaning html fragments ) ( creates a new blank TagNode to hold the nodes to be outputed * omitHtmlEnvelope also triggers omitDoctype * TagNodes that can be reopened after their parent is closed ( i.e. <b><i></b> -- would result in <b><i></i><b><i> ) if the reopened tag ( <i> in this example ) is immediately closed, the reopened tag is pruned. -- accomplished by checking the autoGenerated boolean on TagNode ) * refactoring template methods from Utils to TagTransformer. *CleanerTransformations changes: * Utils.updateTagTransformations now member function. * Handles the transformation work so that multiple TagTransformations can be applied to a given tag. ( sets up for regex expression matching ) * now owns responsibility for determining transformed tagname. *concept of global AttributeTransformations -- used to strip all attributes that start with "on" for example ( i.e. "onclick" , "onblur" ) * plus added regular expressions matching on values/attribute names XmlSerializer/HtmlCleaner -- remove IOException being thrown when reading from strings. * work on spotting "tricky" encoding -- unencode normal ascii characters. * get Default Output charset from CleanerProperties * handle badly encoded numbers better for example &x0fx , &0A; were parsed badly before * added a bunch of html special entities * convert ' in html context to ' * added regex attribute/value matching * random spelling corrections * additional documentation * add greek and math symbols * cleanup change - if tag was closed due to improperly placed child it will be reopened after the child. See ClosedTagReopenTest.java for examples * added audit code - now it is possible to hook in code that will be notified about changes that htmlcleaner does. See CleanerProperties#addHtmlModificationListener. * Added unit tests for escapeXml function from Utils * JDom generation updated not to fail on starting with 'xml' attributes. * Unit tests TODOs added
Packages 0
No packages published