All changes to the Ox gem are documented here. Releases follow semantic versioning.
-
Code cleanup in sax.c to close issue #363.
-
Updated the dump options documentation to
:with_xml
option to resolve #352. -
Updated the sax tests to pass to resolve #335.
-
Element#replace_text on nil
@nodes
no longer fails. Closes #364.
- UTF8 element names now load correctly thansk to @Uelb.
- The sax parser in html mode now allows unquoted attribute values with complaints.
- Window issue with strndup fixed thats to alexanderfast.
- Fixed free on moved pointer.
- Change free to xfree on ruby alloced memory.
- Fixed the intern cache to handle symbol memory changes.
- Updated to support Ruby 3.2.
- Missing attribute value no longer crashes with the SAX parser.
- Writing strings over 16K to a file with builder no longer causes a crash.
- Fixed the
\r
replacement with\n
with the SAX parser according to https://www.w3.org/TR/2008/REC-xml-20081126/#sec-line-ends.
- Renamed internal functions to avoid linking issues where Oj and Ox function names collided.
- All classes and symbols are now registered to avoid issues with GC compaction movement.
- Parsing of any size processing instruction is now allowed. There is no 1024 limit.
- Fixed the
\r
replacement with\n
according to https://www.w3.org/TR/2008/REC-xml-20081126/#sec-line-ends.
- Symbol and string caching changed but should have no impact on use other than being slightly faster and handles large numbers of cached items more efficiently.
- Closing tags in builder are now escapped correctly thanks to ezekg.
- Fixed RDoc for for Ox::Builder.
- Really fixed code issue around HAVE_RB_ENC_ASSOCIATE.
- Code issue around HAVE_RB_ENC_ASSOCIATE fixed.
- Attribute keys for setting attributes no longer create seemily duplicates if symbol and string keys are mixed.
- In Ruby 3.0 Range objects are frozen. This version allows Ranges to be created on load.
- The
:with_cdata
option added for the hash_load() function.
- Fixed one crash that occurred when a corrupted object encoded string was provided.
- mkmf have macros used instead of ad-hoc determinations.
Skip and missed sequence
-
Add ' sequence.
-
:skip_off
no longer misses spaces between elements.
HTML Sequences
- All HTML 4 sequence are now supported.
HTML Escape Sequences
- All HTML 4 escape sequences are now parsed.
Ruby 2.7.0
- Updated for Ruby 2.7.0. More strict type checking. Function signature changes, and
Object#taint
deprecated.
- Add
no_empty
option to not allow and use instead.
- Ox::SyntaxError replaces SyntaxError where such an exception would have previously been raised.
- File offsets when using the SAX parser now use
off_t
. Setting-D_FILE_OFFSET_BITS=64
in the Makefile may allow 32 bit systems to access files larger than 2^32 in size. This has not been tested.
- Remove extra space from doctype dump.
:element_key_mod
and:attr_key_mod
options were added to allow keys to be modified when loading.
- Fixed issue with malformed object mode input.
- Handle
\0
in dumped strings better. - No
\n
added on dumped if indent is less than zero.
locate
fixed to cover a missing condition with named child thanks to mberlanda.
locate
supports attribute exists searches thanks to mberlanda.
prepend_child
added by mberlanda.
- New builder methods for building HTML.
- Examples added.
- Commented out debug statement.
- Attribute values now escape < and > on dump.
- Fixed bug with SAX parser that caused a crash with very long invalid instruction element.
- Fixed SAX parse error with double
- Avoid crash with invalid XML passed to Ox.parse_obj().
- Added :skip_off mode to make sax callback on every none empty string even if there are not other non-whitespace characters present.
- Two new load modes added, :hash and :hash_no_attrs. Both load an XML document to create a Hash populated with core Ruby objects.
- Worked around Ruby API change for RSTRUCT_LEN so Ruby 2.4.2 does not crash.
- The Element#each() method was added to allow iteration over Element nodes conditionally.
- Element#locate() now supports a [@attr=value] specification.
- An underscore character used in the easy API is now treated as a wild card for valid XML characters that are not valid for Ruby method names.
- Added a :nest_ok option to SAX hints that will ignore the nested check on a tag to accomadate non-compliant HTML.
- Set the default for skip to be to skip white space.
- Corrected Builder special character handling.
- Fixed position in builder when encoding special characters.
- Fixed SAX parser bug regarding upper case hints not matching.
- Dump is now smarter about which characters to replace with &xxx; alternatives.
- Added a SAX hint that allows comments to be treated like other elements.
- Tolerant mode now allows case-insensitve matches on elements during parsing. Smart mode in the SAX parser is also case insensitive.
- After encountering a <> the SAX parser will continue parsing after reporting an error.
- Added margin option to dump.
- Thanks to GUI for fixing an infinite loop in Ox::Builder.
- Builder element attributes with special characters are now encoded correctly.
- A newline at end of an XML string is now controlled by the indent value. A value of-1 indicates no terminating newline character and an indentation of zero.
- Fixed compiler warnings and errors.
- Updated for Ruby 2.4.0.
- Added methods to Ox::Builder to provide output position information.
- Added overlay feature to give control over which elements generate callbacks with the SAX parser.
- Element.locate now includes self if the path is relative and starts with a wildcard.
- Made SAX smarter a little smarter or rather let it handle unquoted string with a / at the end.
- Fixed bug with reporting errors of element names that are too long.
- Added Ox::Builder that constructs an XML string or writes XML to a stream using builder methods.
- Added Ox::Element.replace_text() method.
- A invalid_replace option has been added. It will replace invalid XML character with a provided string. Strict effort now raises an exception if an invalid character is encountered on dump or load.
- Ox.load and Ox.parse now allow for a callback block to handle multiple top level entities in the input.
- The Ox SAX parser now supports strings as input directly without and IO wrapper.
- Ox::Element nodes variable is now always initialized to an empty Array.
- Ox::Element attributes variable is now always initialized to an empty Hash.
- Changed the code to allow compilation on older compilers. No change in functionality otherwise.
-
The convert_special option now applies to attributes as well as elements in the SAX parser.
-
The convert_special option now applies to the regualr parser as well as the SAX parser.
-
Updated to work correctly with Ruby 2.3.0.
-
Fixed problem with detecting invalid special character sequences.
-
Fixed bug that caused a crash when an <> was encountered with the SAX parser.
-
Added support to handle script elements in html.
-
Added support for position from start for the sax parser.
-
Added the SAX convert_special option to the default options.
-
Added the SAX smart option to the default options.
-
Other SAX options are now taken from the defaults if not specified.
- Fixed a bug that caused all input to be read before parsing with the sax parser and an IO.pipe.
-
Empty elements such as are now called back with empty text.
-
Fixed GC problem that occurs with the new GC in Ruby 2.2 that garbage collects Symbols.
- Update licenses. No other changes.
- Fixed symbol intern problem with Ruby 2.2.0. Symbols are not dynamic unless rb_intern(). There does not seem to be a way to force symbols created with encoding to be pinned.
- Fixed bug where the parser always started at the first position in a stringio instead of the current position.
- Added check for @attributes being nil. Reported by and proposed fix by Elana.
-
Added skip option to parsing. This allows white space to be collapsed in two different ways.
-
Added respond_to? method for easy access method checking.
- Worked around a module reset and clear that occurs on some Rubies.
- Thanks to jfontan Ox now includes support for XMLRPC.
- Fixed problem compiling with latest version of Rubinius.
- Added support for BigDecimals in :object mode.
-
Small fix to not create an empty element from a closed element when using locate().
-
Fixed to keep objects from being garbages collected in Ruby 2.x.
- Fixed bug that did not allow ISO-8859-1 characters and caused a crash.
- Allow single quoted strings in all modes.
- Fixed DOCTYPE parsing to handle nested '>' characters.
-
Fixed bug in special character decoding that chopped of text.
-
Limit depth on dump to 1000 to avoid core dump on circular references if the user does not specify circular.
-
Handles dumping non-string values for attributes correctly by converting the value to a string.
- Better support for special character encoding with 1.8.7.- February 8, 2013
- Fixed SAX parser handling of &#nnnn; encoded characters.
- Fixed excessive memory allocation issue for very large file parsing (half a gig).
- Fixed buffer sliding window off by 1 error in the SAX parser.
-
Added an attrs_done callback to the sax parser that will be called when all attributes for an element have been read.
-
Fixed bug in SAX parser where raising an exception in the handler routines would not cleanup. The test put together by griffinmyers was a huge help.
-
Reduced stack use in a several places to improve fiber support.
-
Changed exception handling to assure proper cleanup with new stack minimizing.
-
The SAX parser went through a significant re-write. The options have changed. It is now 15% faster on large files and much better at recovering from errors. So much so that the tolerant option was removed and is now the default and only behavior. A smart option was added however. The smart option recognizes a file as an HTML file and will apply a simple set of validation rules that allow the HTML to be parsed more reasonably. Errors will cause callbacks but the parsing continues with the best guess as to how to recover. Rubymaniac has helped with testing and prompted the rewrite to support parsing HTML pages.
-
HTML is now supported with the SAX parser. The parser knows some tags like <br> or <img> do not have to be closed. Other hints as to how to parse and when to raise errors are also included. The parser does it's best to continue parsing even after errors.
-
Added symbolize option to the sax parser. This option, if set to false will use strings instead of symbols for element and attribute names.
-
A contrib directory was added for people to submit useful bits of code that can be used with Ox. The first contributor is Notezen with a nice way of building XML.
- SAX tolerant mode handle multiple elements in a document better.
-
mcarpenter fixed a compile problem with Cygwin.
-
Now more tolerant when the :effort is set to :tolerant. Ox will let all sorts of errors typical in HTML documents pass. The result may not be perfect but at least parsed results are returned.
-
Attribute values need not be quoted or they can be quoted with single quotes or there can be no =value are all.
-
Elements not terminated will be terminated by the next element termination. This effect goes up until a match is found on the element name.
-
SAX parser also given a :tolerant option with the same tolerance as the string parser.
- Fixed bug in the sax element name check that cause a memory write error.
- Fixed the line numbers to be the start of the elements in the sax parser.
-
Added a new feature to Ox::Element.locate() that allows filtering by node Class.
-
Added feature to the Sax parser. If @line is defined in the handler it is set to the line number of the xml file before making callbacks. The same goes for @column but it is updated with the column.
- Fixed bug in element start and end name checking.
- Fixed bug in check for open and close element names matching.
-
Added a correct check for element open and close names.
-
Changed raised Exceptions to customer classes that inherit from StandardError.
-
Fixed a few minor bugs.
- Removed broken check for matching start and end element names in SAX mode. The names are still included in the handler callbacks so the user can perform the check is desired.
- added encoding support for JRuby where possible when in 1.9 mode.
- Applied patch by mcarpenter to fix solaris issues with build and remaining undefined @nodes.
- Sax parser now honors encoding specification in the xml prolog correctly.
-
Ox::Element.locate no longer raises and exception if there are no child nodes.
-
Dumping an XML document no longer puts a carriage return after processing instructions.
-
Fixed bug that caused a crash when an invalid xml with two elements and no was parsed. (issue #28)
-
Modified the SAX parser to not strip white space from the start of string content.
- Added more complete support for processing instructions in both the generic parser and in the sax parser. This change includes and additional sax handler callback for the end of the instruction processing.
-
Pulled in sharpyfox's changes to make Ox with with Windows. (issue #24)
-
Fixed bug that ignored white space only text elements. (issue #26)
- Added support for BOM in the SAX parser.
- Added support for BOM. They are honored for and handled correctly for UTF-8. Others cause encoding issues with Ruby or raise an error as others are not ASCII compatible..
- Changed extconf.rb to use RUBY_PLATFORM.
- Now uses the encoding of the imput XML as the default encoding for the parsed output if the default options encoding is not set and the encoding is not set in the XML file prolog.
- Special character handling now supports UCS-2 and UCS-4 Unicode characters as well as UTF-8 characters.
- Special character handling has been improved. Both hex and base 10 numeric values are allowed up to a 64 bit number for really long UTF-8 characters.
- Fixed compatibility issues with Linux (Ubuntu) mostly related to pointer sizes.
- Added check for Solaris and Linux builds to not use the timezone member of time struct (struct tm).
- Added check for Solaris builds to not use the timezone member of time struct (struct tm).