encoding of special characters is not consistent across ruby versions #41

kashook · 2014-06-18T16:08:28Z

Consider the following ruby script:

require 'gyoku'
puts "Ruby Version: #{RUBY_VERSION}"
puts Gyoku.xml(:special_chars => %q{&"<>'})

When executed on Ruby 1.9.3, it produces this output:

Ruby Version: 1.9.3
<specialChars>&amp;&quot;&lt;&gt;'</specialChars>

When executed on Ruby 2.0.0, it produces this output:

Ruby Version: 2.0.0
<specialChars>&amp;&quot;&lt;&gt;&#39;</specialChars>

The apostrophe character is not encoded at all on Ruby 1.9.3, but on ruby 2.0 it is converted to '. The reason appears to be because of this line. The behavior of the CGI.escapeHTML function seems to have changed from Ruby 1.9.3 to Ruby 2.0. Here is an example.

This script:

require 'cgi'
string = %q{<>&"'}
puts "Ruby Version: #{RUBY_VERSION}"
puts "Original: #{string}"
puts "Encoded: #{CGI.escapeHTML(string)}"

produces this output on Ruby 1.9.3:

Ruby Version: 1.9.3
Original: <>&"'
Encoded: &lt;&gt;&amp;&quot;'

but this output on Ruby 2.0:

Ruby Version: 2.1.1
Original: <>&"'
Encoded: &lt;&gt;&amp;&quot;&#39;

It seems like the output should not change depending on the ruby version. In addition, if an apostrophe is going to be encoded, it seems like it would be more correct to encode it as ' instead of '. In fact, encoding it as ' is directly causing me a problem here. :)

The text was updated successfully, but these errors were encountered:

tjarratt · 2014-06-18T20:23:26Z

Ouch, that's a painful bug. Thank you so much for investigating this issue so thoroughly and explaining it so succinctly, @keiths-osc !

I'd rather not drop support for ruby 1.9.3 yet (it's not close to End of Life yet), so is there a better workaround than switching on RUBY_VERSION in the code here? I haven't looked too exhaustively, but I couldn't find anything outside of CGI.escape_html in the standard lib.

kashook · 2014-06-19T21:34:05Z

@tjarrat, I've been giving your question (and this issue in general) some more thought. There is something else I noticed. When producing xml with attributes, I noticed that the special characters are handled consistently across ruby versions:

This script

require 'gyoku'
puts "Ruby Version: #{RUBY_VERSION}"
puts Gyoku.xml(:special_chars => %q{&"<>'})
puts Gyoku.xml({ :test => 'something', :attributes! => { :test => { :chars => %q{&"<>'} } } })

outputs this on 1.9.3

Ruby Version: 1.9.3
<specialChars>&amp;&quot;&lt;&gt;'</specialChars>
<test chars="&amp;&quot;&lt;&gt;'">something</test>

and this on 2.0

Ruby Version: 2.0.0
<specialChars>&amp;&quot;&lt;&gt;&#39;</specialChars>
<test chars="&amp;&quot;&lt;&gt;'">something</test>

In the chars attribute, the apostrophe is just left as an apostrophe. It appears that it is actually the Builder gem that is handling the encoding for the attribute values instead of any code in gyoku. The encoding appears to be done by a call to this method, which then ends up calling this method. The XChar thing that the latter is using is located here. I believe that the encoding of element values within Builder itself are handled by the same escape method. When builder encodes element values, here is how they look:

require 'builder'
puts "Ruby Version: #{RUBY_VERSION}"
xmlbuilder = Builder::XmlMarkup.new
puts xmlbuilder.test(%q{"<>&'})

I get this on 1.9.3:

Ruby Version: 1.9.3
<test>"&lt;&gt;&amp;'</test>

and this on 2.0:

Ruby Version: 2.0.0
<test>"&lt;&gt;&amp;'</test>

I wonder if the best bet would be for gyoku to encode the special characters for element values in the exact same way builder does (perhaps by directly using the XChar thing from Builder) instead of using CGI.escapeHTML?

The results are consistent across Ruby versions, and my gut feeling is that the Builder way of handling the encoding for XML is maybe more correct than what CGI.escapeHTML does. Of course, Builder doesn't encode the double quote character (where gyoku currently does), but technically it doesn't need to be if I understand XML correctly.

tjarratt · 2014-06-26T05:58:55Z

Sorry I've been so quiet lately --- I'm going to need some time to think about this and explore the ramifications of that change. I think we'd need some stronger integration tests further upstream in Savon to verify that this change doesn't break existing use cases.

With ruby 1.9.3 reaching EOL in a few months I wonder how that affects this issue. We could keep the behavior from 1.9.3, but anyone relying on the new behavior would be very surprised (to say the least).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

encoding of special characters is not consistent across ruby versions #41

encoding of special characters is not consistent across ruby versions #41

kashook commented Jun 18, 2014

tjarratt commented Jun 18, 2014

kashook commented Jun 19, 2014

tjarratt commented Jun 26, 2014

encoding of special characters is not consistent across ruby versions #41

encoding of special characters is not consistent across ruby versions #41

Comments

kashook commented Jun 18, 2014

tjarratt commented Jun 18, 2014

kashook commented Jun 19, 2014

tjarratt commented Jun 26, 2014