-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve BaseParser#unnormalize
#194
Improve BaseParser#unnormalize
#194
Conversation
Thanks for this pull request. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice if the bytesize would be present in the 'raise' statement, so that one can at least adjust the @@entity_expansion_text_limit.
"entity expansion has grown too large: size: XY exceeded @@entity_expansion_text_limit"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'll be helpful but let's work on it in a separated PR.
This fixes the @naitoh The current Could you use the following for index 5e3ad75..aeef268 100644
--- a/test/test_sax.rb
+++ b/test/test_sax.rb
@@ -145,17 +145,19 @@ module REXMLTests
</member>
XML
+ REXML::Security.entity_expansion_limit = 100000
sax = REXML::Parsers::SAX2Parser.new(source)
- assert_raise(RuntimeError.new("number of entity expansions exceeded, processing aborted.")) do
- sax.parse
- end
+ sax.parse
+ assert_equal(11111, sax.entity_expansion_count)
- REXML::Security.entity_expansion_limit = 100
+ REXML::Security.entity_expansion_limit = @default_entity_expansion_limit
sax = REXML::Parsers::SAX2Parser.new(source)
assert_raise(RuntimeError.new("number of entity expansions exceeded, processing aborted.")) do
sax.parse
end
- assert_equal(101, sax.entity_expansion_count)
+ assert do
+ sax.entity_expansion_count > @default_entity_expansion_limit
+ end
end
def test_with_default_entity I think that we need another approach something like the following: diff --git a/lib/rexml/parsers/baseparser.rb b/lib/rexml/parsers/baseparser.rb
index 28810bf..699ed91 100644
--- a/lib/rexml/parsers/baseparser.rb
+++ b/lib/rexml/parsers/baseparser.rb
@@ -547,22 +547,31 @@ module REXML
[Integer(m)].pack('U*')
}
matches.collect!{|x|x[0]}.compact!
+ if filter
+ matches.reject! do |entity_reference|
+ filter.include?(entity_reference)
+ end
+ end
if matches.size > 0
sum = 0
- matches.each do |entity_reference|
- unless filter and filter.include?(entity_reference)
- entity_value = entity( entity_reference, entities )
- if entity_value
- re = Private::DEFAULT_ENTITIES_PATTERNS[entity_reference] || /&#{entity_reference};/
- rv.gsub!( re, entity_value )
- sum += rv.bytesize
- if sum > Security.entity_expansion_text_limit
- raise "entity expansion has grown too large"
- end
- else
- er = DEFAULT_ENTITIES[entity_reference]
- rv.gsub!( er[0], er[2] ) if er
+ matches.tally.each do |entity_reference, n|
+ entity_expansion_count_before = @entity_expansion_count
+ entity_value = entity( entity_reference, entities )
+ entity_expansion_count_delta =
+ @entity_expansion_count - entity_expansion_count_before
+ if n > 1
+ record_entity_expansion(entity_expansion_count_delta * (n - 1))
+ end
+ if entity_value
+ re = Private::DEFAULT_ENTITIES_PATTERNS[entity_reference] || /&#{entity_reference};/
+ rv.gsub!( re, entity_value )
+ sum += rv.bytesize
+ if sum > Security.entity_expansion_text_limit
+ raise "entity expansion has grown too large"
end
+ else
+ er = DEFAULT_ENTITIES[entity_reference]
+ rv.gsub!( er[0], er[2] ) if er
end
end
rv.gsub!( Private::DEFAULT_ENTITIES_PATTERNS['amp'], '&' )
@@ -572,8 +581,8 @@ module REXML
private
- def record_entity_expansion
- @entity_expansion_count += 1
+ def record_entity_expansion(delta=1)
+ @entity_expansion_count += delta
if @entity_expansion_count > Security.entity_expansion_limit
raise "number of entity expansions exceeded, processing aborted."
end
Could you use this? diff --git a/test/test_sax.rb b/test/test_sax.rb
index 5e3ad75..d2bc231 100644
--- a/test/test_sax.rb
+++ b/test/test_sax.rb
@@ -102,10 +102,12 @@ module REXMLTests
class EntityExpansionLimitTest < Test::Unit::TestCase
def setup
@default_entity_expansion_limit = REXML::Security.entity_expansion_limit
+ @default_entity_expansion_text_limit = REXML::Security.entity_expansion_text_limit
end
def teardown
REXML::Security.entity_expansion_limit = @default_entity_expansion_limit
+ REXML::Security.entity_expansion_text_limit = @default_entity_expansion_text_limit
end
class GeneralEntityTest < self
@@ -124,6 +126,17 @@ module REXMLTests
</member>
XML
+ REXML::Security.entity_expansion_limit = 100_000
+ REXML::Security.entity_expansion_text_limit = 1_000_000_000
+ sax = REXML::Parsers::SAX2Parser.new(source)
+ text_size = nil
+ sax.listen(:characters, ["member"]) do |text|
+ text_size = text.size
+ end
+ sax.parse
+ assert_equal(300002, text_size)
+
+ REXML::Security.entity_expansion_text_limit = @default_entity_expansion_text_limit
sax = REXML::Parsers::SAX2Parser.new(source)
assert_raise(RuntimeError.new("entity expansion has grown too large")) do
sax.parse Could you do similar one for |
* Reject filtered matches earlier in the loop * Improve `#unnormalize` by removing redundant calls to `rv.gsub!` * Improve `entity_expansion_limit` tests Co-Authored-By: Sutou Kouhei <kou@clear-code.com>
4fd8b6b
to
83be597
Compare
Ah, I see, you are right!
I've added your suggested changes to 83be597, but I moved the calculation of Since issue #193 was handled by #195, I'll edit the PR description to match the current situation. |
BaseParser#unnormalize
and fix sum
calculationBaseParser#unnormalize
Ah, diff --git a/lib/rexml/parsers/baseparser.rb b/lib/rexml/parsers/baseparser.rb
index 342f948..0ac243a 100644
--- a/lib/rexml/parsers/baseparser.rb
+++ b/lib/rexml/parsers/baseparser.rb
@@ -8,6 +8,22 @@ require "strscan"
module REXML
module Parsers
+ unless [].respond_to?(:tally)
+ module EnumerableTally
+ refine Enumerable do
+ def tally
+ counts = {}
+ each do |item|
+ counts[item] ||= 0
+ counts[item] += 1
+ end
+ counts
+ end
+ end
+ end
+ using EnumerableTally
+ end
+
if StringScanner::Version < "3.0.8"
module StringScannerCaptures
refine StringScanner do |
`#tally` doesn't exist in Ruby 2.5 and 2.6 * Refine `Enumerable` to support `#tally` in `REXML::Parsers` Co-Authored-By: Sutou Kouhei <kou@clear-code.com>
Thanks for the patch, added in b0949d8 |
@naitoh Could you review this before we merge this? |
I have checked this PR. |
Thanks. |
The current implementation of
#unnormalize
iterates over matched entity references that already has been substituted. With these changes we will reduce the number of redundant calls torv.gsub!
.#unnormalize
by removing redundant calls torv.gsub!
entity_expansion_limit
testsExample:
Before this PR, the example above would require 100 iterations. After this PR, 1 iteration.