-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve performance of parsing #126
Conversation
5ca5cdd
to
b3c32c1
Compare
parsers suite: before: Node.js v10.11.0 - Intel(R) Core(TM) i5-2520M CPU @ 2.50GHz |
@mogsie this looks good but the benchmark does not report a significant increase in performance. Do you have data that suggests otherwise? |
Hmm, that's interesting. It obviously only has an effect on the test if
Then again, I did my initial tests on node 6 and node 8, nodejs might have optimized their way out of the problem, but if the document you're testing is I tried replacing the document with the example atom document from Wikipedia: before: after:
Still more than three times as fast as node-expat. |
I made a simpler change to show how master, unchanged, on my machine (i7-2670QM @ 2.20GHz) w/node 9.11.2
If I change the benchmark as follows, just lengthening the attribute and text value, the characteristics change:
All tests slow down, some more significantly than others. ltx from 538k to 283k. This PR causes the slowdown to go from 538k to 489k
Do you want me to add the longer texts to the benchmark in this PR? |
Yes please. 👍 good job! |
This is to ensure that the benchmarking isn't skewed towards small attribute values, and documents without significant portions of text, which it was before xmppjs#126
Done! |
Looping through strings in Javascript land is a lot slower than having the JS engine do it natively. String's indexOf does this faster than looping through each character. This change deals with parsed data (i.e. the raw text between the tags), attribute values (what's in the quotes) and inside XML comments. These three types of data account for a very large portion of characters in any XML document, leaving behind mainly names of tags and of attributes. It might be worth it to rewrite the switch statement, or move the optimisations into the switch statement itself.
This is to ensure that the benchmarking isn't skewed towards small attribute values, and documents without significant portions of text, which it was before xmppjs#126
01da15d
to
dc90ab2
Compare
Rebased off master too. |
Second half of #120:
Looping through strings in Javascript land is a lot slower than having
the JS engine do it natively. String's indexOf does this faster than
looping through each character.
This change deals with parsed data (i.e. the raw text between the
tags), attribute values (what's in the quotes) and inside XML
comments. These three types of data account for a very large portion
of characters in any XML document, leaving behind mainly names of tags
and of attributes.
It might be worth it to rewrite the switch statement, or move the
optimisations into the switch statement itself, but I focused on what
gave most bang for the buck within reason.