-
Notifications
You must be signed in to change notification settings - Fork 30k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
buffer: speed up Buffer.isEncoding() method #5256
Conversation
Following is benchmark result: $ ./node benchmark/compare.js ./node ~/.tnvm/versions/node/v5.5.0/bin/node -- buffers buffer-is-encoding.js
running ./node
buffers/buffer-is-encoding.js
running /Users/jacksontian/.tnvm/versions/node/v5.5.0/bin/node
buffers/buffer-is-encoding.js
buffers/buffer-is-encoding.js encoding=hex: ./node: 37865000 /Users/jacksontian/.tnvm/versions/node/v5.5.0/bin/node: 53626000 ......... -29.39%
buffers/buffer-is-encoding.js encoding=utf8: ./node: 32338000 /Users/jacksontian/.tnvm/versions/node/v5.5.0/bin/node: 46584000 ........ -30.58%
buffers/buffer-is-encoding.js encoding=utf-8: ./node: 29117000 /Users/jacksontian/.tnvm/versions/node/v5.5.0/bin/node: 29546000 ........ -1.45%
buffers/buffer-is-encoding.js encoding=ascii: ./node: 32724000 /Users/jacksontian/.tnvm/versions/node/v5.5.0/bin/node: 26424000 ........ 23.84%
buffers/buffer-is-encoding.js encoding=binary: ./node: 26486000 /Users/jacksontian/.tnvm/versions/node/v5.5.0/bin/node: 22335000 ....... 18.59%
buffers/buffer-is-encoding.js encoding=base64: ./node: 29564000 /Users/jacksontian/.tnvm/versions/node/v5.5.0/bin/node: 17563000 ....... 68.33%
buffers/buffer-is-encoding.js encoding=ucs2: ./node: 36896000 /Users/jacksontian/.tnvm/versions/node/v5.5.0/bin/node: 17559000 ........ 110.12%
buffers/buffer-is-encoding.js encoding=ucs-2: ./node: 31430000 /Users/jacksontian/.tnvm/versions/node/v5.5.0/bin/node: 15970000 ........ 96.81%
buffers/buffer-is-encoding.js encoding=utf16le: ./node: 25419000 /Users/jacksontian/.tnvm/versions/node/v5.5.0/bin/node: 14396000 ...... 76.57%
buffers/buffer-is-encoding.js encoding=utf-16le: ./node: 25545000 /Users/jacksontian/.tnvm/versions/node/v5.5.0/bin/node: 12160000 .... 110.07%
buffers/buffer-is-encoding.js encoding=HEX: ./node: 38260000 /Users/jacksontian/.tnvm/versions/node/v5.5.0/bin/node: 6280700 .......... 509.18%
buffers/buffer-is-encoding.js encoding=UTF8: ./node: 33291000 /Users/jacksontian/.tnvm/versions/node/v5.5.0/bin/node: 6278300 ......... 430.25%
buffers/buffer-is-encoding.js encoding=UTF-8: ./node: 28115000 /Users/jacksontian/.tnvm/versions/node/v5.5.0/bin/node: 5482000 ........ 412.87%
buffers/buffer-is-encoding.js encoding=ASCII: ./node: 33288000 /Users/jacksontian/.tnvm/versions/node/v5.5.0/bin/node: 4861600 ........ 584.71%
buffers/buffer-is-encoding.js encoding=BINARY: ./node: 25741000 /Users/jacksontian/.tnvm/versions/node/v5.5.0/bin/node: 5067200 ....... 408.00%
buffers/buffer-is-encoding.js encoding=BASE64: ./node: 32574000 /Users/jacksontian/.tnvm/versions/node/v5.5.0/bin/node: 5095100 ....... 539.33%
buffers/buffer-is-encoding.js encoding=UCS2: ./node: 32299000 /Users/jacksontian/.tnvm/versions/node/v5.5.0/bin/node: 5052700 ......... 539.24%
buffers/buffer-is-encoding.js encoding=UCS-2: ./node: 32000000 /Users/jacksontian/.tnvm/versions/node/v5.5.0/bin/node: 4389800 ........ 628.97%
buffers/buffer-is-encoding.js encoding=UTF16LE: ./node: 23653000 /Users/jacksontian/.tnvm/versions/node/v5.5.0/bin/node: 4427300 ...... 434.26%
buffers/buffer-is-encoding.js encoding=UTF-16LE: ./node: 25788000 /Users/jacksontian/.tnvm/versions/node/v5.5.0/bin/node: 4756400 ..... 442.17%
buffers/buffer-is-encoding.js encoding=utf9: ./node: 33145000 /Users/jacksontian/.tnvm/versions/node/v5.5.0/bin/node: 4541200 ......... 629.87%
buffers/buffer-is-encoding.js encoding=utf-7: ./node: 31347000 /Users/jacksontian/.tnvm/versions/node/v5.5.0/bin/node: 4464600 ........ 602.13%
buffers/buffer-is-encoding.js encoding=utf17le: ./node: 31211000 /Users/jacksontian/.tnvm/versions/node/v5.5.0/bin/node: 4309600 ...... 624.23%
buffers/buffer-is-encoding.js encoding=utf-17le: ./node: 21892000 /Users/jacksontian/.tnvm/versions/node/v5.5.0/bin/node: 5136700 ..... 326.19%
buffers/buffer-is-encoding.js encoding=Unicode-FTW: ./node: 61616000 /Users/jacksontian/.tnvm/versions/node/v5.5.0/bin/node: 4847600 . 1171.06%
buffers/buffer-is-encoding.js encoding=new gnu gun: ./node: 68657000 /Users/jacksontian/.tnvm/versions/node/v5.5.0/bin/node: 4702100 . 1360.13% |
Use automata to avoid toLowerCase(), faster, but more dirty.
b9c4f27
to
0c4bb94
Compare
Ah, ignore my previous comment, I misread your results =). I removed it. |
Is this not getting out of hand? I appreciate all the performance work being done, and being quite the performance geek myself, I really enjoy this. But how is this maintainable? |
: ) |
If I may make a suggestion that would keep maintainability at a higher level. Why not generate a function that generates the very code you wrote here (as a string), based on an array of strings you feed into it, then feed that into |
I'm not exactly sure that this is worth it.
Also, I doubt that it's the bottleneck in some actual code. |
So we have to trade performance off against maintainable. Feel free to discuss it. |
I think I'd be more in favor of what @ronkorving is suggesting. It shouldn't be that hard to do either. Also the generated and generator function could be re-used in other places where case-insensitive (static) string matching is done. |
My results, alternative approach (this is compared with your PR, not with the current master):
Code: Buffer.isEncoding = function(encoding) {
if (!encoding.length)
encoding = '' + encoding;
var loweredCase = false;
for (;;) {
switch (encoding.length) {
case 3:
switch (encoding) {
case 'hex':
return true;
}
break;
case 4:
switch (encoding) {
case 'utf8':
case 'ucs2':
return true;
}
break;
case 5:
switch (encoding) {
case 'utf-8':
case 'ascii':
case 'ucs-2':
return true;
}
break;
case 6:
switch (encoding) {
case 'binary':
case 'base64':
return true;
}
break;
case 7:
switch (encoding) {
case 'utf16le':
return true;
}
break;
case 8:
switch (encoding) {
case 'utf-16le':
return true;
}
break;
}
if (loweredCase)
return false;
encoding = ('' + encoding).toLowerCase();
loweredCase = true;
}
}; Note that all the actively used encodings are actually faster this way, and that the code is a bit cleaner. Results compared with
I still doubt that this is worth it, because For reference, your PR against master on my PC:
Edit: code and results updated. |
Also note that the results should probably be rechecked on v8 4.9, this looks like something v8 should optimize on it's own. |
A generator function for this kind of parsing is a very good idea. I have deep concerns over making these kinds of micro-optimizations as it does make the code much more opaque and difficult to manage. If we can put this behind a generator function that optimizes either at compile time or module-load time, then we can balance those needs quite well. |
I'm not -1 for performance improvements, but we need to recognize what we're actually optimizing. Look in function normalizeEncoding(encoding) {
// place all the black magic wizardy you want here
} Then use it like so: Buffer.isEncoding = function isEncoding(encoding) {
return normalizeEncoding(encoding) !== null;
}; Which would also be useful for cases like switch(normalizeEncoding(encoding)) {
case 'hex':
return this.hexWrite(string, offset, length);
... Make sense? |
There are a few spots in core where encoding names are accepted and some where they aren't (e.g. 'UTF-8' (turned into buffer), so this benefits in more than one way. |
I had a little extra time today so I wrote a case-insensitive string comparison function generator and it seems like you pretty much need to keep the existing I've also added a pre-optimize step in the benchmark in this PR before calling FWIW here are the results I get with my changes:
|
@mscdex Have you tried to split the existing |
@ChALkeR I just tested that additional change and it does improve things:
EDIT: updated results, there was a typo previously which affected things a bit |
@mscdex Can I take a look at the code? ;-) |
I am confuse why my first version faster than master, but look like slower after force optimization with following code:
|
Here is an another implementation. It use template to generate |
@JacksonTian Did you check the output when adding at least |
|
+1 to what @trevnorris said. |
Is there something to say for the Node community to play a bigger role in V8 development directly? It seems to me that this type of optimization should be completely unnecessary if V8 was able to do it itself (in fact, it should be even much faster I expect). Now I'm no V8 expert in the slightest, so don't see myself lead an effort like this, but the Node community has a lot of working groups. Does it not make sense to have a group dedicated to the JS engine that Node so deeply depends upon? |
#3741 - which, admittedly, I dropped the ball on. I wasn't (and still am not) quite sure how to flesh it out properly. It doesn't help that I currently don't have much time to dedicate. |
Ah, I didn't know that was already ongoing. Great to see so much interest. I hope that process can resume at some point. |
@JacksonTian ... ping ... what's the status on this one? |
7da4fd4
to
c7066fb
Compare
It this going to live in the light of #7207? |
I'm going to go ahead and close this — after #7207 the PR must be entirely redone if it is even still relevant, and the discussion over a generator function should probably occur upon code landing in a new PR. Thanks! |
Use automata to avoid toLowerCase(), faster, but more dirty.