-
Notifications
You must be signed in to change notification settings - Fork 29.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
string_decoder: make write after end to reset #16594
Conversation
Fixes: nodejs#16564 When StringDecoder's `end` is called, it is no longer supposed to wait for the data. If a `write` call is made after `end`, then the decoder has to be flushed and treated as a brand new write request. This patch also introduces a new StringDecoder#reset method, which simply resets all the internal data.
doc/api/string_decoder.md
Outdated
@@ -82,3 +82,13 @@ Returns a decoded string, ensuring that any incomplete multibyte characters at | |||
the end of the `Buffer` are omitted from the returned string and stored in an | |||
internal buffer for the next call to `stringDecoder.write()` or | |||
`stringDecoder.end()`. | |||
|
|||
### stringDecoder.reset([encoding]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A nit: it seems this should go before the stringDecoder.write()
, ABC-wise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
lib/string_decoder.js
Outdated
StringDecoder.prototype.write = function(buf) { | ||
if (this._closed === true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
=== true
is unnecessary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we explicitly compare strictly against the value, the performance would be better right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not with TurboFan, no. I asked about this in another PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For reference, I think this is the previous discussion in question: #16397 (comment)
What if we do something simpler, like overwrite at least |
@mscdex In that case, users would have to create a new |
@thefourtheye The basic concern still stands though, |
@@ -66,6 +66,16 @@ substitution characters appropriate for the character encoding. | |||
If the `buffer` argument is provided, one final call to `stringDecoder.write()` | |||
is performed before returning the remaining input. | |||
|
|||
### stringDecoder.reset([encoding]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this is an implementation detail users don't have to know about.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As it is, there is no way for the users to reset the string decoder, right? They have to create a new instance if needed. This will enable reusability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A new way to reset the state each time was not requested as far as I read the issue. So I also prefer not to expose the reset function. If I am correct the following should work.
const { StringDecoder } = require('string_decoder');
const decoder = new StringDecoder('utf8');
decoder.write(Buffer.from([0xE2, 0x82])); // => ''
decoder.end(); // => '�'
decoder.write(Buffer.of(0x61)); // => 'a'
lib/string_decoder.js
Outdated
StringDecoder.prototype.write = function(buf) { | ||
if (this._closed === true) | ||
this.reset(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this logic be in StringDecoder.prototype.end
instead? Make the per-encoding this.end
switching in the constructor actually set to an internal symbol-named property, and use a shared StringDecoder.prototype.end
that calls that internal encoding-specific function and clean up after itself. Would probably help with write performance too if StringDecoders of different encodings (e.g. UTF-8 which does not have an own property end
and UTF-16LE that does) are used simultaneously.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems all very complicated. Why isn't StringDecoder#end
just doing:
this.lastNeed = 0;
this.lastTotal = 0;
The buffer is already "unsafe" and access to it is guarded, so we don't need to sanitize it. And reusing a constructor as a call seems fraught.
One reason is backwards compatibility. |
Ping @thefourtheye |
I have updated the PR to make sure that |
lib/string_decoder.js
Outdated
function end(buf) { | ||
let result = ''; | ||
|
||
if (this.encoding === 'utf16le') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might consider using a switch
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack.
lib/string_decoder.js
Outdated
let result = ''; | ||
|
||
if (this.encoding === 'utf16le') | ||
result = utf16End.call(this, buf); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're the only ones calling these methods directly, then we should be able to avoid the overhead of .call()
and just pass an instance as another argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack.
lib/string_decoder.js
Outdated
else | ||
result = simpleEnd.call(this, buf); | ||
|
||
this.reset(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be better to use this.reset(this.encoding)
to avoid the switch
in the constructor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We still cannot get rid of the switch
in the constructor, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, but this.reset(this.encoding)
would only hit that first if
in the constructor, which might be better.
fe260cc
to
7c2286f
Compare
]); | ||
|
||
function end(buf) { | ||
const result = (endMappings.get(this.encoding) || simpleEnd)(this, buf); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks "elegant" but using a switch case for the encoding and calling the function directly is probably faster.
@@ -66,6 +66,16 @@ substitution characters appropriate for the character encoding. | |||
If the `buffer` argument is provided, one final call to `stringDecoder.write()` | |||
is performed before returning the remaining input. | |||
|
|||
### stringDecoder.reset([encoding]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A new way to reset the state each time was not requested as far as I read the issue. So I also prefer not to expose the reset function. If I am correct the following should work.
const { StringDecoder } = require('string_decoder');
const decoder = new StringDecoder('utf8');
decoder.write(Buffer.from([0xE2, 0x82])); // => ''
decoder.end(); // => '�'
decoder.write(Buffer.of(0x61)); // => 'a'
ping @thefourtheye |
Ping again @thefourtheye |
This resets the StringDecoder's state after calling `#end`. Further writes to the decoder will act as if it were a brand new instance, allowing simple reuse. Refs: nodejs#16594 Fixes: nodejs#16564
This resets the StringDecoder's state after calling `#end`. Further writes to the decoder will act as if it were a brand new instance, allowing simple reuse. PR-URL: #18494 Fixes: #16564 Refs: #16594 Reviewed-By: Tiancheng "Timothy" Gu <timothygu99@gmail.com> Reviewed-By: Sakthipriyan Vairamani <thechargingvolcano@gmail.com> Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de> Reviewed-By: Anna Henningsen <anna@addaleax.net>
This resets the StringDecoder's state after calling `#end`. Further writes to the decoder will act as if it were a brand new instance, allowing simple reuse. PR-URL: #18494 Fixes: #16564 Refs: #16594 Reviewed-By: Tiancheng "Timothy" Gu <timothygu99@gmail.com> Reviewed-By: Sakthipriyan Vairamani <thechargingvolcano@gmail.com> Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de> Reviewed-By: Anna Henningsen <anna@addaleax.net>
This resets the StringDecoder's state after calling `#end`. Further writes to the decoder will act as if it were a brand new instance, allowing simple reuse. PR-URL: #18494 Fixes: #16564 Refs: #16594 Reviewed-By: Tiancheng "Timothy" Gu <timothygu99@gmail.com> Reviewed-By: Sakthipriyan Vairamani <thechargingvolcano@gmail.com> Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de> Reviewed-By: Anna Henningsen <anna@addaleax.net>
This resets the StringDecoder's state after calling `#end`. Further writes to the decoder will act as if it were a brand new instance, allowing simple reuse. PR-URL: #18494 Fixes: #16564 Refs: #16594 Reviewed-By: Tiancheng "Timothy" Gu <timothygu99@gmail.com> Reviewed-By: Sakthipriyan Vairamani <thechargingvolcano@gmail.com> Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de> Reviewed-By: Anna Henningsen <anna@addaleax.net>
This resets the StringDecoder's state after calling `#end`. Further writes to the decoder will act as if it were a brand new instance, allowing simple reuse. PR-URL: #18494 Fixes: #16564 Refs: #16594 Reviewed-By: Tiancheng "Timothy" Gu <timothygu99@gmail.com> Reviewed-By: Sakthipriyan Vairamani <thechargingvolcano@gmail.com> Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de> Reviewed-By: Anna Henningsen <anna@addaleax.net>
This resets the StringDecoder's state after calling `#end`. Further writes to the decoder will act as if it were a brand new instance, allowing simple reuse. PR-URL: #18494 Fixes: #16564 Refs: #16594 Reviewed-By: Tiancheng "Timothy" Gu <timothygu99@gmail.com> Reviewed-By: Sakthipriyan Vairamani <thechargingvolcano@gmail.com> Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de> Reviewed-By: Anna Henningsen <anna@addaleax.net>
This resets the StringDecoder's state after calling `#end`. Further writes to the decoder will act as if it were a brand new instance, allowing simple reuse. PR-URL: #18494 Fixes: #16564 Refs: #16594 Reviewed-By: Tiancheng "Timothy" Gu <timothygu99@gmail.com> Reviewed-By: Sakthipriyan Vairamani <thechargingvolcano@gmail.com> Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de> Reviewed-By: Anna Henningsen <anna@addaleax.net>
This resets the StringDecoder's state after calling `#end`. Further writes to the decoder will act as if it were a brand new instance, allowing simple reuse. PR-URL: nodejs#18494 Fixes: nodejs#16564 Refs: nodejs#16594 Reviewed-By: Tiancheng "Timothy" Gu <timothygu99@gmail.com> Reviewed-By: Sakthipriyan Vairamani <thechargingvolcano@gmail.com> Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de> Reviewed-By: Anna Henningsen <anna@addaleax.net>
Fixes: #16564
When StringDecoder's
end
is called, it is no longer supposed to waitfor the data. If a
write
call is made afterend
, then the decoderhas to be flushed and treated as a brand new write request.
This patch also introduces a new StringDecoder#reset method, which
simply resets all the internal data.
Checklist
make -j4 test
(UNIX), orvcbuild test
(Windows) passesAffected core subsystem(s)