Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

string_decoder: RangeError introduced by 6.2.1 #7308

Closed
gagern opened this issue Jun 15, 2016 · 4 comments · Fixed by #7310
Closed

string_decoder: RangeError introduced by 6.2.1 #7308

gagern opened this issue Jun 15, 2016 · 4 comments · Fixed by #7310
Labels
string_decoder Issues and PRs related to the string_decoder subsystem.

Comments

@gagern
Copy link
Contributor

gagern commented Jun 15, 2016

  • Version: 6.2.1
  • Platform: Linux 4.6.1-gentoo x86_64 AuthenticAMD GNU/Linux
  • Subsystem: string_decoder

The update from 6.2.0 to 6.2.1 broke tests (on Travis and elsewhere) for CindyJS:

_stream_readable.js:65
  // not happen before the first write call.

RangeError: out of range index
    at RangeError (native)
    at StringDecoder.fillLast (string_decoder.js:94:9)
    at StringDecoder.write (string_decoder.js:73:14)
    at readableAddChunk (_stream_readable.js:160:31)
    at ReadStream.Readable.push (_stream_readable.js:130:10)
    at onread (fs.js:1774:12)
    at FSReqWrap.wrapper [as oncomplete] (fs.js:675:17)

The fact that the quoted line is a comment is highly confusing. It makes it hard to get a sound idea as to what's actually going on here. So I bisected the problem to 79ad172 for PR #6777.

So far I haven't managed to come up with a small reproducing example for this. The issue is perfectly reproducible in CindyJS/CindyJS@c685ce4, though. Just run node make forbidden in the top-level directory of a git clone (not a mere tarball download):

nvm install v6.2.1
git clone https://github.com/CindyJS/CindyJS.git
cd CindyJS
git checkout c685ce4
node make forbidden

It will check groups of files against forbidden regular expressions. The high-level task specification is in make/build.js, with the actual command implementation in make/commands.js. If I only operate on files matching globs, I don't see this error message, so the code running git ls-files is probably involved somehow. But even when I extracted those parts into a smaller separate script, I couldn't reproduce the issue, so I'm not sure what other components play a role here.

Note that there will be an error message about an attempt to re-evaluate the fs module, which is due to the outdated version of graceful-fs used by unzip. That is unrelated; uncommenting the require("unzip") line makes it go away while still reproducing the issue. I'll probably switch to something like yauzl soon.

@mscdex mscdex added the string_decoder Issues and PRs related to the string_decoder subsystem. label Jun 15, 2016
@gagern
Copy link
Contributor Author

gagern commented Jun 15, 2016

I've got a reproducing example now:

'use strict';
const StringDecoder = require('string_decoder').StringDecoder;
const url = 'https://raw.githubusercontent.com/CindyJS/CindyJS/0d04d273c3/ref/img/RostS5.png';
let chunks = [];
require('https').get(url, res => {
  if (res.statusCode !== 200) throw Error(res.statusMessage);
  res.on('data', d => chunks.push(d));
  res.on('end', haveImg);
});
function haveImg() {
  const buf = Buffer.concat(chunks);
  if (buf.length !== 181909) throw Error("Unexpected size: " + buf.length);
  const decoder = new StringDecoder('utf8');
  decoder.write(buf.slice(0, 65536));
  decoder.write(buf.slice(65536, 131072));
  decoder.write(buf.slice(131072, 181909));
  decoder.end();
}

Yes, I'm feeding a PNG file to an UTF-8 decoder, so I don't expect the result to make much sense. Mostly replacement characters. I'd not expect a throw, though. The chunking appears to be relevant here; feeding the whole file to the decoder in one piece will not throw.

@gagern
Copy link
Contributor Author

gagern commented Jun 15, 2016

And a smaller reproducing example:

'use strict';
const decoder = new (require('string_decoder').StringDecoder)('utf8');
decoder.write(Buffer.from('c9b5a9', 'hex'));
decoder.write(Buffer.from('41', 'hex'));

@mscdex
Copy link
Contributor

mscdex commented Jun 15, 2016

Yes, I see it now. I will have a PR to fix this soon.

I am not sure why the exception message is pointing to the wrong text though (and even the wrong source file).

@gagern
Copy link
Contributor Author

gagern commented Jun 15, 2016

@mscdex My first guess was that Q.longStackSupport=true might be involved somehow, but that wasn't the case. And the reproducing example didn't reproduce that aspect of the problem either. Feels like another bug, probably unrelated to what we have here. Tracking the source of that location information through v8 internal code is no fun, though…

mscdex added a commit to mscdex/io.js that referenced this issue Jun 15, 2016
This commit fixes an issue when extra utf8 continuation bytes appear
at the end of a chunk of data, causing miscalculations to be made
when checking how many bytes are needed to decode a complete
character.

Fixes: nodejs#7308
gagern added a commit to gagern/node that referenced this issue Jun 16, 2016
There have been problems with utf8 decoding in cases where the input
was invalid.  Some cases would give different results depending on
chunking, while others even led to exceptions.  This commit simplifies
the code in several ways, reducing the risk of breakage.

Most importantly, the `text` method is not only used for bulk
conversion of a single chunk, but also for the conversion of the mini
buffer `lastChar` while handling characters spanning chunks.  That
should make most of the problems due to incorrect handling of special
cases disappear.

Secondly, that `text` method is now independent of encoding.  The
encoding-dependent method `complete` now has a single well-defined
task: determine the buffer position up to which the input consists of
complete characters.  The actual conversion is handled in a central
location, leading to a cleaner and leaner internal interface.

Thirdly, we no longer try to remember just how many bytes we'll need
for the next complete character.  We simply try to fill up the
`nextChar` buffer and perform a conversion on that.  This reduces the
number of internal counter variables from two to one, namely `partial`
which indicates the number of bytes currently stored in `nextChar`.
A possible drawback of this approach is that there is chance of
degraded performance if input arrives one byte at a time and is from a
script using long utf8 sequences.  As a benefit, though, this allows us
to emit a U+FFFD replacement character sooner in cases where the first
byte of an utf8 sequence is not followed by the expected number of
continuation bytes.

Fixes: nodejs#7308
mscdex added a commit to mscdex/io.js that referenced this issue Jun 24, 2016
This commit fixes an issue when extra utf8 continuation bytes appear
at the end of a chunk of data, causing miscalculations to be made
when checking how many bytes are needed to decode a complete
character.

Fixes: nodejs#7308
PR-URL: nodejs#7310
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Fedor Indutny <fedor.indutny@gmail.com>
Fishrock123 pushed a commit that referenced this issue Jun 27, 2016
This commit fixes an issue when extra utf8 continuation bytes appear
at the end of a chunk of data, causing miscalculations to be made
when checking how many bytes are needed to decode a complete
character.

Fixes: #7308
PR-URL: #7310
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Fedor Indutny <fedor.indutny@gmail.com>
Fishrock123 pushed a commit that referenced this issue Jul 5, 2016
This commit fixes an issue when extra utf8 continuation bytes appear
at the end of a chunk of data, causing miscalculations to be made
when checking how many bytes are needed to decode a complete
character.

Fixes: #7308
PR-URL: #7310
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Fedor Indutny <fedor.indutny@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
string_decoder Issues and PRs related to the string_decoder subsystem.
Projects
None yet
2 participants