-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Invalid JSON (Invalid UTF-8 character at position 0 in state STRING1) #41
Comments
Okay, I've found the problematic character by logging the buffer before the error is thrown; it's a right double quote and it only happens when at position 0 of a given chunk: Any idea why this character would be misinterpreted, is it an issue on our side or a bug here? |
It looks like this library (and JSONStream) is no longer actively maintained, so for anybody else who is unfortunate enough to run into this issue, I ended up using Before: import _ from 'highland';
import JSONStream from 'JSONStream';
// { data: [{}, {}, {}] }
_(readableStream)
.through(JSONStream.parse('data.*'))
.toArray((result) => console.log('DONE', result)) After: import _ from 'highland';
import { parser } from 'stream-json';
import { pick } from 'stream-json/filters/Pick';
import { streamArray } from 'stream-json/streamers/StreamArray';
// { data: [{}, {}, {}] }
_(readableStream)
.through(parser())
.through(pick({ filter: 'data' }))
.through(streamArray())
.map(({ value }) => value)
.toArray((result) => console.log('DONE', result)) |
For others who stumble across this but still want to use this library, I think changing Lines 166 to 171 in b2d8bc6
Only emit the new character if the buffer contains at least as many bytes as are remaining in the sequence: var toConsume = Math.min(this.bytes_remaining, buffer.length);
for (var j = 0; j < toConsume; j++) {
this.temp_buffs[this.bytes_in_sequence][this.bytes_in_sequence - this.bytes_remaining + j] = buffer[j];
}
this.bytes_remaining -= toConsume;
if (this.bytes_remaining === 0) {
this.appendStringBuf(this.temp_buffs[this.bytes_in_sequence]);
this.bytes_in_sequence = 0;
} My fork is pretty far removed from this one, otherwise I'd publish this in a more useful format. Still, hope it helps someone! |
We're indirectly using
jsonparse
viaJSONStream
to stream in JSON data stored in Google Cloud Storage and we're intermittently seeing the following error:99% of the time the data is parsed successfully so I'm guessing it's related to where the chunks of data are split over http -- I believe it could be related to emoji characters or Japanese chars as both exist in our json but I'm struggling to pin point exactly where it's failing.
Is there perhaps a way to log more information re: the string value it failed on?
The text was updated successfully, but these errors were encountered: