Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid JSON (Invalid UTF-8 character at position 0 in state STRING1) #41

Open
richardscarrott opened this issue Jul 6, 2020 · 3 comments

Comments

@richardscarrott
Copy link

We're indirectly using jsonparse via JSONStream to stream in JSON data stored in Google Cloud Storage and we're intermittently seeing the following error:

Invalid JSON (Invalid UTF-8 character at position 0 in state STRING1)

99% of the time the data is parsed successfully so I'm guessing it's related to where the chunks of data are split over http -- I believe it could be related to emoji characters or Japanese chars as both exist in our json but I'm struggling to pin point exactly where it's failing.

Is there perhaps a way to log more information re: the string value it failed on?

@richardscarrott
Copy link
Author

richardscarrott commented Jul 6, 2020

Okay, I've found the problematic character by logging the buffer before the error is thrown; it's a right double quote and it only happens when at position 0 of a given chunk:

Screenshot 2020-07-06 at 19 16 57

Any idea why this character would be misinterpreted, is it an issue on our side or a bug here?

@richardscarrott
Copy link
Author

It looks like this library (and JSONStream) is no longer actively maintained, so for anybody else who is unfortunate enough to run into this issue, I ended up using stream-json which hasn't presented the same problem e.g.

Before:

import _ from 'highland';
import JSONStream from 'JSONStream';

// { data: [{}, {}, {}] }
_(readableStream)
   .through(JSONStream.parse('data.*'))
   .toArray((result) => console.log('DONE', result))

After:

import _ from 'highland';
import { parser } from 'stream-json';
import { pick } from 'stream-json/filters/Pick';
import { streamArray } from 'stream-json/streamers/StreamArray';

// { data: [{}, {}, {}] }
_(readableStream)
    .through(parser())
    .through(pick({ filter: 'data' }))
    .through(streamArray())
    .map(({ value }) => value)
    .toArray((result) => console.log('DONE', result))

@cldellow
Copy link

cldellow commented Apr 27, 2022

For others who stumble across this but still want to use this library, I think changing

jsonparse/jsonparse.js

Lines 166 to 171 in b2d8bc6

for (var j = 0; j < this.bytes_remaining; j++) {
this.temp_buffs[this.bytes_in_sequence][this.bytes_in_sequence - this.bytes_remaining + j] = buffer[j];
}
this.appendStringBuf(this.temp_buffs[this.bytes_in_sequence]);
this.bytes_in_sequence = this.bytes_remaining = 0;
can fix this.

Only emit the new character if the buffer contains at least as many bytes as are remaining in the sequence:

        var toConsume = Math.min(this.bytes_remaining, buffer.length);
        for (var j = 0; j < toConsume; j++) {
          this.temp_buffs[this.bytes_in_sequence][this.bytes_in_sequence - this.bytes_remaining + j] = buffer[j];
        }
        this.bytes_remaining -= toConsume;

        if (this.bytes_remaining === 0) {
          this.appendStringBuf(this.temp_buffs[this.bytes_in_sequence]);
          this.bytes_in_sequence = 0;
        }

My fork is pretty far removed from this one, otherwise I'd publish this in a more useful format. Still, hope it helps someone!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants