Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Erroneously rounds long integer numbers #26

Open
mpnovikova opened this issue Oct 5, 2015 · 13 comments
Open

Erroneously rounds long integer numbers #26

mpnovikova opened this issue Oct 5, 2015 · 13 comments

Comments

@mpnovikova
Copy link

SO that
{"organizerKey": 7161093205057351174} becomes {"organizerKey": 7161093205057352000}

@mpnovikova mpnovikova changed the title Erroneously rounds long integers numbers Erroneously rounds long integer numbers Oct 5, 2015
@creationix
Copy link
Owner

I'm afraid this is a limitation of JavaScript itself. The number type only had 53 bits of precision total.

Try to simply type the number in a node repl or the browser console and you'll see the echoed back value is the same thing jsonparse gives you.

I would recommend storing the key as a string (maybe hex encoded to save space) if you wish to keep all the digits exactly.

@mpnovikova
Copy link
Author

I am aware of those limitations in JavaScript. The problem is I am reading this data from the 3-rd party API Citrix to be precise. And this is how they give it to users. Would you be able to track the scenario when a sequence on numbers is longer than JavaScript can correctly parse as a Int and parse that sequence as a string?

@creationix
Copy link
Owner

It is possible to check the number after parsing and if it's greater than the highest possible 53-bit integer, I could return a string or something, but that would break for people expecting a number. There are many cases where an imprecise number is better than a precise string.

Do you feel confident enough to add the check to the number parser? We could maybe add it as a flag so that you could opt-in to the new behavior? The full list of characters would need to be recorded like is done for strings and then released if the number is used, or used if the number is dropped.

@creationix
Copy link
Owner

Actually, it's probably a good idea to buffer the number and then use parseFloat(str). The memory overhead of the buffered number is minimal, and surely parseFloat is much faster than my manual implementation. It would shrink the code a bit too.

Once the new number parser is in, it would be trivial to store the string in case the parsed version is above the 53-bit threshold.

@mpnovikova
Copy link
Author

We use JSONStream library that relies on this parser. There are many cases where it just uses a number. However in this particular case these long Int IDs are critical for the business functionality. The workaround I had was to omit the JSONStream all together, collect the data in the buffer, then use JSONbig.parse() to parse long int's; but then I run into performance issues and timeouts because instead of streaming the data I am waiting for it.

To answer you question, for the purposes of our project I could fork this repo and add your workaround. I would appreciate if you could point where and how this fix should be done.

Thank you for being so response, highly appreciate it

@creationix
Copy link
Owner

So the basic idea is to accumulate the characters in the number as a string. You can re-use the same property string parsing uses https://github.com/creationix/jsonparse/blob/master/jsonparse.js#L176

You don't need to worry about utf-8 decoding in the number code, all the digits are in the ascii 7-bit range.

Then at all the lines that emit NUMBER tokens (look like this.onToken(NUMBER...), you'll need to check if you should store the string or the number (probably factor this out into a common helper function).

@creationix
Copy link
Owner

The parseFloat idea is separate, but does depend on the string being buffered. If you add parseFloat in the emitting helper, you can remove all the lines that calculate the value piecemeal. The many number states are still required to know when the number has ended however.

@mpnovikova
Copy link
Author

Thank you, I will give it a try

@mpnovikova
Copy link
Author

This fragment https://github.com/creationix/jsonparse/blob/master/jsonparse.js#L183-L307 can be replaced by this

 } else if (this.tState === NUMBER1 || this.tState == NUMBER2 || this.tState === NUMBER3) {
        n = buffer[i];
        switch (n) {
          case 0x30: // 0
          case 0x31: // 1
          case 0x32: // 2
          case 0x33: // 3
          case 0x34: // 4
          case 0x35: // 5
          case 0x36: // 6
          case 0x37: // 7
          case 0x38: // 8
          case 0x39: // 9
          case 0x2e: // .
          case 0x65: // e
          case 0x45: // E
          case 0x2b: // +
          case 0x2d: // -
            this.magnatude += String.fromCharCode(n);
            this.tState = NUMBER3;
            break;
          default:
            this.tState = START;
            if (this.negative) {
              this.magnatude = '-' + this.magnatude;
            }
            this.onToken(NUMBER, parseFloat(this.magnatude));
            this.magnatude = undefined;
            i--;
            break;
        }
}

without loosing any existing functionality. Is that what you had in mind when you talked about parseFloat?

@creationix
Copy link
Owner

This is close, great work. Just a couple suggestions:

  • By merging the 3 states into one, you will be also allowing illegal patterns like --EE31e. It would be best I think to keep the separate states so as to keep the parser a little stricter. If parseFloat weren't so liberal, we could just depend on that to catch illegal patterns, but it will happily parse strings like "123this is not a number".
  • You can store the initial - in the string instead of setting the this.negative, then you don't need to prepend a "-" at the end.
  • I was initially confused by this.magnatude being used to store the string version of the number, I'd rather re-use the this.string property that string parsing uses. The parser is never parsing strings and numbers at the same time so there is no conflict and it makes sense semantically as well since we are storing a string.

You should send a pull request with the proposed changes and my suggested fixes. That way I can discuss inline and once it's good I can merge the PR and you'll get credit for the change and be listed as a collaborator in the repo.

@mpnovikova
Copy link
Author

On point #1
"123this is not a number" won't go through 0-9eE+- mesh all parseFloat has to do after that is to give you NaN on cases like --EE31e; In case of 123Ee it will give you 123 both results are fine with me. Here is the branch fee free to comment inline https://github.com/mpnovikova/jsonparse/blob/bignum/jsonparse.js
Some unit tests are failing not but I am working on them

@jeromew
Copy link

jeromew commented Jul 25, 2016

I understand that this issue can be closed after 044b268

@mmikhalko
Copy link

mmikhalko commented Sep 13, 2019

The problem is still presenting if use a negative big int number like -9223372036854775808.
It had better to add in the RegExp '-' symbol in the line
https://github.com/creationix/jsonparse/blob/master/jsonparse.js#L270

There is a pull request for it case #38

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants