Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bignum/bigreal number representation needed #218

Closed
jmblog opened this issue Nov 23, 2013 · 19 comments
Closed

bignum/bigreal number representation needed #218

jmblog opened this issue Nov 23, 2013 · 19 comments

Comments

@jmblog
Copy link

jmblog commented Nov 23, 2013

There are cases that jq converts extra large numbers to ones with scientific (exponent) notation.

$ cat test.json
{"income":10000000,"total":11111111}

$ cat test.json | jq '.'
{
  "total": 11111111,
  "income": 1e+07
}
@nicowilliams
Copy link
Contributor

I've started on implementing delayed parsing of numbers so as to preserve their original form wherever possible (i.e., whenever the actual number isn't needed as a double in the jq program). It turns out that this will require a lot of work :( Even then we'll need a bignum library to support bignum math in jq.

@stedolan
Copy link
Contributor

stedolan commented Dec 4, 2013

I'm not entirely sure it's wrong to do so. What format would you prefer?

@tischwa
Copy link

tischwa commented Dec 4, 2013

In term of math it is not wrong to write 1e+07 instead of 10000000.

But in terms of software it makes a big difference: Consider a Unix pipe (my usecase) like

% some program | jq 'a cool filter' | some other program B

Now jq's output is fed this to program B, which can deal with int64 or even int128, but not with floats, because in the original data there are no floats.

When jq makes a conversion like above, program B bails out.

See: #143 (comment)

To answer your question: I would prefer if jq would not change the representation of a number, if this number is just moved from jq input to jq output.

(like "sort -g", it does interpret the input numbers but still outputs the original lines.)

@jmblog
Copy link
Author

jmblog commented Dec 5, 2013

@tischwa +1

@nicowilliams
Copy link
Contributor

It'd be a lot easier to provide options for how numbers are formatted on output than to preserve input form (when not touched by arithmetic).

@dfkoh
Copy link

dfkoh commented Dec 5, 2013

I've started on implementing delayed parsing of numbers so as to preserve their original form wherever possible (i.e., whenever the actual number isn't needed as a double in the jq program). It turns out that this will require a lot of work :( Even then we'll need a bignum library to support bignum math in jq.

@nicowilliams I already did this, in my fork: https://github.com/airfrog/jq
I can send you a pull request if you want to incorporate it into the main branch.

@tischwa
Copy link

tischwa commented Dec 5, 2013

Wouldn't it be possible to keep for each parsed number not only the numeric value, but also the input string?
If nothing is assigned to that numeric field during filtering, the input string could be output as is.
If a number is created in an arithmeic expression, the string is empty and the number is output according to some formatting option.

I remember this #143 (comment) where airfrog seemed to have something similar working. (Ahh, I just saw he sent a pull request.)

I think in terms of universal usability jq would gain a lot, if it would follow the typical philosophy of the classical Unix filter-like programs, which only modify the input, if they have to. Examles:

% cat num.txt
111111111111111111
222222222222222222

% jq '.' num.txt
111111111111111100
222222222222222200

% awk '{print $1, 1*$1}' num.txt
111111111111111111 111111111111111104
222222222222222222 222222222222222208

% sort -n num.txt
111111111111111111
222222222222222222

% sort -g num.txt
111111111111111111
222222222222222222

So usually the input is fed through, only if awk has to do the computation 1*$1 it switches internally to a numeric representation, the plain $1 is printed exactly as given in the input. Also sort -n/-g has to interpret the lines numerically but still gives the original input as output.

@mericano1
Copy link

+1

@nicowilliams nicowilliams added this to the 1.5 release milestone Jun 6, 2014
@nicowilliams nicowilliams changed the title Extra large numbers with scientific (exponent) notation bignum/bigreal number representation needed Jun 17, 2014
@nicowilliams
Copy link
Contributor

jq does have David M. Gay's bigint code in jv_dtoa.c. Perhaps it should use more of it. It's thread-safe, and the jv_dtoa_context stuff is really for caching reusable things -- an optimization we could remove if it made things easier. This is clearly more complete than libtomfloat for some things, namely: parsing and formatting numbers, as well as big2double and double2big conversions (which will be needed for API backwards compatibility reasons, and to be able to use libm functions). But it's also less complete for other things: fewer arithmetic operations are implemented (e.g., there's no divide, just a ratio() that returns a double). Either there's a lot of work to do on either codebase, or we find another, more complete library. Ideas?

OTOH, jq maybe doesn't need bignum operations, just a bignum representation falling back to doubles for arithmetic (and comparison?). But I'd prefer to only fallback to doubles for libm functions for which we find no better alternative.

@pkoppstein
Copy link
Contributor

@nicowilliams wrote:

OTOH, jq maybe doesn't need bignum operations, just a bignum representation ....

bignum operations for the medium or long term; bignum representation for the short term (or tomorrow :-)

@nicowilliams
Copy link
Contributor

Well, it's early days and there's still research to be done.

http://www.eskimo.com/~eresrch/float/ looks promising, though I've no idea what the license on it would be (I sent the author email about this). It's very complete, but a) it's fixed-precision (probably easy to change to be dynamic) and b) it doesn't handle normal string representation of numbers (probably also easy to fix). I haven't looked but I suspect it also doesn't do double2big and big2double conversion.

@nicowilliams
Copy link
Contributor

The author of Big Float (http://www.eskimo.com/~eresrch/float/) has agreed to let us use it under friendly terms. I'll take a look at it and see how suitable it is.

@nicowilliams
Copy link
Contributor

@kutzi
Copy link

kutzi commented May 30, 2017

Would be really really nice to have some progress here. I've just been bitten by this bug and it's really hard to catch as the numbers jq spits out looks totally legid - i.e. within the same magnitude

@nicowilliams
Copy link
Contributor

nicowilliams commented May 30, 2017

@kutzi (and anyone else interested in bignum support in jq) We have a PR (that I need to find time to finish) for 64-bit integer support (in addition to IEEE754 doubles). We're not likely to add any kind of bignums unless someone submits a PR. If you or anyone else wants to work on bignum support for jq, you'll need to be aware of a couple of things:

  • no GPL, no LGPL welcomed
  • jq's source code does not always jv_free() values known to be numbers, and valgrind can't check for that anyways given that numbers today are never allocated

I'd start by adding a compile-time option to use allocated numbers so that numeric jvs point to a malloc()ed double, that way failures to jv_free() numbers can be caught by valgrind and fixed.

Next I'd look for a suitable bignum library. There are quite a number of them, but they'd have to be a) C-coded or otherwise have a C API, b) licensed in a way that's friendly to jq's license and jq's users.

Lastly, I'd integrate such a bignum library much like Oniguruma: as a [git] submodule that is used if ./configure can't find it installed or if the user wants the submodule used.

@gcsfred2
Copy link

Any updates?

@mjustin
Copy link

mjustin commented May 19, 2023

Would this be the right issue to ask about having jq preserve zeros after the decimal point as well, e.g. not converting 5.0 to 5? JSON is agnostic about the semantics of numbers, but the programs using the JSON may very well care to differentiate between 500, 500.0, and 5e2 (it might be doing precision-based calculations, for instance).

As a concrete example Java's BigDecimal class keeps track of both the unscaled value and the scale. So these could be reasonably be differentiated as different values when read by a Java program:

        System.out.println(new BigDecimal("500"));
        System.out.println(new BigDecimal("500.0"));
        System.out.println(new BigDecimal("5e2"));
        System.out.println();
        System.out.println(new BigDecimal(BigInteger.valueOf(500), 0));
        System.out.println(new BigDecimal(BigInteger.valueOf(5000), 1));
        System.out.println(new BigDecimal(BigInteger.valueOf(5), -2));

=>

500
500.0
5E+2

500
500.0
5E+2
$ echo '[500, 500.0, 5e2]' | jq -c
[500,500,500]

@itchyny itchyny removed this from the 2.0 release milestone Jun 25, 2023
@emanuele6
Copy link
Member

jq 1.7 released with support for literal large numbers. closing

@tischwa
Copy link

tischwa commented Sep 9, 2023

Awesome, thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests