jv: Add some support for 64 bit ints in a very conservative way (ALTERNATIVE) #1327

nicowilliams · 2017-01-29T01:13:22Z

This is a variant of #1246 that doesn't change the size of a jv.

@dequis What do you think?

dequis · 2017-01-29T05:26:49Z

I'm not a fan of the JQ_OMIT_INTS ifdefs, they smell like premature optimizaition, probably worth benchmarking to see if they are justified.

You should probably change the title of this PR because it's not very conservative, heh. In particular, messing with the union was something i explicitly wanted to avoid - my implementation only affected display and didn't have to deal with other parts of the code. But everything seems to go through jv_number_value() and jv_number() so if you think that's fine I think that's fine.

Having signed 64 bit ints too is neat.

Only skimmed, haven't reviewed much. Also haven't tested.

nicowilliams · 2017-01-29T06:26:12Z

Arithmetic (and math) continues to go through jv_number() and jv_number_value(). Only parsing and printing check the integer members of the union, and the utility jv functions for C applications that want them. Not changing the size of jv, and not having multiple internal representations of any given number at once, seems conservative to me :) and anyways, I kept the title of your PR.

nicowilliams · 2017-01-29T06:30:20Z

As to the JQ_OMIT_INTS... I am copying the SQLite3 pattern. It's fairly cheap, but the problem is that it probably won't get much testing. It is not a premature optimization so much as... me not having measured the impact of this new feature.

pkoppstein · 2017-01-29T06:45:28Z

@nicowilliams - No doubt I'm missing something, but using @dequis's test case, it seems this change entails a step backwards:

github/nicowilliams/jq$ ./jq --version
jq-1.2-889-gb28c886

github/nicowilliams/jq$ git log -1
commit b28c886edbb45c2128f7f5eb79da169190473cff
Author: Nicolas Williams <nico@cryptonector.com>
Date:   Sat Jan 28 18:56:55 2017 -0600

    jv int64 and uint64 support

github/nicowilliams/jq$ echo 111111111111111111 | ./jq -c '[., 1*.]'
echo 111111111111111111 | ./jq -c '[., 1*.]'
[111111111111111104,111111111111111100]

By contrast:

$ echo 111111111111111111 | jq1.5 -c '[., 1*.]'
[111111111111111100,111111111111111100]

nicowilliams · 2017-01-29T07:01:54Z

@pkoppstein Yeah, and that's not the only thing that's wrong. Try echo 2 63^p|dc|jq .. We need more test cases, that's for sure.

nicowilliams · 2017-02-03T00:34:24Z

Tests still needed.

nicowilliams · 2017-02-26T01:31:54Z

@wtlangford Can you review this? I think it's done.

Also, I'm thinking of taking this approach a bit further. I'm thinking of adding jv sub-types for:

binary data (sub-type of array, appearing to be an array of numbers in the range of 0..256 -- i.e., bytes)
array slices when the offset/size does not fit in a jv EDIT: See surprising bug if n > 65535 -- array slice offset/size overflow #1108.

nicowilliams · 2017-02-28T22:11:20Z

Ah, so, jv_identical() needs to be updated too.

EDIT: No, it's done already.

wtlangford

Nothing here seems too concerning, other than the overflow checking.

We appear to continue to do double math with this change. Do we want to consider being subkind-aware for adding and subtracting (and maybe multiplication?). Division should, of course, use double behavior, since we don't want integer division semantics in jq.

I'm envisioning this problem (which is a floating-point precision error):
9223372036854774784 + 1 #9223372036854774784
Of course, jq still prints that as 9223372036854775000 because of a bug somewhere in jv_dtoa...

Is inconsistent integer semantics worse than surprising double precision errors?

Also, we ought to use in our constant folding.
Also, tests. Lots of them.

wtlangford · 2017-03-02T00:23:01Z

src/builtin.c

+      return jv_int64(i);
+  } else if (d < UINT64_MAX) {
+    uint64_t u = d;
+    if (d == (double)u)


Also this one.

Ditto. Here we know that d is positive, and if it's also less than UINT64_MAX then we can represent it as a uint64_t.

wtlangford · 2017-03-02T00:23:23Z

src/builtin.c

+  double d = jv_number_value(input);
+  if (d < 0 && d >= INT64_MIN) {
+    int64_t i = d;
+    if (i < 0)


What's happening here? Is this a guard against the cast turning -0.5 into 0 instead of -1?

No, it's just for deciding whether to return a signed 64-bit integer representation or unsigned 64-bit integer representation.

But if the check is false, the next code that runs is return jv_number(nearbyint(d));, which is a double representation.

These checks should use the nextafter stuff from below for INT64_MIN and UINT64_MAX

Eh? No, there's an else if at 377.

Perhaps I misunderstand.
Line 373 is checking to see if d is negative AND d fits into an int64_t
Line 374 does the cast
What does line 375 check?

Oh yes, i'm the one who's confused here. I need to s/else // here :)

wtlangford · 2017-03-02T00:33:09Z

src/jv.c

+#ifndef JQ_OMIT_INTS
+  jv j = {JV_KIND_NUMBER, JV_SUBKIND_INT64, 0, 0, {.int64 = x}};
+#else
+  jv j = {JV_KIND_NUMBER, JV_SUBKIND_INT64, 0, 0, {.number = x}};


This uses .number, but has a JV_SUBKIND_INT64?

Oops! (It would have no effect, given that other code that would check it would be omitted. But it's still a bug.)

wtlangford · 2017-03-02T00:52:17Z

src/jv.c

+      return (int64_t)j.u.uint64;
+    return INT64_MAX;
+  }
+  if (j.u.number > 0 && (int64_t)j.u.number < 0)


I'm concerned this might be processor-specific behavior and prone to breaking.
How about this instead:

if (j.u.number > nextafter(INT64_MAX, 0)) return INT64_MAX; if (j.u.number < nextafter(INT64_MIN, 0)) return INT64_MIN;

Basically, we get the next double in the direction of zero from whatever we get after converting INT64_MAX to double.
If we're farther from 0 than that value, then we had to be at least INT64_MAX.
The same logic holds for INT64_MIN.

I like this! Thanks!

wtlangford · 2017-03-02T15:35:00Z

src/jv.c

+  }
+  if (j.u.number < 0)
+    return 0;
+  if (j.u.number > (double)UINT64_MAX)


Again, perhaps something like this:

if (j.u.number > nextafter(UINT64_MAX, 0)) return UINT64_MAX;

wtlangford · 2017-03-02T15:51:34Z

src/jv.c

  double x = jv_number_value(j);
+  /* XXX Check against actual double min/max integers */


I seem to repeat myself. 😜
Assuming that x != x is for NaN detection...

if (isnan(j.u.number) || j.u.number > nextafter(INT_MAX, 0) || j.u.number < nextafter(INT_MIN, 0)) return 0;

Fair enough.

wtlangford · 2017-03-02T16:28:32Z

src/jv.h

 struct jv_refcnt;

 /* All of the fields of this struct are private.
   Really. Do not play with them. */
 typedef struct {
  unsigned char kind_flags;
-  unsigned char pad_;
+  unsigned char subkind_flags;


And now our padding is gone. 😢

Well, yes, we lose the padding. I suppose I could encode sub-kind into kind_flags. We could always do that later if ever we want to use the padding for something else.

dequis · 2017-03-02T19:53:27Z

We appear to continue to do double math with this change

In my version of this, that was explicitly intended behavior. The idea was to fix the common case of using jq as a pretty-printer and/or filter, so as long as the numbers aren't touched they keep their old values.

wtlangford · 2017-03-02T19:57:40Z

Ah, then I have no complaints about the double-precision math. Additionally, maintaining existing behavior is easier than changing it and then deciding to change it back later; so we can always revisit in the future.

nicowilliams · 2017-03-02T20:31:08Z

Eventually we might add branches to basic arithmetic operators to avoid double-precision math when possible. But I'm a bit loathe to do that right now. The goal here really is to allow jq . to pass integers through unmodified. Of course, they'll still get parsed and formatted, but there won't be any scientific notation on output, which appears to be the most commonly desired thing here.

We could also fix the formatter to never use scientific notation for doubles with no non-zero decimal part. But we do also get calls for larger integer ranges than -2^52..2^52, which I think is fair enough.

chrishmorris · 2024-07-17T08:52:40Z

Thank you all for JQ.

It would be great if you could address this issue.

nicowilliams force-pushed the ints branch from 8787e2e to b28c886 Compare January 29, 2017 01:14

nicowilliams mentioned this pull request Jan 29, 2017

jv: Add some support for 64 bit ints in a very conservative way #1246

Open

nicowilliams force-pushed the ints branch 2 times, most recently from cdf23ea to cd3569d Compare February 3, 2017 00:33

nicowilliams added 3 commits February 4, 2017 00:24

jv int64 and uint64 support

323c632

fixup parser bugs

d066c61

Add tointeger and isinteger

eb77bae

nicowilliams force-pushed the ints branch from cd3569d to eb77bae Compare February 4, 2017 06:24

nicowilliams requested a review from wtlangford February 26, 2017 01:30

nicowilliams mentioned this pull request Feb 27, 2017

Max long is rounded up #1357

Closed

wtlangford requested changes Mar 2, 2017

View reviewed changes

nicowilliams mentioned this pull request Mar 10, 2017

Control formatting of numbers and escape characters #1363

Open

wtlangford mentioned this pull request Apr 11, 2017

jq unable to deal 64 bit number #1387

Closed

wtlangford mentioned this pull request Oct 19, 2018

raw content was changed by jq '.' #1741

Closed

liquidaty mentioned this pull request Mar 7, 2023

organizing to release JQ 1.6.2 and JQ 1.7: Your suggestions/comments would be appreciated #2550

Closed

itchyny added the ieee754 label Jun 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

jv: Add some support for 64 bit ints in a very conservative way (ALTERNATIVE) #1327

jv: Add some support for 64 bit ints in a very conservative way (ALTERNATIVE) #1327

nicowilliams commented Jan 29, 2017 •

edited

Loading

dequis commented Jan 29, 2017

nicowilliams commented Jan 29, 2017

nicowilliams commented Jan 29, 2017

pkoppstein commented Jan 29, 2017

nicowilliams commented Jan 29, 2017

nicowilliams commented Feb 3, 2017

nicowilliams commented Feb 26, 2017 •

edited

Loading

nicowilliams commented Feb 28, 2017 •

edited

Loading

wtlangford left a comment

wtlangford Mar 2, 2017

nicowilliams Mar 2, 2017

wtlangford Mar 2, 2017

nicowilliams Mar 2, 2017

wtlangford Mar 2, 2017

wtlangford Mar 2, 2017

nicowilliams Mar 2, 2017

wtlangford Mar 2, 2017

nicowilliams Mar 2, 2017

wtlangford Mar 2, 2017

nicowilliams Mar 2, 2017

wtlangford Mar 2, 2017

nicowilliams Mar 2, 2017

wtlangford Mar 2, 2017

nicowilliams Mar 2, 2017

wtlangford Mar 2, 2017

nicowilliams Mar 2, 2017

wtlangford Mar 2, 2017

nicowilliams Mar 2, 2017

dequis commented Mar 2, 2017

wtlangford commented Mar 2, 2017

nicowilliams commented Mar 2, 2017

chrishmorris commented Jul 17, 2024

		double x = jv_number_value(j);
		/* XXX Check against actual double min/max integers */

jv: Add some support for 64 bit ints in a very conservative way (ALTERNATIVE) #1327

Are you sure you want to change the base?

jv: Add some support for 64 bit ints in a very conservative way (ALTERNATIVE) #1327

Conversation

nicowilliams commented Jan 29, 2017 • edited Loading

dequis commented Jan 29, 2017

nicowilliams commented Jan 29, 2017

nicowilliams commented Jan 29, 2017

pkoppstein commented Jan 29, 2017

nicowilliams commented Jan 29, 2017

nicowilliams commented Feb 3, 2017

nicowilliams commented Feb 26, 2017 • edited Loading

nicowilliams commented Feb 28, 2017 • edited Loading

wtlangford left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dequis commented Mar 2, 2017

wtlangford commented Mar 2, 2017

nicowilliams commented Mar 2, 2017

chrishmorris commented Jul 17, 2024

nicowilliams commented Jan 29, 2017 •

edited

Loading

nicowilliams commented Feb 26, 2017 •

edited

Loading

nicowilliams commented Feb 28, 2017 •

edited

Loading