Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

jv: Add some support for 64 bit ints in a very conservative way (ALTERNATIVE) #1327

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

nicowilliams
Copy link
Contributor

@nicowilliams nicowilliams commented Jan 29, 2017

This is a variant of #1246 that doesn't change the size of a jv.

@dequis What do you think?

@dequis
Copy link

dequis commented Jan 29, 2017

I'm not a fan of the JQ_OMIT_INTS ifdefs, they smell like premature optimizaition, probably worth benchmarking to see if they are justified.

You should probably change the title of this PR because it's not very conservative, heh. In particular, messing with the union was something i explicitly wanted to avoid - my implementation only affected display and didn't have to deal with other parts of the code. But everything seems to go through jv_number_value() and jv_number() so if you think that's fine I think that's fine.

Having signed 64 bit ints too is neat.

Only skimmed, haven't reviewed much. Also haven't tested.

@nicowilliams
Copy link
Contributor Author

Arithmetic (and math) continues to go through jv_number() and jv_number_value(). Only parsing and printing check the integer members of the union, and the utility jv functions for C applications that want them. Not changing the size of jv, and not having multiple internal representations of any given number at once, seems conservative to me :) and anyways, I kept the title of your PR.

@nicowilliams
Copy link
Contributor Author

As to the JQ_OMIT_INTS... I am copying the SQLite3 pattern. It's fairly cheap, but the problem is that it probably won't get much testing. It is not a premature optimization so much as... me not having measured the impact of this new feature.

@pkoppstein
Copy link
Contributor

@nicowilliams - No doubt I'm missing something, but using @dequis's test case, it seems this change entails a step backwards:

github/nicowilliams/jq$ ./jq --version
jq-1.2-889-gb28c886

github/nicowilliams/jq$ git log -1
commit b28c886edbb45c2128f7f5eb79da169190473cff
Author: Nicolas Williams <nico@cryptonector.com>
Date:   Sat Jan 28 18:56:55 2017 -0600

    jv int64 and uint64 support

github/nicowilliams/jq$ echo 111111111111111111 | ./jq -c '[., 1*.]'
echo 111111111111111111 | ./jq -c '[., 1*.]'
[111111111111111104,111111111111111100]

By contrast:

$ echo 111111111111111111 | jq1.5 -c '[., 1*.]'
[111111111111111100,111111111111111100]

@nicowilliams
Copy link
Contributor Author

@pkoppstein Yeah, and that's not the only thing that's wrong. Try echo 2 63^p|dc|jq .. We need more test cases, that's for sure.

@nicowilliams
Copy link
Contributor Author

Tests still needed.

@nicowilliams
Copy link
Contributor Author

nicowilliams commented Feb 26, 2017

@wtlangford Can you review this? I think it's done.

Also, I'm thinking of taking this approach a bit further. I'm thinking of adding jv sub-types for:

@nicowilliams
Copy link
Contributor Author

nicowilliams commented Feb 28, 2017

Ah, so, jv_identical() needs to be updated too.

EDIT: No, it's done already.

Copy link
Contributor

@wtlangford wtlangford left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing here seems too concerning, other than the overflow checking.

We appear to continue to do double math with this change. Do we want to consider being subkind-aware for adding and subtracting (and maybe multiplication?). Division should, of course, use double behavior, since we don't want integer division semantics in jq.

I'm envisioning this problem (which is a floating-point precision error):
9223372036854774784 + 1 #9223372036854774784
Of course, jq still prints that as 9223372036854775000 because of a bug somewhere in jv_dtoa...

Is inconsistent integer semantics worse than surprising double precision errors?

Also, we ought to use in our constant folding.
Also, tests. Lots of them.

return jv_int64(i);
} else if (d < UINT64_MAX) {
uint64_t u = d;
if (d == (double)u)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also this one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto. Here we know that d is positive, and if it's also less than UINT64_MAX then we can represent it as a uint64_t.

double d = jv_number_value(input);
if (d < 0 && d >= INT64_MIN) {
int64_t i = d;
if (i < 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's happening here? Is this a guard against the cast turning -0.5 into 0 instead of -1?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it's just for deciding whether to return a signed 64-bit integer representation or unsigned 64-bit integer representation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But if the check is false, the next code that runs is return jv_number(nearbyint(d));, which is a double representation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These checks should use the nextafter stuff from below for INT64_MIN and UINT64_MAX

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eh? No, there's an else if at 377.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps I misunderstand.
Line 373 is checking to see if d is negative AND d fits into an int64_t
Line 374 does the cast
What does line 375 check?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yes, i'm the one who's confused here. I need to s/else // here :)

#ifndef JQ_OMIT_INTS
jv j = {JV_KIND_NUMBER, JV_SUBKIND_INT64, 0, 0, {.int64 = x}};
#else
jv j = {JV_KIND_NUMBER, JV_SUBKIND_INT64, 0, 0, {.number = x}};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This uses .number, but has a JV_SUBKIND_INT64?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops! (It would have no effect, given that other code that would check it would be omitted. But it's still a bug.)

return (int64_t)j.u.uint64;
return INT64_MAX;
}
if (j.u.number > 0 && (int64_t)j.u.number < 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm concerned this might be processor-specific behavior and prone to breaking.
How about this instead:

if (j.u.number > nextafter(INT64_MAX, 0))
  return INT64_MAX;
if (j.u.number < nextafter(INT64_MIN, 0))
  return INT64_MIN;

Basically, we get the next double in the direction of zero from whatever we get after converting INT64_MAX to double.
If we're farther from 0 than that value, then we had to be at least INT64_MAX.
The same logic holds for INT64_MIN.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this! Thanks!

}
if (j.u.number < 0)
return 0;
if (j.u.number > (double)UINT64_MAX)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, perhaps something like this:

if (j.u.number > nextafter(UINT64_MAX, 0))
  return UINT64_MAX;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes!

double x = jv_number_value(j);
/* XXX Check against actual double min/max integers */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I seem to repeat myself. 😜
Assuming that x != x is for NaN detection...

if (isnan(j.u.number) || j.u.number > nextafter(INT_MAX, 0) || j.u.number < nextafter(INT_MIN, 0))
  return 0;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough.

struct jv_refcnt;

/* All of the fields of this struct are private.
Really. Do not play with them. */
typedef struct {
unsigned char kind_flags;
unsigned char pad_;
unsigned char subkind_flags;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And now our padding is gone. 😢

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, yes, we lose the padding. I suppose I could encode sub-kind into kind_flags. We could always do that later if ever we want to use the padding for something else.

@dequis
Copy link

dequis commented Mar 2, 2017

We appear to continue to do double math with this change

In my version of this, that was explicitly intended behavior. The idea was to fix the common case of using jq as a pretty-printer and/or filter, so as long as the numbers aren't touched they keep their old values.

@wtlangford
Copy link
Contributor

Ah, then I have no complaints about the double-precision math. Additionally, maintaining existing behavior is easier than changing it and then deciding to change it back later; so we can always revisit in the future.

@nicowilliams
Copy link
Contributor Author

Eventually we might add branches to basic arithmetic operators to avoid double-precision math when possible. But I'm a bit loathe to do that right now. The goal here really is to allow jq . to pass integers through unmodified. Of course, they'll still get parsed and formatted, but there won't be any scientific notation on output, which appears to be the most commonly desired thing here.

We could also fix the formatter to never use scientific notation for doubles with no non-zero decimal part. But we do also get calls for larger integer ranges than -2^52..2^52, which I think is fair enough.

@chrishmorris
Copy link

Thank you all for JQ.

It would be great if you could address this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants