-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Save literal value of the parsed number to preserve it for the output #1743
Conversation
Any math operations on the numbers will truncate them to decimal precision.
I've just discovered that mantests on my machine PASS because they get aborted
Not yet sure how to fix this but will try |
@wtlangford @nicowilliams what's the reason behind saving on a I'm thinking maybe we should utilise the |
I don't think there was a strong reason, just more that since we knew it wasn't necessary, it didn't always get done.
The call to |
Why do |
Two reasons, in no particular order:
jv s = jv_string("some string");
char *p = jv_string_value(s);
// something with p If jv_string_value were to free the underlying storage here, we'd be sad, and forced to do this: jv s = jv_string("some string");
char *p = jv_string_value(jv_copy(s));
// something with p
jv_free(s); // once we're done and that's not better. |
Well, the In any case, I suspected that there would be some reason, in that case let's maybe introduce some naming convention to distinguish between consuming and non-consuming methods? Because So, just maybe something like |
@pkoppstein I have moved to this PR cause the topic becomes verbose, and also it kind of makes more sense to discuss this particular aspect here than in the issue #1741 Here I would like to refer to that strange behaviour you've pointed me to around how my current branch deals with I have found the reason for the behaviour, and it's not trivial. This is happening due to I am thinking that the solution would be to actually call the builtin implementations of the mentioned functions instead of making the calculations directly. UPDATE I have fixed the issue by using the jv_equal result for EQ, NEQ, LESSEQ and GREATEREQ operations, see the commit. |
…ization This is just a tip of an iceberg when talking about overloaded number arithmetic. For now that fixes the bug, however should we introduce more sophisticated overload of semantic number treatment, other operations will have to be changed.
@pkoppstein I believe I've gotten to some more or less final form of the PR. TODOs
|
Using jq-1.6rc1-25-g56f6124-dirty I was surprised to see:
I trust that is a bug :-) |
@leonid-s-usov - Here's a test case which highlights a discrepancy between the current implementation of jq (jq-1.6rc1-25-g56f6124-dirty) and the principles I'm advocating:
By the principle:
the result should of course be |
oh yeah, Microsoft is now killing GitHub. Skype is already dead. Anyhow, I seem to have your posts in my inbox, I can send those to you.
Surprised? That is where it has started! That's the essence of #1652 and #1741.
The equality above is currently kept compatible (i.e. faulty). Fixing it while maintaining full backward compatibility is the tricky part touching your I would like to believe we have a chance for iterating in order to get there by having a first step which fixes an apparent bug of changed number representations with no reason. This step also provides a work around for cases when literal comparison is required:
The above could read
|
…iteral` filter vs `tostring` and `tojson`
@pkoppstein @wtlangford When calling a function which doesn't consume the jv object, it can be thought as a semantic equivalent to a properly consuming variant of it called with an explicit so, the non-conforming
I would like to suggest that all functions starting with So, the only fishy case will be the There is another guideline for the actual placement of the Following this suggestion, the refactoring will change the following method names (non-exhaustive example):
|
Yep. A note: I'll be very interested to benchmark the compilation stage prior to merging the PR. Compilation is known to be very slow right now (
And now you see why we'd kicked this can so far for so long. 😉
I just run a small centos vm for this purpose. I do my dev locally, and run the tests in the vm when I'm ready.
I like this for now. I want to keep this PR smaller in scope/implication, and may advocate for removing
I'd rather we didn't do this here (and possibly at all, though that's up for discussion) for a number of reasons.
|
1475482
to
5336369
Compare
@pkoppstein does not speak for the project or the maintainers. @pkoppstein only speaks for @pkoppstein. Today jq uses IEEE754. That has impact on semantics that @pkoppstein and others do not like. @stedolan, myself, and @wtlangford have never much minded. This is a limitation that we accept. There are only two things we might do about this:
and/or
Keep in mind: jq may not have any dependencies on GPL'ed libraries. |
I think too that any attempt at preserving untouched numbers will have negative performance effect in several ways:
We do need to start caring more about performance. We've stopped accepting new builtins because we need to fix linking performance first, and @wtlangford and I have run out of energy for that for a while. |
Clear. In any case the current implementation has the IEEE double in place and simply keeps the original string around. It should be noted that the string is only associated with a single instance of the parsed number literal, and gets passed around until the number is consumed by some filter or operator, or until it's dumped. Memory wise... I can totally see how storing the original number string for every parsed number can increase the memory needed, but that is for the ability to return it back unchanged. And, is it such a high price? I mean, loading a million chars into memory is not even a consideration in today's computing, so how big should that potential number array be? And wouldn't it be impractical to work with this data in memory for other reasons but the additional few megabytes for original number literal data? |
Microsoft is taking over the GitHub, let’s see if email responses work
this PR provides a way to work around the limitations you have provided as examples. This PR was actually started to get over those ;)
So, the `.` now will not change the literal.
The `==` will still fail the same way (for backward compatibility), but there is a way to overcome it with a new comparison
(123456789123456789123456789 | toliteral) == “123456789123456789123456789”
(123456789123456789123456789 | tostring) == “123456789123456789123456789”
(123456789123456789123456789 | tojson) == “123456789123456789123456789”
all will return `true`
… On 22 Oct 2018, at 5:47, pkoppstein ***@***.***> wrote:
@leonid-s-usov <https://github.com/leonid-s-usov> - Here's a test case which highlights a discrepancy between the current implementation of jq (jq-1.6rc1-25-g56f6124-dirty) and the principles I'm advocating:
./jq -n '123456789123456789123456789 == 123456789123456790000000000'
true
By the principle:
If m and n are JSON numeric literals, then `m==n` iff Sem(m) == Sem(n)
the result should of course be false.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub <#1743 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AC_854kUTtHccAVpGaXp6GS250Au_jluks5unTHFgaJpZM4XxSVD>.
|
Oh, public API? _facepalm_
That’s bad. But listen, if anyone is using this public API then how come _they_ know whether a given method consumes jv or not, without having access to the source code?
Well on the other hand I’d rather agree that the refactor is out of scope of this PR
Re. the compilation - i’ve removed that code. following conversations with @pkoppstein it was decided that we can’t afford that kind of change so equality is working as it used to.
Anyhow, once we introduce more sophisticated comparison mechanisms we’d have to get back there and plug in all the relevant jv_ methods instead of the adhoc arithmetic, not just equality check.
… On 22 Oct 2018, at 16:41, William Langford ***@***.***> wrote:
This is happening due to constant_fold optimisation at the parser stage. The issue is that that optimisation doesn't call any of the built-ins, which would take care of calling jv_equal under the hood; instead, these work on the doubles directly, which bypasses any logic I have implemented.
Yep. A note: I'll be very interested to benchmark the compilation stage prior to merging the PR. Compilation is known to be very slow right now (O(m×n)), so I'll be interested to know if using jv_equal has a measurable performance cost.
I have fixed the issue by using the jv_equal result for EQ, NEQ, LESSEQ and GREATEREQ operations, see the commit.
That actually fixed another number leak since the block_const results weren't ever free'd. Kind of opens another pool 1k lines to look for unfreed numbers...
And now you see why we'd kicked this can so far for so long. 😉
clean up the leaks due to JV_KIND_NUMBER not being jv_freed all over the place. I may need some help here. Is there a way to run valgrind on macOs? maybe via docker?
I just run a small centos vm for this purpose. I do my dev locally, and run the tests in the vm when I'm ready.
The equality above is currently kept compatible (i.e. faulty). Fixing it while maintaining full backward compatibility is the tricky part touching your tocanonical approach.
I like this for now. I want to keep this PR smaller in scope/implication, and may advocate for removing toliteral/tocanonical per my stance in #1741 <#1741> that this is an underlying implementation change.
I have a suggestion regarding the naming convention of the jv_* functions concerning the memory management.
I'd rather we didn't do this here (and possibly at all, though that's up for discussion) for a number of reasons.
This PR is starting to grow in scope.
We expose a lot (most? all? I'd have to check) of those jv_* functions as part of our public C API. Changing their names is a backwards-incompatible change, and I'm a bit hesitant to do so. (The public C API is contained in the following two headers: src/jq.h and src/jv.h)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub <#1743 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AC_855N6leriJzV6SF-LmC89FxeZNwGoks5uncr8gaJpZM4XxSVD>.
|
I didn't say it was great as-is, either. 😉 Just that breaking that API is a concern and should be handled with care.
I'm not sure what you mean here? |
OK. Re. API, I am preparing a separate PR with those changes. You will have to consider the pro's and con's, assess the number of API users, and potentially schedule this change for the next breaking version.
This was a response to your comment about "benchmark[ing] the compilation stage prior to merging the PR". I just said that in this PR I have reverted the code change around that equality operator, since it was having nasty side effects breaking backward compatibility. Thanks! |
Guys, I need some help please. I am trying to hunt down a leak and I suspect that it's happening somewhere deep in the manually generated bytecode for things like I have spotted this particular test case which generates a number leak (i.e. number isn't getting deallocated) This command argues that a number is leaked While this one does not leak And this one also does not leak What I was able to deduce out of the above is that most probably a number is leaked due to a premature exit from a foreach loop by the means of a
Now as much as I tried to get into the bytecode details I can't spot the problematic place. More than that, I believe it could be that the issue is not with the Of course, I have scanned the code multiple times for some obvious places where a So actually my guess now is that it's an unconditional leak which revealed itself only now because the resource leaked is always a number. Which, in turn, means that it could be the input parameters for things like Any ideas? |
@execute.c:1161 int jq_compile_args(jq_state *jq, const char* str, jv args) { jv_nomem_handler(jq->nomem_handler, jq->nomem_handler_data); The call to jv_nomem_handler was using uninitialised fields of the jq structure
This allows usage of scripting to find offending tests, something like this (pseudocode): skip = 0 while (true) do valgrind ./jq --run-tests my.tests --take 1 --skip $skip check_exit_code_and_break_conditionally($?, $skip) skip=($skip+1) done The command will return status 2 if skip parameter was greater than the number of tests in the file. This allows for additional breaking condition
@wtlangford @nicowilliams @pkoppstein As you can see I have successfully removed all apparent leaks. Cleaned up a bit further than just the new number functionality ;) Actually in order to do so I had to make two additional things
Now the ball is on your side, and I can get back to sleeping, working, eating and all those other things people do. Cheers ;) PS. I think you should seriously consider that API naming convention I have suggested above. I believe that most of the issues in the code I had to fix (apart from those where numbers weren't treated as the other kids, cause they're different) were partly due to the existing ambiguity of the method naming |
As a side note, I think that we could have a MUCH less time consuming memory checker by simply keeping track of jv objects which have a non-zero refcount by the time the program finishes. Could be tricky to get the stack trace for those, of course, but on the other hand it could be better integrated into every test report and save tons of time for finding the bad place. Also, as you probably know, stack trace from valgrind is often not helpful at all: due to the long life of our objects most of the leaks are originating at the Additionally, such internal jv-aware memory tracker could do what valgrind cant: dump the contents of the leaked object. And that, gentlemen, is sometimes a jackpot. Last but not the least, we could enhance the internal memory tracker with the origin of the allocation by utilising macro overrides of the jv object initialisers. |
Does anyone know why the Apple builds fail on travis? |
That ruby requirement has been a painful and unreliable part of the Travis
builds. I'm not too sure what exactly causes it to break suddenly like
this. I'll look into it. (Relatedly, ripping out this ruby dependency is on
my repo/maintenance quality-of-life short list).
…On Wed, Oct 24, 2018, 02:31 Leonid S. Usov ***@***.***> wrote:
Does anyone know why the Apple builds fail on travis?
I mean, it's obviously due to failed brew installation of ruby, but why
would it start failing suddenly?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1743 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ADQ4V9xTTUhCuCbrvuH7r-qe8eoE6dMTks5uoAktgaJpZM4XxSVD>
.
|
@leonid-s-usov What's left of this PR after #1752 was merged? i see |
@wader all of this was superseded by the decimals PR. The appropriate improvements and fixes have been applied there, and |
@leonid-s-usov Great! 👍 |
This is a POC implementation of a concept which may resolve numerous cases of truncated integer identifiers due to the double limitation reported by #1652
The logic is pretty simple: for every parsed number we store the actual read literal which was used to generate the double representation. Then, we use this literal for comparisons and printing.
Any math operation on the literal number will use the truncated double precision representation and generate a new
jv_number
with no literal associated.The implementation is trying to make minimal impact on the performance by falling back to the original number representation whenever literal information is not available.
This implementation has a side effect of non equivalence of
10
and10.0
, which could be taken as a neat feature, or as a bug. Anyhow, it will definitely be considered a non backward compatible change :/. This is clearly seen in the need to update a test on line 539.However, this particular aspect can be adjusted, as the equivalence operation may be considered "math" in terms that the double representation of the number will be taken.
This will make it backward compatible, but I believe that this literal equivalence must be preserved. If we take the use case of some external database ID then it would be great if jq could also match those ids, not only preserve in the output unchanged.
A fun fact about this branch is that the tests to verify the new functionality depend on the functionality itself. As it turned out, the test suite is utilising the same parsing routines to compare the actual output to the expected output, so the truncation test cases actually succeed on the master branch where the testing code is unable to see the difference between those large numbers.