Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alternative operator // applied on a stream #2189

Closed
luciole75w opened this issue Oct 6, 2020 · 18 comments
Closed

Alternative operator // applied on a stream #2189

luciole75w opened this issue Oct 6, 2020 · 18 comments

Comments

@luciole75w
Copy link

Hi,

I'm getting a result which doesn't seem logical to me, considering the documented behavior of the alternative operator //.

Excerpt from the manual:

A filter of the form a // b produces the same results as a, if a produces results other than false and null.

Here are the test commands.

$ jq -n 'false, 1, null, 2 | . // 0'
0
1
0
2

$ jq -n '(false, 1, null, 2) // 0'
1
2

The first result is ok. For the second command, I would expect either the same result as the first one, or the complete stream false, 1, null, 2 as is (if the jq grammar defined the operator // to apply to a stream as a whole). Instead of one of these two options, here the output is an altered form of the left-hand side. Tested with jq 1.5, 1.6 and master (on Linux Mint 19.3).

If it's not a bug then wouldn't it be worth mentionning it in the manual?

@wtlangford
Copy link
Contributor

My initial reaction is that this is a bug, but the behavior that's actually happening here is that you're getting all values from that stream that pass the alternation operator. If none of the values in that stream pass the alternation operator, then you get the alternative value:

$ jq -n '(false, false, false) // 0'
0

I'm not sure if it's a bug now- I would need to dig into the parser code to see why it generates the bytecode it does for that program. @nicowilliams do you have thoughts on if this is a bug or a "feature" that should be documented?

@luciole75w
Copy link
Author

@wtlangford Thanks for your feedback. I agree that the actual behavior definitely makes sense, simply not the most intuitive one to me. That said, even if it's not the behavior intended initially, changing it now may cause too many regressions for users, so acknowledging this as a "feature" in the manual sounds indeed a sensible option :)

@pkoppstein
Copy link
Contributor

pkoppstein commented Oct 7, 2020

@luciole75w - Please note that the jq FAQ summarizes the behavior of A // B as follows:

"A // B" either produces the truthy elements of A if there are any, or else the entire stream B.

Thus the behavior you describe is both longstanding and documented, though as you observe, the documentation in the manual is at best confusing. (Perhaps it was written on the assumption that a is an expression that produces at most one value.)

The conflation of null and false here is the source of endless trouble, but otherwise, the specification is defensible.

@wtlangford
Copy link
Contributor

Ah, good catch- I always forget to check the FAQ. I think this is a documentation bug, then. We should update the manual to be more clear about how // works.

The most surprising part to me is the implication of the binding precedence of //. Since it binds tighter than |, astream | . // "b" is the same as astream | (. // "b"), which should obviously give the observed behavior, but is maybe not what you'd expect with | . (and potentially surprising for something more complex than .). It also proves the exception to our common advice of "anywhere you have a | ., you can likely just remove it".

tl;dr- This is a documentation bug, I think. Let's fix this by improving the explanation in the manual.

@luciole75w
Copy link
Author

@pkoppstein Good point indeed, thanks. I confess to have never checked the wiki so far, my bad :/

So if you plan to update the manual, ideally with an additional example to illustrate, then it's fine. Cheers.

@itchyny itchyny added the docs label Jun 3, 2023
pkoppstein added a commit to pkoppstein/jq that referenced this issue Jul 29, 2023
pkoppstein added a commit to pkoppstein/jq that referenced this issue Jul 30, 2023
See jqlang#2189

Tweak explanation and add examples.
@pkoppstein
Copy link
Contributor

pkoppstein commented Jul 30, 2023

@wtlangford wrote:

This is a documentation bug, I think. Let's fix this by improving the explanation in the manual.

Currently, manual.yml explains // as follows:

          A filter of the form `a // b` produces the same
          results as `a`, if `a` produces results other than `false`
          and `null`. Otherwise, `a // b` produces the same results as `b`.

I propose to change this explanatory paragraph so that it reads:

          A filter of the form `a // b` produces the same
          results as `a | select(.)`, if `a` produces results other than `false`
          and `null`. Otherwise, `a // b` produces the same results as `b`.

And here is one of the new examples I propose be added to the manual, in the
form generated for man.test:

[(false, 1, null, 2) // 42]
null
[1,2]

How would that be?

cc: @itchyny

@nicowilliams
Copy link
Contributor

nicowilliams commented Jul 30, 2023

It's been like this forever. I'm not sure what @stedolan's intent was. Essentially // is like a function we could define like this:

# The bytecode for // is generated by `gen_definedor()` in `src/compile.c` in a close analog of this:
def definedor(a; b): foreach a as $a ([false]; if $a then [true,$a] else [false] end; if .[0
] then .[1] else empty end);  dedfinedor(false, false, 1; 0)

which... is a bit strange. I'm not sure what the intent was, and I'm not sure how important this is for backwards compatibility.

Because I don't know what @stedolan's intent was I'm not yet inclined to update docs, nor, for that matter, the implementation, at least not for 1.7. It's been thus for a decade, it can stay thus till 1.8. I.e., this one isn't urgent. I would like us to have a change freeze until 1.7 is done. We're very close.

(@pkoppstein, just a few days ago you were begging us to release now, and now you're insisting we fix this that and the other issues -- they have to be severe for us to fix them when we're this close. Please keep this in mind.)

@nicowilliams
Copy link
Contributor

While this:

          A filter of the form `a // b` produces the same
          results as `a | select(.)`, if `a` produces results other than `false`
          and `null`. Otherwise, `a // b` produces the same results as `b`.

seems accurate, it is not a very satisfying description [perhaps only not yet], perhaps because // is a so surprising.

When I say this needs research, I say so because: a) I would like to hear from @stedolan, b) I'm thinking it might be useful to have // compile into a call to a definedor builtin that can then be overridden by users (and so that eventually we could have different behaviors using versioned builtin modules).

And again, this goes back to at least November 25, 2012 if not September 4, 2012, thus I feel no sense of urgency here -- not as to the 1.7 release. I'll consider this over the next few days as we get closer to a 1.7 release though -- I need a few days to think about this one.

@nicowilliams
Copy link
Contributor

nicowilliams commented Jul 30, 2023

Essentially // is like a function we could define like this: ...

gen_definedor() is this:

 890 block gen_definedor(block a, block b) {
 891   // var found := false
 892   block found_var = gen_op_var_fresh(STOREV, "found");
 893   block init = BLOCK(gen_op_simple(DUP), gen_const(jv_false()), found_var);
 894
 895   // if found, backtrack. Otherwise execute b
 896   block backtrack = gen_op_simple(BACKTRACK);
 897   block tail = BLOCK(gen_op_simple(DUP),
 898                      gen_op_bound(LOADV, found_var),
 899                      gen_op_target(JUMP_F, backtrack),
 900                      backtrack,
 901                      gen_op_simple(POP),
 902                      b);
 903
 904   // try again
 905   block if_notfound = gen_op_simple(BACKTRACK);
 906
 907   // found := true, produce result
 908   block if_found = BLOCK(gen_op_simple(DUP),
 909                          gen_const(jv_true()),
 910                          gen_op_bound(STOREV, found_var),
 911                          gen_op_target(JUMP, tail));
 912
 913   return BLOCK(init,
 914                gen_op_target(FORK, if_notfound),
 915                a,
 916                gen_op_target(JUMP_F, if_found),
 917                if_found,
 918                if_notfound,
 919                tail);
 920 }
  • found_var is a [hidden, internal] variable used to reduce a basically init initializes that to false and if_found sets it to true if the a program ever produces a truthy value
  • if_notfound backtracks -which is just like executing empty- for values from a that are falsish
  • lastly, if found_var is still false at the end, then b is executed

The net effect is that:

  • falsish values from a are not output by //
  • truthy values from a are output by //
  • b is only evaluated if a never produced a truthy value

thus my earlier description of // using foreach. Before foreach this could not be jq-coded. Now that it can be we could make // become a call to a builtin definedor/2 function that users can override, but... what with? Well, here's a version that might be less surprising:

def definedor($a; b): if $a then $a else b end;

Except now if b is a generator then for each falsish value of a we'll get all values produced by b.

I think I'm starting to understand what @stedolan wanted here: to avoid multiple evaluation of b!

@pkoppstein
Copy link
Contributor

@nicowilliams - please bear in mind that, in this thread at least, there was agreement nearly three years ago that the documentation should be fixed, and that in the interim, both gojq and jaq have adopted the same semantics (based on a|select(.)). Also, there’s a lot of code out there that essentially depends on $falsey//$x emitting $x, so I would think it would be extremely difficult to come up with an alternative semantics that affords compatibility with the de facto behavior for the basic usage.

Incidentally, I would be fine with some weasel words to indicate the paragraph was descriptive and not to be interpreted as prescriptive.

As for my hopes, please note that I had suggested 1.6.1 precisely to avoid anyone feeling undue pressure. Since you had time for binary strings (great stuff, by the way!), I thought it would be worthwhile if the rest of us tried to attend to the important long-standing documentation issues.

@nicowilliams
Copy link
Contributor

nicowilliams commented Jul 30, 2023

please bear in mind that, in this thread at least, there was agreement nearly three years ago that the documentation should be fixed

@wtlangford asked me for my input and I was AWOL. Now I'm not. It's a stretch to say that there was agreement in 2020 when William and I were the only maintainers.

@nicowilliams
Copy link
Contributor

As for my hopes, please note that I had suggested 1.6.1 precisely to avoid anyone feeling undue pressure. Since you had time for binary strings (great stuff, by the way!), I thought it would be worthwhile if the rest of us tried to attend to the important long-standing documentation issues.

We're not going to do micro updates. We do not have the energy for that unless it becomes absolutely necessary (e.g., for a security vulnerability). I suspect that will continue to be the case for a long time. Remember, we're volunteers, and none of us have jq maintainership as a full-time job.

@nicowilliams
Copy link
Contributor

Incidentally, I would be fine with some weasel words to indicate the paragraph was descriptive and not to be interpreted as prescriptive.

As I said, I need(ed) time to digest this issue. I do believe that I now understand what @stedolan intended to do, and now I also understand why it's surprising. But are you surprised that I've never run into this? Had you ever run into this? I suspect I never ran into this because a) I mostly never use // because I really with // would run b only when a is empty, b) I definitely never use (<generator>) // <alternative>.

Now consider the try/catch bug: lots of users opened duplicate issues for it. But this issue doesn't have any duplicates, does it? I think it's hard to be surprised by this, which is another reason I am being a bit conservative here.

Though first and foremost I needed to understand the background, which meant divining @stedolan's intent, and now that I think I understand that, maybe I will be more amenable to a docs update.

@nicowilliams
Copy link
Contributor

@pkoppstein since you went looking, what do the alternative implementations of jq say about // in their docs? Do they leave all language docs to jq?

@nicowilliams
Copy link
Contributor

Incidentally, I would be fine with some weasel words to indicate the paragraph was descriptive and not to be interpreted as prescriptive.

I would also be fine with not saying anything about this in 1.7 after more than ten years of not saying anything about this. I really don't see why you think this is so pressing when we clearly have interesting work lined up for a 1.8 to follow soon after a 1.7.

@pkoppstein
Copy link
Contributor

pkoppstein commented Jul 30, 2023

@nicowilliams wrote:

what do the alternative implementations of jq say about // in their docs? Do they leave all language docs to jq?

I mainly pay attention to gojq and jaq.

jaq's documentation is notable for the paper "Denotational Semantics and a Fast Interpreter for jq", which, however, does not bother with // at all.

Other than this paper, both gojq and jaq generally take the combination of jq's behavior and documentation as the basis for their own documentation, which is basically written as a "delta" against both.

However, the documentation for gojq and jaq is silent on //, perhaps because the manual.yml entry for // is so patently wrong that the authors took as the baseline jq's actual behavior and/or the section on // in the jq FAQ:
https://github.com/jqlang/jq/wiki/FAQ#or-versus-

(A question for @itchyny - Did this jq FAQ entry played a role at all in your decision not to document gojq's // in the gojq README?)

@pkoppstein
Copy link
Contributor

@nicowilliams wrote:

Had you ever run into this?

I wrote the "or" versus "//" section in the jq FAQ in January, 2017, but I'm afraid I don't recall the exact circumstances, other than that the manual.yml entry at the time was so glaringly wrong (on the assumption that it was not only intended for handling empty // b). Ever since writing the entry, I've simply pointed people to it whenever the issue came up.

@nicowilliams
Copy link
Contributor

Worry not. I'll submit a PR tomorrow that clarifies //.

nicowilliams added a commit to nicowilliams/jq that referenced this issue Jul 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants