Make last(empty) empty #1869

muhmuhten · 2019-03-22T05:03:59Z

This is a kind of unsatisfying patchset, in which I set out to make last/1 faster, then discovered that it was buggy all along and fixing it made it even slower than it was in the first place.

I noticed reduce's update semantic s(it throws away all but the last value from the update), and noticed it basically made for a faster last/1. Passed tests. Discovered last(empty) == null. Realized that it was always like that. I don't know what the right thing to do even is and am leaving this here to foist this headache on someone else.

last/1 and first/1 are inconsistent, in that first(empty) is empty but currently last(empty) == null. The former is fairly intentional and gives rise to a useful pattern used by the current definition of isempty. The latter is an accident resulting from reduce semantics.

Unfortunately, it is surprisingly hard to implement a consistent last! The attempt given here unnecessarily wraps every single output of the generator in an array, then throws it out unless it's the last one. I think it's possible to bytecode without doing that nonsense, but would abusing variable guts and I don't want to write that and nobody else will understand it.

In any case, the existing documented semantics of nth is inconsistent with both and is also total nonsense, but I just complained about that elsewhere.

Probably the fundamental issue is the reduce shouldn't throw away all except the last update. On the other hand, its current behaviour is consistent with foreach, where it kind of does make sense, in that all the updates get extracted on but only the last one becomes the next state. Maybe that's nonsense too.

Also fixed a bug (mea culpa) and an unnecessary call in limit/2 (mea culpa).

It's actually empty now. (Makes last/1 even slower than before, though.)

coveralls · 2019-03-22T05:20:48Z

Coverage decreased (-0.01%) to 84.867% when pulling 964caad on muhmuhten:misc_fixes into 3ea0199 on stedolan:master.

nicowilliams · 2019-03-22T05:28:43Z

Here's a reduce-based last/1 with correct semantics:

$ jq -n 'def last(e): reduce e as $e ({empty:true,last:null}; (.empty = false) | (.last=$e))|if .empty then empty else .last end; last(empty)
'
$ jq -n 'def last(e): reduce e as $e ({empty:true,last:null}; (.empty = false) | (.last=$e))|if .empty then empty else .last end; last(1,2)'
2
$

EDIT: Ah, similar to yours. Yours uses an array instead of an object, which is faster. There have been times when I've wondered about making a special jv encoding for arrays of one element which do not need a jvp_array to be allocated. I'm not sure the additional branches would hurt, but certainly fewer allocations would help.

nicowilliams · 2019-03-22T05:33:09Z

Probably the fundamental issue is the reduce shouldn't throw away all except the last update. ...

Actually, that behavior is critical. It's very important that during the execution of the reduction update body, jq not hold on to any additional references to the reduction state, otherwise the memory consumption of things like assignments blow up.

nicowilliams · 2019-03-22T05:34:39Z

I agree it should be possible to bytecode a last/1 that does not need to wrap outputs in arrays. I'm also not sure it's worth it.

nicowilliams · 2019-03-22T05:38:46Z

Another one like this I noticed today is repeat(empty), which loops forever. That is very unsatisfying. I think repeat(empty) should be the same as empty, and I intend to make it so as it turns out to be very useful in dealing with co-routines.

nicowilliams · 2019-03-22T05:47:25Z

Here's an efficient (I think) last/1:

def last(e):
    [0] as $path
  | reduce e as $e ([]; setpath($path;$e))
  | if length==0 then empty else .[0] end;

This is efficient because, though it allocates an array at run-time, it only allocates one array, and it mutates it over and over.

muhmuhten · 2019-03-22T06:03:28Z

I agree it should be possible to bytecode a last/1 that does not need to wrap outputs in arrays. I'm also not sure it's worth it.

Probably any case where last/1 performance matter is also one that would be better served by just not computing out values just to throw them away. Unlike first, it doesn't even actually avoid the extra work...

Here's an efficient (I think) last/1:

A few variants I tried:

% time ./jq -cn 'def last(g): [0] as $p | reduce g as $_ ([]; setpath($p; $_)) | .[]; last(empty), last(range(2e6))'
1999999
./jq -cn   1.47s user 0.00s system 99% cpu 1.473 total
% time ./jq -cn 'def last(g): reduce g as $_ ([]; setpath([0]; $_)) | .[]; last(empty), last(range(2e6))'
1999999
./jq -cn   1.40s user 0.00s system 99% cpu 1.400 total
% time ./jq -cn 'def last(g): reduce g as $_ ([]; [$_]) | .[]; last(empty), last(range(2e6))'
1999999
./jq -cn   1.12s user 0.00s system 99% cpu 1.123 total
% time ./jq -cn 'def last(g): reduce . as $_ (.; g|[.]) | .[]?; last(empty), last(range(2e6))'
1999999
./jq -cn   0.89s user 0.00s system 99% cpu 0.890 total

Sometimes performance can be p unintuitive...

muhmuhten · 2019-03-22T06:16:34Z

(I think what we're seeing is, in order: keeping [0] in a variable isn't actually an improvement over a constant array; array collection is still turns out to be quicker than three function calls; and skipping elements in the VM is quicker than doing the reduction rigmarole.)

There have been times when I've wondered about making a special jv encoding for arrays of one element which do not need a jvp_array to be allocated. I'm not sure the additional branches would hurt, but certainly fewer allocations would help.

If that does in fact help, it seems like it'd be helpful to foreach ((g|[.]), null) &c. code as well.

fadado · 2019-03-22T07:16:38Z

Another one like this I noticed today is repeat(empty), which loops forever. That is very unsatisfying. I think repeat(empty) should be the same as empty, and I intend to make it so as it turns out to be very useful in dealing with co-routines.

def repeat(g): #:: a|(a->*b) => *b
    def r: g , r;
    select(isempty(g)|not) | r
;

muhmuhten · 2019-03-22T07:33:32Z

Evaluates g too many times

nicowilliams · 2019-03-22T14:24:35Z

Yeah, evaluating g only once is essential, otherwise repeat (or whatever function) can't be used with side-effect-having expressions. The I/O and co-routine stuff necessarily adds side-effects...

nicowilliams · 2019-03-22T14:25:49Z

And input, inputs, debug, and stderr all already have side effects.

nicowilliams · 2019-03-22T15:04:52Z

Here's an almost correct repeat/1:

  def repeat(exp):
    label $out |
    def _repeat:
      (exp // break $out), _repeat;
    _repeat;

I say "almost" because the // operator will execute the RHS when the LHS produces null.

nicowilliams · 2019-03-22T15:08:33Z

@muhmuhten looking at the disassembly helps a bit:

$ ./jq --debug-dump-disasm -cn 'def last(g): [0] as $p | reduce g as $_ ([]; setpath($p; $_)) | .[]; last(empty), last(range(2e3))'
0000 TOP
0001 FORK 0011
0003 CALL_JQ last:3 @lambda:4
0009 JUMP 0017
0011 CALL_JQ last:3 @lambda:5
0017 RET
...
last:3:
  [params: g]
  0000 DUP
  0001 PUSHK_UNDER [0]
  0003 POP
  0004 STOREV $p:0
  0007 DUP
  0008 LOADK []
  0010 STOREV $reduce:1
  0013 FORK 0043
  0015 DUPN
  0016 CALL_JQ g:0
  0020 STOREV $_:2
  0023 LOADVN $reduce:1
  0026 SUBEXP_BEGIN
  0027 LOADV $_:2
  0030 SUBEXP_END
  0031 SUBEXP_BEGIN
  0032 LOADV $p:0
  0035 SUBEXP_END
  0036 CALL_BUILTIN setpath
  0039 STOREV $reduce:1
  0042 BACKTRACK
  0043 LOADVN $reduce:1
  0046 EACH
  0047 RET
...
$

vs

$ ./jq --debug-dump-disasm -cn 'def last(g): reduce . as $_ (.; g|[.]) | .[]?; last(empty), last(range(2e3))'
...
last:3:
  [params: g]
  0000 DUP
  0001 STOREV $reduce:0
  0004 FORK 0036
  0006 DUPN
  0007 STOREV $_:1
  0010 LOADVN $reduce:0
  0013 CALL_JQ g:0
  0017 DUP
  0018 LOADK []
  0020 STOREV $collect:2
  0023 FORK 0029
  0025 APPEND $collect:2
  0028 BACKTRACK
  0029 LOADVN $collect:2
  0032 STOREV $reduce:0
  0035 BACKTRACK
  0036 LOADVN $reduce:0
  0039 EACH_OPT
  0040 RET
...

Stripping out most of the instruction addresses and diff'ing:

 last:3:
   [params: g]
        DUP
-       PUSHK_UNDER [0]
-       POP
-       STOREV $p:0
-       DUP
-       LOADK []
-       STOREV $reduce:1
-       FORK 0043
+       STOREV $reduce:0
+       FORK 0036
        DUPN
+       STOREV $_:1
+       LOADVN $reduce:0
        CALL_JQ g:0
-       STOREV $_:2
-       LOADVN $reduce:1
-       SUBEXP_BEGIN
-       LOADV $_:2
-       SUBEXP_END
-       SUBEXP_BEGIN
-       LOADV $p:0
-       SUBEXP_END
-       CALL_BUILTIN setpath
-       STOREV $reduce:1
+       DUP
+       LOADK []
+       STOREV $collect:2
+       FORK 0029
+       APPEND $collect:2
+       BACKTRACK
+  0029 LOADVN $collect:2
+       STOREV $reduce:0
        BACKTRACK
-  0043 LOADVN $reduce:1
-       EACH
+  0036 LOADVN $reduce:0
+       EACH_OPT
        RET

pkoppstein · 2019-03-22T15:24:37Z

@nicowilliams wrote:

Here's an almost correct repeat/1

For the record, the relevant tweak would be:

def repeat(exp):
    label $out
    | def r: ([exp] | if length == 0 then break $out else .[0] end), r;
    r;

Personally, I'm more concerned about repeat(input) than repeat(empty). One might even say that repeatedly asking for nothing should indeed take an indefinite amount of time :-)

nicowilliams · 2019-03-22T15:33:24Z

@pkoppstein repeat/1 currently produces all the outputs of its argument expression...

pkoppstein · 2019-03-22T15:41:28Z

@nicowilliams wrote:

repeat/1 currently produces all the outputs of its argument expression...

Consider:

echo $'1\n2\n3\n' | jq -n 'repeat(input)'
1
2
3
jq: error (at <stdin>:4): break

Notice that an error was raised. Similarly, repeat(input?) does not terminate. Yes, I know, given the current def, that is all as expected, and I'm not asking for any change. As I indicated, to me, the fact that repeat(empty) does not terminate is a feature, or if you like, the acceptable consequence of the simplicity of its current def.

nicowilliams · 2019-03-22T16:01:14Z

@pkoppstein

$ jq -cn '[limit(10; repeat(1,2,3))]'
[1,2,3,1,2,3,1,2,3,1]
$

nicowilliams · 2019-03-22T16:10:09Z

Taking a page from @muhmuhten's last/1:

$ ./jq -cn '
  def repeat(e):
    label $out |
    def _r:
      foreach . as $junk ([]; (e|[.]) // break $out; .[0]), _r;
    _r;
  [limit(10; repeat(empty))]'
[]
$ ./jq -cn '
  def repeat(e):
    label $out |
    def _r:
      foreach . as $junk ([]; (e|[.]) // break $out; .[0]), _r;
    _r;
  [limit(10; repeat(1,2,3))]'
[1,2,3,1,2,3,1,2,3,1]
$ ./jq -cn '
  def repeat(e):
    label $out |
    def _r:
      foreach . as $junk ([]; (e|[.]) // break $out; .[0]), _r;
    _r;
  [limit(10; repeat(1,null,3))]'
[1,null,3,1,null,3,1,null,3,1]
$

muhmuhten · 2019-03-22T21:24:05Z

Offhand, but this seems to assume that if an evaluation of the argument produces no results, it'll never produce any more results. Is this really a reasonable assumption though? e.g. `repeat(input|numbers)`

…

On 3/22/19, Nico Williams ***@***.***> wrote: Taking a page from @muhmuhten's `last/1`: ``` $ ./jq -cn ' def repeat(e): label $out | def _r: foreach . as $junk ([]; (e|[.]) // break $out; .[0]), _r; _r; [limit(10; repeat(empty))]' [] $ ./jq -cn ' def repeat(e): label $out | def _r: foreach . as $junk ([]; (e|[.]) // break $out; .[0]), _r; _r; [limit(10; repeat(1,2,3))]' [1,2,3,1,2,3,1,2,3,1] $ ./jq -cn ' def repeat(e): label $out | def _r: foreach . as $junk ([]; (e|[.]) // break $out; .[0]), _r; _r; [limit(10; repeat(1,null,3))]' [1,null,3,1,null,3,1,null,3,1] $ ``` -- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: #1869 (comment)

nicowilliams · 2019-03-22T22:23:39Z

Currently input outputs an error on EOF. I think instead it should empty on EOF, and repeat/1 should stop on empty. The only expressions that can empty and when re-evaluated not empty are I/O-like functions, and those should come with utilities for checking EOF, doing clearerr(3), and I think it's fair for repeat/1 to stop on empty because you probably can't read again unless you take some action, and looping forever is clearly wrong.

nicowilliams · 2019-03-23T02:05:36Z

A slightly different way to think repeat/1:

repeat(pure_expression_that_is_not_empty) -> makes sense, like range(1e12), maybe this is what you want, an endless stream of something
repeat(IO_writing_expression) -> as long as the sink will take it, keep it up -- and if the sink goes away, well, we can raise an error and get out
repeat(empty) -> not useful at all, but what about:
repeat(some_pure_expression_that_is_same_as_empty) -> certainly not useful to loop tightly doing nothing forever
repeat(IO_reading_expression) -> read until EOF, then... then certainly not loop tightly doing nothing forever

jq -n 'repeat(input) | ...' -> when I hit ^D, I expect to get my shell prompt back.

Because to date we haven't had any impure expressions that end up being empty, and because repeat(empty) is decidedly non-useful, I think we are free to make this change repeat(empty) to backtrack.

BTW, we still have to do something about C-coded jq functions returning jv_invalid() to trigger backtracking. Right now my co-routine and I/O stuff depends on that behavior, and I think we should change input to do the same -- raising an error ("break") works, but it feels very unidiomatic. I don't think returning jv_invalid() to backtrack is a a problem, but I forgot why @muhmuhten thought it was, and I'm open to it being a problem -- if it is a problem we can just add a jv_backtrack() that returns a flavor of invalid jv that causes the VM to backtrack, and a jv_is_backtrack() to go with it.

muhmuhten · 2019-03-23T02:30:53Z

The issue is with repeat(IO expression which can possibly be empty without being finished).

% seq 20 | jq -cn 'def repeat(exp): def _r: exp, _r; _r; [limit(10; repeat(input|select(. % 10 != 5)))]'
[1,2,3,4,6,7,8,9,10,11]
% seq 20 | jq -cn 'def repeat(e): label $out | def _r: foreach . as $junk ([]; (e|[.]) // break $out; .[0]), _r; _r; [limit(10; repeat(input|select(. % 10 != 5)))]'
[1,2,3,4]

I don't think jv_invalid() being a backtrack was a problem, just that null|error should probably be some kind of error rather than a backtrack.

Possibly it would be better to have this repeat as a builtin, since it has useful properties and the implementation is nonobvious, but those properties need to be documented since they are a bit surprising.

Also, it the init expression of foreach needs to be . so that e gets the right input:

% jq -cn 'def repeat(exp): def _r: exp, _r; _r; [3|limit(10; repeat(.))]'
[3,3,3,3,3,3,3,3,3,3]
% jq -cn 'def repeat(e): label $out | def _r: foreach . as $junk ([]; (e|[.]) // break $out; .[0]), _r; _r; [3|limit(10; repeat(.))]'
[[],[],[],[],[],[],[],[],[],[]]

nicowilliams · 2019-03-23T02:56:06Z

I agree, but why not write that sort of expression like so: repeat(input_expression)|select(...) instead of having the select() inside the repeat expression?

Ah, but the point is this is a breaking change then. OK, fair enough :(

nicowilliams · 2019-03-29T13:36:11Z

Another thing like this is that the // operator unfortunately alternates two conditions: the emptiness/non-emptiness of the LHS, and its truthiness:

$ jq './/"yo"'
true
true
false
"yo"
null
"yo"
1
1
"a"
"a"
$

But sometimes you want the RHS only if the LHS was empty. isempty/1 doesn't help if you want walt all the outputs of the LHS.

The way to jq-code that is remarkably like your last/1, @muhmuhten:

$ jq -n </dev/null 'foreach . as $i (null; [input?]; .) | if length==0 then "yo" else .[0] end'
"yo"
$ jq -n 'foreach . as $i (null; [input?]; .) | if length==0 then "yo" else .[0] end'
false
false
$ jq -n 'foreach . as $i (null; [input?]; .) | if length==0 then "yo" else .[0] end'
null
null
$

As a function:

$ jq  -n '
  def definedor(stream; alt):
      foreach . as $i (null; [stream]; .)
    | if length==0 then alt else .[0] end;
  definedor(input?; "yo")'
...

("definedor" is the internal name for the // operator in jq.)

The bytecode for this is nowhere near as tight as for // in spite of not caring what the values are that are produced by the LHS, so we might need a proper new operator. Right now the only ideas I've got as to what that should look like is ///, but that's not very satisfying.

muhmuhten · 2019-03-29T23:07:30Z

It seems like "alternative for empty", "alternative for null", and "alternative for falsey things" should be distinct. It's just the first one that's particularly awkward/nonintuitive to implement in jq.

I think a bit more work is needed to get definedor right:

% jq -n '0|range(.) // -1'
-1
% jq -n 'def definedor(stream; alt): foreach . as $i (null; [stream]; .) | if length==0 then alt else .[0] end; 0|definedor(range(.); -1)'
jq: error (at <unknown>): Range bounds must be numeric
% time jq -n 'first(range(1e20)//-1)|empty'
jq -n 'first(range(1e20)//-1)'  0.01s user 0.01s system 92% cpu 0.014 total
% jq -n 'def definedor(stream; alt): foreach . as $i (null; [stream]; .) | if length==0 then alt else .[0] end; first(definedor(range(1e20); -1))|empty'
# this shouldn't succeed

muhmuhten · 2019-03-29T23:10:41Z

Actually, wouldn't def definedor(s; a): (s|[.]) // (a|[.]) | .[]; do the job? Needless box-unboxing aside.

nicowilliams · 2019-03-29T23:42:19Z

Boxing happens anyways in my version.

nicowilliams · 2019-03-29T23:43:26Z

Oops, my definedor boxes the whole stream, not each of its outputs. Ay.

nicowilliams · 2019-03-30T03:03:20Z

Using // for a this is better than using foreach because a) it's more idiomatic, b) it avoids the SUBEXP_BEGIN/SUBEXP_END brackets around conditionals, but in the end it's only three instructions fewer. Thanks for showing me that variant.

last(f) output null for empty (is a bug jqlang/jq#1869) nth($n; f) error on negative $n and pick correct value for short f

muhmuhten added 5 commits March 21, 2019 21:04

Discard the full program body if there is no TOP

aeeb8af

Slightly faster last/1

ce3e802

add tests for foreach/reduce with multi-value init/update/extract

cb0ee84

Fix last(empty)

0daa5a7

It's actually empty now. (Makes last/1 even slower than before, though.)

Faster limit/2 by saving an empty call in the loop

964caad

liquidaty mentioned this pull request Mar 7, 2023

organizing to release JQ 1.6.2 and JQ 1.7: Your suggestions/comments would be appreciated #2550

Closed

itchyny added the bug label Jun 3, 2023

itchyny mentioned this pull request Sep 16, 2024

[Docs] Clarify the behvior of first, last, and nth on empty generators #3177

Open

wader added a commit to wader/jqjq that referenced this pull request Sep 17, 2024

Fix behaviour of last(f) and nth($n; f)

ef8b48d

last(f) output null for empty (is a bug jqlang/jq#1869) nth($n; f) error on negative $n and pick correct value for short f

itchyny mentioned this pull request Sep 18, 2024

fix: make last(empty) yield no output values #3179

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make last(empty) empty #1869

Make last(empty) empty #1869

muhmuhten commented Mar 22, 2019

coveralls commented Mar 22, 2019

nicowilliams commented Mar 22, 2019 •

edited

Loading

nicowilliams commented Mar 22, 2019

nicowilliams commented Mar 22, 2019

nicowilliams commented Mar 22, 2019 •

edited

Loading

nicowilliams commented Mar 22, 2019 •

edited

Loading

muhmuhten commented Mar 22, 2019

muhmuhten commented Mar 22, 2019

fadado commented Mar 22, 2019

muhmuhten commented Mar 22, 2019

nicowilliams commented Mar 22, 2019

nicowilliams commented Mar 22, 2019

nicowilliams commented Mar 22, 2019

nicowilliams commented Mar 22, 2019 •

edited

Loading

pkoppstein commented Mar 22, 2019 •

edited

Loading

nicowilliams commented Mar 22, 2019

pkoppstein commented Mar 22, 2019 •

edited

Loading

nicowilliams commented Mar 22, 2019

nicowilliams commented Mar 22, 2019

muhmuhten commented Mar 22, 2019 via email

nicowilliams commented Mar 22, 2019

nicowilliams commented Mar 23, 2019

muhmuhten commented Mar 23, 2019

nicowilliams commented Mar 23, 2019

nicowilliams commented Mar 29, 2019

muhmuhten commented Mar 29, 2019

muhmuhten commented Mar 29, 2019

nicowilliams commented Mar 29, 2019

nicowilliams commented Mar 29, 2019

nicowilliams commented Mar 30, 2019

Make last(empty) empty #1869

Are you sure you want to change the base?

Make last(empty) empty #1869

Conversation

muhmuhten commented Mar 22, 2019

coveralls commented Mar 22, 2019

nicowilliams commented Mar 22, 2019 • edited Loading

nicowilliams commented Mar 22, 2019

nicowilliams commented Mar 22, 2019

nicowilliams commented Mar 22, 2019 • edited Loading

nicowilliams commented Mar 22, 2019 • edited Loading

muhmuhten commented Mar 22, 2019

muhmuhten commented Mar 22, 2019

fadado commented Mar 22, 2019

muhmuhten commented Mar 22, 2019

nicowilliams commented Mar 22, 2019

nicowilliams commented Mar 22, 2019

nicowilliams commented Mar 22, 2019

nicowilliams commented Mar 22, 2019 • edited Loading

pkoppstein commented Mar 22, 2019 • edited Loading

nicowilliams commented Mar 22, 2019

pkoppstein commented Mar 22, 2019 • edited Loading

nicowilliams commented Mar 22, 2019

nicowilliams commented Mar 22, 2019

muhmuhten commented Mar 22, 2019 via email

nicowilliams commented Mar 22, 2019

nicowilliams commented Mar 23, 2019

muhmuhten commented Mar 23, 2019

nicowilliams commented Mar 23, 2019

nicowilliams commented Mar 29, 2019

muhmuhten commented Mar 29, 2019

muhmuhten commented Mar 29, 2019

nicowilliams commented Mar 29, 2019

nicowilliams commented Mar 29, 2019

nicowilliams commented Mar 30, 2019

nicowilliams commented Mar 22, 2019 •

edited

Loading

nicowilliams commented Mar 22, 2019 •

edited

Loading

nicowilliams commented Mar 22, 2019 •

edited

Loading

nicowilliams commented Mar 22, 2019 •

edited

Loading

pkoppstein commented Mar 22, 2019 •

edited

Loading

pkoppstein commented Mar 22, 2019 •

edited

Loading