Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

faster/simpler version of transpose, with stream-oriented min and max #2679

Closed
wants to merge 2 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 13 additions & 3 deletions docs/content/manual/manual.yml
Original file line number Diff line number Diff line change
Expand Up @@ -1426,10 +1426,15 @@ sections:
input: '[{"foo":1, "bar":10}, {"foo":3, "bar":100}, {"foo":1, "bar":1}]'
output: ['[[{"foo":1, "bar":10}, {"foo":1, "bar":1}], [{"foo":3, "bar":100}]]']

- title: "`min`, `max`, `min_by(path_exp)`, `max_by(path_exp)`"
- title: "`min`, `max`, `min(stream)`, `max(stream)`, `min_by(path_exp)`, `max_by(path_exp)`"
body: |

Find the minimum or maximum element of the input array.
`min` finds the minimum element of the input array, and
`min(stream)` finds the minimum item in the stream.

`max` and `max(stream)` similarly find the maximum element.

`min(empty)` and `max(empty)` both emit nothing.

The `min_by(path_exp)` and `max_by(path_exp)` functions allow
you to specify a particular field or property to examine, e.g.
Expand All @@ -1439,10 +1444,15 @@ sections:
- program: 'min'
input: '[5,4,2,7]'
output: ['2']

- program: 'min(1,2,3,0.1)'
input: null
output: ['0.1']

- program: 'max_by(.foo)'
input: '[{"foo":1, "bar":14}, {"foo":2, "bar":3}]'
output: ['{"foo":2, "bar":3}']

- title: "`unique`, `unique_by(path_exp)`"
body: |

Expand Down
25 changes: 16 additions & 9 deletions src/builtin.jq
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
def first(g): label $out | g | ., break $out;
def halt_error: halt_error(5);
def error(msg): msg|error;
def map(f): [.[] | f];
Expand All @@ -6,6 +7,19 @@ def sort_by(f): _sort_by_impl(map([f]));
def group_by(f): _group_by_impl(map([f]));
def unique: group_by(.) | map(.[0]);
def unique_by(f): group_by(f) | map(.[0]);
# max(s) and min(s) use boxing technique for the sake of `input`:
def max(s):
reduce (s|[.]) as $x (null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The version in my earlier comment didn't allocate an array for every $x.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but setpath can slow it down, as it did in my tests. Fixing input would be so much nicer!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean "fixing input would be so much nicer"?

As for setpath... we're using it on a zero-or-one element array, so it should not allocate at all after the first time... I'll check that out later.

if . == null then $x
else if $x > . then $x end # for speed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Omitting else clause is a new feature, so we shouldn't use it in builtins? I'm ok with this anyway.

end )
| select(.)[0];
def min(s):
reduce (s|[.]) as $x (null;
if . == null then $x
else if $x < . then $x end # for speed
end )
| select(.)[0];
def max_by(f): _max_by_impl(map([f]));
def min_by(f): _min_by_impl(map([f]));
def add: reduce .[] as $x (null; . + $x);
Expand Down Expand Up @@ -154,7 +168,6 @@ def range($init; $upto; $by):
if $by > 0 then $init|while(. < $upto; . + $by)
elif $by < 0 then $init|while(. > $upto; . + $by)
else empty end;
def first(g): label $out | g | ., break $out;
def isempty(g): first((g|false), true);
def all(generator; condition): isempty(generator|condition and empty);
def any(generator; condition): isempty(generator|condition or empty)|not;
Expand All @@ -181,14 +194,8 @@ def combinations(n):
| combinations;
# transpose a possibly jagged matrix, quickly;
# rows are padded with nulls so the result is always rectangular.
def transpose:
if . == [] then []
else . as $in
| (map(length) | max) as $max
| length as $length
| reduce range(0; $max) as $j
([]; . + [reduce range(0;$length) as $i ([]; . + [ $in[$i][$j] ] )] )
end;
# Using map(length) turns out to be faster than using max/1
def transpose: [range(0; map(length)|max) as $i | [.[][$i]]];
def in(xs): . as $x | xs | has($x);
def inside(xs): . as $x | xs | contains($x);
def repeat(exp):
Expand Down
9 changes: 9 additions & 0 deletions tests/jq.test
Original file line number Diff line number Diff line change
Expand Up @@ -1352,6 +1352,15 @@ unique
[]
[null,null,null,null]

[min(1,2,0.1), max(1,2,0.1)]
null
[0.1,2]


[min(empty),max(empty)]
null
[]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add tests with max, and also with empty as its argument.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

.foo[.baz]
{"foo":{"bar":4},"baz":"bar"}
4
Expand Down