Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

performance bug (introduced at some time after Jan 19, 2016) #1543

Closed
pkoppstein opened this issue Nov 29, 2017 · 11 comments
Closed

performance bug (introduced at some time after Jan 19, 2016) #1543

pkoppstein opened this issue Nov 29, 2017 · 11 comments
Assignees
Milestone

Comments

@pkoppstein
Copy link
Contributor

The following program performs very nicely with jq 1.5 and jq-1.5rc2-150-g1740fd0 of Jan 19, 2016, but fails to terminate in a reasonable time using the current master (jq-1.5rc2-251-g476b367):

def zip($headers):
  [$headers, .] | transpose | map({(.[0]): .[1]}) | add ;

def testzip(n):
  [range(0;n)] as $row
  | $row | zip( $row|map(tostring) ) ;

testzip(1000000) | length

E.g. using jq 1.5:

$ time jq -n -f objectify.jq
1000000

real	0m7.241s
user	0m6.434s
sys	0m0.428s

(My attempts to further bisect have run into some other difficulties....)

@pkoppstein pkoppstein changed the title performance bug (since Jan 19, 2016) performance bug (introduced at some time after Jan 19, 2016) Nov 29, 2017
@pkoppstein
Copy link
Contributor Author

The main reason I'm having trouble bisecting is that my attempts to remake older versions almost always result in executables that can report their version number correctly, but otherwise trip an assertion, e.g.:

~/github/jq$ ./jq --version
jq-1.5rc2-177-ge6fa039
~/github/jq$ ./jq .
Assertion failed: (min < start), function jvp_utf8_backtrack, file src/jv_unicode.c, line 12.
Abort trap: 6

In case there's something obviously wrong with the procedure I'm following, here it is in brief:

  1. make clean
  2. checkout the desired version
  3. autoreconf -i ; ./configure ; make

@nicowilliams, @wtlangford - Any suggestions would be appreciated. Thanks.

@wtlangford
Copy link
Contributor

I wouldn't expect any issues with your method. I usually run autoreconf -fi, but that shouldn't have the observed effect.

@pkoppstein
Copy link
Contributor Author

pkoppstein commented Dec 5, 2017

@wtlangford - You're right: -fi seems to make no difference, but I have been able
to establish that the performance problem is exhibited by:

jq-1.5rc2-238-g10d6605-dirty (May 20, 2017)

@nicowilliams
Copy link
Contributor

@pkoppstein The way to bisect in the face of that is to write a script like this:

#!/bin/bash

if ! make -j4; then # This will automatically autoreconf and ./configure
    exit 125
fi

if ! ./jq . < /dev/null; then
    exit 125
fi

jq -f ... <your test here>
exit $?

@nicowilliams
Copy link
Contributor

@pkoppstein and, of course, you'll want to force a timeout on your test.

@nicowilliams
Copy link
Contributor

I used this bisect script:

#!/bin/bash

set -euo pipefail

make -j8 > /dev/null 2>&1 || exit 125
echo true | ./jq . > /dev/null || exit 125

if timeout -k 10 8 ./jq -n -f objectify.jq; then
  true
else
  stat=$?
  ((stat==124)) && exit 1
fi
exit 0

to find a likely commit that introduced this:

$ git checkout origin/master
$ git bisect start
$ git bisect bad HEAD
$ git bisect good 1740fd036dc6ab464ef0937a0f3fc81618886837
$ git bisect run ./bisect
running ./bisect
Bisecting: 24 revisions left to test after this (roughly 5 steps)
[dc679081fa770c260ca9a569a8a4fdbb10bcdc20] fix truncate_stream to match docs
running ./bisect
Bisecting: 12 revisions left to test after this (roughly 4 steps)
[fd4ae8304e23007672af9a37855c7a76de7c78cf] Parse no deeper than MAX_PARSING_DEPTH
running ./bisect
100000
Bisecting: 6 revisions left to test after this (roughly 3 steps)
[7bdf3a39ade47c0571bda45c639a8fc0f11d0ca6] Add SQL-style operators (#1322)
running ./bisect
Bisecting: 2 revisions left to test after this (roughly 2 steps)
[e82147bbb859413744ec9591065bf87d2b281ce4] Solaris needs XPG6 for c99
running ./bisect
100000
Bisecting: 0 revisions left to test after this (roughly 1 step)
[396543191cec32f69712b4364d26a0ebaa61b801] Make first(g) more efficient: extract only 1 value
running ./bisect
Bisecting: 0 revisions left to test after this (roughly 0 steps)
[e24af3c78e78a3aab05a2800d825d56f1d842b1b] reduce: handle empty updates (fix #1313)
running ./bisect
e24af3c78e78a3aab05a2800d825d56f1d842b1b is the first bad commit
commit e24af3c78e78a3aab05a2800d825d56f1d842b1b
Author: Nicolas Williams <nico@cryptonector.com>
Date:   Mon Jan 23 17:10:51 2017 -0600

    reduce: handle empty updates (fix #1313)

:040000 040000 a922a39aef9d34a2018df1542026ccc5d8a2aefc aa20ba56a4f318500ae6fbc9317f69953aa5d179 M      docs
:100644 100644 7ca3cba3f157ded32f50fc5c3b25ada99d012a45 995b40e9abd948befd8d40cb8ce6b4c3d39c4aa9 M      jq.1.prebuilt
:040000 040000 2fff53eb9e223797bdbf69b670caab0b524e555d 5e46b104ff20a638da38181ccd5fa82b0a734213 M      src
:040000 040000 9a4990eb1c81d8121e80fa3047c1d61345001ec0 f321e399e5dd15c9135802364ea4113f874044bb M      tests
bisect run success

So it looks like e24af3c, which is just this (ignoring docs and test changes):

diff --git a/src/compile.c b/src/compile.c
index 4276d3b..a5c75c9 100644
--- a/src/compile.c
+++ b/src/compile.c
@@ -704,12 +704,27 @@ static block bind_matcher(block matcher, block body) {

 block gen_reduce(block source, block matcher, block init, block body) {
   block res_var = gen_op_var_fresh(STOREV, "reduce");
+  block update_var = gen_op_bound(STOREV, res_var);
+  block jmp = gen_op_target(JUMP, body);
   block loop = BLOCK(gen_op_simple(DUPN),
                      source,
                      bind_matcher(matcher,
                                   BLOCK(gen_op_bound(LOADVN, res_var),
+                                        /*
+                                         * We fork to the body, jump to
+                                         * the STOREV.  This means that
+                                         * if body produces no results
+                                         * (i.e., it just does empty)
+                                         * then we keep the current
+                                         * reduction state as-is.
+                                         *
+                                         * To break out of a
+                                         * reduction... use break.
+                                         */
+                                        gen_op_target(FORK, jmp),
+                                        jmp,
                                         body,
-                                        gen_op_bound(STOREV, res_var))),
+                                        update_var)),
                      gen_op_simple(BACKTRACK));
   return BLOCK(gen_op_simple(DUP),
                init,

I'm guess there's a an implied dup in this. And... there is. It's in the stack_save() call from the FORK opcode.

Oy. We may have to revisit that one patch entirely. It was intended to allow empty RHSes for assignments, which would be very convenient, but maybe there's just no way to get that behavior and not kill performance.

@nicowilliams nicowilliams added this to the 1.6 release milestone Dec 11, 2017
@nicowilliams
Copy link
Contributor

@wtlangford We should back this out before releasing 1.6.

@nicowilliams nicowilliams self-assigned this Dec 11, 2017
@nicowilliams
Copy link
Contributor

Revert in-progress.

@nicowilliams
Copy link
Contributor

@pkoppstein Thanks for the report! This is fixed in master now.

@nicowilliams
Copy link
Contributor

@pkoppstein You might want to check that walk works as expected still. We don't have enough tests for it.

@pkoppstein
Copy link
Contributor Author

@nicowilliams - I added the following lines to my private copy of jq.test and ran jqtest successfully. Please feel free to add to master. (I'd offer a PR but I'm all tangled up in git knots at the moment.) Speaking of PRs, I don't know whether you've had a chance to review Erik Brinkman's recent fine-tuning of his random functions PR, but I think it deserves your review, hopefully so that it can be included in 1.6. Thanks.

walk(if type == "object" and .b == null then del(.b) else . end)
[{"a":1, "b":null}, {"c":3}]
[{"a":1},{"c":3}]

walk(if type == "object" then with_entries(select(.value != null)) else . end)
[{"a":1, "b":null}, {"c":3}]
[{"a":1},{"c":3}]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants