-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
revamp sub/3 to resolve most issues with gsub (and sub with "g") #2624
Conversation
The primary purpose of this commit is to rectify most problems with gsub (and also sub with "g"), and in particular #1425 (\b), #2354 (lookahead), and #2532 (regex == "^(?!cd ).*$|^cd "; "")) The revision also partly resolves #2148 and #1206 in that gsub no longer loops infinitely, but because the new gsub depends on match(_;"g"), the behavior when regex == "" is sometimes non-standard. Since the new sub/3 relies on uniq/1, that has been added as well; this should be a non-issue as the line count of the revised builtin.jq has been significantly reduced overall. Also, _nwise/1 has been tweaked to take advantage of TCO.
space-efficiency enhancement
The prior attempt to enhance _nwise was botched.
sub/3: .gap => $gap
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for my late response, but I reply in short.
- This PR changes the behavior of
"a" | sub("a"; "b","c")
and"aaa" | gsub("a"; "b","c")
. We don't want to change these behaviors just to fix the empty matching issue. - Why do you need to
uniq
the matches? I confirmed thatmake check
passed without it, but could you give any examples that requireuniq
here? - Having
unique/0
anduniq/1
is confusing. Also maybe we will needunique_by/2
too? If you add a public (oops, poor nwise!) filter, needs tests for itself, and manual updates. - I want to fix space inconsistency in the test case queries but this is just a nitpick.
@itchyny - Thanks for your review. Most of the issues you raise can be easily dealt with
The root issue here is that '"qux" | match("(?=u)"; "g")' emits two identical JSON objects
That is, uniq/1 is stream-oriented (like *ix) whereas unique/0 is array-oriented (like its
The reason is that we cannot simply use
I'll include a possible workaround in my next post, |
@itchyny - This is an addendum to the previous post, and is in response to your remark:
In my (as yet still private) jq repository, I've made changes to the manual including additional examples, and noticed that when I regenerate tests/man.test, the result is quite different from the old man.test, as though that file has not been regenerated recently. This leads to the question: In general, when changes to manual.yml are made, do you want the man.test file regenerated as well? Also, there was one line in the regenerated man.test file that was wrong: one occurrence of:
was mistranslated as |
@itchyny - I seem to have resolved the main issue - the streaming semantics when the second argument of sub or gsub is a stream. I'll do a bit more testing and then will probably replace this PR with another one. Two questions: (2) Using gojq, the query |
I'm closing this PR in favor of #2641 @itchyny - This new PR is intended to address all the issues that have been raised, but please note that it follows gojq in the example above, so e.g. "p" | sub("q"; "a", "b") only produces one result. However, it would only requite a small one-line tweak to get it to produce one result for each value in position 2. |
… uniq(stream) The primary purpose of this commit (which supercedes PR jqlang#2624) is to rectify most problems with `gsub` (and also `sub` with the "g" option), in particular jqlang#1425 ('\b'), jqlang#2354 (lookahead), and jqlang#2532 (regex == "^(?!cd ).*$|^cd ";"")). This commit also partly resolves jqlang#2148 and jqlang#1206 in that `gsub` no longer loops infinitely; however, because the new `gsub` depends critically on match(_;"g"), the behavior when regex == "" is sometimes non-standard. [*1] Since the new sub/3 relies on uniq/1, that has been added as well [*2]. The documentation has been updated to reflect the fact that `sub` and `gsub` are intended to be regular in the second argument. [*3] Also, _nwise/1 has been tweaked to take advantage of TCO. Footnotes: [*1] Using the new gsub, '"a" | gsub( ""; "a")' emits "aa" rather than "aaa" as would be standard. This is nevertheless better than the infinite loop behavior of jq 1.6 in this case. With one exception (as explained in [*2]), the new gsub is implemented as though match/2 behavior is correct. That is, bugs in `gsub` behavior will most likely have their origin in `match/2`. [*2] `uniq/1` adopts the Unix/Linux name and semantics; it is needed for the following test case: gsub("(?=u)"; "u") "qux" "quux" Without this functionality: Test jqlang#23: 'gsub("(?=u)"; "u")' at line number 100 *** Expected "quux", but got "quuux" for test at line number 102: gsub("(?=u)"; "u") The root of the problem here is `match`: if `match` is fixed, then gsub would not need `untie`. The addition of `uniq` as a top-level function should be a non-issue relative to general concern about builtins.jq bloat: the line count of the new builtin.jq is significantly reduced overall, and the number of defs is actually reduced by 1 (from 111 (ignoring a redundant def) to 110). [*3] See e.g. jqlang#513 (comment)
The primary purpose of this commit is to rectify most problems with gsub (and also sub with "g"), and in particular
#1425 (\b), #2354 (lookahead), and #2532 (regex == "^(?!cd ).*$|^cd "; ""))
The revision also partly resolves #2148 and #1206 in that gsub no longer loops infinitely, but because the new gsub depends on match(_;"g"), the behavior when regex == "" is sometimes non-standard.
Since the new sub/3 relies on uniq/1, that has been added as well; this should be a non-issue as the line count of the revised builtin.jq has been significantly reduced overall.
Also, _nwise/1 has been tweaked to take advantage of TCO.