You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
gsub/2 is very slow, even when the first argument is not a regex.
For example, to split unixdict.txt (*) (read in as a single string) into separate lines takes about 23 minutes using gsub/2, compared to about 0.4 secs using jq 1.4 split/1.
That is, the ratio of running times is roughly 3,400:1!
If this can't be fixed, then split/1 should not be implemented directly using gsub.
Using ruby's gsub in a similar way to read the same file yields times very close to jq 1.4 split/1. So I'm wondering if the choice of PCRE-mode for Oniguruma is the problem. Just wondering.
@wtlangford - could you please look into this? Thanks.
This is definitely a function of using the regex engine.
Also! Since split was originally implemented using jv_string_split, we have a problem.
Originally, splitting "ABCD.EFGH" on . would return ["ABCD","EFGH"]. Now it returns ["","","","","","","","","",""], due to the regex engine.
I think the default behavior here should be non-regex splitting with the option to use regex to split.
Also worth noting- PCRE mode isn't the problem, as much as the problem is using regexes to split is necessarily slower than simple equality comparisons.
I think the default behavior here should be non-regex splitting with the option to use regex to split.
Yes. Given that it's unlikely Oniguruma-based splitting is going to be acceptably fast anytime soon, I'd suggest taking the easy path: reverting split/1, and using splits/{1,2} for the regex-based splitting. This doesn't preclude something bolder in the future, and adequately resolves all the issues (performance of split/1; backward-compatibility; stream-vs-array output) for now.
gsub/2 is very slow, even when the first argument is not a regex.
For example, to split unixdict.txt (*) (read in as a single string) into separate lines takes about 23 minutes using gsub/2, compared to about 0.4 secs using jq 1.4 split/1.
That is, the ratio of running times is roughly 3,400:1!
If this can't be fixed, then split/1 should not be implemented directly using gsub.
Using ruby's gsub in a similar way to read the same file yields times very close to jq 1.4 split/1. So I'm wondering if the choice of PCRE-mode for Oniguruma is the problem. Just wondering.
@wtlangford - could you please look into this? Thanks.
The text was updated successfully, but these errors were encountered: