-
Notifications
You must be signed in to change notification settings - Fork 29.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
querystring: fix state machine in querystring.parse #11171
Conversation
- posIdx should save the current position instead of the last position. Fixed cases: - `a&&b` => `{ 'a': '', 'b': '' }` - `a=a&&b=b` => `{ 'a': 'a', 'b': 'b' }` Fixes nodejs#10454
Some more bad cases, even after the patch (sorry...): > querystring.parse('&a')
{ '': '', a: '' }
// { a: '' }
> querystring.parse('&=')
{ '': [ '', '' ] }
// { '': '' }
> querystring.parse('a&a&')
{ a: '' }
// { a: [ '', '' ] }
> querystring.parse('a&a&a&')
{ a: '' }
// { a: [ '', '', '' ] }
> querystring.parse('a&a&a&a&')
{ a: '' }
// { a: [ '', '', '', '' ] } |
@TimothyGu That's a great catch! Thanks<3 I've added a commit to supporting the cases. |
This is why I wanted to take a look at the original PR before landing as the On a more general note, I'm not particularly keen on changing |
I think the fastest way we can fix this in cc @jasnell |
Also if we do that, when we start to redo the parsing/serialization bit of |
It does seem like creating a separate spec-complaint parser for URL queries might be a good path. The algorithm doesn't seem that hard to just implement from scratch: https://url.spec.whatwg.org/#concept-urlencoded-parser (Although, the spec's pipeline of JS string --UTF-8 encode--> bytes --percent decode--> bytes --UTF-8 decode without BOM--> strings seems a bit inefficient, and hopefully you can the same results by just staying in string-land.) |
yeah, I'll look at creating a separate parser this next week. |
@jasnell, you might find my C++ implementation to be interesting, since the spec is fairly declarative on what needs to be done. Though performance-wise I think a JS implementation might be faster. |
I've benchmarked the combined changes in #10967 and this PR against master prior to either of these commits, and on at least one of the benchmark combinations (multicharsep) there is ~5.5% performance regression with a high "confidence" rating. I am working on and benchmarking an alternative solution now which should hopefully avoid the performance regression(s). |
Another useful test case: > require("querystring").parse("a=&a=value&a=")
{ a: [ '', '' ] }
// { a: [ '', 'value', '' ] } … and already solved by this patch. |
Alright, I've tested my much simpler changes which should cover the needs of the original PR and this PR, including the tests they introduce (and @stevenvachon's test), and after adjusting the benchmarks to allow the function "priming" to occur more than once to ensure any re-optimizing ("definitely") stays outside of the timed loop, there does not seem to be any high "confidence" negative regressions anymore. Also just judging by this new code change alone, I'm confident that the code will not negatively affect the other code paths. I will submit a PR as soon as the more thorough benchmarking that I currently have running finishes and agrees with my preliminary results. |
@mscdex That sounds great! I have a keen interest in it, let's see what will happen :) |
My proposed solution has now been pushed: #11234 |
I think so, and that patch was great! I will close this to jump to #11234. Thanks 😃 |
@jasnell Yes |
From this comment: #10454 (comment). The
posIdx
should save the current position instead of the last position, and I've added the test cases.Fixed cases:
a&&b
=>{ 'a': '', 'b': '' }
a=a&&b=b
=>{ 'a': 'a', 'b': 'b' }
Fixes #10454
Checklist
make -j4 test
(UNIX), orvcbuild test
(Windows) passesAffected core subsystem(s)
querystring