querystring: fix state machine in querystring.parse #11171

watilde · 2017-02-04T21:56:19Z

From this comment: #10454 (comment). The posIdx should save the current position instead of the last position, and I've added the test cases.

Fixed cases:

a&&b => { 'a': '', 'b': '' }
a=a&&b=b => { 'a': 'a', 'b': 'b' }

Fixes #10454

Checklist

make -j4 test (UNIX), or vcbuild test (Windows) passes
tests and/or benchmarks are included
commit message follows commit guidelines

Affected core subsystem(s)

querystring

- posIdx should save the current position instead of the last position. Fixed cases: - `a&&b` => `{ 'a': '', 'b': '' }` - `a=a&&b=b` => `{ 'a': 'a', 'b': 'b' }` Fixes nodejs#10454

TimothyGu · 2017-02-04T22:11:10Z

Some more bad cases, even after the patch (sorry...):

> querystring.parse('&a')
{ '': '', a: '' }
// { a: '' }
> querystring.parse('&=')
{ '': [ '', '' ] }
// { '': '' }
> querystring.parse('a&a&')
{ a: '' }
// { a: [ '', '' ] }
> querystring.parse('a&a&a&')
{ a: '' }
// { a: [ '', '', '' ] }
> querystring.parse('a&a&a&a&')
{ a: '' }
// { a: [ '', '', '', '' ] }

watilde · 2017-02-04T22:55:21Z

@TimothyGu That's a great catch! Thanks<3 I've added a commit to supporting the cases.
The problem was the logic didn't add the value to obj when curValue is '' in else if (curValue). It should be else. Plus, I noticed that it became unnecessary to evaluate i directly and also delete obj[key], because of the logic updated with the first commit in this PR :)

mscdex · 2017-02-05T17:47:26Z

This is why I wanted to take a look at the original PR before landing as the delete should never have been necessary in the first place.

On a more general note, I'm not particularly keen on changing querystring.parse() (and slowing it down) just to satisfy URLSearchParams. URLSearchParams should just have its own implementation if it requires different behavior.

joyeecheung · 2017-02-06T04:34:43Z

I think the fastest way we can fix this in URLSearchParams without affecting querystring is to simply copy the files over, and then make changes in that copy? We can completely reimplement the querystring part later, but right now we are focusing on spec compliance so this is the fastest way to catch up on the spec(not necessarily the best way for sure).

cc @jasnell

joyeecheung · 2017-02-06T04:46:45Z

Also if we do that, when we start to redo the parsing/serialization bit of URLSearchParams we will be refactoring a spec-complaint fork of querystring instead of reimplementing it from scratch.

domenic · 2017-02-06T04:48:44Z

It does seem like creating a separate spec-complaint parser for URL queries might be a good path. The algorithm doesn't seem that hard to just implement from scratch: https://url.spec.whatwg.org/#concept-urlencoded-parser

(Although, the spec's pipeline of JS string --UTF-8 encode--> bytes --percent decode--> bytes --UTF-8 decode without BOM--> strings seems a bit inefficient, and hopefully you can the same results by just staying in string-land.)

jasnell · 2017-02-06T05:37:01Z

yeah, I'll look at creating a separate parser this next week.

TimothyGu · 2017-02-06T06:23:44Z

@jasnell, you might find my C++ implementation to be interesting, since the spec is fairly declarative on what needs to be done. Though performance-wise I think a JS implementation might be faster.

mscdex · 2017-02-06T11:25:47Z

I've benchmarked the combined changes in #10967 and this PR against master prior to either of these commits, and on at least one of the benchmark combinations (multicharsep) there is ~5.5% performance regression with a high "confidence" rating.

I am working on and benchmarking an alternative solution now which should hopefully avoid the performance regression(s).

stevenvachon · 2017-02-07T01:22:32Z

Another useful test case:

> require("querystring").parse("a=&a=value&a=")
{ a: [ '', '' ] }
// { a: [ '', 'value', '' ] }

… and already solved by this patch.

mscdex · 2017-02-07T02:37:50Z

Alright, I've tested my much simpler changes which should cover the needs of the original PR and this PR, including the tests they introduce (and @stevenvachon's test), and after adjusting the benchmarks to allow the function "priming" to occur more than once to ensure any re-optimizing ("definitely") stays outside of the timed loop, there does not seem to be any high "confidence" negative regressions anymore. Also just judging by this new code change alone, I'm confident that the code will not negatively affect the other code paths.

I will submit a PR as soon as the more thorough benchmarking that I currently have running finishes and agrees with my preliminary results.

watilde · 2017-02-07T08:58:11Z

@mscdex That sounds great! I have a keen interest in it, let's see what will happen :)
Probably, this PR will be closed depending on your patch.

mscdex · 2017-02-08T08:24:58Z

My proposed solution has now been pushed: #11234

jasnell · 2017-02-08T16:43:22Z

@mscdex ... to be clear, #11234 is an alternative to this one, yes?

watilde · 2017-02-08T18:16:17Z

I think so, and that patch was great! I will close this to jump to #11234. Thanks 😃

mscdex · 2017-02-08T19:43:31Z

@jasnell Yes

querystring: fix state machine in querystring.parse

a6d3530

- posIdx should save the current position instead of the last position. Fixed cases: - `a&&b` => `{ 'a': '', 'b': '' }` - `a=a&&b=b` => `{ 'a': 'a', 'b': 'b' }` Fixes nodejs#10454

nodejs-github-bot added the querystring Issues and PRs related to the built-in querystring module. label Feb 4, 2017

querystring: update logic in parse

b837c13

mscdex mentioned this pull request Feb 5, 2017

url: add urlSearchParams.sort() #11098

Closed

4 tasks

mscdex added dont-land-on-v4.x labels Feb 6, 2017

mscdex mentioned this pull request Feb 7, 2017

URLSearchParams regression #11208

Closed

jasnell approved these changes Feb 7, 2017

View reviewed changes

mscdex mentioned this pull request Feb 8, 2017

querystring: fix empty pairs handling #11234

Closed

3 tasks

watilde closed this Feb 8, 2017

watilde deleted the feature/query branch February 8, 2017 18:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

querystring: fix state machine in querystring.parse #11171

querystring: fix state machine in querystring.parse #11171

watilde commented Feb 4, 2017

TimothyGu commented Feb 4, 2017

watilde commented Feb 4, 2017

mscdex commented Feb 5, 2017 •

edited

Loading

joyeecheung commented Feb 6, 2017 •

edited

Loading

joyeecheung commented Feb 6, 2017

domenic commented Feb 6, 2017 •

edited

Loading

jasnell commented Feb 6, 2017

TimothyGu commented Feb 6, 2017

mscdex commented Feb 6, 2017

stevenvachon commented Feb 7, 2017 •

edited

Loading

mscdex commented Feb 7, 2017 •

edited

Loading

watilde commented Feb 7, 2017 •

edited

Loading

mscdex commented Feb 8, 2017

jasnell commented Feb 8, 2017

watilde commented Feb 8, 2017

mscdex commented Feb 8, 2017

querystring: fix state machine in querystring.parse #11171

querystring: fix state machine in querystring.parse #11171

Conversation

watilde commented Feb 4, 2017

Checklist

Affected core subsystem(s)

TimothyGu commented Feb 4, 2017

watilde commented Feb 4, 2017

mscdex commented Feb 5, 2017 • edited Loading

joyeecheung commented Feb 6, 2017 • edited Loading

joyeecheung commented Feb 6, 2017

domenic commented Feb 6, 2017 • edited Loading

jasnell commented Feb 6, 2017

TimothyGu commented Feb 6, 2017

mscdex commented Feb 6, 2017

stevenvachon commented Feb 7, 2017 • edited Loading

mscdex commented Feb 7, 2017 • edited Loading

watilde commented Feb 7, 2017 • edited Loading

mscdex commented Feb 8, 2017

jasnell commented Feb 8, 2017

watilde commented Feb 8, 2017

mscdex commented Feb 8, 2017

mscdex commented Feb 5, 2017 •

edited

Loading

joyeecheung commented Feb 6, 2017 •

edited

Loading

domenic commented Feb 6, 2017 •

edited

Loading

stevenvachon commented Feb 7, 2017 •

edited

Loading

mscdex commented Feb 7, 2017 •

edited

Loading

watilde commented Feb 7, 2017 •

edited

Loading