Skip to content
This repository has been archived by the owner on Apr 22, 2023. It is now read-only.

Strange behavior in querystring parsing #60

Closed
WizKid opened this issue Feb 6, 2010 · 18 comments
Closed

Strange behavior in querystring parsing #60

WizKid opened this issue Feb 6, 2010 · 18 comments

Comments

@WizKid
Copy link

WizKid commented Feb 6, 2010

I'm playing around with node.js and trying to make a simple bittorrent tracker.

The problem I've ran into is that the querystring parser don't like the requests a bittorrent client does.

One part of the url is info_hash=%92%5e%1b%94%98yx%f3%97%ea%db%fa%10d%be%fe%0a5%8c%0a

The first problem I have is that decodeURIComponent in the querystring lib don't like %92 and some other escaped characters. encodeURIComponent(unescape("%92")) gives %C2%92 so maybe it is that bittorrent clients don't encode stuff properly. I tried to look around to see if %92 is correct or wrong but I couldn't find any good sources. But this problem I can work around.

My second problem is val = val.replace(trimmerPattern, ''); in the querystring lib. Why trim querystring parameter keys and values? If I send in a string with a whitespace in the beginning or end I probably want that. Or I think there at least should be a way to say that you want the raw string and no conversion should be done (no whitespace cleanup and no conversion to integers).

If you think it would be a good idea to add a raw option I could add it.

@rictic
Copy link

rictic commented Feb 7, 2010

decodeURIComponent is an ECMAScript method that tries to decode the %xx hex values into UTF-8 characters. Not every byte is a valid UTF-8 character, so that's probably the wrong method for your purposes.

@isaacs
Copy link

isaacs commented Feb 7, 2010

WIzKid:

Hey, QS author here.

You can already change the escape/unescape functions if necessary. Just do this:

var querystring = require("querystring");
querystring.unescape = function (s) { return s };
querystring.escape = function (s) { return s };

For most use cases, (en|de)codeURIComponent is what's needed, so it's a pretty sensible default.

As to trimming, I think you're right, that's probably unnecessary. I don't recall the case that prompted that, but I think it was backward compatibility with some other broken awful server-side language (php or ruby or perl or python, probably). I'm ok with taking it out. For your case, though, if you just replace the escape/unescape functions, you'll get %20 or + chars instead of whitespace.

@isaacs
Copy link

isaacs commented Feb 7, 2010

Ok, what definitely IS a bug, though, is that decodeURIComponent can throw, and it's not obvious that it needs a try/catch around any URL parse. Crashing your server due to an ugly request URL is very much Doing It Wrong.

Adding that to my todo list.

@WizKid
Copy link
Author

WizKid commented Feb 7, 2010

isaacs: I've added a raw-argument to the parse-function to make it not trimming and converting stuff to integers. If you want to use it you can find it at: http://github.com/WizKid/node/commit/26262f205900f5ab122a5365486139bf9cd40333

@isaacs
Copy link

isaacs commented Feb 7, 2010

I'm not entirely convinced that converting to Number is a bad thing in any case, since it toString()s just fine if you treat it like a string. The trimming can be taken out completely, IMO, but I'll run it by the list to see if anyone is depending on that behavior for some reason.

If you just swap out the encode/decode functions in your module you'll get what you want, and I'd prefer to do that rather than add a "raw" param to the parser. (Three args is already a lot, and this is an edge case.) I definitely also need to fix the throwing problem, because that's going to unexpectedly bite folks.

I'd really like to have it parse to a ByteString, but node doesn't have one of those yet.

@WizKid
Copy link
Author

WizKid commented Feb 7, 2010

I just couldn't find anything in the manual that parse did convert stuff to integers. So as a user of the API I would be pretty surprised if I parsed the querystring containing username=1234 and then wanted to check if the length of the username was valid by using params['username'].length which would return undefined. I just feels like everyone would need to use toString on everything all the time to make sure it is a string. I don't say that you should change it. I just wanted to tell you that it feels strange for me.

@isaacs
Copy link

isaacs commented Feb 7, 2010

Ok, that's a valid point. I'm convinced. Numericizing numbers will be removed.

So, to close this issue, we've got:

  1. Don't crash by default on bad stuff.
  2. Don't numberify number values in the qs
  3. Don't trim.

Anything else?

@WizKid
Copy link
Author

WizKid commented Feb 7, 2010

Sounds perfect for me.

Then I could just locally overload the escape/unescape metod to use escape/unescape instead of using escapeURIComponent/unescapeURIComponent.

@isaacs
Copy link

isaacs commented Feb 8, 2010

Fixed in isaacs/node-v0.x-archive@d27d47f. Not the ideal fix, since it'd be better to use a binary string, but hey, at least it's not gonna crash the server.

@ry
Copy link

ry commented Feb 8, 2010

isaacs, LGTM but could you add a test case?

@psvensson
Copy link

I've had a similar problem where I'm using (fab) and it breaks in the beginning of the url routing due to the above issues. But since I'm using latest version from git, shouldn't the breaking issue be solved by now?

Also, since I'm a n00b to node, can I require in and override the encode/decode stuff anywhere, or do I have to put it inside the module that explicitly use it?

Cheers,
PS

@ry
Copy link

ry commented Mar 12, 2010

This issue is fixed, afaik

@psvensson
Copy link

Thanks!

@WizKid
Copy link
Author

WizKid commented Mar 12, 2010

From what I can see it hasn't been put in the main node repo. Just in Isaacs private one. At least http://github.com/ry/node/blob/master/lib/querystring.js haven't got the change in it.

@domine
Copy link

domine commented Apr 2, 2010

Are there any disadvantages to using 'unescape' instead of 'decodeURIComponent' in the querystring module? That seems to fix this problem

@psvensson
Copy link

No, you're probably right. I was just showing my ignorance :) But as stated, everything now works fine at lower levels and the issue doesn't come up anymore.

This issue was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants