Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The values of the MIME type parameters aren't parsed as ASCII strings #141

Closed
andreubotella opened this issue Apr 22, 2021 · 3 comments · Fixed by #142
Closed

The values of the MIME type parameters aren't parsed as ASCII strings #141

andreubotella opened this issue Apr 22, 2021 · 3 comments · Fixed by #142

Comments

@andreubotella
Copy link
Member

If the "parse a MIME type" algorithm is called on a string like "multipart/form-data; boundary=áèîøü", the parsing succeeds, and the resulting MIME type record has a "boundary" parameter of "áèîøü", even though the MIME type definition specifies that parameter values are ASCII strings.

This is because in the parsing algorithm, the essence and parameter names must only contain HTTP token code points, which are a subset of ASCII; but parameter values must only contain HTTP quoted-string token code points, which aren't.

I found this as part of working on a multipart/form-data parser in https://github.com/andreubotella/multipart-form-data, since I noticed that some browsers accept a boundary string with code points between U+0080 and U+00FF while others don't, and after going down the rabbit hole of fetch algorithms, this seems to be the cause of that incompatibility.

@annevk
Copy link
Member

annevk commented Apr 23, 2021

Wow, nice find! And also somewhat surprising this was overlooked for so long. I guess this is a use case for "isomorphic string" as we shouldn't subset HTTP.

@andreubotella
Copy link
Member Author

The "parse a MIME type" algorithm works on strings, not on byte sequences as it used to do, so no isomorphic decoding is needed here. This means that a caller could pass code points greater than U+00FF, but trying to parse them as part of a parameter value would already fail because they wouldn't be HTTP quoted-string token code points. Should then the definition of parameter value refer to HTTP quoted-string token code points?

@annevk
Copy link
Member

annevk commented Apr 23, 2021

With "isomorphic string" I meant a string whose code points are in the range U+0000 to U+00FF, inclusive (aka latin1, but we avoid latin1 as a term on the web as it also means windows-1252 there). But yeah, we could also define the parameter value as a string whose code points are HTTP quoted-string token code points.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging a pull request may close this issue.

2 participants