Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle raw UTF-8 bytes in redirect headers #317

Merged
merged 2 commits into from
Apr 6, 2021

Conversation

sagebind
Copy link
Owner

@sagebind sagebind commented Apr 1, 2021

Try to parse the Location header as UTF-8 bytes as a fallback if the header value is not valid US-ASCII. This is technically against the URI spec which requires all literal characters in the URI to be US-ASCII (see RFC 3986, Section 4.1).

This is also more or less against the HTTP spec, which historically allowed for ISO-8859-1 text in header values but since was restricted to US-ASCII plus opaque bytes. Never has UTF-8 been encouraged or allowed as-such. See RFC 7230, Section 3.2.4 for more info.

However, some bad or misconfigured web servers will do this anyway, and most web browsers recover from this by allowing and interpreting UTF-8 characters as themselves even though they should have been percent-encoded. The third-party URI parsers that we use have no such leniency, so we percent-encode such bytes (if legal UTF-8) ahead of time before handing them off to the URI parser.

This is in the spirit of being generous with what we accept (within reason) while being strict in what we produce. Since real websites exhibit this out-of-spec behavior it is worth handling it.

Note that the underlying tiny_http library that our HTTP test mocking is based on does not allow UTF-8 header values right now, so we can't really test this efficiently. We already have a couple tests out there doing some raw TCP munging for one reason or another, so in the future we need to make sure to rewrite testserver to allow such headers and then enable the test. For now I've manually verified that this works.

Fixes #315.

Try to parse the `Location` header as UTF-8 bytes as a fallback if the header value is not valid US-ASCII. This is technically against the URI spec which requires all literal characters in the URI to be US-ASCII (see [RFC 3986, Section 4.1](https://tools.ietf.org/html/rfc3986#section-4.1)).

This is also more or less against the HTTP spec, which historically allowed for ISO-8859-1 text in header values but since was restricted to US-ASCII plus opaque bytes. Never has UTF-8 been encouraged or allowed as-such. See [RFC 7230, Section 3.2.4](https://tools.ietf.org/html/rfc7230#section-3.2.4) for more info.

However, some bad or misconfigured web servers will do this anyway, and most web browsers recover from this by allowing and interpreting UTF-8 characters as themselves even though they _should_ have been percent-encoded. The third-party URI parsers that we use have no such leniency, so we percent-encode such bytes (if legal UTF-8) ahead of time before handing them off to the URI parser.

This is in the spirit of being generous with what we accept (within reason) while being strict in what we produce. Since real websites exhibit this out-of-spec behavior it is worth handling it.

Note that the underlying `tiny_http` library that our HTTP test mocking is based on does not allow UTF-8 header values right now, so we can't really test this efficiently. We already have a couple tests out there doing some raw TCP munging for one reason or another, so in the future we need to make sure to rewrite `testserver` to allow such headers and then enable the test. For now I've manually verified that this works.

Fixes #315.
@codecov
Copy link

codecov bot commented Apr 1, 2021

Codecov Report

Merging #317 (bb6d283) into master (3522593) will decrease coverage by 0.31%.
The diff coverage is 55.10%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #317      +/-   ##
==========================================
- Coverage   75.25%   74.94%   -0.32%     
==========================================
  Files          57       57              
  Lines        2983     2993      +10     
==========================================
- Hits         2245     2243       -2     
- Misses        738      750      +12     
Impacted Files Coverage Δ
src/cookies/psl/mod.rs 82.92% <ø> (ø)
src/error.rs 44.66% <0.00%> (-0.67%) ⬇️
tests/redirects.rs 100.00% <ø> (ø)
src/redirect.rs 71.08% <33.33%> (-6.70%) ⬇️
src/agent/mod.rs 72.68% <62.50%> (-0.05%) ⬇️
src/agent/selector.rs 74.41% <64.28%> (-0.59%) ⬇️
src/agent/timer.rs 100.00% <100.00%> (ø)
src/handler.rs 69.72% <0.00%> (-1.38%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3522593...bb6d283. Read the comment docs.

@sagebind
Copy link
Owner Author

sagebind commented Apr 1, 2021

@teto-bot rustfmt

@sagebind sagebind merged commit 70100a4 into master Apr 6, 2021
@sagebind sagebind deleted the 315-raw-utf8-redirect-header branch April 6, 2021 16:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Redirects aren't followed if Location header contains non-ASCII symbols
2 participants