Remove Trailing Slashes from Hostname #1254
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hostnames can be passed in with trailing slashes. Since we blindly concat a path with a leading slash, paths like http://hostname//$rpc/path (note the double slash before $rpc) are possible. The duplicated slash can cause issues down the line, especially if the path is separated from the host in requests.
Browsers/servers generally allow double slashes, and while they are valid syntactically under the RFC spec, there is no specified semantic meaning. So a server could interpret a path like http://hostname//$rpc/path as equivalent to http://hostname/$rpc/path (most common), as a separate path with an empty path component before /$rpc, or as something else. The problem is several interpretations are equally valid under the spec and it's implementation-dependent which is used.
Additionally, the path can be separate from the host in a request depending on whether it's in origin form versus absolute form (see https://datatracker.ietf.org/doc/html/rfc7230#section-5.3). In absolute form (GET http://hostname//$rpc/path), the path and host are clear. In origin form (GET //$rpc/path, Host: hostname), it is still clear to us looking at it, but may not be clear to a server. For example the server may think the //$rpc/path is in absolute form due to the double slash, and thus parse it as host: $rpc, path: /path. This is especially problematic as https://datatracker.ietf.org/doc/html/rfc7230#section-5.4 says:
"When a proxy receives a request with an absolute-form of request-target, the proxy MUST ignore the received Host header field (if any) and instead replace it with the host information of the request-target. A proxy that forwards such a request MUST generate a new Host field-value based on the received request-target rather than forward the received Host field-value."
Which means a proxy would corrupt the request when in this format by overwriting the old host value with $rpc
This parsing concern is not idle speculation either. Java's main URI class (https://docs.oracle.com/javase/8/docs/api/java/net/URI.html), parses a URI of the form //$rpc/path as having host/authority $rpc and path /path. Real bugs are observed from usage of this in commonly used libraries such as Netty, and various proxies.
Basically, this doesn't cause issues in the most common cases, but is a bug waiting to happen in many other edge cases, and affects widely used infrastructure.