Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Forwarded, X-Forwarded-Scheme and X-Forwarded-Host for better scheme and host resolution #1881

Merged
merged 26 commits into from
May 22, 2017

Conversation

evertlammerts
Copy link
Contributor

What do these changes do?

They enhance the resolution mechanism for scheme and host in aiohttp.web_request.Baserequest.

The scheme is resolved based on:

  • the currently used transport
  • the secure_proxy_ssl_header argument of BaseRequest.__init__
  • the Forwarded header
  • the X-Forwarded-Proto header

in that order.

The host is resolved based on:

  • the Forwarded header
  • the X-Forwarded-Host header
  • the Host header

in that order.

Are there changes in behavior for the user?

This PR does not include changes in the contract.

Related issue number

#1134 Support X-Forwarded-* and Forwarded implicitly, deprecate secure_proxy_ssl_header.

Checklist

  • I think the code is well written
  • Unit tests for the changes exist
  • Documentation reflects the changes
  • If you provide code modification, please add yourself to CONTRIBUTORS.txt
    • The format is <Name> <Surname>.
    • Please keep alphabetical order, the file is sorted by names.
  • Add a new entry to CHANGES.rst
    • Choose any open position to avoid merge conflicts with other PRs.
    • Add a link to the issue you are fixing (if any) using #issue_number format at the end of changelog message. Use Pull Request number if there are no issues for PR or PR covers the issue only partially.

CHANGES.rst Outdated
@@ -41,6 +41,8 @@ Changes

- Do not unquote `+` in match_info values #1816

- Use Forwarded, X-Forwarded-Scheme and X-Forwarded-Host for better scheme and host resolution. #1134
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PEP8 here. Please split this to 2 lines or more of up to 79 characters per line. For more information on PEP8 see the www.python.org page on PEP8.

@asvetlov
Copy link
Member

Implementation looks good on fist glaze.
Let me reread RFC before merging.
I'll add required deprecation records after merging if @evertlammerts has troubles with it.

@asvetlov asvetlov self-assigned this May 14, 2017
@evertlammerts
Copy link
Contributor Author

does c2848ed cover the deprecation of secure_proxy_ssl_header?

Copy link
Contributor

@AraHaan AraHaan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems alright for now.

@evertlammerts
Copy link
Contributor Author

Ping @asvetlov

@asvetlov
Copy link
Member

Sorry, I'm on US PyCon right now and quite busy.

@asvetlov
Copy link
Member

asvetlov commented May 20, 2017

https://tools.ietf.org/html/rfc7239 says that Forwarded header could be present several times but the very first occurrence points to host which initiates the request.
@evertlammerts could you add an explicit test for it?

forwarded = self._message.headers.get(hdrs.FORWARDED)
if forwarded is not None:
parsed = re.findall(
r'^by=([^;]*); +for=([^;]*); +host=([^;]*); +proto=(https?)$',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect the parser should be more complicated.

The header field value can be
   defined in ABNF syntax as:

       Forwarded   = 1#forwarded-element

       forwarded-element =
           [ forwarded-pair ] *( ";" [ forwarded-pair ] )

       forwarded-pair = token "=" value
       value          = token / quoted-string

       token = <Defined in [RFC7230], Section 3.2.6>
       quoted-string = <Defined in [RFC7230], Section 3.2.6>

It means the forwarded-pair elements could be in random order: by=a; host=b.com and host=b.com, by=a are both valid combinations.
Also value could be optionally quoted, aiohttp shoud unquote it automatically.
pair-name is case-insensitive: all 'Host', 'host', and 'HOst` are valid names:

       Forwarded: for="_gazonk"
       Forwarded: For="[2001:db8:cafe::17]:4711"
       Forwarded: for=192.0.2.60;proto=http;by=203.0.113.43
       Forwarded: for=192.0.2.43, for=198.51.100.17

   Note that as ":" and "[]" are not valid characters in "token", IPv6
   addresses are written as "quoted-string".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please note: the lase example contains two for pairs, it's not supported by your regexp now.

@evertlammerts
Copy link
Contributor Author

evertlammerts commented May 20, 2017 via email

@evertlammerts
Copy link
Contributor Author

Alright, this actually adds a public forwarded property to BaseRequest. It makes an effort to parse Forwarded headers as specified by the RFC:

  • It adds all parameters (by, for, host, proto) in the order it finds them; starting at the topmost / first added 'Forwarded' header, at the leftmost / first-added parameter.
  • It checks that the value has valid syntax in general as specified in section 4: either a 'token' or a 'quoted-string'.
  • It un-escapes found escape sequences.
  • It does NOT validate 'by' and 'for' contents as specified in section 6.
  • It does NOT validate 'host' contents (Host ABNF).
  • It does NOT validate 'proto' contents for valid URI scheme names.

Don't you think the place for parsing the Forwarded header is rather a middleware? I'm in doubt myself now.

Otherwise, what's a good place for the regex constants? Would be nice to compile the regex only once, but module level doesn't do much for esthetics.

Copy link
Member

@asvetlov asvetlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love .req.forwarded property.

No, we don't need a middleware for it.

Global variables for regexps are pretty fine. If you don't like this approach you could put them ad class variables into BaseRequest

if hdrs.FORWARDED in self._message.headers:
for forwarded_elm in self._message.headers.getall(hdrs.FORWARDED):
if len(forwarded_elm):
forwarded_pairs = tuple(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't create a tuple but make a generator comprehension. It saves both time and memory.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please support Forwarded: by=first, by=second form, it's equal to

Forwarded: by=first
Forwarded: by=second

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is supported by the regexes and I've included tests for it

params = {'by': [], 'for': [], 'host': [], 'proto': []}
if hdrs.FORWARDED in self._message.headers:
for forwarded_elm in self._message.headers.getall(hdrs.FORWARDED):
if len(forwarded_elm):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drop this check, iteration over empty forwarded_elm is pretty fine.

proto=tuple(...), )
"""
params = {'by': [], 'for': [], 'host': [], 'proto': []}
if hdrs.FORWARDED in self._message.headers:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drop the check, in is not free.
Use for forwarded_elm in self._message.headers.getall(hdrs.FORWARDED, []) instead

r'[bB][yY]|[fF][oO][rR]|[hH][oO][sS][tT]|[pP][rR][oO][tT][oO]')

_FORWARDED_PAIR = (
r'^ *({forwarded_params})=({token}|{quoted_string}) *$'.format(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC spaces around = are allowed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure but as far as I can see that's not the case. Section 4 only says forwarded-pair = token "=" value and section 7 recalls Note that an HTTP list allows white spaces to occur between the identifiers, and the list may be split over multiple header fields.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO it's rare case but still allowed

continue
param = forwarded_pair[0][0].lower()
value = forwarded_pair[0][1]
if len(value) and value[0] == '"':
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a check for value[-1] == '"'

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If empty value allowed? I mean len(value) == 0.
Also if value: gives the same result as if len(value) but works a bit faster.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no need to check for a closing quote: the regex validates quoted-string so any value starting with " must also have a closing ".

A token may be empty (but note again that I don't check the syntax in section 6)

Returns a dict(by=tuple(...), for=tuple(...), host=tuple(...),
proto=tuple(...), )
"""
params = {'by': [], 'for': [], 'host': [], 'proto': []}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the property's type should be MultyDict instead of dict of lists.
It preserves all information about proxies in right order but original host still could be checked as req.forwarded.get('host')

value = _QUOTED_PAIR_REPLACE_RE.sub(
r'\1', value[1:-1])
params[param].append(value)
return params
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return MultiDictProxy(params) -- it's an immutable view.

Returns a dict(by=tuple(...), for=tuple(...), host=tuple(...),
proto=tuple(...), )
"""
params = {'by': [], 'for': [], 'host': [], 'proto': []}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use MultiDict here instead of dict of pairs. It preserves all information about proxies.

"""
params = {'by': [], 'for': [], 'host': [], 'proto': []}
if hdrs.FORWARDED in self._message.headers:
for forwarded_elm in self._message.headers.getall(hdrs.FORWARDED):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drop previous line, in is not free but getall(hdrs.FORWARDED, ()) does the check using single call to data structure.

@evertlammerts
Copy link
Contributor Author

evertlammerts commented May 21, 2017

I forgot Python < 3.5 doesn't like an arbitrary number of unpackings (*iterable). Just to be certain, is Python >= 3.4.2 still the requirement?

@asvetlov
Copy link
Member

Yes, we sill support Python 3.4.2+
Will drop it soon but not right now.
It will be trivial but huge change -- replace all yield from with await :)

@evertlammerts
Copy link
Contributor Author

I seem to have missed a couple of things:

  • field values are separated by comma: my implementation rather assumes that each header has a single field value in which each parameter may occur multiple times (i.e. I allow host=abc; host=def; which is explicitly not allowed and should be host=abc, host=def)
  • MultiDict maintains order between values, so I can just use md.add()

Regarding OWS around =, I can't find examples where that is allowed. Actually, I think it isn't allowed around ; either, or at least the ABNF doesn't include OWS or RWS. (Another bug in my regexes.) The only place where it seems to be allowed is the field-value separator (,).

I looked for examples of whitespace around = in other places as well. I can only think of the Content-* (charset=...) and Accept(q=...) headers as ones that also accept key-value pairs and RFC 7231 explicitly doesn't allow whitespace. Why do you think it is allowed? Maybe something to do with clients not being compliant?

I'll do my best to add some more commits later this week. When are you planning the next release, @asvetlov?

@asvetlov
Copy link
Member

Ok, I'm convinced regarding to spaces.
But commas and proper multidict usage (md.add(k, v)) should be fixed.
There is no scheduled date for the release.
I love to have this issue in next version and could wait for you for a while if @fafhrd91 agrees with me.

@fafhrd91
Copy link
Member

@asvetlov it is your call. I am busy with different project in any case

@evertlammerts
Copy link
Contributor Author

d1ba86b fixes the last issues, as far as I can see.

  • It is now possible to provide multiple field-values (comma separated list) in a single Forwarded header. See the tests / code in d1ba86b for the supported semantics.
  • I've fixed the unpacking issue. The patch should be compatible with Python 3.4 now.
  • I've fixed an issue with the quoted-pairs regex, for quoted-strings (ie host="something \"escaped\"" now works correctly)

As far as the return type goes, I've changed that to a tuple containing MappingProxies. The semantics of the Forwarded header are so that every proxy may add a Forwarding field-value; by, for, host and proto in any combination, but not multiple times per field-value. When a proxy adds a new Forwarding field-value, it either appends it to an already existing Forwarding header (after a comma) or it adds a new Forwarding header at the bottom. I think it's useful to maintain the information about the order in which proxies added something. So now the result will look something like:

req.forward = (
    MappingProxy({'by': ..., 'for': ...}),  # added by proxy 1
    MappingProxy({'host': '...', 'for': ..., 'proto': ...}),  # added by proxy 2
    # ... etc
)

If a field-value added by a proxy cannot be parsed, the code will add a MappingProxy(dict()) so that users can still see that something is there, and that it was added by a proxy at that point in the chain.

Let me know what you think, @asvetlov.

@asvetlov asvetlov merged commit 5a43ce0 into aio-libs:master May 22, 2017
@asvetlov
Copy link
Member

Awesome work @evertlammerts
Very thank you!

@adamcharnock
Copy link

For anyone else chasing their tail trying to figure out where this feature has gone, it has been removed and moved into middleware. See here:

#2171 (comment)

@lock
Copy link

lock bot commented Oct 28, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a [new issue] for related bugs.
If you feel like there's important points made in this discussion, please include those exceprts into that [new issue].
[new issue]: https://github.com/aio-libs/aiohttp/issues/new

@lock lock bot added the outdated label Oct 28, 2019
@lock lock bot locked as resolved and limited conversation to collaborators Oct 28, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants