Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strictly require ASCII headers names and values #137

Merged
merged 1 commit into from
Aug 30, 2014
Merged

Strictly require ASCII headers names and values #137

merged 1 commit into from
Aug 30, 2014

Conversation

kxepal
Copy link
Member

@kxepal kxepal commented Aug 13, 2014

Historically, HTTP has allowed field content with text in the
ISO-8859-1 charset [ISO-8859-1], supporting other charsets only
through use of [RFC2047] encoding.  In practice, most HTTP header
field values use only a subset of the US-ASCII charset [USASCII].
Newly defined header fields SHOULD limit their field values to
US-ASCII octets.  A recipient SHOULD treat other octets in field
content (obs-text) as opaque data.

-- http://tools.ietf.org/html/rfc7230#section-3.2.4

Allowing non-ASCII headers names and values may eventually cause some
nasty errors which may be hard to debug.

Also, this change allows to pass headers name and values as bytes in
case when they fulfils the requirements. Very useful when header have
to contain some base64 encoded value which is in bytes by default.

@asvetlov
Copy link
Member

-1
We should support non-ascii headers.
If, say, some server requires non-ascii header aiohttp should has a way to pass it in.

@kxepal
Copy link
Member Author

kxepal commented Aug 13, 2014

@asvetlov I wonder why if the main specification tell the other? If some server doesn't follows the RFC there are too many places where his support may be broken.

@asvetlov
Copy link
Member

I afraid the spec describes desired situation, not the status quo.

Sure, non-ascii headers are evil but personally I would like to have support for it if possible.

@kxepal
Copy link
Member Author

kxepal commented Aug 13, 2014

Well, my motivation for this change was dictated on two aspects:

  1. I spent quite a lot of time trying to figure out why 400 Bad Request in response when everything looks and feels correct, even tests. Actually, that was a typo in c char that looks exactly as cyrillic с. Stupid situation, but it's impossible to hit it with requests or stdlib http.client (which is used by it) - unicode decode error at header = header.encode('ascii') line oblivious tells what's wrong and where to look for it. And actually I hadn't found around any webserver that supported non-ascii headers - each was happy to fail.

  2. I wanted to allow headers to have bytes values in case when they fulfils the RFC requirements. Saves a few more oblivious actions.

@asvetlov
Copy link
Member

@fafhrd91 @popravich your thoughts?

@popravich
Copy link
Member

RFCs are good but I agree with @asvetlov -- we should support non-ascii headers.
tested non-ascii headers on several sites -- all worked well.

$ telnet httpbin.org 80
Trying 107.20.143.59...
Connected to httpbin.org.
Escape character is '^]'.
GET /cookies HTTP/1.1
Host: httpbin.org
Cookie: key1=тест

HTTP/1.1 200 OK
Access-Control-Allow-Credentials: true
Access-Control-Allow-Origin: *
Content-Type: application/json
Date: Wed, 13 Aug 2014 15:08:42 GMT
Server: gunicorn/18.0
Content-Length: 61
Connection: keep-alive

{
  "cookies": {
    "key1": "\u0442\u0435\u0441\u0442"
  }
}

same with aiohttp:

import asyncio, aiohttp
loop = asyncio.get_event_loop()
req = aiohttp.request('get', 'http://httpbin.org/cookies', cookies={'key': 'тест'})
resp = loop.run_until_complete(req)
loop.run_until_complete(resp.json())
# {'cookies': {'key': 'тест'}}

so I don't see a big problem in non-ascii headers.

@kxepal
Copy link
Member Author

kxepal commented Aug 13, 2014

Here is the problem:

$ telnet httpbin.org 80
Trying 54.225.141.194...
Connected to httpbin.org.
Escape character is '^]'.
GET /cookies HTTP/1.1
Host: httpbin.org
Cookie: key1=тест
Лет-ит: краш

HTTP/1.1 400 BAD_REQUEST
Content-Length: 0
Connection: Close

Connection closed by foreign host.

@fafhrd91
Copy link
Member

-1 for header values
+1 for header names

@popravich
Copy link
Member

agree to @fafhrd91

@kxepal
Copy link
Member Author

kxepal commented Aug 15, 2014

Ok, what about handling header names/values as bytes if only they are printable ascii?

@fafhrd91
Copy link
Member

printable ascii is too complicated. lets enforce ascii for header names for now.

@fafhrd91
Copy link
Member

@kxepal could you change your PR, I'd like to make another release during weekend

@kxepal
Copy link
Member Author

kxepal commented Aug 16, 2014

Yes, I will, just waited for the feedback on ASCII bytes. I'm ok with, however, this wouldn't change things a lot: according RFC only printable ASCII are allowed for headers name/values and they are strongly recommended to be lesser than 0x80 char, so it's only matter of single decode call. Will update PR in a moment.

@kxepal
Copy link
Member Author

kxepal commented Aug 16, 2014

Done with new commit message.

@kxepal
Copy link
Member Author

kxepal commented Aug 28, 2014

Everyone happy with proposed changes? Can I merge it?

@fafhrd91
Copy link
Member

Sorry for delay.

I don't like isinstance check.

@asvetlov
Copy link
Member

@fafhrd91 isinstance for name or value?

@fafhrd91
Copy link
Member

For both. I am happy with asserts

On Thursday, August 28, 2014, Andrew Svetlov notifications@github.com
wrote:

@fafhrd91 https://github.com/fafhrd91 isinstance for name or value?


Reply to this email directly or view it on GitHub
#137 (comment).

@kxepal
Copy link
Member Author

kxepal commented Aug 28, 2014

@fafhrd91 assert isn't the right way to check the things since 1) it can be stripped from compiled bytecode (and nothings it will guard after that) 2) it raises semantically wrong error type for the our constraint 3) you still have to provide meaningful error message and for assertion long descriptions leads to ugly code

@fafhrd91
Copy link
Member

if you create application that can send wrong header in production mode then you doomed :)
i do not want to pay for this checks at run time.

@kxepal
Copy link
Member Author

kxepal commented Aug 29, 2014

@fafhrd91 fair enough, thought still don't feel that's right. I'll update PR.

Header value still can have unicode values. Also raise proper exception
for invalid type or value with a friendly error message.

See http://tools.ietf.org/html/rfc7230#section-3.2.4 for more info.
@kxepal
Copy link
Member Author

kxepal commented Aug 29, 2014

Done. Thought, it still wouldn't protect you completely since different servers reacts differently on punctuation characters in header name.

@fafhrd91
Copy link
Member

Everything is tradeoff. If you are that bad and put punctuation into header
name, that should not affect real users of the library

On Friday, August 29, 2014, Alexander Shorin notifications@github.com
wrote:

Done. Thought, it still wouldn't protect you completely since different
servers reacts differently on punctuation characters in header name.


Reply to this email directly or view it on GitHub
#137 (comment).

fafhrd91 added a commit that referenced this pull request Aug 30, 2014
Strictly require ASCII headers names and values
@fafhrd91 fafhrd91 merged commit 90f0589 into aio-libs:master Aug 30, 2014
@lock
Copy link

lock bot commented Oct 30, 2019

This thread has been automatically locked since there has not been
any recent activity after it was closed. Please open a new issue for
related bugs.

If you feel like there's important points made in this discussion,
please include those exceprts into that new issue.

@lock lock bot added the outdated label Oct 30, 2019
@lock lock bot locked as resolved and limited conversation to collaborators Oct 30, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants