Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ISO-8859-1 encoded http headers #1102

Merged
merged 1 commit into from
Aug 31, 2015
Merged

Conversation

ephes
Copy link
Contributor

@ephes ephes commented Aug 22, 2015

Hi,

gunicorn uses utf8 encoding for http response headers. I don't know
much about http standards, but this is probably not correct:

http://stackoverflow.com/questions/4400678/http-header-should-use-what-character-encoding

best regards,
Jochen

@benoitc
Copy link
Owner

benoitc commented Aug 22, 2015

The tests don't pass. unitests.mock don't pass. Can you fix it?

Also reading the new RFC 7230:

Historically, HTTP has allowed field content with text in the
ISO-8859-1 charset [ISO-8859-1], supporting other charsets only
through use of [RFC2047] encoding. In practice, most HTTP header
field values use only a subset of the US-ASCII charset [USASCII].
Newly defined header fields SHOULD limit their field values to
US-ASCII octets. A recipient SHOULD treat other octets in field
content (obs-text) as opaque data.

Not sure how to handle opaque data there. Also including such change we should also test if it works with some value given as unicode (which happen sometime in some countries...). Maybe we should have more tests there. Ideally we shouldn't transform anything there.

Note: we are converting here due to the silly way bytes, native strings have been managed between py2 and py3.

@jamadden
Copy link
Collaborator

Also including such change we should also test if it works with some value given as unicode (which happen sometime in some countries...).

Of course, according to the WSGI spec, that's not supposed to happen in Python 2. Headers are specified to be given as the "native string type", so they should already be bytes and applications that send unicode values are in non-compliance with the spec (I've seen middleware break due to a buggy application that had a unicode header value). Likewise under Python 3 (where the native string type is unicode) including non-latin-1-encodable data is also out of compliance with the spec, the HTTP spec this time, as well as the WSGI spec:

Do not be confused however: even if Python's str type is actually Unicode "under the hood", the content of native strings must still be translatable to bytes via the Latin-1 encoding!

So either case will enter implementation-defined behaviour and not be interoperable.

@tilgovi
Copy link
Collaborator

tilgovi commented Aug 24, 2015

👍 to this change

return value
if not isinstance(value, text_type):
raise TypeError('%r is not a string' % value)
return value.encode("latin1")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

latin1 -> latin-1. latin1 is an alias of latin-1.

@berkerpeksag
Copy link
Collaborator

Good catch, thanks! Could you please squash the commits?

@benoitc benoitc modified the milestone: R19.4 Aug 25, 2015
@ephes
Copy link
Contributor Author

ephes commented Aug 29, 2015

Ok, squashed the commits :).

@@ -57,7 +57,7 @@ Commonly Used Arguments
Check the :ref:`faq` for ideas on tuning this parameter.
* ``-k WORKERCLASS, --worker-class=WORKERCLASS`` - The type of worker process
to run. You'll definitely want to read the production page for the
implications of this parameter. You can set this to ``$(NAME)``
implications of this parameter. You can set this to ``egg:gunicorn#$(NAME)``
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change shouldn't be here :) See the original commit: 8de5eb9

In general, the patch LGTM except this, but I can take care of it if you don't have time.

Thanks!

@ephes
Copy link
Contributor Author

ephes commented Aug 29, 2015

Yup, this line was a leftover from an unintentional merge :/. Thanks for pointing it out - it's now removed.

berkerpeksag added a commit that referenced this pull request Aug 31, 2015
ISO-8859-1 encoded http headers
@berkerpeksag berkerpeksag merged commit 9c1d442 into benoitc:master Aug 31, 2015
@berkerpeksag
Copy link
Collaborator

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants