ISO-8859-1 encoded http headers #1102

ephes · 2015-08-22T15:50:54Z

Hi,

gunicorn uses utf8 encoding for http response headers. I don't know
much about http standards, but this is probably not correct:

http://stackoverflow.com/questions/4400678/http-header-should-use-what-character-encoding

best regards,
Jochen

benoitc · 2015-08-22T17:33:44Z

The tests don't pass. unitests.mock don't pass. Can you fix it?

Also reading the new RFC 7230:

Historically, HTTP has allowed field content with text in the
ISO-8859-1 charset [ISO-8859-1], supporting other charsets only
through use of [RFC2047] encoding. In practice, most HTTP header
field values use only a subset of the US-ASCII charset [USASCII].
Newly defined header fields SHOULD limit their field values to
US-ASCII octets. A recipient SHOULD treat other octets in field
content (obs-text) as opaque data.

Not sure how to handle opaque data there. Also including such change we should also test if it works with some value given as unicode (which happen sometime in some countries...). Maybe we should have more tests there. Ideally we shouldn't transform anything there.

Note: we are converting here due to the silly way bytes, native strings have been managed between py2 and py3.

jamadden · 2015-08-22T18:04:36Z

Also including such change we should also test if it works with some value given as unicode (which happen sometime in some countries...).

Of course, according to the WSGI spec, that's not supposed to happen in Python 2. Headers are specified to be given as the "native string type", so they should already be bytes and applications that send unicode values are in non-compliance with the spec (I've seen middleware break due to a buggy application that had a unicode header value). Likewise under Python 3 (where the native string type is unicode) including non-latin-1-encodable data is also out of compliance with the spec, the HTTP spec this time, as well as the WSGI spec:

Do not be confused however: even if Python's str type is actually Unicode "under the hood", the content of native strings must still be translatable to bytes via the Latin-1 encoding!

So either case will enter implementation-defined behaviour and not be interoperable.

tilgovi · 2015-08-24T19:05:32Z

👍 to this change

berkerpeksag · 2015-08-25T00:07:43Z

gunicorn/util.py

+        return value
+    if not isinstance(value, text_type):
+        raise TypeError('%r is not a string' % value)
+    return value.encode("latin1")


latin1 -> latin-1. latin1 is an alias of latin-1.

berkerpeksag · 2015-08-25T00:11:15Z

Good catch, thanks! Could you please squash the commits?

ephes · 2015-08-29T08:29:31Z

Ok, squashed the commits :).

berkerpeksag · 2015-08-29T08:40:23Z

docs/source/run.rst

@@ -57,7 +57,7 @@ Commonly Used Arguments
  Check the :ref:`faq` for ideas on tuning this parameter.
 * ``-k WORKERCLASS, --worker-class=WORKERCLASS`` - The type of worker process
  to run. You'll definitely want to read the production page for the
-  implications of this parameter. You can set this to ``$(NAME)``
+  implications of this parameter. You can set this to ``egg:gunicorn#$(NAME)``


This change shouldn't be here :) See the original commit: 8de5eb9

In general, the patch LGTM except this, but I can take care of it if you don't have time.

Thanks!

ephes · 2015-08-29T09:21:45Z

Yup, this line was a leftover from an unintentional merge :/. Thanks for pointing it out - it's now removed.

ISO-8859-1 encoded http headers

berkerpeksag · 2015-08-31T03:55:11Z

Thanks!

ISO-8859-1 encoded http headers

berkerpeksag reviewed Aug 25, 2015
View reviewed changes

benoitc modified the milestone: R19.4 Aug 25, 2015

ephes force-pushed the latin1_headers branch from 98346bf to 71eeb75 Compare August 29, 2015 08:26

berkerpeksag reviewed Aug 29, 2015
View reviewed changes

encode http headers as latin1 RFC 2616

338721a

ephes force-pushed the latin1_headers branch from 71eeb75 to 338721a Compare August 29, 2015 09:17

berkerpeksag added a commit that referenced this pull request Aug 31, 2015

Merge pull request #1102 from ephes/latin1_headers

9c1d442

ISO-8859-1 encoded http headers

berkerpeksag merged commit 9c1d442 into benoitc:master Aug 31, 2015

jamadden mentioned this pull request Sep 4, 2015

UnicodeEncodeError: 'latin-1' codec can't encode character u'\u010d' in position 19: ordinal not in range(256) gevent/gevent#614

Closed

benoitc mentioned this pull request Nov 23, 2015

UnicodeEncodeError on python3 #1151

Closed

jamadden mentioned this pull request Sep 21, 2016

wsgi.py > send_headers: encoding problem. #1353

Closed

mjjbell pushed a commit to mjjbell/gunicorn that referenced this pull request Mar 16, 2018

Merge pull request benoitc#1102 from ephes/latin1_headers

4557266

ISO-8859-1 encoded http headers

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ISO-8859-1 encoded http headers #1102

ISO-8859-1 encoded http headers #1102

ephes commented Aug 22, 2015

benoitc commented Aug 22, 2015

jamadden commented Aug 22, 2015

tilgovi commented Aug 24, 2015

berkerpeksag Aug 25, 2015

berkerpeksag commented Aug 25, 2015

ephes commented Aug 29, 2015

berkerpeksag Aug 29, 2015

ephes commented Aug 29, 2015

berkerpeksag commented Aug 31, 2015

ISO-8859-1 encoded http headers #1102

ISO-8859-1 encoded http headers #1102

Conversation

ephes commented Aug 22, 2015

benoitc commented Aug 22, 2015

jamadden commented Aug 22, 2015

tilgovi commented Aug 24, 2015

berkerpeksag Aug 25, 2015

Choose a reason for hiding this comment

berkerpeksag commented Aug 25, 2015

ephes commented Aug 29, 2015

berkerpeksag Aug 29, 2015

Choose a reason for hiding this comment

ephes commented Aug 29, 2015

berkerpeksag commented Aug 31, 2015