-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Preserve header casing #103
Preserve header casing #103
Conversation
Codecov Report
@@ Coverage Diff @@
## master #103 +/- ##
==========================================
+ Coverage 99.14% 99.15% +0.01%
==========================================
Files 21 21
Lines 937 950 +13
Branches 173 173
==========================================
+ Hits 929 942 +13
Misses 7 7
Partials 1 1
Continue to review full report at Codecov.
|
h11/_headers.py
Outdated
def __eq__(self, other): | ||
return list(self) == other | ||
|
||
def raw(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious about this. It seems it would return a list of the triplet where elsewhere raw
implies the name that came over the wire. This should be clearer about what it is returning
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also wonder about the wisdom of having a headers sequence without structured single-header data
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also wonder about the wisdom of having a headers sequence without structured single-header data
I'm not quite sure what you mean here. Are you saying "If you're returning a three-tuple from this interface then let's have it use a named-tuple" or something else?
Perhaps a marginally different interface for us to expose here would not be .raw() -> (<lowercase name>, <raw name>, <value>)
, but instead expose just .raw_items() -> (<raw name>, <value>)
Perhaps that'd address the naming/intent slightly better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking of having a class that encapsulates a Header
and then having the collection of Headers
use that. A namedtuple could work fine as well. All that said, I get that tuples may be a smidge faster and that these are internal implementation details. Speaking from having worked on header collection objects in urllib3 in the past, these tuples can drive maintainers to pull out their hair (as well as future folks trying to update/extend the behaviour).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, for something that remains compatible with the existing API I think the options here are...
- A custom
Headers
sequence, that exposes the extra information in a.raw_items()
interface or similar, that returns a two-tuple of(case-sensitive-name, value)
for usages that require the raw casing info. - A custom
Headers
sequence, the returnsHeader
instances, that can iterate as two-tuples, but also expose.name
,.case_sensitive_name
and.value
attributes, which are available for usages that require the raw casing info.
Or some variation on those. (Eg. this PR which currently has .raw()
returning the three-tuple of info.)
Personally I'm fairly agnostic, as both the above options seem reasonable enough. The Header
case has the most extra overhead, since it creates and accesses a per-header instance rather than the plain tuple, while I wouldn't expect the .raw_items()
approach to introduce anything really noticeable, but I could run through some timings on each of the options to help better inform our options.
Just as a comment or maybe for additional inspiration: |
Okay, based on the feedback here's a slight retake on this, which I think exposes a neater interface and has a pretty minimal change footprint. After this pull request...
Internally...
|
What do you get from |
Ah thanks I'd just been looking into that, but I'd not seen the benchmark. Just as a rough guideline, I've taken the following to get ideas of comparative performances of the different approaches here... import h11
import timeit
def send_request():
conn = h11.Connection(our_role=h11.CLIENT)
headers = [
(b'Accept', b'*/*'),
(b'Accept-Encoding', b'gzip, deflate'),
(b'Connection', b'keep-alive'),
(b'Host', b'www.example.org'),
(b'User-Agent', b'HTTPie/2.2.0'),
]
request = h11.Request(method="GET", target="/", headers=headers)
bytes_to_send = conn.send(request)
print(timeit.timeit(send_request, number=100000)) Which comes out with...
I'll take a look at the proper benchmarks now... |
Existing...
|
The benchmark I cited isn't terribly clever, but it does exercise a full request/response cycle with some realistic headers. |
Closing this off in favour of #104 |
A proposal for preserving header casing information in both directions, while still exposing lower-case only to users by default, and keeping careful type separation.
Essentially a take on @Lukasa's comment here #31 (comment) and @njsmith's follow up...
Here's how it looks to the end-user...
h11
now preserves header casing on sending headers..headers
becomes aSequence
type, rather than a raw list. Iterating over the sequence continues to return(<lowercased-name>, <value>)
pairs..headers.raw()
is available for usages such as console or debug output that require original casing information, and returns a list of(<lowercased-name>, <raw-name>, <value>)
three-tuples.I've addressed this commit-by-commit, which should help make the approach I've taken here clear...
Sequence
type, rather than a raw mutable list, but continues to store exactly the same information.Headers
type, but don't use it anywhere.get_comma_header
,set_comma_header
. Both functions continue to be case insensitive in their effects, but it will matter in theset_comma_header
case because it'll give us nice header casing on the over-the-wire bytes. Switching both over within the codebase for consistency.Strictly speaking there is an API change here, in that
.headers
on events are now sequences, rather than plain lists. Any user code that is doing grungy stuff by mutating that data-structure in-place, wouldn't function after this. But that's a bit of a hacky broken thing to be doing anyway, so a version bump that tightened up the API spec into ".headers
is an immutable sequence" seems reasonable enough right?Anyway's putting this out there, so we've got something to discuss. 🤔
Thanks so much for maintaining such a fantastically careful & thoroughly designed library. It's a joy to work with. ✨