Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Makes request headers lowercase in English locale. #29419

Closed
wants to merge 2 commits into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -434,10 +434,10 @@ private void setHeaders(HttpRequest httpRequest, Header[] requestHeaders) {
for (Header requestHeader : requestHeaders) {
Objects.requireNonNull(requestHeader, "request header must not be null");
httpRequest.addHeader(requestHeader);
requestNames.add(requestHeader.getName());
requestNames.add(requestHeader.getName().toLowerCase(Locale.ENGLISH));
}
for (Header defaultHeader : defaultHeaders) {
if (requestNames.contains(defaultHeader.getName()) == false) {
if (requestNames.contains(defaultHeader.getName().toLowerCase(Locale.ENGLISH)) == false) {
httpRequest.addHeader(defaultHeader);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adityasrini
Rather than doing the two String.toLowerCase(Locale.ENGLISH) which requires 2 changes, you should replace the new HashMap() with a new TreeMap(String.CASE_INSENSITIVE_ORDER).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't necessarily agree on this. We don't need a sorted map here, we just need to make sure that keys are case insensitive, which is one of the properties of treemap when used in this way, but we don't need all of its other properties which also affect that data structure that's internally used to store entries. Makes sense?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@javanna

Sorry my suggestion should have been to replace the new HashSet() with new TreeSet(String.CASE_INSENSITIVE_ORDER) at line 433 , and remove the 2x toLowerCase(Locale.ENGLISH) additions, obviously mentioning TreeMap was nonsense.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't necessarily agree with this either, we would be introducing a sorted data structure where we don't need ordering, but only case-insensitive lookups. Not sure it is a good trade-off.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@javanna

we would be introducing a sorted data structure where we don't need ordering, but only case-insensitive lookups.

The internal structure of treeset doesnt matter, any more than the internals of hashset matter.

Hashsets also sort/arrange their keys, it might not be alphabetically, but the entire buckets thing also has its own system sorting system when it allocates keys into a chain of buckets, but who cares.

All that matters is the *set allows us to determine if some key already exists in it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe in this case it won't make a huge difference, but HashSet is backed by a HashMap while TreeSet is back by a TreeMap. TreeMap is implemented using a red-black tree, which is very different compared to HashMap, the reason being that the former will have to re-balance the tree to make it possible to iterate through entries in their natural ordering. We are using a set here though just to quickly check if it already contains something, and we never iterate through it. Conceptually, I still like more the two calls to lowercase than using a red-black tree just because treeset allows to make strings case-insensitive. Hopefully I explained what I meant. It would be interesting to measure which of the two solutions is faster, not sure it's worth it in this case given that this is probably not a hotspot.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TreeMap is implemented using a red-black tree, which is very different compared to HashMap,

Today TreeMap is a red black tree, tomorrow it might change, it doesnt matter, all we care about is that it keeps the contract that allows case insensitive look ups of keys. We can be sure they are close enough to the same speed, the jdk collection guys know this important they arent going to make one 10x slower than the other.

A HashMap turns into a tree when it gets really full anyway, which basicaly means the jdk guys dont care why should you ?

https://stackoverflow.com/questions/30164087/how-does-java-8s-hashmap-degenerate-to-balanced-trees-when-many-keys-have-the-s

The implementation notes comment in HashMap is a better description of HashMap's operation than I could write myself. The relevant parts for understanding the tree nodes and their ordering are:

This map usually acts as a binned (bucketed) hash table, but when bins get too large, they are transformed into bins of TreeNodes, each structured similarly to those in java.util.TreeMap. [...] Bins of TreeNodes may be traversed and used like any others, but additionally support faster lookup when overpopulated. [...]

which is very different compared to HashMap, the reason being that the former will have to re-balance the tree to make it possible to iterate through entries in their natural ordering

And a hashmap also has to rebalance when it hits some threshold. To solve that we simply call the ctor that takes initial size so in both case the rebalance never happens.

We are using a set here though just to quickly check if it already contains something, and we never iterate through it.

Thats right so why talk about something that never happens.

It would be interesting to measure which of the two solutions is faster, not sure it's worth it in this case given that this is probably not a hotspot

Except that enough of this is, because HM and TM are used everywhere and will be compiled by hotspot. If one is compiled so will be the other and vice versa.

Why ? This path takes 100s/100s of cycles vs billions for a complete request, it doesnt matter. Even if one is 2x or 10x sloewr the total request time will be basically the same.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to keeping a hash set. Moving to a tree set would make operations perform in O(log(size)) rather than constant time. Even if this set would usually be small so that it wouldn't matter, we have seen in the past that users are sometimes very creative when it comes to pushing the system to its boundaries.

}
Expand Down