Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New HttpClient API design #22505

Closed
galvesribeiro opened this issue Jun 26, 2017 · 42 comments
Closed

New HttpClient API design #22505

galvesribeiro opened this issue Jun 26, 2017 · 42 comments
Assignees
Labels
area-System.Net.Http design-discussion Ongoing discussion about design without consensus enhancement Product code improvement that does NOT require public API changes/additions needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration
Milestone

Comments

@galvesribeiro
Copy link
Member

Following the discussion here https://github.com/dotnet/designs/issues/9 there are several issues with current HttpClient implementation. The idea with this issue is to discuss the design of a proposed new implementation and API surface for HttpClient as part of the original issue.

Some points discussed on that issue:

  1. There are 2 different implementations of HttpClient. One on Windows (based on WinHttp) and other on *NIX world (based on libcurl)
  2. It does not release all resources right away as most of people expect by a class implementing IDisposable (the TCP socket still leaking for a while by default for 240s due to TIME_WAIT state which makes you run out of available socket connections under high load)
  3. There could be a internal pool similar to what exist in SqlConnection. However, it would probably hit worse issues with DNS cache, since it is managed by the OS networking stack and this implementation try to be 100% managed on top of the new Foundation layer proposed on the previous issue.
@stephentoub
Copy link
Member

stephentoub commented Jun 26, 2017

cc: @davidsh, @CIPop, @geoffkizer

@Drawaes
Copy link
Contributor

Drawaes commented Jun 26, 2017

My opinion for what it's worth is. Leave httpclient alone aside from the fully managed awesomeness that is already in progress (mostly solves 1.)

However there should be a second class that shares the guts (sockets, http parsing etc) with http client but is designed for out of the box service to service or high throughput.

Http client behaves like a client-side lob out of the box (connections per server at low numbers, holding sockets etc) and by trying to make a single class to rule them all I feel the API gets restricted.

FYI I think this is the same issue that SSLStream struggles with (clients have to have long term backwards compatibility, lower resource use etc)

@galvesribeiro
Copy link
Member Author

galvesribeiro commented Jun 26, 2017

Good idea... Something like HttpChannel, which essentially use external HTTP protocol implementation (to enable things like HTTP/2, WebSocket, WebRTC), and is 1 per socket. In other hand, HttpClient use it, and can keep the pool of channels and automagically manages it.

However, there still the DNS issue which is something out of managed world... How to deal with that?

@Drawaes
Copy link
Contributor

Drawaes commented Jun 26, 2017

That is exactly what I am thinking, because in a server environment it would be much better to be able to get lower and control the connections (this could lead to a "server focused" pool). The DNS problem, is tough it's also a security consideration, if you look at the motions Java has gone through there are issues like DNS poisoning etc that can cause reliability and DOS attacks.

@Tratcher
Copy link
Member

The proposals to create a pool of HttpClient instances are non-nonsensical, HttpClient already implements pooling internally, you'd get the safe effect with a single shared instance.

It does sound like there are DNS issues to address, but frequently closing connections (i.e. always using new instances of HttpClient) is over aggressive. We need a way to periodically refresh DNS and limit the lifetime of connections after we detect DNS changes for that host.

@galvesribeiro
Copy link
Member Author

galvesribeiro commented Jun 26, 2017

frequently closing connections (i.e. always using new instances of HttpClient) is over aggressive

Why? If you are in a server-to-server connection, like @Drawaes alluded, it would be interesting to give the service developer the ability to manage the connection lifecycle.

@NickCraver
Copy link
Member

We need a way to periodically refresh DNS and limit the lifetime of connections after we detect DNS changes for that host.

Almost like some sort of TTL... 😁

Seriously though, if we can cache internally the DNS records for their actual TTLs we'd get the expected behavior, at least from my point of view. I haven't reviewed the managed plans that deeply, is this even a possibility or are we only getting the result (and no TTL) in those instances?

@Drawaes
Copy link
Contributor

Drawaes commented Jun 27, 2017

@Tratcher I am not really seeing the downside to exposing the individual connections that make up a pool. I hadn't thought of it from that direction but instead providing a second pooling provider that is more "server" focused, but actually now I think about it exposing the individual connections allows for more scenarios to be covered by "power" users.

@davidfowl I can't find it right now, but this is not too dissimilar to an idea you were commenting on a while back about a "kestrel client" similar to other frameworks provide

@Tratcher
Copy link
Member

@galvesribeiro connection management is one of the hardest problems in the client stack, offloading that to the user isn't doing them many favors. server-to-server can use a somewhat simplified model, but even then you're not going to want to restrict it to a single connection bottleneck so you still need to track and assign connections. Doing this poorly can severely compromise the performance of your application.

There's also quite a bit of state that gets shared across connections in the client pool. E.g. cookies, cache, dns, auth, tls sessions, etc.. Managing that sharing would also get cumbersome.

@Drawaes
Copy link
Contributor

Drawaes commented Jun 27, 2017

Agreed on the state. But the problem for service 2 service often is the state. In a "microservice" environment you often have services calling others as driven from one incoming request.

I agree it is tough to write this by yourself so then I think a "shared primitive connection" with the socket and parsing code that is used in the httpclient and another "service" based http client would be ideal.

@Tratcher
Copy link
Member

using (var client = new HttpClient()) { ... } is one of the most frequent and impactful mistakes I see made. Developers have been drilled into doing this for all IDisposable's, but they fail to appreciate the complexity of the resources being managed by that client and the consequences for churning them. The docs have helped explain the issue after the fact, but the same mistakes continue to be made.

Proposal: Make a central static HttpMessageHandler and change HttpClient's default constructor to use it rather than newing one up. After all, it's the handler that owns the pool, not the HttpClient instance. This handler would only have settings and defaults appropriate to the shared scenario, and would no-op in its Dispose method. This static handler could internally be implemented using any of the existing handler stacks, or the new managed one.

Developers could customize all of the settings on their HttpClient instance without impacting other parts of the app (e.g. timeouts, default headers, base address). If you need to insert middleware then you wrap it around the static handler rather than a new instance. If you need to customize handler settings for a specific section of your app then you new up your own handler and client as you would today.

@Drawaes
Copy link
Contributor

Drawaes commented Jun 27, 2017

I agree with the above approach. I think maybe what might be useful here is to provide use cases. Then they can be assessed as to their commonality as well as the ability to modify the current API, or other solutions for them rather than talking in the abstract. So @Tratcher 's comment above is perfect in that respect.

I have the use case of a front end service (servicing a JS UI) that calls a number of backend services. This is a common pattern I have seen, but I am not saying this is how everyone does it. We would like to limit the outbound connections calling backend services, say to 500 connections, but there are 5 different services we call. With the current HttpClient we only have the option of making the connection per server limit 100, and not being able to service 500 calls if all the calls are to one of the services, or 500 and risk hitting 2500 connections.

@PureKrome
Copy link
Contributor

using (var client = new HttpClient()) { ... } is one of the most frequent and impactful mistakes I see made. Developers have been drilled into doing this for all IDisposable's, but they fail to appreciate the complexity of the resources being managed by that client and the consequences for churning them.

I think these two sentences are so smack bang on the mark. I don't have any suggested solutions to all of this but I can fully, 100% attest to what @Tratcher summed up.

When I read that (now famous) DotNet Monsters article, my 'WTF' could probably be heard across the entire country :( That said, I felt way better when every person I asked, they to didn't know about this and we all thought we were 'smart' cause we were putting this in a using statement and being nice-and-good little girls and boys.

I see commentary along the lines of "docs, more docs, better docs, etc" to help educate and facilitate this issue, but I just don't feel like that's ... gonna really help, etc.

So yeah -> totally 100% super-strongly agree with @Tratcher but I just don't have a suggestion to help make it better. 🤔 But this issue resonates really strongly with me and some other peers.

@galvesribeiro
Copy link
Member Author

We all know the side effectos od the using() on HttpClient, so, why keep it implementing IDisposable in the first place?

Also, statics... I've see terrible scenarios that were always driven by having a shared thing statically... We had an huge refactory on Orleans recently just to avoid a single static client class... I would rather avoid that... Other scenarios will eventually come, and the evolution of HttpClient would be easier if we use proper abstractions.

State sharing is always a pain, I agree. But making it static will only make the false sensation of safety.

I would stick with @Drawaes original idea and have a lower level type for server-to-server comms, and a more user friendly client type. Having both in the same type will always limit the possibilities.

@poke
Copy link
Contributor

poke commented Jun 27, 2017

We all know the side effects of the using() on HttpClient, so, why keep it implementing IDisposable in the first place?

Because IDisposable isn’t about an object supposedly being short-lived at all. It’s about whether an object has unmanaged resources that should be disposed. And if HttpClient has that (as it looks like), it makes perfect sense to implement IDisposable.

It’s just that IDisposable often implies that one should only keep the object open for a short time, although that’s not actually stated by the contract.

@galvesribeiro
Copy link
Member Author

@poke and that is the point... Short-lived or not, the user assume that they should call Dispose() once they are done with that object. using() is just a syntax-sugar to define scope and call it automatically.

Anyway more a philosophical comment. :)

@olviko
Copy link

olviko commented Jun 27, 2017

@stephentoub

That could of course be done. But that alone is an incomplete solution... how does someone release the resources in the pool explicitly? How does someone express isolation of pools? Etc.

SqlConnection has a limited (clumsy?) way to do that through connection string flags. ServicePoint is just plain sucks.

Just let us implement/instantiate DNS and TCP connection managers and then pass them into HTTP client or HTTP request through an overloaded ctor. Default to global singletons otherwise.

@olviko
Copy link

olviko commented Jun 27, 2017

using (var client = new HttpClient()) { ... } is one of the most frequent and impactful mistakes I see made. Developers have been drilled into doing this for all IDisposable's, but they fail to appreciate the complexity of the resources being managed by that client and the consequences for churning them. The docs have helped explain the issue after the fact, but the same mistakes continue to be made.

The problem with HttpClient is that it is doing way too many things internally instead of delegating them to other classes. It should not be responsible for managing DNS cache or TCP connection pools.

@darrelmiller
Copy link

@olviko HttpClient is an extremely lightweight class. It does some management of cancellation tokens, some header setting and buffer draining, but other than that it passes everything on to HttpClientHandler or WebRequestHandler. In turn HttpClientHandler allows ServicePointManager to do all the connection management and HttpWebRequest does the actual request sending.

The IDisposable thing is unfortunate but it's there to clean up cancellation tokens and request/response bodies that exist during the lifetime of sending a request or returning a response. When not in the process of making a request, the HttpClient doesn't hold onto any unmanaged resources.

@darrelmiller
Copy link

@Tratcher We tried to convince the HttpClient team years ago to make the HttpClientHandler somehow globally changeable but we were unsuccessful at the time. I think it would open up a whole range of useful scenarios.

@darrelmiller
Copy link

@davidfowl HttpClient was designed to long living for numerous reasons unrelated to connection management. HttpClient has a BaseAddress property, a DefaultRequestHeaders collection, and a pipeline of message handlers. Having to set these up on every request would be a waste of time and would push people to create factory classes, defeating the purpose of having HttpClient as a class.

Setting up a single HttpClient instance for each target API that you want to call is ideal because it allows setting up all the common behavior once and ensures that a connection group is managed just for that HttpClient.

The benefits of this design are more obvious when building desktop and mobile apps, but I do believe there is still some value to the model in a server environment.

@Drawaes
Copy link
Contributor

Drawaes commented Jul 7, 2017

But doesn't service point apply to every http handler? Eg max server connections is 1 property for every http handler, so in the server world, if I wanted to say "max connections to server for api x should be 5" and "max connections to server for api y should be 20" can I do that today? (It's possible I am missing something)

@mconnew
Copy link
Member

mconnew commented Jul 14, 2017

There is a scenario I ran into with an internal customer of WCF using an HTTP binding which would be impossible to provide a fix for if we had a global singleton HttpClient instance. The scenario was a service was running on multiple hosts behind a load balancer. One of the hosts ended up with a bad deployment of their service code and was returning 500 responses for every request. A mid-tier server had a pool of connections to the load balancer and one of those connections was established to the failed server. The failed server always returned a valid HTTP response so it wouldn't get dropped by the connection pooling as technically the response was a valid response at the protocol level. On top of that, as the response was very fast, that connection would get used a lot so most requests would fail. The solution was to provide a way to abandon the connection pool (this was using HttpWebRequest so I made a mechanism where you could instruct WCF to use a different ConnectionGroupName so effectively abandoning the old pool and starting a new one). The bad server had been detected and removed from the load balancer, but the load balancer doesn't close existing connections to a server removed from the pool. Switching to a new connection pool allowed moving away from the bad server. If there is a single instance of HttpClient / HttpClientHandler, then you severely restrict your ability to control your connections.

@Drawaes
Copy link
Contributor

Drawaes commented Jul 14, 2017 via email

@Drawaes
Copy link
Contributor

Drawaes commented Jul 14, 2017

I do wonder if there is some cross over between wanting an "http parser" and control the connections on the client side with the discussion over at
Serverside Primatives

Because at the end of the day, a low level exposed http parser and some connection primitives (sockets etc) seem to be the requirement on both sides.

@mconnew
Copy link
Member

mconnew commented Jul 17, 2017

@Drawaes, using HttpClient directly, the 500's from behind a load balancer are easy to deal with. Dispose the old HttpClient and instantiate a new one. There is an overload of the HttpClient ctor which takes an HttpClientHandler and a bool to say whether the HttpClientHandler should be disposed. With this, you can have a static instance of your HttpClientHandler and create and Dispose of HttpClient without upsetting your connection pool.

@Drawaes
Copy link
Contributor

Drawaes commented Jul 17, 2017

does that kill the single connection in the connection pool?

@mconnew
Copy link
Member

mconnew commented Jul 17, 2017

If you specify false on the constructor, it won't kill the connection pool in HttpClientHandler.

@mattjohnsonpint
Copy link
Contributor

mattjohnsonpint commented Aug 3, 2017

Another thing that would be worth addressing in any redesign, is that if one wants to authenticate with a client certificate, it must be done in the HttpClientHandler, and thus cannot be changed on a per-request basis.

Most of the time this is fine, as a client cert is intended to represent the client. However there are some cases where one may need to use different client certs for different requests - even to the same destination. Currently this requires at least one HttpClient instance per certificate.

To figure out which instance to use, one may need a Dictionary or ConcurrentDictionary to manage their instances. 😕

It would be easier if one could attach a certificate directly to a HttpRequestMessage.

@aL3891
Copy link

aL3891 commented Aug 11, 2017

Id love to see httpclient work on a model similar to the new transport/protocols model on the server side (https://github.com/aspnet/KestrelHttpServer/issues/1980)

The way I see it, Httpclient should only deal with things like remembering and passing along cookies, http parsing should be done by a layer below that, and the transport/connection management should be done by a layer below that

@bryanjhogan
Copy link

@darrelmiller

Setting up a single HttpClient instance for each target API that you want to call is ideal because it allows setting up all the common behavior once and ensures that a connection group is managed just for that HttpClient.

I have hit this problem a few times when developing Web API apps that called other web service.

When using Ninject and Castlewindsor it was possible to create multiple HttpClients at startup, configure them and pass the appropriate HttpClient into the right controllers.

But with the inbuilt DI in .NET Core it doesn't seem possible to do this.

I think this is a common use case.

@John0King
Copy link

Current HttpClient might bad for concurrent , it use lock before sending the request , and the reason is that it's instance behaviour .BaseAddress and .DefaultRequestHeader is not thread safe for static behaviour . I don't think an instance act like static is good .

and for the solution : use an static instance . the problem is there no way to share the instance across various package , ( imagine each package use one HttpClient , 200 packages will use 200 HttpClient and all of them not disposable and not for sure about the memory usage )

So , I think the new HttpClient should be static and provide a non-static disposable class over those static method for allowing us use .BaseAddress and .DefaultRequestHeader and do not need thread safe

@bklooste
Copy link

bklooste commented Dec 13, 2017

I don't think the lock is an issue these are extremely short lived. Static public Httpclient would be evil ( and you would need many more locks) ( a static internal tcp connection pool used by httpclient is a different matter) . I agree with @Drawaes above leave it and create a new high performance client that is harder to use with a construct which accepts a tcppool etc.

Worth noting quite a few people with Rust hate the new high performance tokio as its much harder to use.

@darrelmiller
Copy link

@bryanjhogan Why can't you inject in HttpClient instances in ASP.NET DI ?

@John0King I don't see why the OperationStarted flag that is used to prevent you changing BaseAddress and DefaultRequestHeaders after making a request is bad for concurrent access? It doesn't affect the request/response processing. Changing BaseAddress/DefaultRequestHeaders per request defeats the point of the properties. If you don't want to use them, you don't have to.

And I don't think anyone was suggesting that using HttpClient as a static instance is a good idea. Having a shared instance per API is possible without using statics.

@John0King
Copy link

@darrelmiller

Having a shared instance per API is possible without using statics.

how ? How u handle the cookie / Credential / default headers ?

and for concurrent performance , I'm not very sure, I only see the souce code and it using lock (I think without lock may be better for concurrent ) . At some time ,I miss the old HttpWebRequest, it don't need to share the instance so it also do not need to lock.

@darrelmiller
Copy link

@John0King I don't see a lock here https://github.com/dotnet/corefx/blob/master/src/System.Net.Http/src/System/Net/Http/HttpClient.cs Can you show me where you are seeing a lock in the source?

DefaultHeaders can be stored in the HttpClient instance and Cookies and Certificates can be added to the HttpClientHandler.

@John0King
Copy link

@darrelmiller 😆 I must remember it incorrectly, I can't find it.

@msftgits msftgits transferred this issue from dotnet/corefx Jan 31, 2020
@msftgits msftgits added this to the Future milestone Jan 31, 2020
@dgioulakis
Copy link

dgioulakis commented May 12, 2020

Please consider exposing the backing HttpMessageHandler through the HttpClient API. If only in a limited capacity - perhaps through a slimmed down facade or interface. I understand the handler's settings are mostly immutable by point-after-creation, but things like modifying the CookieContainer's items within the handler are not. With IHttpClientFactory and .AddHttpClient() now being touted as best practice for managing clients, there is no clear way to manage cookies except per-request via headers. Sometimes you may want to add default cookies - but don't have them at point of creation. e.g. maybe you cache them somewhere

This can obviously be done during the factory delegate, but it's really hard for me to imagine pulling all of this customized code into the composition root of Programs.cs. There are plenty of reasons to want to encapsulate this logic in the application component using the HttpClient

At least, perhaps just exposing the CookieContainer just solely a getter.

Copy link
Contributor

Due to lack of recent activity, this issue has been marked as a candidate for backlog cleanup. It will be closed if no further activity occurs within 14 more days. Any new comment (by anyone, not necessarily the author) will undo this process.

This process is part of our issue cleanup automation.

@dotnet-policy-service dotnet-policy-service bot added backlog-cleanup-candidate An inactive issue that has been marked for automated closure. no-recent-activity labels Nov 11, 2024
@CarnaViire
Copy link
Member

  • big part of the issue and discussion seems quite outdated since 2017 (e.g. "There are 2 different implementations of HttpClient. One on Windows (based on WinHttp) and other on *NIX world (based on libcurl)" is not true anymore for a long time)
  • some other parts are covered by other existing issues (e.g. DNS resolution, per-session cookies, LLHTTP etc)

This really is a cleanup candidate, but we need to verify nothing important is left out (and if needed, create a separate issue scoped to the specific ask, or expand one of the existing issues)

I'm assigning it to myself temporarily for the investigation/triage purposes

@dotnet-policy-service dotnet-policy-service bot removed no-recent-activity backlog-cleanup-candidate An inactive issue that has been marked for automated closure. labels Nov 11, 2024
@CarnaViire CarnaViire self-assigned this Nov 11, 2024
@CarnaViire CarnaViire added the needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration label Nov 11, 2024
@galvesribeiro
Copy link
Member Author

galvesribeiro commented Nov 11, 2024

@CarnaViire hey!

Yes, I agree. This is from 2017 when things were different. With most of the things now on the managed world, and with the introduction of HttpClientFactory, I think we have most of the requests on this issue implemented.

The only one that I still haven't seen workarounds is the DNS cache management. That may still bite us in some scenarios where intentionally we don't want to rely on caches.

@CarnaViire
Copy link
Member

Solved issues:

  • Unify implementation -> SocketsHttpHandler, and HttpClientHandler uses SocketsHttpHandler underneath (where supported)
  • Connection pooling / lifetime management / reacting to DNS changes -> HttpClientFactory or SocketsHttpHandler with PooledConnectionLifetime + docs
  • Use custom DNS resolution -> ConnectCallback in SocketsHttpHandler
  • Resilience strategies (circuit breaker, rate limiter, hedging, etc) -> Microsoft.Extensions.Http.Resilience (integration with Polly)

Tracked issues:


Triage: I believe the list covers all the topics discussed in the issue. Closing this "mega-issue" in favor of the issues linked above.

@CarnaViire CarnaViire closed this as not planned Won't fix, can't repro, duplicate, stale Nov 13, 2024
@github-actions github-actions bot locked and limited conversation to collaborators Dec 13, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Net.Http design-discussion Ongoing discussion about design without consensus enhancement Product code improvement that does NOT require public API changes/additions needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration
Projects
None yet
Development

No branches or pull requests