-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
.Net: HttpClients are hold as captive dependencies / singletons #4400
Comments
Hi @gingters ! It looks like it's still recommended to use singleton Please let me know if I'm missing something. Thanks a lot! |
I ran into that issue with the Chat Copilot app, which uses the default way of registration, in docker, when my notebook switched networks (from wired to wifi, and back). Seems from the point of view of the message handler the tcp connection still seems active (for whatever reason), but the outgoing requests didn't made it through the other network. Also not after 2 minutes. It always ended up in timeouts. Just to confirm it wasn't an issue with my docker/container environment I shelled in the container and started a new instance of the api on another port, and that other instance could connect to the Open AI api without a problem. When developing and debugging locally, that a) usually seldom happens and b) isn't a big deal there. But I thought a connection hiccup when running SK in production (i.e. some changes to a docker network where the backend service runs) etc. and the connections not recovering at all might become a problem, so I reported this. |
@gingters Thank you for reporting this issue! Maybe it's worth to report similar issue in Chat Copilot repo to see, if something can be improved to handle such scenario, for example change a way how Tagging @stephentoub to double-check if there is something that can be improved for Thanks again! |
Hi @gingters, today we can't use IHttpClientFactory because the library SK uses to access both OpenAI and AzureOpenAI services does not support it. So, for now, SK consumer code has a few options:
These options should be sufficient for now until either the library supports IHttpClientFactory or .NET fixes HttpClient so that "it just works" regardless of how long you keep its instance. For more details, see the Consider updating HttpClientFactory... issue. |
I’m closing the issue since there’s not much we can do at the moment except those options listed above. |
This is a follow-up to #790
Describe the bug
Semantic Kernel captures instances of the
HttpClient
class in singletons.See https://github.com/microsoft/semantic-kernel/blob/main/dotnet/src/Connectors/Connectors.OpenAI/OpenAIServiceCollectionExtensions.cs#L55-L59
This will become a massive problem when used in long-running services.
The reason is, that the
HttpClient
holds an internal instance of aHttpMessageHandler
. TheHttpMessageHandler
is not designed to be used for a prolonged amount of time. In .NET in theHttpClientFactory
instances ofHttpMessageHandles
are pooled and re-used for transient(!) usages ofHttpClient
instances - with a default maximum lifetime of only 2 minutes for the inner handler.The reason is, that a TCP connection to a server under some circumstances be held open forever. Even if the HTTP transport is not working anymore.
If, for some reason, the DNS information for the configured endpoint changes, or the connection goes bad (i.e. terminates at a proxy and something changes behind the proxy), the
HttpMessageHandler
instance is not able to recover from that.As such, the instance of the
HttpClient
that uses this inner handler is rendered unusable for the rest of its lifetime. As a singleton this is for the lifetime of the process.There is no way to recover from such a problem other than restarting the process.
To Reproduce
Use a LoadBalancer in your network (i.e. Big5). "Hide" at least two Azure Open AI services (or the Open AI API) behind this loadbalancer. Make one of these instance active and the other inactive.
Set up any project with Semantic Kernel and an Open AI connector and point it to the loadbalancer/proxy instead of the direct endpoint.
While the project runs, switch the LB to use the configured second instance and switch the first one to inactive.
All existing connections to the LB will still be routed to the now inactive first instance and these TCP connections are now broken until a new TCP connection is established (which will be balanced to the second instance).
The SK project will not be able to recover from that and cannot execute calls to the LLM anymore.
You will have to stop and restart the process hosting the Kernel to recover from that.
Expected behavior
For each call, a new instance of an
HttpClient
should be retrieved from anIHttpClientFactory
instead of capturing anHttpClient
in a singleton.In the scenario above, this will be able to automatically recover from such an error at max after 2 minutes when the old broken inner
HttpMessageHandler
instances are discarded and new Handlers are created.Platform
The text was updated successfully, but these errors were encountered: