.Net: HttpClients are hold as captive dependencies / singletons #4400

gingters · 2023-12-22T10:33:51Z

This is a follow-up to #790

Describe the bug

Semantic Kernel captures instances of the HttpClient class in singletons.

See https://github.com/microsoft/semantic-kernel/blob/main/dotnet/src/Connectors/Connectors.OpenAI/OpenAIServiceCollectionExtensions.cs#L55-L59

This will become a massive problem when used in long-running services.
The reason is, that the HttpClient holds an internal instance of a HttpMessageHandler. The HttpMessageHandler is not designed to be used for a prolonged amount of time. In .NET in the HttpClientFactory instances of HttpMessageHandles are pooled and re-used for transient(!) usages of HttpClient instances - with a default maximum lifetime of only 2 minutes for the inner handler.

The reason is, that a TCP connection to a server under some circumstances be held open forever. Even if the HTTP transport is not working anymore.

If, for some reason, the DNS information for the configured endpoint changes, or the connection goes bad (i.e. terminates at a proxy and something changes behind the proxy), the HttpMessageHandler instance is not able to recover from that.
As such, the instance of the HttpClient that uses this inner handler is rendered unusable for the rest of its lifetime. As a singleton this is for the lifetime of the process.

There is no way to recover from such a problem other than restarting the process.

To Reproduce
Use a LoadBalancer in your network (i.e. Big5). "Hide" at least two Azure Open AI services (or the Open AI API) behind this loadbalancer. Make one of these instance active and the other inactive.

Set up any project with Semantic Kernel and an Open AI connector and point it to the loadbalancer/proxy instead of the direct endpoint.

While the project runs, switch the LB to use the configured second instance and switch the first one to inactive.
All existing connections to the LB will still be routed to the now inactive first instance and these TCP connections are now broken until a new TCP connection is established (which will be balanced to the second instance).

The SK project will not be able to recover from that and cannot execute calls to the LLM anymore.
You will have to stop and restart the process hosting the Kernel to recover from that.

Expected behavior
For each call, a new instance of an HttpClient should be retrieved from an IHttpClientFactory instead of capturing an HttpClient in a singleton.
In the scenario above, this will be able to automatically recover from such an error at max after 2 minutes when the old broken inner HttpMessageHandler instances are discarded and new Handlers are created.

Platform

OS: any
Language: C#
Source: one example: https://github.com/microsoft/semantic-kernel/blob/main/dotnet/src/Connectors/Connectors.OpenAI/OpenAIServiceCollectionExtensions.cs#L55-L59

The text was updated successfully, but these errors were encountered:

dmytrostruk · 2023-12-22T11:46:01Z

Hi @gingters !
Is it possible to reproduce an issue when you setup PooledConnectionLifetime property?
https://learn.microsoft.com/en-us/dotnet/fundamentals/networking/http/httpclient-guidelines#dns-behavior

It looks like it's still recommended to use singleton HttpClient instance with PooledConnectionLifetime property, which should solve both the port exhaustion and DNS changes problems:
https://learn.microsoft.com/en-us/dotnet/fundamentals/networking/http/httpclient-guidelines#recommended-use

Please let me know if I'm missing something. Thanks a lot!

gingters · 2023-12-22T12:51:03Z

I ran into that issue with the Chat Copilot app, which uses the default way of registration, in docker, when my notebook switched networks (from wired to wifi, and back). Seems from the point of view of the message handler the tcp connection still seems active (for whatever reason), but the outgoing requests didn't made it through the other network. Also not after 2 minutes. It always ended up in timeouts.

Just to confirm it wasn't an issue with my docker/container environment I shelled in the container and started a new instance of the api on another port, and that other instance could connect to the Open AI api without a problem.

When developing and debugging locally, that a) usually seldom happens and b) isn't a big deal there.

But I thought a connection hiccup when running SK in production (i.e. some changes to a docker network where the backend service runs) etc. and the connections not recovering at all might become a problem, so I reported this.

dmytrostruk · 2023-12-22T13:15:04Z

@gingters Thank you for reporting this issue!

Maybe it's worth to report similar issue in Chat Copilot repo to see, if something can be improved to handle such scenario, for example change a way how HttpClient is registered.

Tagging @stephentoub to double-check if there is something that can be improved for HttpClient usage on Semantic Kernel side.

Thanks again!

SergeyMenshykh · 2024-01-09T21:29:00Z

Hi @gingters, today we can't use IHttpClientFactory because the library SK uses to access both OpenAI and AzureOpenAI services does not support it. So, for now, SK consumer code has a few options:

Use a singleton HttpClient instance with the PooledConnectionLifetime set to the desired interval. This option should work in .NET Core and .NET 5+. For more details, see the Recommended Use section.

For .NET Framework, HTTP clients can be configured with a custom HttpMessageHandler that would use another handler internally with the required lifetime obtained through IHttpMessageHandlerFactory. See the example below:

sealed class FactoryHttpMessageHandler : HttpMessageHandler
{
    public static FactoryHttpMessageHandler Instance { get; } = new();

    private readonly IHttpMessageHandlerFactory _handlerFactory =
        new ServiceCollection()
        .ConfigureHttpClientDefaults(c => c.SetHandlerLifetime(TimeSpan.FromSeconds(4)))
        .BuildServiceProvider()
        .GetRequiredService<IHttpMessageHandlerFactory>();

    private HttpMessageHandler GetHandler() => _handlerFactory.CreateHandler("");

    protected override Task<HttpResponseMessage> SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
    {
        return new HttpMessageInvoker(GetHandler()).SendAsync(request, cancellationToken);
    }
}

Registering SK as Transient or Scoped instead of Singleton may work as well for certain scenarios, but it won't be as efficient as the options above due to unnecessary allocations.

These options should be sufficient for now until either the library supports IHttpClientFactory or .NET fixes HttpClient so that "it just works" regardless of how long you keep its instance. For more details, see the Consider updating HttpClientFactory... issue.

SergeyMenshykh · 2024-01-11T11:04:36Z

I’m closing the issue since there’s not much we can do at the moment except those options listed above.

shawncal added .NET Issue or Pull requests regarding .NET code triage labels Dec 22, 2023

github-actions bot changed the title ~~HttpClients are hold as captive dependencies / singletons~~ .Net: HttpClients are hold as captive dependencies / singletons Dec 22, 2023

markwallace-microsoft added vnext and removed triage labels Jan 2, 2024

markwallace-microsoft added this to Semantic Kernel Jan 2, 2024

markwallace-microsoft assigned SergeyMenshykh Jan 8, 2024

matthewbolanos moved this to Backlog in Semantic Kernel Jan 10, 2024

SergeyMenshykh closed this as completed Jan 11, 2024

github-project-automation bot moved this from Backlog to Sprint: Done in Semantic Kernel Jan 11, 2024

dmytrostruk mentioned this issue Apr 19, 2024

.Net: Replace HttpClient with IHttpClientFactory in OpenAIFunctionExecutionParameters #5927

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.Net: HttpClients are hold as captive dependencies / singletons #4400

.Net: HttpClients are hold as captive dependencies / singletons #4400

gingters commented Dec 22, 2023

dmytrostruk commented Dec 22, 2023

gingters commented Dec 22, 2023

dmytrostruk commented Dec 22, 2023

SergeyMenshykh commented Jan 9, 2024

SergeyMenshykh commented Jan 11, 2024

.Net: HttpClients are hold as captive dependencies / singletons #4400

.Net: HttpClients are hold as captive dependencies / singletons #4400

Comments

gingters commented Dec 22, 2023

dmytrostruk commented Dec 22, 2023

gingters commented Dec 22, 2023

dmytrostruk commented Dec 22, 2023

SergeyMenshykh commented Jan 9, 2024

SergeyMenshykh commented Jan 11, 2024