HttpClientFactory holds onto exceptions indefinitely, despite a two-minute timer #80605

amittleider · 2023-01-13T09:28:40Z

Description

Summary

I have opened a PR here: #80604 to demonstrate this issue. The PR is not meant to be merged, it's merely a demonstration of the issue that I'm reporting. The test code runs and demonstrates the issue well. The production code is incomplete, but shows a potential path to fixing the issue.

The problem is that DefaultHttpClientFactory doesn't handle the case where an factory throws an exception well.

Typically, an HttpClientHandler will be recycled after two minutes (https://github.com/dotnet/runtime/blob/main/src/libraries/Microsoft.Extensions.Http/src/DependencyInjection/HttpClientBuilderExtensions.cs#L499), but, in the case where the building of the handler throws an exception, the exception will be cached indefinitely.

In the definition of a Lazy object:

Exception caching When you use factory methods, exceptions are cached. That is, if the factory method throws an exception the first time a thread tries to access the [Value](https://learn.microsoft.com/en-us/dotnet/api/system.lazy-1.value?view=net-7.0) property of the [Lazy<T>](https://learn.microsoft.com/en-us/dotnet/api/system.lazy-1?view=net-7.0) object, the same exception is thrown on every subsequent attempt.

This Lazy object is accessed in the CreateHandler method https://github.com/dotnet/runtime/blob/main/src/libraries/Microsoft.Extensions.Http/src/DefaultHttpClientFactory.cs#L118 , and immediately after it is accessed, the code will start a timer with the StartHandlerEntry timer. So, if line 118 throws an exception, a timer will never be set, and there will never be an attempt to re-initialize the object and the handler will be indefinitely in a bad state and irrecoverable.

Example scenario

As an example of how this might occur, consider a factory which takes runtime configuration as input. The runtime configuration changes to a bad state, causing an exception in the handler. The runtime configuration is then updated to a good state, but the factory will never be called again, so the application must be restarted to clear the Lazy object from memory.

Proposed solutions

One solution is to ensure that the HandlerTimer is set before the factory is called. This will ensure that even if exceptions occur, we will always try to re-initialize it after the two-minute default. In the current implementation, the timer depends on the ActiveHandlerTrackingEntry, so this would require a bit of a re-write in the way the timers are handled. (See the code in the PR https://github.com/dotnet/runtime/compare/main...amittleider:runtime:HttpClientFactory_ExceptionsPersist?expand=1#diff-940dba6ffc5ae548fca7105241869c1a78442c04b968f129ce645407022e83cbR118 . This code doesn't compile, but it helps to illustrate the proposal).
Another solution would be to make a new type of Lazy object that has a timer pre-built in. Like this, there is no need to handle timers at all within the DefaultHttpClientFactory.
Last solution may be to modify the thread safety mode to use LazyThreadSafetyMode.PublicationOnly, which will not cache exceptions.

Labels

@dotnet/area-extensions-dependencyinjection
@dotnet/ncl

Regression?

No response

Known Workarounds

Never throw exceptions in HttpClientHandlerFactories.

Configuration

No response

Other information

No response

The text was updated successfully, but these errors were encountered:

dotnet-issue-labeler · 2023-01-13T09:28:43Z

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

amittleider · 2023-01-13T09:32:16Z

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

@dotnet/area-extensions-dependencyinjection
@dotnet/ncl

ghost · 2023-01-13T12:03:53Z

Tagging subscribers to this area: @dotnet/ncl
See info in area-owners.md if you want to be subscribed.

Issue Details

Description

Summary

I have opened a PR here: #80604 to demonstrate this issue. The PR is not meant to be merged, it's merely a demonstration of the issue that I'm reporting. The test code runs and demonstrates the issue well. The production code is incomplete, but shows a potential path to fixing the issue.

The problem is that DefaultHttpClientFactory doesn't handle the case where an factory throws an exception well.

Typically, an HttpClientHandler will be recycled after two minutes (https://github.com/dotnet/runtime/blob/main/src/libraries/Microsoft.Extensions.Http/src/DependencyInjection/HttpClientBuilderExtensions.cs#L499), but, in the case where the building of the handler throws an exception, the exception will be cached indefinitely.

In the definition of a Lazy object:

Exception caching When you use factory methods, exceptions are cached. That is, if the factory method throws an exception the first time a thread tries to access the [Value](https://learn.microsoft.com/en-us/dotnet/api/system.lazy-1.value?view=net-7.0) property of the [Lazy<T>](https://learn.microsoft.com/en-us/dotnet/api/system.lazy-1?view=net-7.0) object, the same exception is thrown on every subsequent attempt.

This Lazy object is accessed in the CreateHandler method https://github.com/dotnet/runtime/blob/main/src/libraries/Microsoft.Extensions.Http/src/DefaultHttpClientFactory.cs#L118 , and immediately after it is accessed, the code will start a timer with the StartHandlerEntry timer. So, if line 118 throws an exception, a timer will never be set, and there will never be an attempt to re-initialize the object and the handler will be indefinitely in a bad state and irrecoverable.

Example scenario

As an example of how this might occur, consider a factory which takes runtime configuration as input. The runtime configuration changes to a bad state, causing an exception in the handler. The runtime configuration is then updated to a good state, but the factory will never be called again, so the application must be restarted to clear the Lazy object from memory.

Proposed solutions

One solution is to ensure that the HandlerTimer is set before the factory is called. This will ensure that even if exceptions occur, we will always try to re-initialize it after the two-minute default. In the current implementation, the timer depends on the ActiveHandlerTrackingEntry, so this would require a bit of a re-write in the way the timers are handled. (See the code in the PR https://github.com/dotnet/runtime/compare/main...amittleider:runtime:HttpClientFactory_ExceptionsPersist?expand=1#diff-940dba6ffc5ae548fca7105241869c1a78442c04b968f129ce645407022e83cbR118 . This code doesn't compile, but it helps to illustrate the proposal).
Another solution would be to make a new type of Lazy object that has a timer pre-built in. Like this, there is no need to handle timers at all within the DefaultHttpClientFactory.

Labels

@dotnet/area-extensions-dependencyinjection
@dotnet/ncl

Reproduction Steps

See the test code in the PR: https://github.com/dotnet/runtime/pull/80604/files#diff-7ee446a98cb0ad2039642e909ac0732b37e7f764b534651cd88a3ac910c5b382R20

Expected behavior

See the test code in the PR: https://github.com/dotnet/runtime/pull/80604/files#diff-7ee446a98cb0ad2039642e909ac0732b37e7f764b534651cd88a3ac910c5b382R45

Actual behavior

See the test code in the PR: https://github.com/dotnet/runtime/pull/80604/files#diff-7ee446a98cb0ad2039642e909ac0732b37e7f764b534651cd88a3ac910c5b382R53

Regression?

No response

Known Workarounds

Never throw exceptions in HttpClientHandlerFactories.

Configuration

No response

Other information

No response

Author:	amittleider
Assignees:	-
Labels:	`untriaged`, `area-Extensions-HttpClientFactory`
Milestone:	-

CarnaViire · 2023-01-18T21:04:10Z

Thanks for the report and for detailed analysis @amittleider!

It seems to me that it was a design decision to expect configuration callbacks to be no-throw. It is another question on whether it was a good design decision, so we can open that conversation. In my opinion, it is an acceptable behavior. When an exception happens, it is delivered to the user. The factory doesn't go into an "undefined" state after an exception, when you don't know whether you would get an exception or not, it goes into a "failed" state. And there is a workaround to always catch the exceptions inside the user callback. I will not argue that it could be better, though. For example, the callback might be called again on next attempt to create a client, which would solve the problem that you had to restart an application to reset the "failed" state. But it seems to be the first time we had this reported -- it seems more of a corner-case scenario and nice-to-have. So I would put it to Future for now.

On the topic of your proposed fix, the Timer bit is an implementation detail, so I believe it is not related to the issue at hand and should not be part of the behavior design. It can change any time e.g. when we would switch to SocketsHttpHandler in #35987, and the primary handler would be only created once, while PooledConnectionLifetime would be used instead of timers. Also, if you are interested, it was a conscious decision to start the timer after the handler is created, to avoid race conditions if case of very short timers, see this comment.

amittleider · 2023-01-24T12:34:18Z

Thanks for the report and for detailed analysis @amittleider!

It seems to me that it was a design decision to expect configuration callbacks to be no-throw. It is another question on whether it was a good design decision, so we can open that conversation. In my opinion, it is an acceptable behavior. When an exception happens, it is delivered to the user. The factory doesn't go into an "undefined" state after an exception, when you don't know whether you would get an exception or not, it goes into a "failed" state. And there is a workaround to always catch the exceptions inside the user callback. I will not argue that it could be better, though. For example, the callback might be called again on next attempt to create a client, which would solve the problem that you had to restart an application to reset the "failed" state. But it seems to be the first time we had this reported -- it seems more of a corner-case scenario and nice-to-have. So I would put it to Future for now.

On the topic of your proposed fix, the Timer bit is an implementation detail, so I believe it is not related to the issue at hand and should not be part of the behavior design. It can change any time e.g. when we would switch to SocketsHttpHandler in #35987, and the primary handler would be only created once, while PooledConnectionLifetime would be used instead of timers. Also, if you are interested, it was a conscious decision to start the timer after the handler is created, to avoid race conditions if case of very short timers, see this comment.

Thanks for taking a look, @CarnaViire . Thanks for pointing out the race condition. I agree that the first suggestion would not be the right solution.

What do you think about the third suggestion? We could use a different method of lazy initialization, one that will not hold onto exceptions if it is failed to initialize (LazyThreadSafetyMode.PublicationOnly). In this case, it's potentially a one-line fix here: https://github.com/dotnet/runtime/blob/main/src/libraries/Microsoft.Extensions.Http/src/DefaultHttpClientFactory.cs#L88 .

CarnaViire · 2023-01-31T20:33:18Z

In case of LazyThreadSafetyMode.PublicationOnly, the initialization method can be executed concurrently multiple times per name, which is not acceptable perf-wise, as it would lead to unnecessary allocations (several message handlers being created instead of one). Handler configuration code, which is called within the initialization, can also rely on it being executed in a thread-safe manner, which would break.

If we were to modify the behavior here, we would probably need to let go of Lazy and implement synchronization manually by some other means.

ghost added the untriaged New issue has not been triaged by the area owner label Jan 13, 2023

amittleider mentioned this issue Jan 13, 2023

HttpClientFactory should not hold onto exceptions indefinitely #80604

Closed

CarnaViire added the area-Extensions-HttpClientFactory label Jan 13, 2023

CarnaViire added this to the Future milestone Jan 18, 2023

ghost removed the untriaged New issue has not been triaged by the area owner label Jan 18, 2023

CarnaViire mentioned this issue Sep 11, 2024

HttpClient can never recover from exception in custom IHttpMessageHandlerBuilderFilter #107689

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HttpClientFactory holds onto exceptions indefinitely, despite a two-minute timer #80605

HttpClientFactory holds onto exceptions indefinitely, despite a two-minute timer #80605

amittleider commented Jan 13, 2023 •

edited

Loading

dotnet-issue-labeler bot commented Jan 13, 2023

amittleider commented Jan 13, 2023

ghost commented Jan 13, 2023

Description

Summary

Example scenario

Proposed solutions

Labels

Reproduction Steps

Expected behavior

Actual behavior

Regression?

Known Workarounds

Configuration

Other information

CarnaViire commented Jan 18, 2023

amittleider commented Jan 24, 2023 •

edited

Loading

CarnaViire commented Jan 31, 2023

HttpClientFactory holds onto exceptions indefinitely, despite a two-minute timer #80605

HttpClientFactory holds onto exceptions indefinitely, despite a two-minute timer #80605

Comments

amittleider commented Jan 13, 2023 • edited Loading

Description

Summary

Example scenario

Proposed solutions

Labels

Reproduction Steps

Expected behavior

Actual behavior

Regression?

Known Workarounds

Configuration

Other information

dotnet-issue-labeler bot commented Jan 13, 2023

amittleider commented Jan 13, 2023

ghost commented Jan 13, 2023

Description

Summary

Example scenario

Proposed solutions

Labels

Reproduction Steps

Expected behavior

Actual behavior

Regression?

Known Workarounds

Configuration

Other information

CarnaViire commented Jan 18, 2023

amittleider commented Jan 24, 2023 • edited Loading

CarnaViire commented Jan 31, 2023

amittleider commented Jan 13, 2023 •

edited

Loading

amittleider commented Jan 24, 2023 •

edited

Loading