Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HttpClientFactory holds onto exceptions indefinitely, despite a two-minute timer #80605

Open
amittleider opened this issue Jan 13, 2023 · 6 comments

Comments

@amittleider
Copy link

amittleider commented Jan 13, 2023

Description

Summary

I have opened a PR here: #80604 to demonstrate this issue. The PR is not meant to be merged, it's merely a demonstration of the issue that I'm reporting. The test code runs and demonstrates the issue well. The production code is incomplete, but shows a potential path to fixing the issue.

The problem is that DefaultHttpClientFactory doesn't handle the case where an factory throws an exception well.

Typically, an HttpClientHandler will be recycled after two minutes (https://github.com/dotnet/runtime/blob/main/src/libraries/Microsoft.Extensions.Http/src/DependencyInjection/HttpClientBuilderExtensions.cs#L499), but, in the case where the building of the handler throws an exception, the exception will be cached indefinitely.

In the definition of a Lazy object:

Exception caching When you use factory methods, exceptions are cached. That is, if the factory method throws an exception the first time a thread tries to access the [Value](https://learn.microsoft.com/en-us/dotnet/api/system.lazy-1.value?view=net-7.0) property of the [Lazy<T>](https://learn.microsoft.com/en-us/dotnet/api/system.lazy-1?view=net-7.0) object, the same exception is thrown on every subsequent attempt.

This Lazy object is accessed in the CreateHandler method https://github.com/dotnet/runtime/blob/main/src/libraries/Microsoft.Extensions.Http/src/DefaultHttpClientFactory.cs#L118 , and immediately after it is accessed, the code will start a timer with the StartHandlerEntry timer. So, if line 118 throws an exception, a timer will never be set, and there will never be an attempt to re-initialize the object and the handler will be indefinitely in a bad state and irrecoverable.

Example scenario

As an example of how this might occur, consider a factory which takes runtime configuration as input. The runtime configuration changes to a bad state, causing an exception in the handler. The runtime configuration is then updated to a good state, but the factory will never be called again, so the application must be restarted to clear the Lazy object from memory.

Proposed solutions

  1. One solution is to ensure that the HandlerTimer is set before the factory is called. This will ensure that even if exceptions occur, we will always try to re-initialize it after the two-minute default. In the current implementation, the timer depends on the ActiveHandlerTrackingEntry, so this would require a bit of a re-write in the way the timers are handled. (See the code in the PR https://github.com/dotnet/runtime/compare/main...amittleider:runtime:HttpClientFactory_ExceptionsPersist?expand=1#diff-940dba6ffc5ae548fca7105241869c1a78442c04b968f129ce645407022e83cbR118 . This code doesn't compile, but it helps to illustrate the proposal).
  2. Another solution would be to make a new type of Lazy object that has a timer pre-built in. Like this, there is no need to handle timers at all within the DefaultHttpClientFactory.
  3. Last solution may be to modify the thread safety mode to use LazyThreadSafetyMode.PublicationOnly, which will not cache exceptions.

Labels

@dotnet/area-extensions-dependencyinjection
@dotnet/ncl

Reproduction Steps

See the test code in the PR: https://github.com/dotnet/runtime/pull/80604/files#diff-7ee446a98cb0ad2039642e909ac0732b37e7f764b534651cd88a3ac910c5b382R20

Expected behavior

See the test code in the PR: https://github.com/dotnet/runtime/pull/80604/files#diff-7ee446a98cb0ad2039642e909ac0732b37e7f764b534651cd88a3ac910c5b382R45

Actual behavior

See the test code in the PR: https://github.com/dotnet/runtime/pull/80604/files#diff-7ee446a98cb0ad2039642e909ac0732b37e7f764b534651cd88a3ac910c5b382R53

Regression?

No response

Known Workarounds

Never throw exceptions in HttpClientHandlerFactories.

Configuration

No response

Other information

No response

@dotnet-issue-labeler
Copy link

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

@ghost ghost added the untriaged New issue has not been triaged by the area owner label Jan 13, 2023
@amittleider
Copy link
Author

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

@dotnet/area-extensions-dependencyinjection
@dotnet/ncl

@ghost
Copy link

ghost commented Jan 13, 2023

Tagging subscribers to this area: @dotnet/ncl
See info in area-owners.md if you want to be subscribed.

Issue Details

Description

Summary

I have opened a PR here: #80604 to demonstrate this issue. The PR is not meant to be merged, it's merely a demonstration of the issue that I'm reporting. The test code runs and demonstrates the issue well. The production code is incomplete, but shows a potential path to fixing the issue.

The problem is that DefaultHttpClientFactory doesn't handle the case where an factory throws an exception well.

Typically, an HttpClientHandler will be recycled after two minutes (https://github.com/dotnet/runtime/blob/main/src/libraries/Microsoft.Extensions.Http/src/DependencyInjection/HttpClientBuilderExtensions.cs#L499), but, in the case where the building of the handler throws an exception, the exception will be cached indefinitely.

In the definition of a Lazy object:

Exception caching When you use factory methods, exceptions are cached. That is, if the factory method throws an exception the first time a thread tries to access the [Value](https://learn.microsoft.com/en-us/dotnet/api/system.lazy-1.value?view=net-7.0) property of the [Lazy<T>](https://learn.microsoft.com/en-us/dotnet/api/system.lazy-1?view=net-7.0) object, the same exception is thrown on every subsequent attempt.

This Lazy object is accessed in the CreateHandler method https://github.com/dotnet/runtime/blob/main/src/libraries/Microsoft.Extensions.Http/src/DefaultHttpClientFactory.cs#L118 , and immediately after it is accessed, the code will start a timer with the StartHandlerEntry timer. So, if line 118 throws an exception, a timer will never be set, and there will never be an attempt to re-initialize the object and the handler will be indefinitely in a bad state and irrecoverable.

Example scenario

As an example of how this might occur, consider a factory which takes runtime configuration as input. The runtime configuration changes to a bad state, causing an exception in the handler. The runtime configuration is then updated to a good state, but the factory will never be called again, so the application must be restarted to clear the Lazy object from memory.

Proposed solutions

  1. One solution is to ensure that the HandlerTimer is set before the factory is called. This will ensure that even if exceptions occur, we will always try to re-initialize it after the two-minute default. In the current implementation, the timer depends on the ActiveHandlerTrackingEntry, so this would require a bit of a re-write in the way the timers are handled. (See the code in the PR https://github.com/dotnet/runtime/compare/main...amittleider:runtime:HttpClientFactory_ExceptionsPersist?expand=1#diff-940dba6ffc5ae548fca7105241869c1a78442c04b968f129ce645407022e83cbR118 . This code doesn't compile, but it helps to illustrate the proposal).
  2. Another solution would be to make a new type of Lazy object that has a timer pre-built in. Like this, there is no need to handle timers at all within the DefaultHttpClientFactory.

Labels

@dotnet/area-extensions-dependencyinjection
@dotnet/ncl

Reproduction Steps

See the test code in the PR: https://github.com/dotnet/runtime/pull/80604/files#diff-7ee446a98cb0ad2039642e909ac0732b37e7f764b534651cd88a3ac910c5b382R20

Expected behavior

See the test code in the PR: https://github.com/dotnet/runtime/pull/80604/files#diff-7ee446a98cb0ad2039642e909ac0732b37e7f764b534651cd88a3ac910c5b382R45

Actual behavior

See the test code in the PR: https://github.com/dotnet/runtime/pull/80604/files#diff-7ee446a98cb0ad2039642e909ac0732b37e7f764b534651cd88a3ac910c5b382R53

Regression?

No response

Known Workarounds

Never throw exceptions in HttpClientHandlerFactories.

Configuration

No response

Other information

No response

Author: amittleider
Assignees: -
Labels:

untriaged, area-Extensions-HttpClientFactory

Milestone: -

@CarnaViire
Copy link
Member

Thanks for the report and for detailed analysis @amittleider!

It seems to me that it was a design decision to expect configuration callbacks to be no-throw. It is another question on whether it was a good design decision, so we can open that conversation. In my opinion, it is an acceptable behavior. When an exception happens, it is delivered to the user. The factory doesn't go into an "undefined" state after an exception, when you don't know whether you would get an exception or not, it goes into a "failed" state. And there is a workaround to always catch the exceptions inside the user callback. I will not argue that it could be better, though. For example, the callback might be called again on next attempt to create a client, which would solve the problem that you had to restart an application to reset the "failed" state. But it seems to be the first time we had this reported -- it seems more of a corner-case scenario and nice-to-have. So I would put it to Future for now.

On the topic of your proposed fix, the Timer bit is an implementation detail, so I believe it is not related to the issue at hand and should not be part of the behavior design. It can change any time e.g. when we would switch to SocketsHttpHandler in #35987, and the primary handler would be only created once, while PooledConnectionLifetime would be used instead of timers. Also, if you are interested, it was a conscious decision to start the timer after the handler is created, to avoid race conditions if case of very short timers, see this comment.

@CarnaViire CarnaViire added this to the Future milestone Jan 18, 2023
@ghost ghost removed the untriaged New issue has not been triaged by the area owner label Jan 18, 2023
@amittleider
Copy link
Author

amittleider commented Jan 24, 2023

Thanks for the report and for detailed analysis @amittleider!

It seems to me that it was a design decision to expect configuration callbacks to be no-throw. It is another question on whether it was a good design decision, so we can open that conversation. In my opinion, it is an acceptable behavior. When an exception happens, it is delivered to the user. The factory doesn't go into an "undefined" state after an exception, when you don't know whether you would get an exception or not, it goes into a "failed" state. And there is a workaround to always catch the exceptions inside the user callback. I will not argue that it could be better, though. For example, the callback might be called again on next attempt to create a client, which would solve the problem that you had to restart an application to reset the "failed" state. But it seems to be the first time we had this reported -- it seems more of a corner-case scenario and nice-to-have. So I would put it to Future for now.

On the topic of your proposed fix, the Timer bit is an implementation detail, so I believe it is not related to the issue at hand and should not be part of the behavior design. It can change any time e.g. when we would switch to SocketsHttpHandler in #35987, and the primary handler would be only created once, while PooledConnectionLifetime would be used instead of timers. Also, if you are interested, it was a conscious decision to start the timer after the handler is created, to avoid race conditions if case of very short timers, see this comment.

Thanks for taking a look, @CarnaViire . Thanks for pointing out the race condition. I agree that the first suggestion would not be the right solution.

What do you think about the third suggestion? We could use a different method of lazy initialization, one that will not hold onto exceptions if it is failed to initialize (LazyThreadSafetyMode.PublicationOnly). In this case, it's potentially a one-line fix here: https://github.com/dotnet/runtime/blob/main/src/libraries/Microsoft.Extensions.Http/src/DefaultHttpClientFactory.cs#L88 .

@CarnaViire
Copy link
Member

In case of LazyThreadSafetyMode.PublicationOnly, the initialization method can be executed concurrently multiple times per name, which is not acceptable perf-wise, as it would lead to unnecessary allocations (several message handlers being created instead of one). Handler configuration code, which is called within the initialization, can also rely on it being executed in a thread-safe manner, which would break.

If we were to modify the behavior here, we would probably need to let go of Lazy and implement synchronization manually by some other means.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants