Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add additional trace to HttpConnectionPool #66605

Merged
merged 3 commits into from
Mar 17, 2022

Conversation

CarnaViire
Copy link
Member

The additional trace should help troubleshoot hanged "pending" connection issues (i.e. the reasons why new connections were not created) in HttpConnectionPool.

The issue manifests as growing number of timeouted requests. It is not easy to link a timeouted request (esp. with a small timeout) to a connection that was created 3 minutes ago but still is not connected. There are no traces of the state of HttpConnectionPool, e.g. number of pending connections, so it is unclear from existing traces why the pool decided not to create a connection.

Related to #66297

@ghost
Copy link

ghost commented Mar 14, 2022

Tagging subscribers to this area: @dotnet/ncl
See info in area-owners.md if you want to be subscribed.

Issue Details

The additional trace should help troubleshoot hanged "pending" connection issues (i.e. the reasons why new connections were not created) in HttpConnectionPool.

The issue manifests as growing number of timeouted requests. It is not easy to link a timeouted request (esp. with a small timeout) to a connection that was created 3 minutes ago but still is not connected. There are no traces of the state of HttpConnectionPool, e.g. number of pending connections, so it is unclear from existing traces why the pool decided not to create a connection.

Related to #66297

Author: CarnaViire
Assignees: CarnaViire
Labels:

area-System.Net.Http

Milestone: -

@CarnaViire CarnaViire requested a review from a team March 14, 2022 18:57
@@ -478,6 +478,13 @@ private void CheckForHttp11ConnectionInjection()
return;
}

if (NetEventSource.Log.IsEnabled())
Copy link
Member

@ManickaP ManickaP Mar 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to add some telemetry counters? So that you wouldn't need to turn on the whole debugging logs to diagnose it.
cc @MihaZupan

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What kind of counters did you have in mind?
We could expose many things, but they may not be too useful in diagnosing an issue like this since they are all aggregated for the process.

I see System.Net.Sockets doesn't have a "current" connection attempt counter - I don't see a reason not to add that. We have "current" counters on all the other sources.
Something like PendingHttp11ConnectionCount on Http would then be sort of redundant.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I was trying to point out is that in order to get this diagnostics log, you need to turn on the internal logging. It's very verbose and chaotic and not designed to be used in production. I see it mostly as our internal logging helping us diagnose issues. In general, we don't recommend customers to turn this on. And there were even attempts to get completely rid of this.
So the question is whether this trace should help us or the customer diagnose the issue? If it's for customers, shouldn't it be part of the official HTTP telemetry?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EventSource does have the LogLevel for a reason, so we could certainly expose a lot more info under debug/trace.

The "official" telemetry surface is an API we effectively promise not to break.
As such, these logs shouldn't be unique events with a bunch of parameters since they are likely to change/break in the future.

We can, however, have something like

ConnectionPoolDebug(long poolId, string message);

Where you can still easily collect and aggregate all the information around the pool behavior, it just wouldn't be ideal for automatic processing.

Alternatively, we can start using the Keywords feature of EventSource (filtering by topic). Or the connection pool can have its own EventSource System.Net.Http.ConnectionPool.

Copy link
Member

@MihaZupan MihaZupan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree this will be useful for future debugging, thanks.

All of this info is sort of available if you have access to all the events since the start of the connection pool. But if someone just observes moment-in-time behavior (turning on tracing when there is an issue happening), they would be lacking this info, making the trace much less useful.

Something that may also be helpful is including the exception in the trace on connection failure here (same for H2, H3):

if (NetEventSource.Log.IsEnabled()) Trace("HTTP/1.1 connection failed");

@CarnaViire CarnaViire merged commit bd02b32 into dotnet:main Mar 17, 2022
@CarnaViire
Copy link
Member Author

/backport to release/6.0

@github-actions
Copy link
Contributor

Started backporting to release/6.0: https://github.com/dotnet/runtime/actions/runs/2000094673

@github-actions
Copy link
Contributor

@CarnaViire backporting to release/6.0 failed, the patch most likely resulted in conflicts:

$ git am --3way --ignore-whitespace --keep-non-patch changes.patch

Applying: Add additional trace to HttpConnectionPool
Applying: Add exception to trace in case of failed connect
error: sha1 information is lacking or useless (src/libraries/System.Net.Http/src/System/Net/Http/SocketsHttpHandler/HttpConnectionPool.cs).
error: could not build fake ancestor
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0002 Add exception to trace in case of failed connect
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".
Error: The process '/usr/bin/git' failed with exit code 128

Please backport manually!

radekdoulik pushed a commit to radekdoulik/runtime that referenced this pull request Mar 30, 2022
The additional trace should help troubleshoot hanged "pending" connection issues (i.e. the reasons why new connections were not created) in HttpConnectionPool.

The issue manifests as growing number of timeouted requests. It is not easy to link a timeouted request (esp. with a small timeout) to a connection that was created 3 minutes ago but still is not connected. There are no traces of the state of HttpConnectionPool, e.g. number of pending connections, so it is unclear from existing traces why the pool decided not to create a connection.

Related to dotnet#66297
@karelz karelz added this to the 7.0.0 milestone Apr 8, 2022
@ghost ghost locked as resolved and limited conversation to collaborators May 8, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants