Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[release/5.0] Http telemetry changes backport #41610

Merged
merged 3 commits into from
Aug 31, 2020

Conversation

MihaZupan
Copy link
Member

@MihaZupan MihaZupan commented Aug 31, 2020

Joint changes of 3 related PRs: #41527, #41022, #41590 (they depend on each other, therefore they are merged in the port)

Customer Impact

Http telemetry: Request Start/Stop events don't properly correlate, which makes the end-to-end tracing impossible to use.
#41022 fixed that issue by moving the relevant telemetry instrumentation into HttpClient & HttpMessageInvoker.

The PR also adds more events that can provide more fine-grained insight into the lifetime of requests -- result of end-to-end validation and comparison with other stack:

  • RequestHeadersStart/Stop
  • RequestContentStart/Stop
  • ResponseHeadersStart/Stop
  • ResponseContentStart/Stop

RequestAborted event and counter were renamed to RequestFailed to align with naming of other *Failed events and to reflect their real purpose -- i.e. RequestFailed includes also non-successful HTTP response status codes (not quite an abort of request).

The PR further merges several event patterns together into parameterized HTTP version to avoid duplication (motivated by end-to-end validation feedback), for example:
Http11ConnectionEstablished and Http20ConnectionEstablished are merged into ConnectionEstablished(version).

Testing

  • Added a significant amount of testing to ensure the proper events and counters are emitted as part of this change.
  • There are a lot of existing tests covering HttpClient behavior in CI.
  • End-to-end manual validation in PerfView and using in-proc EventListener.

Risk

  • Small - There is minimal impact on non-Telemetry code path, because the change just shifts around code in HttpClient, but the changes are mostly copy/paste indentation changes (scope changes like including code in try-catch block, etc.).
  • Telemetry code paths are brand new - added during 5.0 (end of June), not yet consumed by any customers (beyond potentially early experimentation)

MihaZupan and others added 3 commits August 31, 2020 18:49
* Change HandshakeStop SslProtocols parameter to enum

* Parameterize Http11/Http20 events

* Add dummy request to ensure Http2 settings are received
* Moved HTTP request telemetry to HttpClient.SendAsync

* Added ResponseContent and helper methods events.

* Rework helper method activity nesting

* Expand Telemetry tests

* Also log RequestStart/Stop in HttpMessageInvoker

* Update RequestStart signature

* RequestAborted => RequestFailed rename

* ResponseContent Begin => Start/Stop

* Fix HttpMessageInvoker implementation

* Add Synchronous request Telemetry tests

* Check telemetryStarted before ResponseContentStart

Co-authored-by: MihaZupan <mihazupan.zupan1@gmail.com>
@MihaZupan MihaZupan added this to the 5.0.0 milestone Aug 31, 2020
@MihaZupan MihaZupan requested review from scalablecory, karelz, ManickaP, alnikola and a team August 31, 2020 17:26
@ghost
Copy link

ghost commented Aug 31, 2020

Tagging subscribers to this area: @dotnet/ncl
See info in area-owners.md if you want to be subscribed.

@MihaZupan
Copy link
Member Author

/azp run runtime-libraries outerloop

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

{
if (!_concurrentStreams.TryRequestCreditNoWait(1))
if (_pool.EnableMultipleHttp2Connections)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was swapping these 2 if checks intentional?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. We only want to log RequestLeftQueue if we actually have to wait, so we have to perform the TryRequestCreditNoWait check first.
We already did so in the EnableMultipleHttp2Connections == true, so swapping the order around avoided duplicating a check.

@karelz
Copy link
Member

karelz commented Aug 31, 2020

@ericstj this is the Telemetry change I talked about -- we're aiming to have it ready for merge in a few hours. Let us know if you approve. Thanks!

@ericstj
Copy link
Member

ericstj commented Aug 31, 2020

Approved since this is a blocking bug in a 5.0 feature.

Comment on lines +56 to +57
(byte)request.Version.Major,
(byte)request.Version.Minor,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we okay truncating these if user sets them to something crazy? Seems like something they'd want to see in diagnostics.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assumed we throw for anything unknown, do we allow it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really. We allow it and we just connect over what we have (depending on VersionPolicy).
Minor is not checked almost at all (except for 1.0 and chunked content ASAIK).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What was the motivation to change to byte here?

@@ -1425,6 +1436,9 @@ private async ValueTask<Http2Stream> SendHeadersAsync(HttpRequestMessage request

return s.mustFlush || s.endStream;
}, cancellationToken).ConfigureAwait(false);

if (HttpTelemetry.Log.IsEnabled()) HttpTelemetry.Log.RequestHeadersStop();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it intended that you'll get a broken Stop without a Start when the switch is flipped?

@MihaZupan
Copy link
Member Author

Outerloop CI Failures: #40798, #41606, #41531, #41530

Copy link
Member

@ManickaP ManickaP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Except for open questions LGTM.

Copy link
Contributor

@scalablecory scalablecory left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, just the question about the version truncation

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants