Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[otelhttp] transport metrics #3769

Open
wants to merge 26 commits into
base: main
Choose a base branch
from

Conversation

RangelReale
Copy link
Contributor

@RangelReale RangelReale commented May 3, 2023

Add HTTP client metrics following the semantic conventions.

As users of Transport are not expected to have a handler where they can customize the attributes using a Labeler, callback function options were added to add attributes from the request and the reponse objects.

Fixes #3134
Fixes #1336

@RangelReale RangelReale requested a review from a team May 3, 2023 13:00
@codecov
Copy link

codecov bot commented May 4, 2023

Codecov Report

Merging #3769 (45ceb1d) into main (7d9626d) will increase coverage by 0.0%.
Report is 1 commits behind head on main.
The diff coverage is 94.4%.

❗ Current head 45ceb1d differs from pull request most recent head e28520c. Consider uploading reports for the commit e28520c to get more accurate results

Additional details and impacted files

Impacted file tree graph

@@          Coverage Diff          @@
##            main   #3769   +/-   ##
=====================================
  Coverage   79.4%   79.4%           
=====================================
  Files        166     166           
  Lines      10360   10401   +41     
=====================================
+ Hits        8230    8266   +36     
- Misses      1996    1999    +3     
- Partials     134     136    +2     
Files Changed Coverage Δ
instrumentation/net/http/otelhttp/common.go 100.0% <ø> (ø)
instrumentation/net/http/otelhttp/transport.go 94.1% <93.4%> (-0.7%) ⬇️
instrumentation/net/http/otelhttp/config.go 84.6% <100.0%> (+1.7%) ⬆️
instrumentation/net/http/otelhttp/handler.go 86.9% <100.0%> (ø)
instrumentation/net/http/otelhttp/wrap.go 92.5% <100.0%> (+7.4%) ⬆️

... and 3 files with indirect coverage changes

@dmathieu
Copy link
Member

dmathieu commented May 4, 2023

This would need a changelog, and tests.

@RangelReale
Copy link
Contributor Author

This would need a changelog, and tests.

Add tests and changelog.

@RangelReale
Copy link
Contributor Author

Hmm CI shows a data race message that I can't reproduce locally.

@pellared
Copy link
Member

pellared commented May 8, 2023

Hmm CI shows a data race message that I can't reproduce locally.

For reference: https://github.com/open-telemetry/opentelemetry-go-contrib/actions/runs/4883405160/jobs/8714784942

Have you tried running multiple times? E.g. by calling go test -race -count=1000 ?

EDIT: At first glance it looks not related to your changes. If I am correct, then it would be better to create a separate issue for it.

EDIT 2: The data race is in your code. #3769 (comment). PS. I managed to get the data race on my machine (but the probability to hit it is extremely low. I had to run go test -race -count=10000 multiple times in instrumentation/net/http/otelhttp/test directory).

Copy link

@forestsword forestsword left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much for this. I was looking for this feature. Until this gets merged I decided to follow your lead and did something similar in what I'm writing. I'm not a maintainer but my review is some things I found.

@@ -37,6 +37,14 @@ const (
ServerLatency = "http.server.duration" // Incoming end to end duration, microseconds
)

// Client HTTP metrics.
const (
ClientRequestCount = "http.client.request_count" // Outgoing request count total

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is no longer defined in the semantic convention. And doesn't look to be actually used in the code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it is not used, we are supposed to get this information from the count of duration. I left it there becase the one for server was also there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the server one is still there, so I'm keeping this one also.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instrumentation/net/http/otelhttp/transport.go Outdated Show resolved Hide resolved
instrumentation/net/http/otelhttp/transport.go Outdated Show resolved Hide resolved
instrumentation/net/http/otelhttp/transport.go Outdated Show resolved Hide resolved
@pellared
Copy link
Member

@RangelReale are you still willing to work on this?

@RangelReale
Copy link
Contributor Author

yes, I''m available either to follow up or if someone wants to takeover also fine.

@pellared
Copy link
Member

yes, I''m available either to follow up or if someone wants to takeover also fine.

It is fine if you follow up. I will do my best to take look in the beginning of next week.

Copy link
Member

@pellared pellared left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RangelReale RangelReale requested a review from pellared August 8, 2023 16:04
@pellared
Copy link
Member

@Aneurysm9, can you please review this PR (as module codeowner)?

Copy link
Member

@pellared pellared left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did my best to carefully review this PR. Nice work. A left a few comments that needs to be addressed. Some should lead to simplification of the PR.

instrumentation/net/http/otelhttp/common.go Outdated Show resolved Hide resolved
Comment on lines 213 to 225
// WithRequestAttributeGetter extracts additional attributes from the request.
func WithRequestAttributeGetter(fn func(req *http.Request) []attribute.KeyValue) Option {
return optionFunc(func(o *config) {
o.GetRequestAttributes = fn
})
}

// WithResponseAttributeGetter extracts additional attributes from the response.
func WithResponseAttributeGetter(fn func(req *http.Response) []attribute.KeyValue) Option {
return optionFunc(func(o *config) {
o.GetResponseAttributes = fn
})
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

People may NOT want to add always the same attributes to spans and metrics. While for spans it is not a problem for metrics it leads to high cardinality.

I strongly suggest to remove this functionality from this PR. We can create a different solution in a separate PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed.

}
metricdatatest.AssertEqual(t, want, sm.Metrics[1], metricdatatest.IgnoreTimestamp())

// Duration value is not predictable.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. We can check that it is "non-zero" 😉
  2. Would adding a IgnoreValue option to metricdatatest help?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created a PR in the main repo to add this "IgnoreValue" option.

Comment on lines 137 to 138
reader := metric.NewManualReader()
meterProvider := metric.NewMeterProvider(metric.WithReader(reader))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the changes needed in this test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, will remove.

"go.opentelemetry.io/otel/trace"
)

func assertClientScopeMetrics(t *testing.T, sm metricdata.ScopeMetrics, attrs attribute.Set) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think we usually add helper functions at the bottom of the file (below tests).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

want = metricdata.Metrics{
Name: "http.client.response_content_length",
Data: metricdata.Sum[int64]{
DataPoints: []metricdata.DataPoint[int64]{{Attributes: attrs, Value: 13}},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This value is tightly coupled to the test where the function is used. Maybe it would be better just to inline the function into the test as right now it is not really reusable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is a copy/change from the handler test, maybe it can be better to leave both equals for future changes?

Comment on lines +48 to +49
counters map[string]metric.Int64Counter
valueRecorders map[string]metric.Float64Histogram
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we using maps here? How about simply having the mutiple instrument fields (many metric.Int64Counter and metric.Float64Histogram fields instead of these two maps).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also the same thing done by the server handler, this pattern seems to be used in a lot of instrumentations.

Co-authored-by: Robert Pająk <pellared@hotmail.com>
@jlordiales
Copy link

Curious if this is currently blocked by anything? Would be awesome to have this merged and released when possible

}

// Add metrics
attributes := semconvutil.HTTPClientRequest(r)
Copy link

@jlordiales jlordiales Sep 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This probably needs something similar to

func HTTPServerRequestMetrics(server string, req *http.Request) []attribute.KeyValue {
which avoids any high cardinality attributes for metrics

o := metric.WithAttributes(attributes...)
t.counters[ClientRequestContentLength].Add(ctx, bw.read.Load(), o)
if err == nil {
t.counters[ClientResponseContentLength].Add(ctx, res.ContentLength, o)
Copy link

@jdef jdef Sep 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ContentLength may not be what we want here, could be negative. since bodyWrapper already tracks count of bytes read from the response, it's probably better to refer to that count here

[EDIT] I see that newWrappedBody is used instead of bodyWrapper .. so maybe it's not a drop-in replacement. nonetheless we need a more accurate byte count than http.Response.ContentLength is guaranteed to provide.

@@ -109,21 +140,64 @@ func (t *Transport) RoundTrip(r *http.Request) (*http.Response, error) {
ctx = httptrace.WithClientTrace(ctx, t.clientTrace(ctx))
}

readRecordFunc := func(int64) {}
if t.readEvent {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

readEvent and readRecordFunc map to "read" in the span .. but it's the client's Request object, so it's actually a "write" event - right? because it's being written to the server?

"read" events should be recorded from the bytes consumed by the http.Response.Body, right?

@Sovietaced
Copy link
Contributor

@RangelReale What is the latest on this? Do you need any help?

@RangelReale
Copy link
Contributor Author

@RangelReale What is the latest on this? Do you need any help?

@Sovietaced I've not been following too much the latest changes, if you want to help/assume this fork, it is fine with me.

@Sovietaced
Copy link
Contributor

@Sovietaced I've not been following too much the latest changes, if you want to help/assume this fork, it is fine with me.

Thanks! I have made a fork and will be going through the code tomorrow. I will open up another pull request once I deeply understand the code and have addressed any outstanding comments here.

@Sovietaced
Copy link
Contributor

I have an initial draft of this work continued here: #4707

Would appreciate any reviews

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add metrics for transport layer in instrumentation net/http Add Instrumentation (metric) for net/http
8 participants