-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add HTTP handler for metrics querying #3095
Add HTTP handler for metrics querying #3095
Conversation
Signed-off-by: albertteoh <albert.teoh@logz.io>
Codecov Report
@@ Coverage Diff @@
## master #3095 +/- ##
==========================================
+ Coverage 95.86% 95.96% +0.09%
==========================================
Files 235 236 +1
Lines 10143 10252 +109
==========================================
+ Hits 9724 9838 +114
+ Misses 349 344 -5
Partials 70 70
Continue to review full report at Codecov.
|
cmd/query/app/http_handler.go
Outdated
@@ -115,6 +127,10 @@ func (aH *APIHandler) RegisterRoutes(router *mux.Router) { | |||
// TODO - remove this when UI catches up | |||
aH.handleFunc(router, aH.getOperationsLegacy, "/services/{%s}/operations", serviceParam).Methods(http.MethodGet) | |||
aH.handleFunc(router, aH.dependencies, "/dependencies").Methods(http.MethodGet) | |||
aH.handleFunc(router, aH.latencies, "/metrics/latencies/{%s}", servicesParam).Methods(http.MethodGet) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you particularly set on passing service in the URL? We tried to move away from that (see L126)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, I'm not particularly attached to passing service as a URL path segment. Just thought that having services in the URL path is an affordance to suggest it is a mandatory parameter.
What was the reason behind moving away from it? I only found this comment, but couldn't find the discussion on why there was final agreement on moving in this direction: #52 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What was the reason behind moving away from it?
- part of my holy war on high cardinality REST URLs (when params are encoded in the URL we can't use URL as the span name)
- when the service name is part of the URL people sometimes tend to think it can go there unencoded, which breaks when the service name contains a slash
in the end though - not very strong reasons
Signed-off-by: albertteoh <albert.teoh@logz.io>
Signed-off-by: albertteoh <albert.teoh@logz.io>
service := r.FormValue(serviceParam) | ||
operation := r.FormValue(operationParam) | ||
|
||
startTime, err := p.parseTime(startTimeParam, r) | ||
startTime, err := p.parseTime(r, startTimeParam, time.Microsecond) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the search
endpoint defines times in microseconds since epoch (why?), but other endpoints such as dependencies
and metrics
use milliseconds since epoch, since those are the time units JS (i.e. Jaeger UI) use e.g. Date.now()
, so we need to pass the units to the parseTime
function to support both cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should add this to comments in the code. The search operates on span latencies, which are expressed as microseconds in the data model, so it makes sense to support high accuracy. The search UI itself does not insist on exact units because it supports string like 1ms. We had a debate over whether units should be handled by the UI instead of the backend service, but here we are, since Go makes parsing 1ms very easy.
The dependencies API does not operate on the latency space, instead its timestamps are just time range selections, and the typical backend granularity of those is on the order of 15min or more. So microseconds aren't really that useful in this domain, although I certainly would've preferred having a consistent time representation.
Metrics is a new domain, you need to decide which representation makes more sense. I think it's closer to dependencies, where you're asking about wall clock time, not the latency domain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The search operates on span latencies, which are expressed as microseconds in the data model
Out of curiosity, why micros and not posix nanos which I think also fit into an int64? Were there some limitations in storage or clients to handle posix nanos?
Metrics is a new domain, you need to decide which representation makes more sense. I think it's closer to dependencies, where you're asking about wall clock time, not the latency domain.
I think milliseconds are most appropriate (I've documented these reasons in comments within queryParser
):
- The main client in mind is Jaeger UI. Being a react.js app, its built-in time precision is milliseconds so it simplifies usage for Jaeger UI.
- The min step size is
1ms
for the Prometheus-compliant storage backends I've tested. - It also follows that data point timestamps are reported with a millisecond precision, at least from my tests so far.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of curiosity, why micros and not posix nanos which I think also fit into an int64? Were there some limitations in storage or clients to handle posix nanos?
legacy from zipkin origins, which still only supports micros, whereas our domain model uses Go's time.Time which has nanos. But our Thrift API uses micros.
cmd/query/app/query_parser.go
Outdated
|
||
// If no units are supplied, assume parsing of duration strings like 5ms. | ||
case units == nil: | ||
if d, err = time.ParseDuration(formValue); err == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the search
endpoint defines durations as duration strings, presumably to support human-friendly clients like curl, etc. so this maintains support for such duration parameter values.
Signed-off-by: albertteoh <albert.teoh@logz.io>
Signed-off-by: albertteoh <albert.teoh@logz.io>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've tidied a few things up:
- use encapsulation for duration parsing
- move query param parsing logic for each endpoint into
queryParser
- fix up comments
- unexport unnecessarily exported var
- wrap errors
PTAL.
service := r.FormValue(serviceParam) | ||
operation := r.FormValue(operationParam) | ||
|
||
startTime, err := p.parseTime(startTimeParam, r) | ||
startTime, err := p.parseTime(r, startTimeParam, time.Microsecond) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should add this to comments in the code. The search operates on span latencies, which are expressed as microseconds in the data model, so it makes sense to support high accuracy. The search UI itself does not insist on exact units because it supports string like 1ms. We had a debate over whether units should be handled by the UI instead of the backend service, but here we are, since Go makes parsing 1ms very easy.
The dependencies API does not operate on the latency space, instead its timestamps are just time range selections, and the typical backend granularity of those is on the order of 15min or more. So microseconds aren't really that useful in this domain, although I certainly would've preferred having a consistent time representation.
Metrics is a new domain, you need to decide which representation makes more sense. I think it's closer to dependencies, where you're asking about wall clock time, not the latency domain.
Signed-off-by: albertteoh <albert.teoh@logz.io>
Signed-off-by: albertteoh <albert.teoh@logz.io>
// parse takes a request and constructs a model of parameters | ||
// parseTraceQueryParams takes a request and constructs a model of parameters. | ||
// | ||
// Why start/end parameters are expressed in microseconds: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yurishkuro I've added your comments here for posterity. Please advise if they would be better placed else where.
|
||
// parseMetricsQueryParams takes a request and constructs a model of metrics query parameters. | ||
// | ||
// Why the API is designed using an end time (endTs) and lookback: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also added reasons for various parameter decisions here. Again, if you can think of a better place to put these, I can move them.
spanstore.OperationQueryParameters{ServiceName: "abc/trifle"}, | ||
spanstore.OperationQueryParameters{ | ||
ServiceName: "abc/trifle", | ||
SpanKind: "server", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here's the assertion to catch my earlier breaking API change.
Signed-off-by: albertteoh <albert.teoh@logz.io>
Signed-off-by: albertteoh <albert.teoh@logz.io>
b04a2f7
to
ba4234d
Compare
cmd/query/app/query_parser.go
Outdated
@@ -76,15 +121,16 @@ type traceQueryParameters struct { | |||
// key := strValue | |||
// keyValue := strValue ':' strValue | |||
// tags :== 'tags=' jsonMap | |||
func (p *queryParser) parse(r *http.Request) (*traceQueryParameters, error) { | |||
func (p *queryParser) parseTraceQueryParams(r *http.Request) (*traceQueryParameters, error) { | |||
dp := durationStringParser{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
declared too far from usage
cmd/query/app/query_parser.go
Outdated
lookback time.Duration | ||
} | ||
|
||
durationParser interface { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
embrace functional programming - this is just a function, and the two parsers can be easily implemented to return functions, without using any structs
// Valid input span kinds are the string representations from the OpenTelemetry model/proto/metrics/otelspankind.proto. | ||
// For example: | ||
// - "SPAN_KIND_SERVER" | ||
// - "SPAN_KIND_CLIENT" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this consistent with the strings we used in the Operations API? I thought we used strings client/server, as defined in OpenTracing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good pickup, I couldn't find formal docs in comments but found evidence where "server" and "client" are used in the operations API unit tests.
Added a mapping from jaeger/opentracing span kinds to OTEL span kinds.
Signed-off-by: albertteoh <albert.teoh@logz.io>
"github.com/gogo/protobuf/proto" | ||
) | ||
|
||
type jsonMarshaler = func(writer io.Writer, response interface{}) error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I decided to follow your advice and embrace functional programming here as well; it looks much more concise :)
// Valid input span kinds are the string representations from the OpenTelemetry model/proto/metrics/otelspankind.proto. | ||
// For example: | ||
// - "SPAN_KIND_SERVER" | ||
// - "SPAN_KIND_CLIENT" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good pickup, I couldn't find formal docs in comments but found evidence where "server" and "client" are used in the operations API unit tests.
Added a mapping from jaeger/opentracing span kinds to OTEL span kinds.
lookback time.Duration | ||
} | ||
|
||
durationParser = func(s string) (time.Duration, error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
converted the duration parsers to functions. I did initially try to adopt functional programming but was thrown by the fact that the unit duration parser needed to maintain state (the duration units), but hadn't thought that we can get around this via closures.
Thanks for pointing this out, @yurishkuro, it looks a lot neater, and I've learnt something new! :)
Co-authored-by: Yuri Shkuro <yurishkuro@users.noreply.github.com> Signed-off-by: albertteoh <albert.teoh@logz.io>
679fd00
to
d9f5740
Compare
Thanks for the review, Yuri! |
Signed-off-by: albertteoh albert.teoh@logz.io
Which problem is this PR solving?
Short description of the changes