Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve observability of enqueuing requests to query-scheduler #5818

Merged
merged 6 commits into from
Aug 24, 2023

Conversation

charleskorn
Copy link
Contributor

@charleskorn charleskorn commented Aug 23, 2023

What this PR does

This PR improves the observability of query-frontend interactions with query-schedulers.

In particular:

  • tracing events are emitted when attempts are made to enqueue a request, when enqueuing fails and when it succeeds
  • tracing events are emitted when a query response is received
  • a span is emitted for each enqueue request to a scheduler, including the target scheduler's address
  • the error returned by the scheduler, rather than nil, is now logged when the scheduler returns an error (here)

Example trace looks like this now:

Screenshot 2023-08-23 at 4 06 44 pm

Note to reviewers: I'd suggest reviewing each commit separately.

Which issue(s) this PR fixes or relates to

(none)

Checklist

  • [n/a] Tests updated
  • [n/a] Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

@@ -222,9 +226,12 @@ func (f *Frontend) RoundTripGRPC(ctx context.Context, req *httpgrpc.HTTPRequest)
retries := f.cfg.WorkerConcurrency + 1 // To make sure we hit at least two different schedulers.

enqueueAgain:
level.Debug(spanLogger).Log("msg", "enqueuing request")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to reviewers: I've deliberately added this request started event and the "request succeeded" event below despite adding the span in enqueueRequest, as it's possible there's a delay between a request being added to f.requestsCh and it being picked up and enqueued by a worker.

@charleskorn charleskorn marked this pull request as ready for review August 23, 2023 06:18
@charleskorn charleskorn requested a review from a team as a code owner August 23, 2023 06:18
Copy link
Member

@pstibrany pstibrany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thank you!

pkg/frontend/v2/frontend_scheduler_worker.go Outdated Show resolved Hide resolved
req.response <- &frontendv2pb.QueryResultRequest{
HttpResponse: &httpgrpc.HTTPResponse{
Code: http.StatusInternalServerError,
Body: []byte(resp.Error),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch!

Copy link
Contributor

@dimitarvdimitrov dimitarvdimitrov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Co-authored-by: Peter Štibraný <pstibrany@gmail.com>
@charleskorn charleskorn enabled auto-merge (squash) August 24, 2023 01:11
@charleskorn charleskorn merged commit 4df2f07 into main Aug 24, 2023
27 checks passed
@charleskorn charleskorn deleted the charleskorn/scheduler-observability branch August 24, 2023 01:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants