Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query Frontend Refactor: Metrics Query Range #3584

Merged
merged 19 commits into from
Apr 19, 2024

Conversation

joe-elliott
Copy link
Member

@joe-elliott joe-elliott commented Apr 17, 2024

What this PR does:
Moves the final frontend endpoint (metrics query range) over to the new streaming pipeline. Note that the streaming is done in a naive fashion. We simply return whatever is available every 500ms. After this is integrated into a frontend we will revisit and determine what a good streaming pattern will be.

Open Questions

  • Should we add a new SLO config and SLO metric explicitly for metrics? Currently it piggybacks on search.
  • Should we reorg the code layout to be around pipeline? i.e. should we bring the combiner/sharder/handlers all together in modules? They are starting to have dependencies that would be easier to see if they were co-located.

Other changes:

  • Refactors the pipeline to return an interface instead of *http.Response. The interface allows additional data to accompany the returned result.
    • Adds the ability for a request to add a value into the context. This value is then returned with its associated response. This is used to return a per job sampling rate to the combiner.
  • Drops prom compatibility support
  • Adds support in the CLI for the new endpoints
  • Updates the local docker-compose example to support metrics queries
  • Adds guard code in the frontend to prevent invalid config options
  • Removes a number of unused files such as the no-op combiner and the old multitenant sharder
  • Added total jobs and completed jobs to the metrics query range response
  • Excludes goconst linter from _test.go files.

Which issue(s) this PR fixes:
Fixes #3413

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

Signed-off-by: Joe Elliott <number101010@gmail.com>
Signed-off-by: Joe Elliott <number101010@gmail.com>
Signed-off-by: Joe Elliott <number101010@gmail.com>
Signed-off-by: Joe Elliott <number101010@gmail.com>
Signed-off-by: Joe Elliott <number101010@gmail.com>
Signed-off-by: Joe Elliott <number101010@gmail.com>
Signed-off-by: Joe Elliott <number101010@gmail.com>
Signed-off-by: Joe Elliott <number101010@gmail.com>
Signed-off-by: Joe Elliott <number101010@gmail.com>
Signed-off-by: Joe Elliott <number101010@gmail.com>
Signed-off-by: Joe Elliott <number101010@gmail.com>
Signed-off-by: Joe Elliott <number101010@gmail.com>
Signed-off-by: Joe Elliott <number101010@gmail.com>
Signed-off-by: Joe Elliott <number101010@gmail.com>
Signed-off-by: Joe Elliott <number101010@gmail.com>
pkg/api/http.go Outdated Show resolved Hide resolved
if samplingRate != nil {
fRate := samplingRate.(float64)

if fRate != 0.0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can skip this when rate == 1.0. Maybe switch to fRate < 1.0 ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the 0.0 check is just a sanity check to prevent panics. fRate < 1.0 && fRate > 0.0. would there ever be a time we deflated metrics with the rate?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the sampling rate will always be between 0 and 1 due to the check in the sharder: https://github.com/grafana/tempo/pull/3584/files#diff-85974350e62b77bee6b5d88d537e3946839ccb1bba5f6dc89450a1c5489b5d4fR364

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch. updated to the suggested condition

modules/frontend/frontend.go Show resolved Hide resolved
Signed-off-by: Joe Elliott <number101010@gmail.com>
Signed-off-by: Joe Elliott <number101010@gmail.com>
Signed-off-by: Joe Elliott <number101010@gmail.com>
Signed-off-by: Joe Elliott <number101010@gmail.com>
Copy link
Contributor

@mdisibio mdisibio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM and nice work. I look forward to glorious streaming metrics panels!

@mdisibio
Copy link
Contributor

Should we add a new SLO config and SLO metric explicitly for metrics? Currently it piggybacks on search.

I think in the future yes, but not required for this PR. Even after we make the next round of changes that brings the metrics read path much closer to the search read path, I still expect the minimum throughout and latency to be different.

Should we reorg the code layout to be around pipeline? i.e. should we bring the combiner/sharder/handlers all together in modules? They are starting to have dependencies that would be easier to see if they were co-located.

Yeah open to another reshuffle if it makes sense.

@joe-elliott joe-elliott merged commit c59bb01 into grafana:main Apr 19, 2024
15 checks passed
electron0zero added a commit that referenced this pull request May 9, 2024
In #3584 we decided to count metrics queries as part of search slo. 

split metrics slo out of search because metrics are still experimental and should not share the SLO with search.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Tempo Query Frontend Refactor
2 participants