Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update vendored prometheus #8295

Merged
merged 5 commits into from
Jun 7, 2024
Merged

Update vendored prometheus #8295

merged 5 commits into from
Jun 7, 2024

Conversation

pracucci
Copy link
Collaborator

@pracucci pracucci commented Jun 6, 2024

What this PR does

In this PR I'm updating mimir-prometheus to get the changes synced in grafana/mimir-prometheus#639.

The main change affecting Mimir is the introduction of query_offset in rule group config, which deprecates evaluation_delay. In this PR I propose to do the safest change in Mimir, which is mapping query_offset and evaluation_delay independently in the rule group config we store in the object storage, so that when a tenant downloads a previously loaded config they get the same exact YAML. This is a property I would like to preserve for now, in case any customer runs a reconcile loop comparing the local YAML config with the one stored in Mimir (read via API): I prefer to make sure the two YAMLs always match (which wouldn't be true if we manipulate the YAML config to set query_offset to the value of evaluation_delay when it's not set).

Manual tests

I've run manual tests with mimirtool downloaded from Mimir 2.12.0 release and mimirtool built on top of this PR.

Test results against Mimir running from this branch:

  • Upload rule config
    • mimirtool 2.12.0
      • Supports evaluation_delay
      • Fails parsing if config contains query_offset (expected)
    • mimirtool this branch
      • Supports evaluation_delay and query_offset
  • Get rule config
    • mimirtool 2.12.0
      • Supports evaluation_delay
      • Does not display query_offset if remote config contains it (expected), but doesn't fail
    • mimirtool this branch
      • Supports evaluation_delay and query_offset

Test results against Mimir running from main:

  • Upload rule config
    • mimirtool this branch
      • Supports evaluation_delay
      • Upload succeed if config contains query_offset but server ignores it (a subsequent rules get doesn't return it)
  • Get rule config
    • mimirtool this branch
      • Supports evaluation_delay

Which issue(s) this PR fixes or relates to

N/A

Checklist

  • Tests updated.
  • Documentation added.
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX].
  • about-versioning.md updated with experimental features.

…roup config and deprecating evaluation_delay

Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
@@ -82,6 +83,7 @@
* [BUGFIX] Store-gateway: Allow long-running index scans to be interrupted. #8154
* [BUGFIX] Query-frontend: fix splitting of queries using `@ start()` and `@end()` modifiers on a subquery. Previously the `start()` and `end()` would be evaluated using the start end end of the split query instead of the original query. #8162
* [BUGFIX] Distributor: Don't discard time series with invalid exemplars, just drop affected exemplars. #8224
* [BUGFIX] Ingester: fixed in-memory series count when replaying a corrupted WAL. #8295
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to reviewers: this is related to prometheus/prometheus#14079.

@@ -211,3 +211,8 @@ The following features or configuration parameters are currently deprecated and
- `-ingester.client.report-grpc-codes-in-instrumentation-label-enabled`
- Mimirtool
- the flag `--rule-files`

The following features or configuration parameters are currently deprecated and will be **removed in a future release (to be announced)**:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to reviewers: I want to keep evaluation_delay support for longer.

@@ -1071,7 +1124,7 @@ rules:

router.ServeHTTP(w, req)
require.Equal(t, 200, w.Code)
require.YAMLEq(t, tt.output, w.Body.String())
require.YAMLEq(t, tt.input, w.Body.String())
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to reviewers: we can simplify the test. We expect to receive the same YAML we uploaded.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree, although isn't that because we don't test 0 value here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. If we would test the 0 value the output would be different

Copy link
Contributor

@krajorama krajorama left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, small typo nit

pracucci and others added 2 commits June 6, 2024 18:51
Signed-off-by: Marco Pracucci <marco@pracucci.com>
Co-authored-by: George Krajcsovits <krajorama@users.noreply.github.com>
@pracucci pracucci marked this pull request as ready for review June 6, 2024 16:51
@pracucci pracucci requested review from grafanabot, a team and jdbaldry as code owners June 6, 2024 16:51
@pracucci pracucci enabled auto-merge (squash) June 6, 2024 16:52
Copy link
Contributor

@gotjosh gotjosh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, one comment, non-blocking, I don't need to look at this again.

Thanks again for bringing these changes in.

@@ -330,7 +330,7 @@ func DefaultTenantManagerFactory(
ForGracePeriod: cfg.ForGracePeriod,
ResendDelay: cfg.ResendDelay,
AlwaysRestoreAlertState: true,
DefaultEvaluationDelay: func() time.Duration {
DefaultRuleQueryOffset: func() time.Duration {
// Delay the evaluation of all rules by a set interval to give a buffer
// to metric that haven't been forwarded to Mimir yet.
return overrides.EvaluationDelay(userID)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit [non-blocking]: At some point, we probably want to create a new limit so that it matches the description a bit more accurately. Happy to create an issue for it if you think it's worth it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree we should but looks very low priority. I'm fine creating an issue for now.

@pracucci pracucci merged commit 1fcecee into main Jun 7, 2024
29 checks passed
@pracucci pracucci deleted the update-mimir-prometheus branch June 7, 2024 06:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants