Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Making the number of CPUs used for WAL replay configurable #4445

Merged
merged 6 commits into from
Mar 10, 2023

Conversation

duricanikolic
Copy link
Contributor

@duricanikolic duricanikolic commented Mar 9, 2023

What this PR does

The version of prometheus is upgraded (via mimir-prometheus) to the one corresponding to this commit, which introduced WALReplayConcurrency as an option on tsdb Options and HeadOptions. This option represents the maximal number of CPUs used during WAL replay on TSDB opening. If not set to a positive number, all available CPUs are used.

WALReplayConcurrency has been added as a tsdb option on mimir side as well, and it is configurable via the newly added advanced configuration parameter -blocks-storage.tsdb.wal-replay-concurrency, whose default value is 0. The latter is supposed to override the -blocks-storage.tsdb.max-tsdb-opening-concurrency-on-startup, which is now deprecated, and will be removed from Mimir 2.10. In order to guarantee backwards compatibility, the following logic is used:

  • if -blocks-storage.tsdb.wal-replay-concurrency is set to a positive number, the deprecated -blocks-storage.tsdb.max-tsdb-opening-concurrency-on-startup is ignored.
  • otherwise, -blocks-storage.tsdb.max-tsdb-opening-concurrency-on-startup is used and the behavior is identical to the one before this PR.

Setting -blocks-storage.tsdb.wal-replay-concurrency to a positive value imposes the maximal number of concurrent processes to be used on ingesters startup. The heuristic used is the following:

  • if there are more than 10 tenants, opening of TSDBs is done by -blocks-storage.tsdb.wal-replay-concurrency concurrent processes, each of them using only 1 process for WAL replay (i.e., prometheus option WALReplayConcurrency is set to 1).
  • otherwise, opening of TSDBs id done by only 1 process, which uses -blocks-storage.tsdb.wal-replay-concurrency concurrent processes for WAL replay ((i.e., prometheus option WALReplayConcurrency is set to -blocks-storage.tsdb.wal-replay-concurrency).

Which issue(s) this PR fixes or relates to

Part of https://github.com/grafana/mimir-squad/issues/1040

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

@duricanikolic duricanikolic requested review from a team as code owners March 9, 2023 17:09
Copy link
Contributor

@treid314 treid314 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style suggestions mostly, I think what's here makes sense. When Prometheus got updated it looks like they moved the tsdb.GenerateTestHistogram function to tsdbutil. which is causing the unit test and linting issues. Not sure if we have a mainline PR open to update Prometheus yet but I don't think we should be doing that here.

cmd/mimir/config-descriptor.json Outdated Show resolved Hide resolved
pkg/ingester/ingester.go Outdated Show resolved Hide resolved
pkg/ingester/ingester.go Outdated Show resolved Hide resolved
pkg/ingester/shipper.go Outdated Show resolved Hide resolved
pkg/ingester/ingester.go Outdated Show resolved Hide resolved
pkg/ingester/ingester.go Outdated Show resolved Hide resolved
@56quarters
Copy link
Contributor

The referenced pull request #1040 seems unrelated to this change?

CHANGELOG.md Outdated Show resolved Hide resolved
@pracucci pracucci self-requested a review March 10, 2023 11:35
Copy link
Collaborator

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some notes:

  • List the deprecated config option under docs/sources/mimir/operators-guide/configure/about-versioning.md
  • In the vendored Prometheus they've added lookback_delta parameter to the instant and range query API. To be 100% Prometheus compatible we should support it too. This is not expected to be done in this PR, but please open an issue in the Mimir repo about it and kick off the discussion whether we should add support for it (I think so)
  • I checked other vendored changes and LGTM

pkg/mimir/mimir_test.go Outdated Show resolved Hide resolved
pkg/storage/tsdb/config.go Outdated Show resolved Hide resolved
pkg/storage/tsdb/config.go Outdated Show resolved Hide resolved
pkg/ingester/ingester.go Outdated Show resolved Hide resolved
pkg/ingester/ingester.go Outdated Show resolved Hide resolved
pkg/ingester/ingester.go Outdated Show resolved Hide resolved
pkg/ingester/ingester.go Outdated Show resolved Hide resolved
pkg/ingester/ingester.go Outdated Show resolved Hide resolved
pkg/ingester/ingester.go Outdated Show resolved Hide resolved
Copy link
Collaborator

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, very good. Thanks for addressing my previous comments! I left few last nits and then I will be more than happy to merge it 😉

pkg/storage/tsdb/config.go Outdated Show resolved Hide resolved
pkg/ingester/ingester.go Outdated Show resolved Hide resolved
pkg/ingester/ingester.go Outdated Show resolved Hide resolved
pkg/ingester/ingester.go Outdated Show resolved Hide resolved
Copy link
Collaborator

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, modulo a last nit :)

pkg/ingester/ingester_test.go Outdated Show resolved Hide resolved
@pracucci pracucci enabled auto-merge (squash) March 10, 2023 14:55
@pracucci pracucci merged commit 7c906d6 into main Mar 10, 2023
@pracucci pracucci deleted the yuri/wal-replay-optimization branch March 10, 2023 14:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants