Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(bloom planner): Compute gaps and build tasks from metas and TSDBs #12994

Merged
merged 11 commits into from
May 21, 2024

Conversation

salvacorts
Copy link
Contributor

@salvacorts salvacorts commented May 20, 2024

What this PR does / why we need it:
This PR copies logic from the bloom compactor to the new bloom planner component. This is what it does:

  1. Every planning_interval:
  2. Load all TSDB tables from today() - MaxTableOffset (default 2) to today() - MinTableOffset (default 1). For each:
  3. Parse the tenants on each table and for each
  4. Check if building blooms is enabled for the tenant
  5. Split the FP keyspace into a per-tenant configurable set of bounds
  6. Fetch all the metas for the given tenant and find gaps comparing the TSDB against the metas
  7. Create a task for each set of gaps within each FP bound

This is the definition of a task:

  • Table name
  • Tenant ID
  • FP Ownership bounds
  • TSDB file ID
  • List of gaps along with block refs overlapping the gaps

Special notes for your reviewer:

  • Most of the logic and tests are copied right away from the bloom compactor implementation.
  • This is how I'd review this PR:
    • Skim through the copied files (no changes were made):
      • tsdb.go and tasb_test.go
      • At ultil.go I moved the findGaps function from the compactor controller and renamed to FindGapsInFingerprintBounds. I also copied its test to util_test.go
      • At tableIterator.go I extracted the dayRangeIterator. The code is the same as the one from the compactor.
    • Focus on planner.go.
      • Note that the following functions and their tests were directly copied from the compactor:
        • planner.tables
        • gapsBetweenTSDBsAndMetas
        • blockPlansForGaps
      • So I'd focus on:
        • planner.runOne
        • planner.loadWork
    • Also look at the config and docs generated.

Checklist

  • Reviewed the CONTRIBUTING.md guide (required)
  • Documentation added
  • Tests updated
  • Title matches the required conventional commits format, see here
    • Note that Promtail is considered to be feature complete, and future development for logs collection will be in Grafana Alloy. As such, feat PRs are unlikely to be accepted unless a case can be made for the feature actually being a bug fix to existing behavior.
  • Changes that require user attention or interaction to upgrade are documented in docs/sources/setup/upgrade/_index.md
  • For Helm chart changes bump the Helm chart version in production/helm/loki/Chart.yaml and update production/helm/loki/CHANGELOG.md and production/helm/loki/README.md. Example PR
  • If the change is deprecating or removing a configuration option, update the deprecated-config.yaml and deleted-config.yaml files respectively in the tools/deprecated-config-checker directory. Example PR

@github-actions github-actions bot added the type/docs Issues related to technical documentation; the Docs Squad uses this label across many repositories label May 20, 2024
@salvacorts salvacorts marked this pull request as ready for review May 20, 2024 15:05
@salvacorts salvacorts requested a review from a team as a code owner May 20, 2024 15:05
func (cfg *Config) RegisterFlagsWithPrefix(_ string, _ *flag.FlagSet) {
// TODO: Register flags with flagsPrefix
func (cfg *Config) RegisterFlagsWithPrefix(prefix string, f *flag.FlagSet) {
f.DurationVar(&cfg.PlanningInterval, prefix+".interval", 10*time.Minute, "Interval at which to re-run the bloom creation planning.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: IMO, 10m is too frequent, it's more frequent than the TSDB index is compacted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it to 8 hours, so it runs three times a day. Wdyt


case <-ticker.C:
if err := p.runOne(ctx); err != nil {
return err
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any error would stop the planner service. Is this by intention?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect an error to be logged and an error counter to be increased, but not the service to be shut down.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That looked a bit weird to me as well when I copied it from the bloom compactor:

if err := c.runOne(ctx); err != nil {
return err
}

I'll log the error instead.

error counter to be increased

That's already done inside the runOne function:

status = statusFailure
)
defer func() {
p.metrics.buildCompleted.WithLabelValues(status).Inc()

func (p *Planner) tables(ts time.Time) *dayRangeIterator {
// adjust the minimum by one to make it inclusive, which is more intuitive
// for a configuration variable
adjustedMin := min(p.cfg.MinTableOffset - 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is min() used for here with a single argument?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I copied over that function. I think that min is probably a leftover. Removed.

@@ -205,6 +205,9 @@ type Limits struct {
BloomCompactorMaxBlockSize flagext.ByteSize `yaml:"bloom_compactor_max_block_size" json:"bloom_compactor_max_block_size" category:"experimental"`
BloomCompactorMaxBloomSize flagext.ByteSize `yaml:"bloom_compactor_max_bloom_size" json:"bloom_compactor_max_bloom_size" category:"experimental"`

BloomCreationEnabled bool `yaml:"bloom_creation_enabled" json:"bloom_creation_enabled" category:"experimental"`
BloomSplitSeriesKeyspaceByFactor int `yaml:"bloom_split_series_keyspace_by_factor" json:"bloom_split_series_keyspace_by_factor" category:"experimental"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Personally I find the name factor not quite right, because it implies that it is used to multiply with something.
I would name it something with series shard size.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't use shard since it's not sharding at all. What about bloom_split_series_keyspace_by: 256. I think it reads good enough.

Note that wi'll likely replace this keyspace split by something smarter using TSDB stats soon, so I wouldn't worry too much about naming here.

@salvacorts salvacorts requested a review from chaudum May 21, 2024 10:18
@salvacorts salvacorts enabled auto-merge (squash) May 21, 2024 11:08
@salvacorts salvacorts merged commit 3195036 into main May 21, 2024
60 checks passed
@salvacorts salvacorts deleted the salvacorts/bloom-refactor/build-tasks branch May 21, 2024 11:12
jotak pushed a commit to jotak/loki that referenced this pull request May 21, 2024
…TSDBs (grafana#12994)

Signed-off-by: Joel Takvorian <jtakvori@redhat.com>
trevorwhitney added a commit that referenced this pull request May 24, 2024
commit 0bfd0ad
Merge: 68aa188 efdae3d
Author: Trevor Whitney <trevorjwhitney@gmail.com>
Date:   Thu May 23 17:04:32 2024 -0600

    Merge branch 'main' into sample-count-and-bytes

commit 68aa188
Author: Trevor Whitney <trevorjwhitney@gmail.com>
Date:   Thu May 23 17:03:32 2024 -0600

    feat: guard aggregation behavior behind a feature flag

commit efdae3d
Author: hayden <haydenfuss@gmail.com>
Date:   Thu May 23 16:25:50 2024 -0400

    feat(helm): Support for PVC Annotations for Non-Distributed Modes (#12023)

    Signed-off-by: hfuss <hayden.fuss@kaleido.io>
    Co-authored-by: J Stickler <julie.stickler@grafana.com>
    Co-authored-by: Trevor Whitney <trevorjwhitney@gmail.com>

commit f0d6a92
Author: Trevor Whitney <trevorjwhitney@gmail.com>
Date:   Thu May 23 14:03:32 2024 -0600

    feat: reject filter queries to /patterns endpoint

commit dc620e7
Author: Trevor Whitney <trevorjwhitney@gmail.com>
Date:   Wed May 8 14:08:44 2024 -0600

    feat: collect and serve pre-agg bytes and count

    * pre-aggregate bytes and count per stream in the pattern ingester
    * serve bytes_over_time and count_over_time queries from the patterns
      endpoint

commit 97212ea
Author: Jay Clifford <45856600+Jayclifford345@users.noreply.github.com>
Date:   Thu May 23 12:10:48 2024 -0400

    feat: Added Interactive Sandbox to Quickstart tutorial (#12701)

commit 1111595
Author: Vladyslav Diachenko <82767850+vlad-diachenko@users.noreply.github.com>
Date:   Thu May 23 13:18:16 2024 +0300

    feat: new stream count limiter (#13006)

    Signed-off-by: Vladyslav Diachenko <vlad.diachenko@grafana.com>
    Co-authored-by: JordanRushing <rushing.jordan@gmail.com>

commit 987e551
Author: Quentin Bisson <quentin@giantswarm.io>
Date:   Thu May 23 02:15:52 2024 +0200

    fix: allow cluster label override in bloom dashboards (#13012)

    Signed-off-by: QuentinBisson <quentin@giantswarm.io>

commit d3c9cec
Author: Quentin Bisson <quentin@giantswarm.io>
Date:   Thu May 23 01:59:28 2024 +0200

    fix: upgrade old plugin for the loki-operational dashboard. (#13016)

    Signed-off-by: QuentinBisson <quentin@giantswarm.io>

commit 8d9fb68
Author: Quentin Bisson <quentin@giantswarm.io>
Date:   Wed May 22 22:00:08 2024 +0200

    fix: remove unneccessary disk panels for ssd read path (#13014)

    Signed-off-by: QuentinBisson <quentin@giantswarm.io>

commit 1948899
Author: Quentin Bisson <quentin@giantswarm.io>
Date:   Wed May 22 15:16:29 2024 +0200

    fix: Mixins - Add missing log datasource on loki-deletion (#13011)

commit efd8f5d
Author: Salva Corts <salva.corts@grafana.com>
Date:   Wed May 22 10:43:32 2024 +0200

    refactor(blooms): Add queue to bloom planner and enqueue tasks (#13005)

commit d6f29fc
Author: Vitor Gomes <41302394+vitoorgomes@users.noreply.github.com>
Date:   Wed May 22 04:34:42 2024 +1200

    docs: update otlp ingestion with correct endpoint and add endpoint to reference api docs (#12996)

commit 3195036
Author: Salva Corts <salva.corts@grafana.com>
Date:   Tue May 21 13:12:24 2024 +0200

    refactor(bloom planner): Compute gaps and build tasks from metas and TSDBs  (#12994)

commit 7a3338e
Author: Jonathan Davies <jpds@protonmail.com>
Date:   Tue May 21 10:41:42 2024 +0100

    feat: loki/main.go: Log which config file path is used on startup (#12985)

    Co-authored-by: Michel Hollands <42814411+MichelHollands@users.noreply.github.com>

commit bf8a278
Author: Ashwanth <iamashwanth@gmail.com>
Date:   Tue May 21 12:56:07 2024 +0530

    chore: remove duplicate imports (#13001)

commit 1f5291a
Author: Ashwanth <iamashwanth@gmail.com>
Date:   Tue May 21 12:38:02 2024 +0530

    fix(indexstats): do not collect stats from "IndexStats" lookups for other query types (#12978)

commit 8442dca
Author: Jay Clifford <45856600+Jayclifford345@users.noreply.github.com>
Date:   Mon May 20 17:52:17 2024 -0400

    feat: Added getting started video (#12975)

commit 75ccf21
Author: Christian Haudum <christian.haudum@gmail.com>
Date:   Mon May 20 17:14:40 2024 +0200

    feat(blooms): Separate page buffer pools for series pages and bloom pages (#12992)

    Series pages are much smaller than bloom pages and therefore can make use of a separate buffer pool with different buckets.

    The second commit fixes a possible panic.

    Signed-off-by: Christian Haudum <christian.haudum@gmail.com>

commit 94d610e
Author: Yarden Shoham <git@yardenshoham.com>
Date:   Mon May 20 18:05:50 2024 +0300

    docs: Fix broken link in the release notes (#12990)

    Co-authored-by: J Stickler <julie.stickler@grafana.com>

commit 31a1314
Author: choeffer <christian.hoeffer@maibornwolff.de>
Date:   Mon May 20 16:39:25 2024 +0200

    docs(install-monolithic): add quotation marks (#12982)

    Co-authored-by: Michel Hollands <42814411+MichelHollands@users.noreply.github.com>

commit 8978ecf
Author: Salva Corts <salva.corts@grafana.com>
Date:   Mon May 20 12:36:22 2024 +0200

    feat: Boilerplate for new bloom build planner and worker components. (#12989)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size/XXL type/docs Issues related to technical documentation; the Docs Squad uses this label across many repositories
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants