Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(blooms)!: Introduce a new block schema (V3) #14038

Merged
merged 10 commits into from
Sep 5, 2024
Merged

Conversation

chaudum
Copy link
Contributor

@chaudum chaudum commented Sep 4, 2024

What this PR does / why we need it:

A new schema version V3 for bloom blocks that is incompatible with old schemas.
It simplifies some parts of the code and makes the schema more extensible, laying the groundwork for indexing structured metadata fields into bloom filters.

⚠️ This PR breaks the bloom filter functionality! Until the read path is implemented, bloom blocks created with the new schema cannot be queries in the bloom gateways.

Checklist

  • Reviewed the CONTRIBUTING.md guide (required)
  • Documentation added
  • Tests updated
  • Title matches the required conventional commits format, see here
    • Note that Promtail is considered to be feature complete, and future development for logs collection will be in Grafana Alloy. As such, feat PRs are unlikely to be accepted unless a case can be made for the feature actually being a bug fix to existing behavior.
  • Changes that require user attention or interaction to upgrade are documented in docs/sources/setup/upgrade/_index.md
  • For Helm chart changes bump the Helm chart version in production/helm/loki/Chart.yaml and update production/helm/loki/CHANGELOG.md and production/helm/loki/README.md. Example PR
  • If the change is deprecating or removing a configuration option, update the deprecated-config.yaml and deleted-config.yaml files respectively in the tools/deprecated-config-checker directory. Example PR

Copy link
Member

@rfratto rfratto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, I was able to follow along here fairly well. I didn't spot any major issues, so my comments are mainly questions rather than anything blocking.

@@ -145,6 +145,11 @@ func (p *processor) processBlock(_ context.Context, bq *bloomshipper.CloseableBl
return err
}

// We require V3 schema
if !schema.IsCurrentSchema() {
return v1.ErrUnsupportedSchemaVersion
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that processor.processTasks aborts after the first task group where p.processTasksForDay returns an error. Is that something we want to keep?

Especially with this most recent change, I believe that means if any of the existing blocks aren't the current schema version, then we don't process any of newer ones, even if they're valid.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, good point. We could ignore incompatible blocks, rather than fail.
But for the sake of simplicity, I would cancel a request with the first incompatible block.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me. If this turns out to be an issue in the future we can replace it with multierror instead of returning immediately.

pkg/storage/bloom/v1/builder_test.go Show resolved Hide resolved
pkg/bloomgateway/processor.go Outdated Show resolved Hide resolved
pkg/storage/bloom/v1/index.go Show resolved Hide resolved
pkg/storage/bloom/v1/util.go Outdated Show resolved Hide resolved
@chaudum chaudum marked this pull request as ready for review September 5, 2024 07:27
@chaudum chaudum requested a review from a team as a code owner September 5, 2024 07:27
@chaudum chaudum changed the title chore(blooms): Introduce a new block schema (V3) chore(blooms)!: Introduce a new block schema (V3) Sep 5, 2024
Copy link
Member

@rfratto rfratto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good from my side, Salva might see something I don't

Copy link
Contributor

@salvacorts salvacorts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👏

@@ -46,6 +46,8 @@ func (b *BloomBlockBuilder) Append(bloom *Bloom) (BloomOffset, error) {
}
}

// version := b.opts.Schema.version
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

@@ -66,9 +66,21 @@ func (b BlockOptions) Encode(enc *encoding.Encbuf) {
enc.PutBE64(b.BlockSize)
}

// func NewDefaultBlockOptions(maxBlockSizeBytes, maxBloomSizeBytes uint64) BlockOptions {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

// V2 supports single series blooms encoded over multiple pages
// to accommodate larger single series
V2
// V2 indicated schema for indexed structured metadata
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// V2 indicated schema for indexed structured metadata
// V3 indicated schema for indexed structured metadata

type Schema struct {
version Version
encoding chunkenc.Encoding
nGramLength, nGramSkip uint64
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, we are going to remove this once we remove the ngrams, right? But we will keep V3

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we will remove them later on

opts: opts,
writer: writer,
index: NewIndexBuilder(opts, index),
blooms: NewBloomBlockBuilder(opts, blooms),
}, nil
}

func (b *V2Builder) BuildFrom(itr iter.Iterator[SeriesWithBlooms]) (uint32, error) {
func (b *V3Builder) BuildFrom(itr iter.Iterator[SeriesWithBlooms]) (uint32, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have two builders: the merge builder and the regular builder. But IIUC, we only use the BuildFrom from the merge builder, right? If so, we should remove this method (in a separate PR probably).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BuildFrom is actually only used in tests for building "literal blooms".


type V2Builder struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will we support building older versions of blocks? If not, does it make sense to keep V3Builder in case we add a V4Builder in the future? Or should we name this Builder right away and keep changing it as we add versions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At some point when the feature is GA, yes. For now we can make all changes to the V3Builder.

Got rid of backwards compatibility shenanigans
and introduced a new version that includes the indexed fields.

Signed-off-by: Christian Haudum <christian.haudum@gmail.com>
Signed-off-by: Christian Haudum <christian.haudum@gmail.com>
Signed-off-by: Christian Haudum <christian.haudum@gmail.com>
Signed-off-by: Christian Haudum <christian.haudum@gmail.com>
Signed-off-by: Christian Haudum <christian.haudum@gmail.com>
Signed-off-by: Christian Haudum <christian.haudum@gmail.com>
Signed-off-by: Christian Haudum <christian.haudum@gmail.com>
Signed-off-by: Christian Haudum <christian.haudum@gmail.com>
Signed-off-by: Christian Haudum <christian.haudum@gmail.com>
Signed-off-by: Christian Haudum <christian.haudum@gmail.com>
@chaudum chaudum merged commit 5395daf into main Sep 5, 2024
60 checks passed
@chaudum chaudum deleted the chaudum/block-schema branch September 5, 2024 16:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants