Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

db: add external sstable merging iterator #1529

Merged
merged 1 commit into from
Feb 24, 2022

Conversation

jbowens
Copy link
Collaborator

@jbowens jbowens commented Feb 22, 2022

Add a pebble.NewExternalIter function that may be used to construct a
*pebble.Iterator that reads from a provided slice of sstables rather than
committed database state. Input sstables are required to contain all
zero-sequence number keys. Shadowing of keys is resolved by treating the files
as ordered in reverse chronological order.

This iterator is intended to replace the storage package's multiIterator.

Close #1526.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Collaborator Author

@jbowens jbowens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still need to flesh out test coverage.

Reviewable status: 0 of 10 files reviewed, all discussions resolved (waiting on @erikgrinaker, @nicktrav, and @sumeerbhola)

Copy link
Contributor

@erikgrinaker erikgrinaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! How should overlapping range keys across SSTs be handled? This is relevant for backups, as we discussed, where two SSTs may have range keys [a-d)@3 and [c-e)@3 because we're resuming the second export from e.g. c@5. In these cases, the suffixes and values for these range keys will be the same.

I suppose if we seek to c@6 then we'd hit the first range key, exposed as [a-d), and when we get to c@5 we'd expose [c-e)? Or will we somehow merge them or truncate the bounds?

Reviewed 10 of 10 files at r1, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @jbowens, @nicktrav, and @sumeerbhola)


external_iterator.go, line 26 at r1 (raw file):

	o *Options,
	iterOpts *IterOptions,
	maxTableFormat sstable.TableFormat,

Do we need this? Shouldn't Pebble just error on table formats greater than the ones it knows about, and accept the rest?

Copy link
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose if we seek to c@6 then we'd hit the first range key, exposed as [a-d), and when we get to c@5 we'd expose [c-e)? Or will we somehow merge them or truncate the bounds?

My understanding is that it will be the merged bound since we are using the defragmenting iter in this code.

Reviewed 6 of 10 files at r1.
Reviewable status: 7 of 10 files reviewed, 2 unresolved discussions (waiting on @erikgrinaker, @jbowens, @nicktrav, and @sumeerbhola)


external_iterator.go, line 20 at r1 (raw file):

// input files slice must be sorted in reverse chronological ordering. A key in
// an a file at a lower index will shadow a key with an identical user key
// contained within a file at a higher index.

btw, in the CockroachDB use case do we expect

  • anything other than trivial shadowing (e.g. same point key => value).
  • range key unsets or deletes

sstable/block.go, line 993 at r1 (raw file):

}

type rangeKeyIter struct {

This code repetition is unfortunate.
Can we share the same code if we specify the constraint on what key kinds are permitted when constructing it?

Copy link
Collaborator Author

@jbowens jbowens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the defragmenting iterator will merge the bounds. I added a test case demonstrating the merging.

Reviewable status: 7 of 10 files reviewed, 2 unresolved discussions (waiting on @erikgrinaker, @nicktrav, and @sumeerbhola)


external_iterator.go, line 20 at r1 (raw file):

Previously, sumeerbhola wrote…

btw, in the CockroachDB use case do we expect

  • anything other than trivial shadowing (e.g. same point key => value).
  • range key unsets or deletes

My understanding is that those are the only cases of shadowing we expect in CockroachDB ^


external_iterator.go, line 26 at r1 (raw file):

Previously, erikgrinaker (Erik Grinaker) wrote…

Do we need this? Shouldn't Pebble just error on table formats greater than the ones it knows about, and accept the rest?

Good call, removed!


sstable/block.go, line 993 at r1 (raw file):

Previously, sumeerbhola wrote…

This code repetition is unfortunate.
Can we share the same code if we specify the constraint on what key kinds are permitted when constructing it?

Yeah, the writing was already on the wall for this reuse of blockIter to directly implement keyspan.FragmentIterator, since we intend to disentangle keyspan.FragmentIterator and base.InternalIterator. I updated this type to be used for both range deletions and range keys, and to explicitly implement keyspan.FragmentIterator rather than relying on the blockIter embedded type to transparently satisfy it.

Copy link
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 2 of 3 files at r2, 4 of 4 files at r3.
Reviewable status: all files reviewed (commit messages unreviewed), 3 unresolved discussions (waiting on @erikgrinaker, @itsbilal, @jbowens, @nicktrav, and @sumeerbhola)


sstable/block.go, line 995 at r3 (raw file):

	// TODO(jackson): Remove remaining dependencies on the internal
	// value being propagated via positioning methods and refactor the
	// FragmentIterator to be indepenedent of base.InternalIterator.

independent

Copy link
Contributor

@nicktrav nicktrav left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! :lgtm:

Reviewed 10 of 10 files at r1, 3 of 3 files at r2, 4 of 4 files at r3, all commit messages.
Reviewable status: all files reviewed, 5 unresolved discussions (waiting on @erikgrinaker, @itsbilal, @jbowens, and @sumeerbhola)


external_iterator.go, line 19 at r3 (raw file):

// Input sstables may contain point keys, range keys, range deletions, etc. The
// input files slice must be sorted in reverse chronological ordering. A key in
// an a file at a lower index will shadow a key with an identical user key

nit: key in a file


sstable/block.go, line 968 at r3 (raw file):

	// decode the end key from a fragment's raw internal value. Range
	// deletions store the end key directly as the value, whereas range

nit: whereas other range key kinds ...


testdata/external_iterator, line 1 at r3 (raw file):

build 1

Great test!

Add a pebble.NewExternalIter function that may be used to construct a
*pebble.Iterator that reads from a provided slice of sstables rather than
committed database state. Input sstables are required to contain all
zero-sequence number keys. Shadowing of keys is resolved by treating the files
as ordered in reverse chronological order.

This iterator is intended to replace the storage package's multiIterator.
Copy link
Collaborator Author

@jbowens jbowens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTRs!

Reviewable status: 7 of 11 files reviewed, 5 unresolved discussions (waiting on @erikgrinaker, @itsbilal, @jbowens, @nicktrav, and @sumeerbhola)

@jbowens jbowens merged commit e058941 into cockroachdb:master Feb 24, 2022
@jbowens jbowens deleted the new-external-iter branch February 24, 2022 01:36
Copy link
Contributor

@erikgrinaker erikgrinaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the defragmenting iterator will merge the bounds. I added a test case demonstrating the merging.

Great stuff, thanks for getting this in so fast!

Reviewed all commit messages.
Reviewable status: 7 of 11 files reviewed, 4 unresolved discussions


external_iterator.go, line 20 at r1 (raw file):

Previously, jbowens (Jackson Owens) wrote…

My understanding is that those are the only cases of shadowing we expect in CockroachDB ^

I believe that's accurate, yeah. But I'm not intimately familiar with the bulk IO code.

@nicktrav nicktrav mentioned this pull request Feb 26, 2022
29 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

db: add external sstable iterator interface
5 participants