Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: add MVCC point synthesizing iterator #82013

Merged

Conversation

erikgrinaker
Copy link
Contributor

@erikgrinaker erikgrinaker commented May 28, 2022

This patch adds pointSynthesizingIter, an MVCC iterator which wraps an
arbitrary MVCC iterator and synthesizes point keys for range keys at
their start key and where they overlap point keys. It can optionally
synthesize around the SeekGE seek key too, which is useful for point
operations like MVCCGet where we may want to return a synthetic
tombstone for an MVCC range tombstone if there is no existing point key.

This will primarily be used to handle MVCC range tombstones in MVCC
scans and gets, as well as during MVCC conflict checks, which allows
much of this logic to remain unchanged and simplified (in particular,
pebbleMVCCScanner will not need any changes).

However, this patch does not make use of the iterator yet, since both it
and Pebble will need further performance optimizations for use in hot
paths. For now, correctness is sufficient, and only basic attempts at
performance optimization have been made.

Touches #70412.

Release note: None

@erikgrinaker erikgrinaker self-assigned this May 28, 2022
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@erikgrinaker erikgrinaker force-pushed the mvcc-range-tombstones-point-synthesis branch 2 times, most recently from 28c4de1 to 2a7d3f2 Compare May 29, 2022 15:59
@erikgrinaker erikgrinaker force-pushed the mvcc-range-tombstones-point-synthesis branch 4 times, most recently from fea9e1c to e37f39d Compare June 4, 2022 13:33
@erikgrinaker erikgrinaker marked this pull request as ready for review June 4, 2022 13:33
@erikgrinaker erikgrinaker requested a review from a team as a code owner June 4, 2022 13:33
@erikgrinaker erikgrinaker force-pushed the mvcc-range-tombstones-point-synthesis branch from e37f39d to 89a2c63 Compare June 4, 2022 13:47
Copy link
Collaborator

@jbowens jbowens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 5 files at r1, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @aliher1911, @erikgrinaker, @itsbilal, and @nicktrav)


pkg/storage/point_synthesizing_iter.go line 137 at r1 (raw file):

		// Reuse byte slices.
		rangeKeysPos:   i.rangeKeysPos[:0],
		rangeKeysStart: i.rangeKeysStart[:0],

worth reusing the i.rangeKeys slice too?


pkg/storage/point_synthesizing_iter.go line 194 at r1 (raw file):

	} else if !i.reverse {
		i.atPoint = i.rangeKeysIdx >= len(i.rangeKeys) ||
			point.Timestamp.IsEmpty() ||

nit: match the handling of empty timestamp in the forward and reverse cases? (eg, use IsEmpty() in both, or IsSet() in both)


pkg/storage/point_synthesizing_iter.go line 256 at r1 (raw file):

	// point key at the seek key, then we also need to peek backwards for an
	// existing point key above us, which would mandate that we synthesize point
	// keys here after all.

does this requirement come from the mvccscanner's usage? can we refrain from peeking backwards if the range key contains no timestamps ≤ the seek key's timestamp? that would match the behavior of point tombstones.


pkg/storage/point_synthesizing_iter.go line 284 at r1 (raw file):

		// If we're now at a bare range key, it must either be at the start of it,
		// or in the middle with emitOnSeekGE enabled. In either case, we want to
		// move the iterator ahead to look for a point key that may be colocated

does 'colocated' here mean has the same user key?


pkg/storage/point_synthesizing_iter.go line 368 at r1 (raw file):

		if !i.atPoint {
			i.iter.Next()
		}

This (switching directions using NextKey) is prohibited by the MVCCIterator interface. Should we even attempt to support it if it's undefined behavior in the interface? At the very least, I think it could use a comment.

@erikgrinaker erikgrinaker force-pushed the mvcc-range-tombstones-point-synthesis branch from 89a2c63 to 5a2ec4f Compare June 7, 2022 17:33
Copy link
Contributor Author

@erikgrinaker erikgrinaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @aliher1911, @itsbilal, @jbowens, and @nicktrav)


pkg/storage/point_synthesizing_iter.go line 137 at r1 (raw file):

Previously, jbowens (Jackson Owens) wrote…

worth reusing the i.rangeKeys slice too?

Yes, but not yet I think. We're cloning the slice elements every time, which is likely to be much more expensive, and we wouldn't want to hold onto the element references. This whole scheme needs a rethink when we come back around to optimize it, which might involve changes to the MVCCIterator.RangeKeys() data structure too. For now I've focused on correctness.


pkg/storage/point_synthesizing_iter.go line 194 at r1 (raw file):

Previously, jbowens (Jackson Owens) wrote…

nit: match the handling of empty timestamp in the forward and reverse cases? (eg, use IsEmpty() in both, or IsSet() in both)

Think so? I feel like it easier to just read what it says instead of using negatives. No real opinion though, changed it to !IsSet().


pkg/storage/point_synthesizing_iter.go line 256 at r1 (raw file):

Previously, jbowens (Jackson Owens) wrote…

does this requirement come from the mvccscanner's usage? can we refrain from peeking backwards if the range key contains no timestamps ≤ the seek key's timestamp? that would match the behavior of point tombstones.

It was mostly motivated by having predictable behavior. The existence of a synthetic tombstone shouldn't depend on the seek key, since we often optimize iteration by sometimes seeking and sometimes using next, and it seems like we'd want the results to be the same in either case.

However, in the SeekGE case we could avoid this by changing the semantics to only synthesize range tombstones above existing points. I think that might be sufficient for our needs, and would often result in far fewer synthetic tombstones during scans anyway, so it might be worthwhile. However, it wouldn't be entire consistent with the Prefix (i.e. emitOnSeekGE) case though, where we'd synthesize the full set even when there are no point keys. Wdyt?

There's certainly other optimizations we can do here, as you point out. Added a comment with your idea. For now, I've focused on correctness and null-case optimizations, we can come back to optimizations in the presence of range keys later.


pkg/storage/point_synthesizing_iter.go line 284 at r1 (raw file):

Previously, jbowens (Jackson Owens) wrote…

does 'colocated' here mean has the same user key?

Yep. We operate across a roachpb.Key span, so by "key" we typically mean the user key. But I agree that this wording was clumsy (i.e. it talks about being colocated at a different position, which is fairly paradoxical). Reworded.


pkg/storage/point_synthesizing_iter.go line 368 at r1 (raw file):

Previously, jbowens (Jackson Owens) wrote…

This (switching directions using NextKey) is prohibited by the MVCCIterator interface. Should we even attempt to support it if it's undefined behavior in the interface? At the very least, I think it could use a comment.

True. We have best-effort support for it in pebbleIterator already though, and I figured it was easy enough to implement so we might as well have it there in case we ever needed to lift the restriction. But I added a comment, and wouldn't have any qualms about tearing it out either.

Copy link
Collaborator

@jbowens jbowens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm_strong:

Reviewed 1 of 3 files at r2, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @aliher1911, @erikgrinaker, @itsbilal, and @nicktrav)


pkg/storage/point_synthesizing_iter.go line 137 at r1 (raw file):

Previously, erikgrinaker (Erik Grinaker) wrote…

Yes, but not yet I think. We're cloning the slice elements every time, which is likely to be much more expensive, and we wouldn't want to hold onto the element references. This whole scheme needs a rethink when we come back around to optimize it, which might involve changes to the MVCCIterator.RangeKeys() data structure too. For now I've focused on correctness.

makes sense


pkg/storage/point_synthesizing_iter.go line 256 at r1 (raw file):

The existence of a synthetic tombstone shouldn't depend on the seek key, since we often optimize iteration by sometimes seeking and sometimes using next, and it seems like we'd want the results to be the same in either case.

Ah, gotcha.

However, in the SeekGE case we could avoid this by changing the semantics to only synthesize range tombstones above existing points. I think that might be sufficient for our needs, and would often result in far fewer synthetic tombstones during scans anyway, so it might be worthwhile. However, it wouldn't be entire consistent with the Prefix (i.e. emitOnSeekGE) case though, where we'd synthesize the full set even when there are no point keys. Wdyt?

Yeah, that makes sense. My understanding here is hazy, but would it be feasible to eventually replace emitOnSeekGE with something akin to pebbleScanner.failOnMoreRecent? So that we can avoid synthesizing at the seek key but still perform conflict checking?


pkg/storage/point_synthesizing_iter.go line 368 at r1 (raw file):

Previously, erikgrinaker (Erik Grinaker) wrote…

True. We have best-effort support for it in pebbleIterator already though, and I figured it was easy enough to implement so we might as well have it there in case we ever needed to lift the restriction. But I added a comment, and wouldn't have any qualms about tearing it out either.

just the comment is good by me


pkg/storage/point_synthesizing_iter.go line 582 at r2 (raw file):

// assertInvariants asserts iterator invariants.
func (i *pointSynthesizingIter) assertInvariants() error {

love all these well documented & enforced invariants. does cockroachdb have any non-race build tags for including invariant checks? eg, the pebble repo has an invariants build tag which can be nice to include invariant-checking without the significant slowdown of the race checker.

Copy link
Contributor Author

@erikgrinaker erikgrinaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @aliher1911, @itsbilal, @jbowens, and @nicktrav)


pkg/storage/point_synthesizing_iter.go line 256 at r1 (raw file):

would it be feasible to eventually replace emitOnSeekGE with something akin to pebbleScanner.failOnMoreRecent? So that we can avoid synthesizing at the seek key but still perform conflict checking?

Yes, something like that is definitely possible. I've wanted to keep both the scanner and synthesizer general and loosely coupled though. I don't think the emitOnSeekGE kludge is all that bad, but would've liked it better if we could know whether the underlying iterator was a prefix iterator.

in the SeekGE case we could avoid this by changing the semantics to only synthesize range tombstones above existing points. I think that might be sufficient for our needs, and would often result in far fewer synthetic tombstones during scans anyway

I'm getting convinced that this is the way to go -- if nothing else, for the performance gains in not emitting and then processing all of these useless synthetic point tombstones. It does have some loss of generality though, if we ever wanted to synthesize non-tombstone points, or for other use-cases that might care about them. It's possible that I'll keep both modes if the code delta turns out to be trivial. Will update the PR tomorrow.


pkg/storage/point_synthesizing_iter.go line 582 at r2 (raw file):
Thanks! Yeah, I think I prefer this pattern over littering invariant checks throughout the main code.

does cockroachdb have any non-race build tags for including invariant checks?

I think race is the de facto build tag for stuff like this. We have a crdb_test tag that used in test builds though, and I've seen SQL use it a fair bit for stuff like this, but in KV land or for more expensive checks I think I've only ever seen the race tag used. I agree that it might be nice to use some other tag orthogonal to race detection though.

Copy link
Contributor Author

@erikgrinaker erikgrinaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @aliher1911, @itsbilal, @jbowens, and @nicktrav)


pkg/storage/point_synthesizing_iter.go line 256 at r1 (raw file):

Will update the PR tomorrow.

On second thought, I think I'll merge this as-is and submit a separate PR with this change later. I'd like to unblock other teams asap, and this is correct and reasonably fast in the no-range-key case.

This patch adds `pointSynthesizingIter`, an MVCC iterator which wraps an
arbitrary MVCC iterator and synthesizes point keys for range keys at
their start key and where they overlap point keys. It can optionally
synthesize around the SeekGE seek key too, which is useful for point
operations like `MVCCGet` where we may want to return a synthetic
tombstone for an MVCC range tombstone if there is no existing point key.

This will primarily be used to handle MVCC range tombstones in MVCC
scans and gets, as well as during MVCC conflict checks, which allows
much of this logic to remain unchanged and simplified (in particular,
`pebbleMVCCScanner` will not need any changes).

However, this patch does not make use of the iterator yet, since both it
and Pebble will need further performance optimizations for use in hot
paths. For now, correctness is sufficient, and only basic attempts at
performance optimization have been made.

Release note: None
@erikgrinaker erikgrinaker force-pushed the mvcc-range-tombstones-point-synthesis branch from 5a2ec4f to d07bf3a Compare June 8, 2022 21:33
@erikgrinaker
Copy link
Contributor Author

TFTR!

bors r=jbowens

@craig
Copy link
Contributor

craig bot commented Jun 9, 2022

Build succeeded:

@craig craig bot merged commit 30d4c9d into cockroachdb:master Jun 9, 2022
@erikgrinaker erikgrinaker deleted the mvcc-range-tombstones-point-synthesis branch June 9, 2022 22:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants