-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage: add MVCC point synthesizing iterator #82013
storage: add MVCC point synthesizing iterator #82013
Conversation
28c4de1
to
2a7d3f2
Compare
fea9e1c
to
e37f39d
Compare
e37f39d
to
89a2c63
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 5 files at r1, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aliher1911, @erikgrinaker, @itsbilal, and @nicktrav)
pkg/storage/point_synthesizing_iter.go
line 137 at r1 (raw file):
// Reuse byte slices. rangeKeysPos: i.rangeKeysPos[:0], rangeKeysStart: i.rangeKeysStart[:0],
worth reusing the i.rangeKeys
slice too?
pkg/storage/point_synthesizing_iter.go
line 194 at r1 (raw file):
} else if !i.reverse { i.atPoint = i.rangeKeysIdx >= len(i.rangeKeys) || point.Timestamp.IsEmpty() ||
nit: match the handling of empty timestamp in the forward and reverse cases? (eg, use IsEmpty()
in both, or IsSet()
in both)
pkg/storage/point_synthesizing_iter.go
line 256 at r1 (raw file):
// point key at the seek key, then we also need to peek backwards for an // existing point key above us, which would mandate that we synthesize point // keys here after all.
does this requirement come from the mvccscanner's usage? can we refrain from peeking backwards if the range key contains no timestamps ≤ the seek key's timestamp? that would match the behavior of point tombstones.
pkg/storage/point_synthesizing_iter.go
line 284 at r1 (raw file):
// If we're now at a bare range key, it must either be at the start of it, // or in the middle with emitOnSeekGE enabled. In either case, we want to // move the iterator ahead to look for a point key that may be colocated
does 'colocated' here mean has the same user key?
pkg/storage/point_synthesizing_iter.go
line 368 at r1 (raw file):
if !i.atPoint { i.iter.Next() }
This (switching directions using NextKey) is prohibited by the MVCCIterator
interface. Should we even attempt to support it if it's undefined behavior in the interface? At the very least, I think it could use a comment.
89a2c63
to
5a2ec4f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aliher1911, @itsbilal, @jbowens, and @nicktrav)
pkg/storage/point_synthesizing_iter.go
line 137 at r1 (raw file):
Previously, jbowens (Jackson Owens) wrote…
worth reusing the
i.rangeKeys
slice too?
Yes, but not yet I think. We're cloning the slice elements every time, which is likely to be much more expensive, and we wouldn't want to hold onto the element references. This whole scheme needs a rethink when we come back around to optimize it, which might involve changes to the MVCCIterator.RangeKeys()
data structure too. For now I've focused on correctness.
pkg/storage/point_synthesizing_iter.go
line 194 at r1 (raw file):
Previously, jbowens (Jackson Owens) wrote…
nit: match the handling of empty timestamp in the forward and reverse cases? (eg, use
IsEmpty()
in both, orIsSet()
in both)
Think so? I feel like it easier to just read what it says instead of using negatives. No real opinion though, changed it to !IsSet()
.
pkg/storage/point_synthesizing_iter.go
line 256 at r1 (raw file):
Previously, jbowens (Jackson Owens) wrote…
does this requirement come from the mvccscanner's usage? can we refrain from peeking backwards if the range key contains no timestamps ≤ the seek key's timestamp? that would match the behavior of point tombstones.
It was mostly motivated by having predictable behavior. The existence of a synthetic tombstone shouldn't depend on the seek key, since we often optimize iteration by sometimes seeking and sometimes using next, and it seems like we'd want the results to be the same in either case.
However, in the SeekGE case we could avoid this by changing the semantics to only synthesize range tombstones above existing points. I think that might be sufficient for our needs, and would often result in far fewer synthetic tombstones during scans anyway, so it might be worthwhile. However, it wouldn't be entire consistent with the Prefix (i.e. emitOnSeekGE) case though, where we'd synthesize the full set even when there are no point keys. Wdyt?
There's certainly other optimizations we can do here, as you point out. Added a comment with your idea. For now, I've focused on correctness and null-case optimizations, we can come back to optimizations in the presence of range keys later.
pkg/storage/point_synthesizing_iter.go
line 284 at r1 (raw file):
Previously, jbowens (Jackson Owens) wrote…
does 'colocated' here mean has the same user key?
Yep. We operate across a roachpb.Key
span, so by "key" we typically mean the user key. But I agree that this wording was clumsy (i.e. it talks about being colocated at a different position, which is fairly paradoxical). Reworded.
pkg/storage/point_synthesizing_iter.go
line 368 at r1 (raw file):
Previously, jbowens (Jackson Owens) wrote…
This (switching directions using NextKey) is prohibited by the
MVCCIterator
interface. Should we even attempt to support it if it's undefined behavior in the interface? At the very least, I think it could use a comment.
True. We have best-effort support for it in pebbleIterator
already though, and I figured it was easy enough to implement so we might as well have it there in case we ever needed to lift the restriction. But I added a comment, and wouldn't have any qualms about tearing it out either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 3 files at r2, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @aliher1911, @erikgrinaker, @itsbilal, and @nicktrav)
pkg/storage/point_synthesizing_iter.go
line 137 at r1 (raw file):
Previously, erikgrinaker (Erik Grinaker) wrote…
Yes, but not yet I think. We're cloning the slice elements every time, which is likely to be much more expensive, and we wouldn't want to hold onto the element references. This whole scheme needs a rethink when we come back around to optimize it, which might involve changes to the
MVCCIterator.RangeKeys()
data structure too. For now I've focused on correctness.
makes sense
pkg/storage/point_synthesizing_iter.go
line 256 at r1 (raw file):
The existence of a synthetic tombstone shouldn't depend on the seek key, since we often optimize iteration by sometimes seeking and sometimes using next, and it seems like we'd want the results to be the same in either case.
Ah, gotcha.
However, in the SeekGE case we could avoid this by changing the semantics to only synthesize range tombstones above existing points. I think that might be sufficient for our needs, and would often result in far fewer synthetic tombstones during scans anyway, so it might be worthwhile. However, it wouldn't be entire consistent with the Prefix (i.e. emitOnSeekGE) case though, where we'd synthesize the full set even when there are no point keys. Wdyt?
Yeah, that makes sense. My understanding here is hazy, but would it be feasible to eventually replace emitOnSeekGE
with something akin to pebbleScanner.failOnMoreRecent
? So that we can avoid synthesizing at the seek key but still perform conflict checking?
pkg/storage/point_synthesizing_iter.go
line 368 at r1 (raw file):
Previously, erikgrinaker (Erik Grinaker) wrote…
True. We have best-effort support for it in
pebbleIterator
already though, and I figured it was easy enough to implement so we might as well have it there in case we ever needed to lift the restriction. But I added a comment, and wouldn't have any qualms about tearing it out either.
just the comment is good by me
pkg/storage/point_synthesizing_iter.go
line 582 at r2 (raw file):
// assertInvariants asserts iterator invariants. func (i *pointSynthesizingIter) assertInvariants() error {
love all these well documented & enforced invariants. does cockroachdb have any non-race build tags for including invariant checks? eg, the pebble repo has an invariants
build tag which can be nice to include invariant-checking without the significant slowdown of the race checker.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @aliher1911, @itsbilal, @jbowens, and @nicktrav)
pkg/storage/point_synthesizing_iter.go
line 256 at r1 (raw file):
would it be feasible to eventually replace emitOnSeekGE with something akin to pebbleScanner.failOnMoreRecent? So that we can avoid synthesizing at the seek key but still perform conflict checking?
Yes, something like that is definitely possible. I've wanted to keep both the scanner and synthesizer general and loosely coupled though. I don't think the emitOnSeekGE kludge is all that bad, but would've liked it better if we could know whether the underlying iterator was a prefix iterator.
in the SeekGE case we could avoid this by changing the semantics to only synthesize range tombstones above existing points. I think that might be sufficient for our needs, and would often result in far fewer synthetic tombstones during scans anyway
I'm getting convinced that this is the way to go -- if nothing else, for the performance gains in not emitting and then processing all of these useless synthetic point tombstones. It does have some loss of generality though, if we ever wanted to synthesize non-tombstone points, or for other use-cases that might care about them. It's possible that I'll keep both modes if the code delta turns out to be trivial. Will update the PR tomorrow.
pkg/storage/point_synthesizing_iter.go
line 582 at r2 (raw file):
Thanks! Yeah, I think I prefer this pattern over littering invariant checks throughout the main code.
does cockroachdb have any non-race build tags for including invariant checks?
I think race
is the de facto build tag for stuff like this. We have a crdb_test
tag that used in test builds though, and I've seen SQL use it a fair bit for stuff like this, but in KV land or for more expensive checks I think I've only ever seen the race
tag used. I agree that it might be nice to use some other tag orthogonal to race detection though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @aliher1911, @itsbilal, @jbowens, and @nicktrav)
pkg/storage/point_synthesizing_iter.go
line 256 at r1 (raw file):
Will update the PR tomorrow.
On second thought, I think I'll merge this as-is and submit a separate PR with this change later. I'd like to unblock other teams asap, and this is correct and reasonably fast in the no-range-key case.
This patch adds `pointSynthesizingIter`, an MVCC iterator which wraps an arbitrary MVCC iterator and synthesizes point keys for range keys at their start key and where they overlap point keys. It can optionally synthesize around the SeekGE seek key too, which is useful for point operations like `MVCCGet` where we may want to return a synthetic tombstone for an MVCC range tombstone if there is no existing point key. This will primarily be used to handle MVCC range tombstones in MVCC scans and gets, as well as during MVCC conflict checks, which allows much of this logic to remain unchanged and simplified (in particular, `pebbleMVCCScanner` will not need any changes). However, this patch does not make use of the iterator yet, since both it and Pebble will need further performance optimizations for use in hot paths. For now, correctness is sufficient, and only basic attempts at performance optimization have been made. Release note: None
5a2ec4f
to
d07bf3a
Compare
TFTR! bors r=jbowens |
Build succeeded: |
This patch adds
pointSynthesizingIter
, an MVCC iterator which wraps anarbitrary MVCC iterator and synthesizes point keys for range keys at
their start key and where they overlap point keys. It can optionally
synthesize around the SeekGE seek key too, which is useful for point
operations like
MVCCGet
where we may want to return a synthetictombstone for an MVCC range tombstone if there is no existing point key.
This will primarily be used to handle MVCC range tombstones in MVCC
scans and gets, as well as during MVCC conflict checks, which allows
much of this logic to remain unchanged and simplified (in particular,
pebbleMVCCScanner
will not need any changes).However, this patch does not make use of the iterator yet, since both it
and Pebble will need further performance optimizations for use in hot
paths. For now, correctness is sufficient, and only basic attempts at
performance optimization have been made.
Touches #70412.
Release note: None