-
Notifications
You must be signed in to change notification settings - Fork 456
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sstable: add range key support to BlockIntervalCollector #1452
sstable: add range key support to BlockIntervalCollector #1452
Conversation
This is an alternative implementation to the one in #1426 which feels both simpler and more flexible. I made a simplifying assumption that range keys should never affect point key data block properties. Put another way, I couldn't think of a case where one would want a collector that encodes range key properties into a point key data block, given that the range keys will never appear in the point key data blocks (they have their own have a single, dedicated block). There's a minor API change that requires callers to specify a collector for both point and range keys separately. Given implementations tend to maintain two uints for upper / lower bounds, there's only a small associated cost. One benefit is that this implementation provides the flexibility to construct a collector that:
For the cases above, this implementation seems better than the alternative of constructing two separate collectors for point and range keys, which would double up on the space used required to store the collector name and encode up to two properties instead of just the one. |
11c9fa2
to
445976e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One downside of taking the union of the range and point keys for the
table level interval is an increased rate of false positives when
filtering tables based on an interval and the range key interval is
wider than the point key interval.
In CockroachDB, the practical concern here is that you may have a broad swath of old data in the span [a, z)
at timestamps <10
. Then you write a MVCCDeleteRange range key [a,z)@100
and its compacted into the same table(s) as the point keys. A time-bound iterator with a read timestamp ≤ @100
will construct a point iterator levelIter
that still needs to visit every table, even though there's zero points within the interval. [The rangekey levelIter
for the same time-bound iterator does need to visit every table in order to read the rangekey].
It's not a huge deal, because the index-level block properties will still allow us to skip over all the point blocks. But broadly if we stick with a split iterator implementation such that point and range key iterators belonging to the same pebble.Iterator
may be looking at different tables, I wonder if we should lean towards recording separate properties. [I think we should also do this with the FileMetadata.{Smallest,Largest}
]
Reviewed 1 of 5 files at r1, all commit messages.
Reviewable status: 1 of 5 files reviewed, 3 unresolved discussions (waiting on @nicktrav and @sumeerbhola)
sstable/block_property.go, line 132 at r1 (raw file):
// NewBlockIntervalCollector constructs a BlockIntervalCollector, with the // given name and data block collector.
nit: expand the comment to describe the role of the two collectors and that either is permitted to be nil?
sstable/writer.go, line 625 at r1 (raw file):
return err } }
I'm wondering whether we should even bother add range keys to the table-property collectors. None of the existing implementations will correctly extract suffixes from range-key internal values, and I think the table-property collectors interface is made obsolete by block-property collectors, which also record a table level property. In CockroachDB, we won't write range keys until a CockroachDB cluster version is finalized that knows about block-property collectors. Then after the next release, it seems like we could remove table-property collectors altogether.
sstable/testdata/block_properties, line 122 at r1 (raw file):
point: [a@5#1,1,c@15#3,1] rangedel: [#0,0,#0,0] rangekey: [d@20#4,21,z#72057594037927935,21]
This test collector is a little confusing, because in the context of CockroachDB MVCC timestamps, the suffix on the start key isn't the one that matters*. We care about the suffix(es) encoded within the values.
*The suffix on the start key eg, d@20
only matters for the purpose of specifying the start bound user key. In CockroachDB, a rangekey d@20.RANGEKEYSET.4:d@15 [(@10=foo)]
takes effect in the user key range [d@20,d@15)
but has a MVCC timestamp of @10
, which is what we should record to the interval collector. In practice, a CockroachDB MVCCDeleteRange range key with those bounds and suffix value has no real effect because there exist no keys within the range [d@20,d@15)
also at timestamps <@10
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should lean towards recording separate properties.
This implementation would support that, via registering two interval collectors: one with a non-nil point key collector and nil range key collector; and a second with a nil point key collector and a non-nil range key collector. They would have two similar, but different names.
What's not clear to me is whether there may exist a use-case in the future where an interval collector is constructed with both a non-nil point and range key collector with the one name. This implementation is flexible enough to support that, if we ever wanted something like that.
[I think we should also do this with the FileMetadata.{Smallest,Largest}]
Agree. This is on our TODO list.
I wonder if using these bounds for point keys would be enough to filter at the table level in the case that you mention?
Another option is to move the existing DataBlockIntervalCollector
interface, BlockIntervalCollector
struct, and "interval" concept up into Cockroach, and let the caller have full control over how a block property collector works - that could be a combined collector, or it could be two separate collectors.
Reviewable status: 1 of 5 files reviewed, 3 unresolved discussions (waiting on @jbowens, @nicktrav, and @sumeerbhola)
sstable/writer.go, line 625 at r1 (raw file):
Previously, jbowens (Jackson Owens) wrote…
I'm wondering whether we should even bother add range keys to the table-property collectors. None of the existing implementations will correctly extract suffixes from range-key internal values, and I think the table-property collectors interface is made obsolete by block-property collectors, which also record a table level property. In CockroachDB, we won't write range keys until a CockroachDB cluster version is finalized that knows about block-property collectors. Then after the next release, it seems like we could remove table-property collectors altogether.
Sounds good. Removed.
sstable/testdata/block_properties, line 122 at r1 (raw file):
Previously, jbowens (Jackson Owens) wrote…
This test collector is a little confusing, because in the context of CockroachDB MVCC timestamps, the suffix on the start key isn't the one that matters*. We care about the suffix(es) encoded within the values.
*The suffix on the start key eg,
d@20
only matters for the purpose of specifying the start bound user key. In CockroachDB, a rangekeyd@20.RANGEKEYSET.4:d@15 [(@10=foo)]
takes effect in the user key range[d@20,d@15)
but has a MVCC timestamp of@10
, which is what we should record to the interval collector. In practice, a CockroachDB MVCCDeleteRange range key with those bounds and suffix value has no real effect because there exist no keys within the range[d@20,d@15)
also at timestamps<@10
.
I'll rework this to use the suffixes present in the values.
445976e
to
136b1e0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They would have two similar, but different names.
I don't think this could be used to avoid unnecessary sstable reads in the scenario described above, at least without additional changes to the block property filter interface, because the combined pebble.Iterator
over point and range keys would have one set of block property filters. It'd have to supply both filters.
I wonder if using these bounds for point keys would be enough to filter at the table level in the case that you mention?
It may be enough if the user key span over which range keys are written is much wider than the point keys', but it doesn't help in the case that they have similar user key spans but very different mvcc timestamp ranges.
Maybe the interface simplicity of this approach is worth the slightly reduced filtering effectiveness. Curious if @sumeerbhola has thoughts
Reviewable status: 1 of 5 files reviewed, 3 unresolved discussions (waiting on @jbowens, @nicktrav, and @sumeerbhola)
136b1e0
to
4c24c85
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 1 of 5 files reviewed, 3 unresolved discussions (waiting on @jbowens, @nicktrav, and @sumeerbhola)
sstable/testdata/block_properties, line 122 at r1 (raw file):
Previously, nicktrav (Nick Travers) wrote…
I'll rework this to use the suffixes present in the values.
Updated to pluck the suffix from the value in the case that it's a range key set or unset. I kept the cases for the point keys as this illustrates how the same collector can be used for point-, range-keys, or both.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- I agree that the integration with CockroachDB should have different block property collectors for point and range keys. I don't mind the generality here of the
BlockIntervalCollector
. I could also live with aBlockIntervalCollector
that can do one or the other but not both if there are strong opinions against this current approach. - I am not keen on moving
BlockIntervalCollector
to the CockroachDB code. The way we did TBI in CockroachDB seems hacky from a code cleanliness perspective, and we have a clean separation of concerns here that we can unit test thoroughly. - I didn't understand the concern about the combined
pebble.Iterator
. It would be configured withIterOptions.{PointFilters,RangeKeyFilters}
.
Reviewed 3 of 5 files at r1, 1 of 1 files at r2, 1 of 2 files at r3.
Reviewable status: 4 of 5 files reviewed, 4 unresolved discussions (waiting on @jbowens, @nicktrav, and @sumeerbhola)
sstable/block_property.go, line 108 at r2 (raw file):
name string points DataBlockIntervalCollector ranges DataBlockIntervalCollector
Consider adding a comment like:
A BlockIntervalCollector that collects over point and range keys needs to have both of these DataBlockIntervalCollectors since range keys are fed to the BlockIntervalCollector as they get fed to the Writer, which means they are interleaved with point keys.
(since someone may wonder why we can't reuse one DataBlockIntervalCollector the way we reuse it for different point data blocks)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't understand the concern about the combined pebble.Iterator. It would be configured with IterOptions.{PointFilters,RangeKeyFilters}.
That sounds good to me.
Reviewable status: 4 of 5 files reviewed, 4 unresolved discussions (waiting on @jbowens, @nicktrav, and @sumeerbhola)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summarizing a few conversations we had internally. tldr: I think the approach in this PR is sufficient.
I agree that the integration with CockroachDB should have different block property collectors for point and range keys. I don't mind the generality here of the BlockIntervalCollector.
There's a subtle point here in that even though the implementation in this PR provides the ability to specify both a point and range interval collector with the same name, we don't have an immediate use-case for such a collector. Instead, the typical use case (at least in Cockroach), will be to instantiate two interval collectors with different names - one for point keys only, the other for range keys only. The instantiation (in Cockroach) will look like:
const (
pointKeyCollectorName = "foo"
rangeKeyCollectorName = "bar"
)
var PebbleBlockPropertyCollectors = []func() pebble.BlockPropertyCollector{
func() pebble.BlockPropertyCollector {
return sstable.NewBlockIntervalCollector(
pointKeyCollectorName,
&pebbleDataBlockMVCCTimeIntervalCollector{}, // point key collector
nil, // range key collector
)
},
func() pebble.BlockPropertyCollector {
return sstable.NewBlockIntervalCollector(
rangeKeyCollectorName,
nil, // point key collector
&rangeKeyCollector{}, // range key collector
)
},
}
I didn't understand the concern about the combined pebble.Iterator. It would be configured with IterOptions.{PointFilters,RangeKeyFilters}.
Building on from the previous point, as we'll typically have two collectors (one each for points and ranges exclusively), PointFilters
will filter only for point keys (with the same pointKeyCollectorName
name as above), and RangeKeyFilters
will filter only for range keys (with the same rangeKeyCollectorName
as above).
This will allow for the independent point and range key filtering that we require.
The filtering code is still a TODO for this PR.
I am not keen on moving BlockIntervalCollector to the CockroachDB code.
Will keep things as they are in Pebble, so as to avoid exposing too much of the internals to Cockroach.
Reviewable status: 4 of 5 files reviewed, 4 unresolved discussions (waiting on @jbowens, @nicktrav, and @sumeerbhola)
ad64f1b
to
609c9de
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added another series of test cases to demonstrate how filtering works and verify that the point / range filters will work independently.
I might punt the API change to the top-level iterator options (to take in a slice each for point and range keys) to a second PR, so as to keep this one focused on just the collector changes.
All other feedback addressed, so this should be good for another pass.
Reviewable status: 2 of 5 files reviewed, 3 unresolved discussions (waiting on @jbowens and @sumeerbhola)
sstable/block_property.go, line 132 at r1 (raw file):
Previously, jbowens (Jackson Owens) wrote…
nit: expand the comment to describe the role of the two collectors and that either is permitted to be nil?
Done.
sstable/block_property.go, line 108 at r2 (raw file):
Previously, sumeerbhola wrote…
Consider adding a comment like:
A BlockIntervalCollector that collects over point and range keys needs to have both of these DataBlockIntervalCollectors since range keys are fed to the BlockIntervalCollector as they get fed to the Writer, which means they are interleaved with point keys.(since someone may wonder why we can't reuse one DataBlockIntervalCollector the way we reuse it for different point data blocks)
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 3 files at r4.
Reviewable status: all files reviewed (commit messages unreviewed), 4 unresolved discussions (waiting on @jbowens, @nicktrav, and @sumeerbhola)
sstable/block_property.go, line 144 at r4 (raw file):
// respectively. // // The caller may pass a nil DataBlockIntervalCollector for either (or both) of
or both? what purpose would both being nil serve?
Currently, the `sstable.BlockIntervalCollector` exists as a helper struct to which an implementation of `sstable.DataBlockIntervalCollector` is passed. The latter contains the user-defined logic for converting a key and value in an associated interval that can be maintained. The `BlockIntervalCollector` struct handles the logic required to implement `sstable.BlockPropertyCollector`, encoding the interval as a property when the data block is completed, and maintaining index-block- and table-level intervals. Update `sstable.Writer` to pass through range keys to each block property collector as they are added to the writer. As range and point keys can be thought of as existing in separate keyspaces within the same LSM, range keys only contribute to table level intervals (unioned with the point key interval), rather than block- and index-level intervals. Another way to rationalize this decision is to consider that range keys are contained in a single, dedicated block in the SSTable, while the point keys can span multiple blocks and have an associated index entry into which the properties for the specific block are encoded. Block level filtering applies to point keys only, whereas table-level filtering takes the union of the point and range key intervals for the table. One downside of taking the union of the range and point keys for the table level interval is an increased rate of false positives when filtering tables based on an interval and the range key interval is wider than the point key interval. This change alters the `NewBlockIntervalCollector` function to take in two `DataBlockIntervalCollector`s, one for point and range keys. This is required to track the intervals separately within the `BlockIntervalCollector`. The caller has the flexibility of passing in `nil` for one (or both) of the collectors, in which case either point or range keys (or both) will be ignored by the collector. This can be used, for example, to construct a collectors that apply exclusively to either point or range keys.
609c9de
to
67be653
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTRs!
Dismissed @jbowens and @sumeerbhola from 4 discussions.
Reviewable status: 2 of 5 files reviewed, all discussions resolved (waiting on @sumeerbhola)
sstable/block_property.go, line 144 at r4 (raw file):
Previously, sumeerbhola wrote…
or both? what purpose would both being nil serve?
No purpose. The comment was more for completeness. Given it's a non-sensical input, we can just panic if it's misconfigured, as that's user-error. Updated and removed parenthetical.
Currently, the
sstable.BlockIntervalCollector
exists as a helperstruct to which an implementation of
sstable.DataBlockIntervalCollector
is passed. The latter contains theuser-defined logic for converting a key and value in an associated
interval that can be maintained.
The
BlockIntervalCollector
struct handles the logic required toimplement
sstable.BlockPropertyCollector
, encoding the interval as aproperty when the data block is completed, and maintaining index-block-
and table-level intervals.
Update
sstable.Writer
to pass through range keys to each blockproperty collector as they are added to the writer.
As range and point keys can be thought of as existing in separate
keyspaces within the same LSM, range keys only contribute to table level
intervals (unioned with the point key interval), rather than block- and
index-level intervals.
Another way to rationalize this decision is to consider that range keys
are contained in a single, dedicated block in the SSTable, while the
point keys can span multiple blocks and have an associated index entry
into which the properties for the specific block are encoded. Block
level filtering applies to point keys only, whereas table-level
filtering takes the union of the point and range key intervals for the
table.
One downside of taking the union of the range and point keys for the
table level interval is an increased rate of false positives when
filtering tables based on an interval and the range key interval is
wider than the point key interval.
This change alters the
NewBlockIntervalCollector
function to take intwo
DataBlockIntervalCollector
s, one for point and range keys. This isrequired to track the intervals separately within the
BlockIntervalCollector
. The caller has the flexibility of passing innil
for one (or both) of the collectors, in which case either point orrange keys (or both) will be ignored by the collector. This can be used,
for example, to construct a collectors that apply exclusively to either
point or range keys.