-
Notifications
You must be signed in to change notification settings - Fork 456
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
manifest: add methods for extending table bounds #1535
manifest: add methods for extending table bounds #1535
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 22 files reviewed, 2 unresolved discussions (waiting on @itsbilal, @jbowens, and @sumeerbhola)
internal/manifest/version.go, line 106 at r1 (raw file):
SmallestPointKey InternalKey LargestPointKey InternalKey HasPointKeys bool
Coincidentally, these HasPointKey
and HasRangeKey
fields will be useful when we make the version edit changes to persist the types of keys we have present on a table.
internal/manifest/version.go, line 155 at r1 (raw file):
// NB: calling this method should be preferred to manually setting the bounds by // manipulating the fields directly, to maintain certain invariants. func (m *FileMetadata) MaybeExtendPointKeyBounds(cmp Compare, smallest, largest InternalKey) {
Naming is hard - open to suggestions on something a little less verbose than MaybeExtend{Point,Range}KeyBounds
. Though, they do convey the "maybe" nature of the calls.
9c039d9
to
b1b77bc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the approach. Haven't read much of the PR, but sending some initial comments.
One alternative considered was to unexport the smallest / largest fields
and gate access behind methods. However, as mentioned above, for
performance reasons it beneficial to keep these fields exported.
Won't those methods be inlined? Does -gcflags=-m
show that they won't, if you make that change?
Reviewed 1 of 22 files at r1.
Reviewable status: 1 of 22 files reviewed, 3 unresolved discussions (waiting on @itsbilal, @jbowens, and @nicktrav)
internal/manifest/version.go, line 106 at r1 (raw file):
Previously, nicktrav (Nick Travers) wrote…
Coincidentally, these
HasPointKey
andHasRangeKey
fields will be useful when we make the version edit changes to persist the types of keys we have present on a table.
We may want to optimize the sizeof FileMetadata since we have one for each file. Do you know what it was and what it is now? Maybe play around with shifting some of these bools to one place to reduce the padding for alignment.
internal/manifest/version.go, line 155 at r1 (raw file):
Previously, nicktrav (Nick Travers) wrote…
Naming is hard - open to suggestions on something a little less verbose than
MaybeExtend{Point,Range}KeyBounds
. Though, they do convey the "maybe" nature of the calls.
I am bad at naming, but I'd be satisfied with simply ExtendPointKeyBounds
.
internal/manifest/version_edit.go, line 274 at r2 (raw file):
// manifest is updated to support range keys, which will most likely // leverage a bitset to infer which key types (points or ranges) are // used for the overall smallest and largest keys.
I didn't understand the bitset comment. Is that to avoid the key comparisons for each file when opening the DB?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won't those methods be inlined? Does -gcflags=-m show that they won't, if you make that change?
We'd thought of this, but @jbowens and I had discussed direct field access as a compromise between dedicated "getters", and direct field sets and gets.
Given the "getters" would be so simple now as to be a direct field access (I moved the comparisons onto the write path in this patch), it probably makes sense to have getters and unexport the fields (provided the methods inline, and I'd be shocked if they didn't).
I'll follow up.
Reviewable status: 1 of 22 files reviewed, 3 unresolved discussions (waiting on @itsbilal, @jbowens, @nicktrav, and @sumeerbhola)
internal/manifest/version.go, line 106 at r1 (raw file):
Previously, sumeerbhola wrote…
We may want to optimize the sizeof FileMetadata since we have one for each file. Do you know what it was and what it is now? Maybe play around with shifting some of these bools to one place to reduce the padding for alignment.
Good point. I'll play around with this some more.
internal/manifest/version_edit.go, line 274 at r2 (raw file):
Previously, sumeerbhola wrote…
I didn't understand the bitset comment. Is that to avoid the key comparisons for each file when opening the DB?
It's premature musing on my part (I was going to tackle this on a separate PR that follows this).
We now need to encode whether there are range keys in the table - I was thinking of something like the following scheme:
- a new file tag (
tagNewFile5
?) that simply extendstagNewFile4
for range keys - keeping the existing smallest / largest fields for point keys
- some way of indicating whether there are range keys present in the table (a bool?)
- if there are range keys, lay down the smallest / largest range key fields
At this point we technically have enough information to hydrate the overall bounds. However, this calculation relies on the comparison function for the table, for which we only have the name, and not the actual implementation. The Compare
struct is available higher up, but we can't compute in this Decode
method.
Another approach I was thinking of - we don't actually need to perform the comparison if instead we directly encode which key type corresponds to the smallest / largest bounds. Then at hydration time, we can simply set Smallest
and Largest
directly to wither the smallest / largest point or range key.
The bitset comment was me thinking that a bool
(to specify the presence of range keys), plus another two fields for indicating how Smallest
and Largest
are set would be wasteful, and there's probably a way of packing all of that information into a single byte.
As I write this, I also wonder if we could have something like:
- if no range keys, use
tagNewFile4
, and existing logic just hydratesSmallest
/Largest
with the point keys (as there are no range keys to even consider), as we're doing here - else, there are range keys, so use
tagNewFile5
as the marker:- write existing fields
- encode the smallest range key
- encode the largest range key
- encode whether to use the point or range key for the smaller and larger fields (two bits, but we use a byte)
- write existing custom tags from
tagNewFile4
Did you have thoughts on an encoding scheme?
b1b77bc
to
773e11a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 1 of 22 files reviewed, 2 unresolved discussions (waiting on @itsbilal, @jbowens, @nicktrav, and @sumeerbhola)
internal/manifest/version.go, line 106 at r1 (raw file):
Previously, nicktrav (Nick Travers) wrote…
Good point. I'll play around with this some more.
Before this patch: 352 bytes
Currently: 376 bytes
The alignment is 8 bytes, so if we pack all the bools at the end, we can get this down to 336. Pushed a fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The inlining appears to work as expected - though I had to check the objdumps (for whatever reason -gcflags=-m5
didn't show it, perhaps because it's so trivial?).
Changing all the calls sites to use methods and unexporting the fields is something we can look at. I'm hesitant to do it here, given how large this change is already. I can follow up with a mechanical PR if / when this lands.
Reviewable status: 1 of 22 files reviewed, 2 unresolved discussions (waiting on @itsbilal, @jbowens, @nicktrav, and @sumeerbhola)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 22 files at r1.
Reviewable status: 3 of 22 files reviewed, 3 unresolved discussions (waiting on @itsbilal, @nicktrav, and @sumeerbhola)
internal/manifest/version.go, line 184 at r3 (raw file):
// NB: calling this method should be preferred to manually setting the bounds by // manipulating the fields directly, to maintain certain invariants. func (m *FileMetadata) MaybeExtendRangeKeyBounds(cmp Compare, smallest, largest InternalKey) {
Possibly a controversial take: I think it'd be a little more ergonomic in tests if we return the mutated receiver, like we do in Options.EnsureDefault()
, allowing chaining like:
m := (&fileMetadata{FileNum: 000001}).
MaybeExtendPointKeyBounds(cmp, pointSmall, pointLarge).
MaybeExtendRangeKeyBounds(cmp, rangeSmall, rangeLarge)
It's a bit unusual and potentially confusing to return the receiver, so ¯\_(ツ)_/¯
773e11a
to
2377283
Compare
Currently, `FileMetadata` exposes three pairs of keys representing bounds for a table - one each for point keys, range keys and the overall bounds for a table. The pair for the overall bounds for the table are a function of the point and range pairs and could technically be derived, but instead these overall bounds are materialized on the `FileMetadata` struct directly for performance reasons (it is faster to dereference the struct fields than making a function call to perform one or more comparisons). With the addition of two pairs of bounds in cockroachdb#1521 for the point and range bounds, there is a risk that a caller forgets to set a pair of bounds, violating the invariant that the overall point and range bounds are internally consistent with the point and range key bounds. To mitigate this risk, introduce methods on `manifest.FileMetadata` that can be used to "extend" the point or range key bounds for a table. These methods will maintain the invariant that the overall bounds for the table are computed directly from the point and range bounds. The methods, `MaybeExtend{Point,Range}KeyBounds`, are intended for "optimistic" use - a table's point, range and / or overall bounds may not necessarily be extended by a call to the respective methods if the existing bounds are smaller or larger than the provided bounds. This confines the comparisons to these method calls, as opposed to having the caller perform the checks to determine when bounds should be updated. As these fields are set once and read multiple times, this is considered a reasonable compromise. Update all existing call sites that directly set the existing `FileMetadata` smallest / largest bound fields to instead use the new methods for extending bounds. The only remaining call sites that set fields directly are limited to the `manifest` package and one call site in `ingest.go` that needs to mutate sequence numbers. These calls sites that store values in the fields directly are explicitly commented to indicate why the fields are being set directly. One alternative considered was to unexport the smallest / largest fields and gate access behind methods. However, as mentioned above, for performance reasons it beneficial to keep these fields exported.
2377283
to
fe889e9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 1 of 22 files reviewed, 2 unresolved discussions (waiting on @itsbilal, @jbowens, and @sumeerbhola)
internal/manifest/version.go, line 155 at r1 (raw file):
Previously, sumeerbhola wrote…
I am bad at naming, but I'd be satisfied with simply
ExtendPointKeyBounds
.
Done!
internal/manifest/version.go, line 184 at r3 (raw file):
Previously, jbowens (Jackson Owens) wrote…
Possibly a controversial take: I think it'd be a little more ergonomic in tests if we return the mutated receiver, like we do in
Options.EnsureDefault()
, allowing chaining like:m := (&fileMetadata{FileNum: 000001}). MaybeExtendPointKeyBounds(cmp, pointSmall, pointLarge). MaybeExtendRangeKeyBounds(cmp, rangeSmall, rangeLarge)
It's a bit unusual and potentially confusing to return the receiver, so ¯\_(ツ)_/¯
Fair point! Updated.
TFTRs! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 1 of 22 files reviewed, 2 unresolved discussions
internal/manifest/version_edit.go, line 274 at r2 (raw file):
Previously, nicktrav (Nick Travers) wrote…
It's premature musing on my part (I was going to tackle this on a separate PR that follows this).
We now need to encode whether there are range keys in the table - I was thinking of something like the following scheme:
- a new file tag (
tagNewFile5
?) that simply extendstagNewFile4
for range keys- keeping the existing smallest / largest fields for point keys
- some way of indicating whether there are range keys present in the table (a bool?)
- if there are range keys, lay down the smallest / largest range key fields
At this point we technically have enough information to hydrate the overall bounds. However, this calculation relies on the comparison function for the table, for which we only have the name, and not the actual implementation. The
Compare
struct is available higher up, but we can't compute in thisDecode
method.Another approach I was thinking of - we don't actually need to perform the comparison if instead we directly encode which key type corresponds to the smallest / largest bounds. Then at hydration time, we can simply set
Smallest
andLargest
directly to wither the smallest / largest point or range key.The bitset comment was me thinking that a
bool
(to specify the presence of range keys), plus another two fields for indicating howSmallest
andLargest
are set would be wasteful, and there's probably a way of packing all of that information into a single byte.As I write this, I also wonder if we could have something like:
- if no range keys, use
tagNewFile4
, and existing logic just hydratesSmallest
/Largest
with the point keys (as there are no range keys to even consider), as we're doing here- else, there are range keys, so use
tagNewFile5
as the marker:
- write existing fields
- encode the smallest range key
- encode the largest range key
- encode whether to use the point or range key for the smaller and larger fields (two bits, but we use a byte)
- write existing custom tags from
tagNewFile4
Did you have thoughts on an encoding scheme?
the latter scheme to pack the decisions into a byte sounds good. We will also need to handle the case of range keys and no point keys.
Note for reviewers
This is a fairly sizable patch given the many places where the smallest / largest fields were being set directly (and hence the motivation for the patch in the first place). Fortunately most of the diff comes from updating tests.
It might be beneficial to first review the updates in the
manifest
package (version.go
andversion_edit.go
, and corresponding data-driven test infile_metadata_bounds
. Then review the non-test updates to use the new methods in the following:compaction.go
db.go
flush_external.go
ingest.go
The remainder of the changes are to test files, and are fairly mechanical changes, so probably don't need as much scrutiny.
Fortunately, Reviewable seems to group the test and non-test files 👍
Currently,
FileMetadata
exposes three pairs of keys representingbounds for a table - one each for point keys, range keys and the overall
bounds for a table. The pair for the overall bounds for the table are a
function of the point and range pairs and could technically be derived,
but instead these overall bounds are materialized on the
FileMetadata
struct directly for performance reasons (it is faster to dereference the
struct fields than making a function call to perform one or more
comparisons).
With the addition of two pairs of bounds in #1521 for the point and
range bounds, there is a risk that a caller forgets to set a pair of
bounds, violating the invariant that the overall point and range bounds
are internally consistent with the point and range key bounds.
To mitigate this risk, introduce methods on
manifest.FileMetadata
thatcan be used to "extend" the point or range key bounds for a table. These
methods will maintain the invariant that the overall bounds for the
table are computed directly from the point and range bounds.
The methods,
MaybeExtend{Point,Range}KeyBounds
, are intended for"optimistic" use - a table's point, range and / or overall bounds may
not necessarily be extended by a call to the respective methods if the
existing bounds are smaller or larger than the provided bounds. This
confines the comparisons to these method calls, as opposed to having the
caller perform the checks to determine when bounds should be updated. As
these fields are set once and read multiple times, this is considered a
reasonable compromise.
Update all existing call sites that directly set the existing
FileMetadata
smallest / largest bound fields to instead use the newmethods for extending bounds. The only remaining call sites that set
fields directly are limited to the
manifest
package and one call sitein
ingest.go
that needs to mutate sequence numbers. These calls sitesthat store values in the fields directly are explicitly commented to
indicate why the fields are being set directly.
One alternative considered was to unexport the smallest / largest fields
and gate access behind methods. However, as mentioned above, for
performance reasons it beneficial to keep these fields exported.