-
Notifications
You must be signed in to change notification settings - Fork 456
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
internal/manifest: encode/decode range keys in manifest #1541
internal/manifest: encode/decode range keys in manifest #1541
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great work! I like the fact that the format lets us instantiate a FileMetadata
without a Compare
.
Reviewed 5 of 5 files at r1, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @jbowens, @nicktrav, and @sumeerbhola)
internal/manifest/version_edit.go, line 58 at r1 (raw file):
// Pebble tags. tagNewFileRangeKeys = 300
Nit: maybe we should call this tagNewFile4RangeKeys
to be clearer that it extends tagNewFile4
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @jbowens, @nicktrav, and @sumeerbhola)
internal/manifest/version.go, line 248 at r1 (raw file):
) // boundsMarker returns a marker byte whose bits encode as the following:
Note to self: after talking with @sumeerbhola, using two bits seems like a better way to go than half a byte :) It gives us 6 bits that we could potentially use in the future, plus bitmasks are more standard than half byte masks :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @nicktrav and @sumeerbhola)
internal/manifest/version.go, line 248 at r1 (raw file):
Previously, nicktrav (Nick Travers) wrote…
Note to self: after talking with @sumeerbhola, using two bits seems like a better way to go than half a byte :) It gives us 6 bits that we could potentially use in the future, plus bitmasks are more standard than half byte masks :)
Maybe it's over optimizing, but could we use one of those extra bits for the key-pairs count?
I'm thinking one byte with three used bits:
maskContainsPointKeys
— set if the file contains range keysmaskSmallest
—1
if the smallest comes from the point bound,0
if it comes from the range bound. Required to be0
ifb & maskContainsPointKeys == 0
maskLargest
—1
if the largest comes from the point bound,0
if it comes from the range bound. Required to be0
ifb & maskContainsPointKeys == 0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @nicktrav and @sumeerbhola)
internal/manifest/version.go, line 248 at r1 (raw file):
Previously, jbowens (Jackson Owens) wrote…
Maybe it's over optimizing, but could we use one of those extra bits for the key-pairs count?
I'm thinking one byte with three used bits:
maskContainsPointKeys
— set if the file contains range keysmaskSmallest
—1
if the smallest comes from the point bound,0
if it comes from the range bound. Required to be0
ifb & maskContainsPointKeys == 0
maskLargest
—1
if the largest comes from the point bound,0
if it comes from the range bound. Required to be0
ifb & maskContainsPointKeys == 0
Love it.
0808cb4
to
54f768c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 2 of 5 files reviewed, all discussions resolved (waiting on @itsbilal and @sumeerbhola)
internal/manifest/version_edit.go, line 58 at r1 (raw file):
Previously, itsbilal (Bilal Akhtar) wrote…
Nit: maybe we should call this
tagNewFile4RangeKeys
to be clearer that it extendstagNewFile4
?
Done.
Going to keep open until we make the 22.1 branch cut. |
23a2bf3
to
47b0226
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 5 files at r3.
Reviewable status: 3 of 5 files reviewed, 2 unresolved discussions (waiting on @itsbilal and @nicktrav)
internal/manifest/version.go, line 267 at r3 (raw file):
// - if the table contains point keys // - if the table's smallest key is a point key // - if the table's largest key is a point key
The PR description needs to be updated for this revised scheme.
internal/manifest/version_edit.go, line 58 at r3 (raw file):
// Pebble tags. tagNewFile4RangeKeys = 300
I can't remember which tags belong to the same "space", i.e., which ones can't collide. And are all these encoded as unsigned varints?
I am wondering since it isn't clear to me why the value 300 was chosen, instead of 104?
May be a good time to add code comments for these tags.
47b0226
to
e2f4e1c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 2 of 5 files reviewed, 2 unresolved discussions (waiting on @itsbilal, @jbowens, and @sumeerbhola)
internal/manifest/version.go, line 267 at r3 (raw file):
Previously, sumeerbhola wrote…
The PR description needs to be updated for this revised scheme.
Good catch. Update the PR description. The commit message was already updated.
internal/manifest/version_edit.go, line 58 at r3 (raw file):
And are all these encoded as unsigned varints?
I believe this to be true.
isn't clear to me why the value 300 was chosen, instead of 104?
No particular reason for 300 - we have the 1xx
values for RocksDB tags that we inherited, and 2xx
for column families. I was a little wary of using something in those ranges for fear of incompatibilities with stores being migrated from Cockroach to Pebble, but maybe that ship has sailed already. That said, I just took another look at the corresponding code in RocksDB, and their latest tag is still 103
, so I guess we could use 104
, and even revert the name of the tag identifier to tagNewFile5
, as it's literally the "fifth new file format" (in the LevelDB -> RocksDB -> Pebble lineage).
I'll make the suggested changes, but happy to revert. @jbowens - any thoughts here?
May be a good time to add code comments for these tags.
Agree. It's also time we documented the encoding. I'll make sure this gets rolled in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 2 of 5 files reviewed, 2 unresolved discussions (waiting on @itsbilal, @jbowens, and @sumeerbhola)
internal/manifest/version_edit.go, line 58 at r3 (raw file):
Cockroach to Pebble
Woops. RocksDB to Pebble.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that the 22.1 branch is cut, this can land.
Reviewable status: 2 of 5 files reviewed, 2 unresolved discussions (waiting on @itsbilal, @jbowens, and @sumeerbhola)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 5 files at r3, 1 of 1 files at r4.
Reviewable status: 4 of 5 files reviewed, 2 unresolved discussions (waiting on @itsbilal, @jbowens, and @sumeerbhola)
internal/manifest/version_edit.go, line 58 at r3 (raw file):
Previously, nicktrav (Nick Travers) wrote…
Cockroach to Pebble
Woops. RocksDB to Pebble.
I am fine if we want to be conservative wrt the tag numbering, as long as we document our reasoning.
I think a single byte unsigned varint can encode [0, 128).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 5 files at r3, 1 of 1 files at r4, all commit messages.
Reviewable status: 4 of 5 files reviewed, 5 unresolved discussions (waiting on @itsbilal, @nicktrav, and @sumeerbhola)
-- commits, line 18 at r4:
nit: s/tagNewFile4RangeKeys
/tagNewFile5
/
-- commits, line 36 at r4:
nit: this sentence about 'two additional bytes' is out of date now, right?
internal/manifest/version_edit.go, line 58 at r3 (raw file):
Previously, sumeerbhola wrote…
I am fine if we want to be conservative wrt the tag numbering, as long as we document our reasoning.
I think a single byte unsigned varint can encode [0, 128).
104
sounds good to me
internal/manifest/version_test.go, line 463 at r4 (raw file):
} fmt.Fprintf(&b, "%s\n", m.DebugString(base.DefaultFormatter, true)) fmt.Fprintf(&b, " bounds: (smallest=%s,largest=%s) (0x%02x)\n", smallest, largest, bounds)
nit: could use %08b
to format the bounds marker so that it's easier to interpret as a bit field
Currently, only point key bounds for a table are present in a manifest entry. With the addition of range keys, the manifest needs to track the point and range key bounds, in addition to providing a mechanism to infer the overall bounds for the table. Adopt the following encoding scheme to incorporate range keys into the manifest: - if the table only contains point keys, the existing encoding applies - a new file entry is written with `tagNewFile4` and the smallest and largest point keys are used as the smallest and largest bounds for the table. - else, the table contains range keys, and the new file is written with a new tag, `tagNewFile5` that extends `tagNewFile4`: - a "marker" byte is written that indicates if the table also has point keys, and how the bounds can be used to reconstruct the overall table bounds. In increasing order of bit significance: - if the table contains point keys, `1`, else `0` - if the smallest key is a point key, `1`, else `0` - if the largest key is a point key, `1`, else `0` - the smallest/largest pairs are written - first the point key bounds (if present), then the range key bounds. - any custom tags are written (e.g. table marked for compaction, etc.). This new encoding scheme uses no additional space in the manifest if there are only point keys, and remains backwards compatible with the existing manifest format. If range keys are present, each new file requires a single additional byte that encodes whether the table contains point keys in addition to how to reconstruct the overall bounds from the point and/or range keys in the table. This approach also has the advantage of encoding enough information to be able to fully reconstruct the overall table bounds at decode time, rather than waiting until a `Compare` function is available to perform the comparisons.
e2f4e1c
to
0cbb980
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 3 of 5 files reviewed, 2 unresolved discussions (waiting on @itsbilal, @jbowens, and @sumeerbhola)
Previously, jbowens (Jackson Owens) wrote…
nit: s/
tagNewFile4RangeKeys
/tagNewFile5
/
Done.
Previously, jbowens (Jackson Owens) wrote…
nit: this sentence about 'two additional bytes' is out of date now, right?
Done.
internal/manifest/version_edit.go, line 58 at r3 (raw file):
Previously, jbowens (Jackson Owens) wrote…
104
sounds good to me
👍
internal/manifest/version_test.go, line 463 at r4 (raw file):
Previously, jbowens (Jackson Owens) wrote…
nit: could use
%08b
to format the bounds marker so that it's easier to interpret as a bit field
Good call. Done.
TFTRs! |
Currently, only point key bounds for a table are present in a manifest
entry. With the addition of range keys, the manifest needs to track the
point and range key bounds, in addition to providing a mechanism to
infer the overall bounds for the table.
Adopt the following encoding scheme to incorporate range keys into the
manifest:
if the table only contains point keys, the existing encoding applies -
a new file entry is written with
tagNewFile4
and the smallest andlargest point keys are used as the smallest and largest bounds for the
table.
else, the table contains range keys, and the new file is written with
a new tag,
tagNewFile4RangeKeys
that extendstagNewFile4
:a "marker" byte is written that indicates if the table also has
point keys, and how the bounds can be used to reconstruct the
overall table bounds. In increasing order of bit significance:
1
, else0
1
, else0
1
, else0
the smallest/largest pairs are written - first the point key bounds
(if present), then the range key bounds.
any custom tags are written (e.g. table marked for compaction, etc.).
This new encoding scheme uses no additional space in the manifest if
there are only point keys, and remains backwards compatible with the
existing manifest format. If range keys are present, each new file
requires two additional bytes (a varint with value
1
or2
for thesmallest/largest keypair count, plus a byte for the bounds marker) on
top of what it takes to encode the smallest/largest point key bounds.
This approach also has the advantage of encoding enough information to
be able to fully reconstruct the overall table bounds at decode time,
rather than waiting until a
Compare
function is available to performthe comparisons.