Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

db: introduce Pebble-specific sstable versioning scheme #1409

Closed
nicktrav opened this issue Dec 14, 2021 · 6 comments
Closed

db: introduce Pebble-specific sstable versioning scheme #1409

nicktrav opened this issue Dec 14, 2021 · 6 comments
Assignees

Comments

@nicktrav
Copy link
Contributor

Both cockroachdb/cockroach#73485 and cockroachdb/cockroach#73708 revealed incompatibilities between the sstable format and the Pebble binary version. Specifically, older versions of Pebble cannot parse ssatbles with index entries containing block property information.

While this was temporarily fixed on the Cockroach side via cockroachdb/cockroach#73765, we need to address this on the Pebble side, both for block properties, but also the upcoming range key feature (tracking via #1339).

Given that the SST format has diverged sufficiently from the Rockdbv2 format with the addition of the block properties and range keys, it makes sense to consider introducing a Pebble-specific versioning scheme. Such a scheme would most likely need to change the magic number in the SST footer, and make use of a version number (Pebble knows about Rocks versions 1 and 2, for example).

@nicktrav
Copy link
Contributor Author

cc: @sumeerbhola @jbowens @dt

@jbowens
Copy link
Collaborator

jbowens commented Dec 17, 2021

Tangentially related to #97

@nicktrav
Copy link
Contributor Author

Neat. Those look like interesting enhancements. I wonder if we could adopt them as part of whatever the new versioning scheme is? Doesn't seem like we'd need compatibility with rocksdb format versions 3 and 4 anymore?

@jbowens
Copy link
Collaborator

jbowens commented Dec 17, 2021

It’s a minor point, but I think it’d be nice to reserve versions 3 and 4 for RocksDB versions 3 and 4. All sstable versions 1-4 can maintain bi-directional compatibility with RocksDB and use the RocksDB magic identifier. All future Pebble versions (all versions >= 5) can use the Pebble magic identifier.

@nicktrav
Copy link
Contributor Author

nicktrav commented Dec 17, 2021

reserve versions 3 and 4 for RocksDB versions 3 and 4

All sstable versions 1-4 can maintain bi-directional compatibility with RocksDB and use the RocksDB magic identifier.

Sounds good.

All future Pebble versions (all versions >= 5) can use the Pebble magic identifier.

I was under the impression that we could reset the version counter back to 0 if we introduced a new magic identifier. Looking at how rocksdb handles this, the last 8 bytes of the table (the magic bytes) determine how to interpret the rest of the table, including the versioning scheme / format version.

Concretely, we could have something like:

identifier    version
----------    -------
leveldb          0
rocksdb          1 (not supported)
rocksdb          2 (our current default)
rocksdb          3 (reserved, TBA)
rocksdb          4 (reserved, TBA)
pebble           0 (perhaps equivalent to rockdbv2?)
pebble           1 (pebble range keys)

To also summarize our conversation internally, sstable.FormatVersion would have a new enum type for the (Pebble, vN) format tuple.

And also:

and then we maintain a mapping of Pebble format major version to TableFormat : so that once a database is upgraded to the new format major version, we begin generating and accepting sstables in that TableFormat

@nicktrav nicktrav self-assigned this Jan 11, 2022
@nicktrav
Copy link
Contributor Author

nicktrav commented Feb 4, 2022

Done in #1450 and #1471.

@nicktrav nicktrav closed this as completed Feb 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants