-
-
Notifications
You must be signed in to change notification settings - Fork 671
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
new sstable format #1943
new sstable format #1943
Conversation
- Keep(VInt): number of bytes to pop | ||
|
||
|
||
Note: there is no ambiguity between both representation as Add is always guarantee to be non-zero, except for the very first key of an SSTable, where Keep is guaranteed to be zero. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great comment. This also implies that we do not support redundant keys. Is it written somewhere in this document?
Note: there is no ambiguity between both representation as Add is always guarantee to be non-zero, except for the very first key of an SSTable, where Keep is guaranteed to be zero. | |
Note: there is no ambiguity between both representation as Add is always guarantee to be non-zero, except for the very first key of an SSTable, where Keep is guaranteed to be zero. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it is, this could be rephrased as
Note: there is no ambiguity between both representation as Add is always guarantee to be non-zero, except for the very first key of an SSTable, where Keep is guaranteed to be zero. | |
Note: as the SSTable does not support redundant keys, there is no ambiguity between both representation. Add is always guaranteed to be non-zero, except for the very first key of an SSTable, where Keep is guaranteed to be zero. |
to include that information
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oups, I forgot to add that before merging. I'll add it in the compression PR, or as an independent PR if it compression end up not making it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really great job. Thank you. I had a couple of comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a ord_to_term benchmark and make sure that we don't have a regression here?(Previously we were using a binary_search...)
Scratch that. That part is unchanged at this point.
Codecov Report
📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more @@ Coverage Diff @@
## main #1943 +/- ##
==========================================
+ Coverage 94.46% 94.52% +0.06%
==========================================
Files 309 312 +3
Lines 56822 57381 +559
==========================================
+ Hits 53676 54241 +565
+ Misses 3146 3140 -6
... and 39 files with indirect coverage changes Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
this implements and document a new format for sstables. Most of the format is the same as before, except the index is now a small sstable too instead of a cbor blob.
fix #1851
groundwork for #1827 (encode dictionary type, but only for sstable, fst kept unchanged for now)
fix some of #1738
fix #1731
fix some of #1727
sstable::Dictionary::new
and usesstable::Dictionary::create
Size comparison with previous format:
src/fastfield/mod.rs
andcolumnar/src/tests.rs
show the difference for small indexessstable/src/lib.rs
went from 115 to 53 bytesThe improvement comes from three sources:
$fblocks$$slast_key_or_greater$!!jblock_addr$jbyte_range$estartcend$mfirst_ordinal
, where there is more field name than actual metadata (impact both small and large indexes)This could probably be made to support reading the old format if we want some amount of backward compatibility, at the cost of reintroducing cbor