Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new sstable format #1943

Merged
merged 9 commits into from
Mar 21, 2023
Merged

new sstable format #1943

merged 9 commits into from
Mar 21, 2023

Conversation

trinity-1686a
Copy link
Contributor

@trinity-1686a trinity-1686a commented Mar 16, 2023

this implements and document a new format for sstables. Most of the format is the same as before, except the index is now a small sstable too instead of a cbor blob.

fix #1851

  • remove the dependency.
  • have a well-spec'ed format.
  • have proper versioning in the format
  • get something more compact... Right now a tiny dictionary can often top 100 bytes.

groundwork for #1827 (encode dictionary type, but only for sstable, fst kept unchanged for now)

fix some of #1738

  • inverse vint implementation (not sure what this is)
  • remove serde_cbor.
  • add multilevel indexing (in some way this is a 2 level index I guess?)

fix #1731

fix some of #1727

  • add version & magic number in SSTable format. At the dictionary scale, not in each block.
  • remove sstable::Dictionary::new and use sstable::Dictionary::create
  • refactor the magic number/ version logic.

Size comparison with previous format:

  • tests in src/fastfield/mod.rs and columnar/src/tests.rs show the difference for small indexes
  • test payload in sstable/src/lib.rs went from 115 to 53 bytes
  • on a split of exactly one day of gh-archive, the size of the "footer" went from 1075365 (1MB) to 86399 (84kB), ~92% smaller

The improvement comes from three sources:

  • cbor keep the name of the fields as it aims to encode the same kind of data as json, so an index for a single block might look something like this: $fblocks$$slast_key_or_greater$!!jblock_addr$jbyte_range$estartcend$mfirst_ordinal, where there is more field name than actual metadata (impact both small and large indexes)
  • using sstable allows to merge key prefixes when there are blocks ending with similar enough keys (impact only large indexes)
  • block range and first_ord are encoded as delta, start offset are implicit, derived from the length of previous blocks (impact both small and large indexes)

This could probably be made to support reading the old format if we want some amount of backward compatibility, at the cost of reintroducing cbor

sstable/README.md Outdated Show resolved Hide resolved
sstable/README.md Outdated Show resolved Hide resolved
- Keep(VInt): number of bytes to pop


Note: there is no ambiguity between both representation as Add is always guarantee to be non-zero, except for the very first key of an SSTable, where Keep is guaranteed to be zero.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great comment. This also implies that we do not support redundant keys. Is it written somewhere in this document?

Suggested change
Note: there is no ambiguity between both representation as Add is always guarantee to be non-zero, except for the very first key of an SSTable, where Keep is guaranteed to be zero.
Note: there is no ambiguity between both representation as Add is always guarantee to be non-zero, except for the very first key of an SSTable, where Keep is guaranteed to be zero.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it is, this could be rephrased as

Suggested change
Note: there is no ambiguity between both representation as Add is always guarantee to be non-zero, except for the very first key of an SSTable, where Keep is guaranteed to be zero.
Note: as the SSTable does not support redundant keys, there is no ambiguity between both representation. Add is always guaranteed to be non-zero, except for the very first key of an SSTable, where Keep is guaranteed to be zero.

to include that information

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oups, I forgot to add that before merging. I'll add it in the compression PR, or as an independent PR if it compression end up not making it

sstable/src/lib.rs Outdated Show resolved Hide resolved
sstable/src/lib.rs Outdated Show resolved Hide resolved
Copy link
Collaborator

@fulmicoton fulmicoton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really great job. Thank you. I had a couple of comments.

Copy link
Collaborator

@fulmicoton fulmicoton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a ord_to_term benchmark and make sure that we don't have a regression here?(Previously we were using a binary_search...)

Scratch that. That part is unchanged at this point.

@codecov-commenter
Copy link

Codecov Report

Merging #1943 (403ceea) into main (8459efa) will increase coverage by 0.06%.
The diff coverage is 94.21%.

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

@@            Coverage Diff             @@
##             main    #1943      +/-   ##
==========================================
+ Coverage   94.46%   94.52%   +0.06%     
==========================================
  Files         309      312       +3     
  Lines       56822    57381     +559     
==========================================
+ Hits        53676    54241     +565     
+ Misses       3146     3140       -6     
Impacted Files Coverage Δ
sstable/src/value/mod.rs 100.00% <ø> (ø)
sstable/src/dictionary.rs 86.09% <52.94%> (-1.65%) ⬇️
sstable/src/sstable_index.rs 99.37% <96.96%> (-0.63%) ⬇️
sstable/src/value/index.rs 98.79% <98.79%> (ø)
columnar/src/tests.rs 100.00% <100.00%> (ø)
src/fastfield/mod.rs 100.00% <100.00%> (ø)
sstable/src/delta.rs 97.65% <100.00%> (+0.07%) ⬆️
sstable/src/lib.rs 99.40% <100.00%> (+0.94%) ⬆️

... and 39 files with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Revisit the sstable format Make SSTable block size configurable
3 participants