-
-
Notifications
You must be signed in to change notification settings - Fork 671
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SSTableIndex could be a multi layer SSTable #1948
Comments
Assigned to @trinity-1686a for the moment, but we don't even know if we want this. |
In the flamegraph PSeitz linked here #1946 (comment), |
unassigning myself for now, while there is possibly something to gain for large segments, I believe there are probably more impactful changes to explore |
Right now we use the SSTable format to store block "checkpoints" in a Vec and do binary search on those.
An alternative could be to store several layers of SSTable:
Thanks to geometric series, the overhead of having that stack of layers is minor for B = 16 for instance.
We would have transformed the binary search into a coarse to scale linear search, with good strong locality.
The main benefit would be to make opening a term dictionary very cheap, regardless of number of blocks.
@trinity-1686a @PSeitz let me know if I make sense? To know if this is useful, we need to know
how expensive it is to open a sstable with 10 millions terms today.
To accept such a change, we will need also a bench on
ord_to_term
(if it does not exist already)The text was updated successfully, but these errors were encountered: