Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rotate compaction output at user key boundary #375

Closed

Conversation

kezhuw
Copy link

@kezhuw kezhuw commented May 18, 2016

so that keys with same user key will not spread over multiple tables in
same level, level 1 or above.

Level compaction, except for level-0, don't pick overlapping tables. So
if we produce overlapping tables in user key level, it is possible that
file with newer version user key got picked and compacted to next level.
Now, the invariant that Version::Get depends on is broken, Version::Get
is broken.

The vulnerability of this approach is that a malicious client can retain
a snapshhot and do large number of writes to same key, which can cause
huge sorted table generated. LevelDB is a library that embedded in other
application. I think it should be the application's responsibility to
keep snapshhot from malicious clients.

Picking overlapping tables in all level compactions is another approach
to solve this problem. I think this approach may violate the algorithm
this implementation implies.

so that keys with same user key will not spread over multiple tables in
same level, level 1 or above.

Level compaction, except for level-0, don't pick overlapping tables. So
if we produce overlapping tables in user key level, it is possible that
file with newer version user key got picked and compacted to next level.
Now, the invariant that Version::Get depends on is broken, Version::Get
is broken.

The vulnerability of this approach is that a malicious client can retain
a snapshhot and do large number of writes to same key, which can cause
huge sorted table generated. LevelDB is a library that embedded in other
application. I think it should be the application's responsibility to
keep snapshhot from malicious clients.

Picking overlapping tables in all level compactions is another approach
to solve this problem. I think this approach may violate the algorithm
this implementation implies.
@kezhuw kezhuw force-pushed the rotate_compaction_at_user_key_boundary branch from 849345d to 2546ba5 Compare May 18, 2016 08:48
@kezhuw
Copy link
Author

kezhuw commented May 18, 2016

I write a gist https://gist.github.com/kezhuw/2e53639ca46b89016ea54703e61734cd to show this.

This gist contains several files: rotate.diff, rotate.before.log, rotate.after.log, rotate.filesize.log.

You can apply rotate.diff to leveldb source directory to create a test program to produce the log file attached in this gist. It contains a macro ROTATE_AT_USER_KEY to toggle this pacth.

The test program constantly puts a key to LevelDB and allocates snapshots from that db without releasing. It also logs some compaction info to stderr, which I redirect to rotate.before.log and rotate.after.log in this gist.

level 1: compacted file 62 ==> file 63, overlapping at user key: I am key![71 ==> 70]

This is Line 47 from rotate.before.log. It says that user key of file 62's largest key is same with file 63's smallest key. The key in file 62 has sequence 71, which is bigger than file 63's, which is 70. This means that key in file 62 is newer than key in file 63.

level 1 compaction: smallest snapshot: 1 file 62 [I am key!(72) ==> I am key!(71)](compacting) file 63 [I am key!(70) ==> I am key!(69)]

This is Line 74-76 from rotate.before.log. It says that file 62 is be compacting to level 2, while file 63 is not! According to Line 47, newer entry in file 62 got compacted to next level, while its older variant remains in younger level. If we try to get that key with snapshot 71, we will hit file 63, not file 62.

This pull request only rotate compaction outputs in user key boundary. Keys with same user key will never got spread over multiple tables in same level above level-0. The side effect is that it may produce file exceed file size limitation. Sadly, this pull request can't pass ci due to this side effect.

rotate.after.log is produced by program with this pacth applied. rotate.filesize.log is output from ls -lh of the db directory.

@kezhuw
Copy link
Author

kezhuw commented Mar 30, 2019

Closes this pull request and #376 due to better alternative #339(#320) which did not violate file size limitation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant