-
Notifications
You must be signed in to change notification settings - Fork 7.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compaction causes data inconsistency when using snapshots #320
Comments
The error occurs when leveldb does a compaction while there are several records with the same key but different sequence numbers. This can occur when snapshots are in play. The compaction algorithm can choose a file such that it will push the latest record from level i to level i+1, then a following get will retrieve the older record from level i which is incorrect. I see two ways to fix: (i) always compact a file set that is closed with respect records with the same key, e.g. if files f1 and f2 have records with key k1 then compact f1 and f2 together, or (ii) when considering f1 for compaction check that no older record of any k in f1 exists at level i, it is sufficient to check the boundary elements of the file f1 I believe, and if the criteria isn't met to try to compact the previous file (I assume you sort based on key and sequence number with increasing sequence numbers being later in the sequence of files at a level). |
Closes google#320 During compaction it was possible that records from a block b1=(l1,u1) would be pushed down from level i to level i+1. If there is a block b2=(l2,u2) at level i with k1 = user_key(u1) = user_key(l2) then a subsequent search for k1 will yield the record l2 which has a smaller sequence number than u1 because the sort order for records sorts increasing by user key but decreaing by sequence number. This change add a call to a new function AddBoundaryInputs to SetupOtherInputs. AddBoundaryInputs searches for a block b2 matching the criteria above and adds it to the set of files to be compacted. Whenever AddBoundaryInputs is called it is important that the compaction fileset in level i+1 (known as c->inputs_[1] in the code) be recomputed. Each call to AddBoundaryInputs is followed by a call to GetOverlappingInputs. SetupOtherInputs is called on both manual and automated compaction passes. It is called for both level zero and for levels greater than 0. Testing Done: A test program (issue320_test) was constructed that performs mutations while snapshots are active. issue320_test fails without this bug fix after 64k writes. It passes with this bug fix. It was run with 200M writes and passed. Unit tests were written for the new function that was added to the code. Make check was run and seen to pass. Signed-off-by: Richard Cole <richcole@amazon.com>
In case it isn't obvious from the pretty graphics above included by github there is a patch for this bug here: |
Closes google#320 During compaction it was possible that records from a block b1=(l1,u1) would be pushed down from level i to level i+1. If there is a block b2=(l2,u2) at level i with k1 = user_key(u1) = user_key(l2) then a subsequent search for k1 will yield the record l2 which has a smaller sequence number than u1 because the sort order for records sorts increasing by user key but decreaing by sequence number. This change add a call to a new function AddBoundaryInputs to SetupOtherInputs. AddBoundaryInputs searches for a block b2 matching the criteria above and adds it to the set of files to be compacted. Whenever AddBoundaryInputs is called it is important that the compaction fileset in level i+1 (known as c->inputs_[1] in the code) be recomputed. Each call to AddBoundaryInputs is followed by a call to GetOverlappingInputs. SetupOtherInputs is called on both manual and automated compaction passes. It is called for both level zero and for levels greater than 0. The original change posted in google#339 has been modified to also include changed made by Chris Mumford<cmumford@google.com> in cmumford@4b72cb1 1. Releasing snapshots during test cleanup to avoid memory leak warnings. 2. Refactored test to use testutil.h to be in line with other issue tests and to create the test database in the correct temporary location. 3. Added copyright banner. Otherwise, just minor formatting and limiting character width to 80 characters. Additionally the change was rebased on top of current master and changes previously made to the Makefile were ported to the CMakeLists.txt. Testing Done: A test program (issue320_test) was constructed that performs mutations while snapshots are active. issue320_test fails without this bug fix after 64k writes. It passes with this bug fix. It was run with 200M writes and passed. Unit tests were written for the new function that was added to the code. Make test was run and seen to pass. Signed-off-by: Richard Cole <richcole@amazon.com>
Closes google#320 During compaction it was possible that records from a block b1=(l1,u1) would be pushed down from level i to level i+1. If there is a block b2=(l2,u2) at level i with k1 = user_key(u1) = user_key(l2) then a subsequent search for k1 will yield the record l2 which has a smaller sequence number than u1 because the sort order for records sorts increasing by user key but decreaing by sequence number. This change add a call to a new function AddBoundaryInputs to SetupOtherInputs. AddBoundaryInputs searches for a block b2 matching the criteria above and adds it to the set of files to be compacted. Whenever AddBoundaryInputs is called it is important that the compaction fileset in level i+1 (known as c->inputs_[1] in the code) be recomputed. Each call to AddBoundaryInputs is followed by a call to GetOverlappingInputs. SetupOtherInputs is called on both manual and automated compaction passes. It is called for both level zero and for levels greater than 0. The original change posted in google#339 has been modified to also include changed made by Chris Mumford<cmumford@google.com> in 4b72cb1 1. Releasing snapshots during test cleanup to avoid memory leak warnings. 2. Refactored test to use testutil.h to be in line with other issue tests and to create the test database in the correct temporary location. 3. Added copyright banner. Otherwise, just minor formatting and limiting character width to 80 characters. Additionally the change was rebased on top of current master and changes previously made to the Makefile were ported to the CMakeLists.txt. Testing Done: A test program (issue320_test) was constructed that performs mutations while snapshots are active. issue320_test fails without this bug fix after 64k writes. It passes with this bug fix. It was run with 200M writes and passed. Unit tests were written for the new function that was added to the code. Make test was run and seen to pass. Signed-off-by: Richard Cole <richcole@amazon.com>
Closes google#320 During compaction it was possible that records from a block b1=(l1,u1) would be pushed down from level i to level i+1. If there is a block b2=(l2,u2) at level i with k1 = user_key(u1) = user_key(l2) then a subsequent search for k1 will yield the record l2 which has a smaller sequence number than u1 because the sort order for records sorts increasing by user key but decreaing by sequence number. This change add a call to a new function AddBoundaryInputs to SetupOtherInputs. AddBoundaryInputs searches for a block b2 matching the criteria above and adds it to the set of files to be compacted. Whenever AddBoundaryInputs is called it is important that the compaction fileset in level i+1 (known as c->inputs_[1] in the code) be recomputed. Each call to AddBoundaryInputs is followed by a call to GetOverlappingInputs. SetupOtherInputs is called on both manual and automated compaction passes. It is called for both level zero and for levels greater than 0. The original change posted in google#339 has been modified to also include changed made by Chris Mumford<cmumford@google.com> in cmumford@4b72cb1 1. Releasing snapshots during test cleanup to avoid memory leak warnings. 2. Refactored test to use testutil.h to be in line with other issue tests and to create the test database in the correct temporary location. 3. Added copyright banner. Otherwise, just minor formatting and limiting character width to 80 characters. Additionally the change was rebased on top of current master and changes previously made to the Makefile were ported to the CMakeLists.txt. Testing Done: A test program (issue320_test) was constructed that performs mutations while snapshots are active. issue320_test fails without this bug fix after 64k writes. It passes with this bug fix. It was run with 200M writes and passed. Unit tests were written for the new function that was added to the code. Make test was run and seen to pass. Signed-off-by: Richard Cole <richcole@amazon.com>
Closes google#320 During compaction it was possible that records from a block b1=(l1,u1) would be pushed down from level i to level i+1. If there is a block b2=(l2,u2) at level i with k1 = user_key(u1) = user_key(l2) then a subsequent search for k1 will yield the record l2 which has a smaller sequence number than u1 because the sort order for records sorts increasing by user key but decreaing by sequence number. This change add a call to a new function AddBoundaryInputs to SetupOtherInputs. AddBoundaryInputs searches for a block b2 matching the criteria above and adds it to the set of files to be compacted. Whenever AddBoundaryInputs is called it is important that the compaction fileset in level i+1 (known as c->inputs_[1] in the code) be recomputed. Each call to AddBoundaryInputs is followed by a call to GetOverlappingInputs. SetupOtherInputs is called on both manual and automated compaction passes. It is called for both level zero and for levels greater than 0. The original change posted in google#339 has been modified to also include changed made by Chris Mumford<cmumford@google.com> in cmumford@4b72cb1 1. Releasing snapshots during test cleanup to avoid memory leak warnings. 2. Refactored test to use testutil.h to be in line with other issue tests and to create the test database in the correct temporary location. 3. Added copyright banner. Otherwise, just minor formatting and limiting character width to 80 characters. Additionally the change was rebased on top of current master and changes previously made to the Makefile were ported to the CMakeLists.txt. Testing Done: A test program (issue320_test) was constructed that performs mutations while snapshots are active. issue320_test fails without this bug fix after 64k writes. It passes with this bug fix. It was run with 200M writes and passed. Unit tests were written for the new function that was added to the code. Make test was run and seen to pass. Signed-off-by: Richard Cole <richcole@amazon.com>
Closes google#320 During compaction it was possible that records from a block b1=(l1,u1) would be pushed down from level i to level i+1. If there is a block b2=(l2,u2) at level i with k1 = user_key(u1) = user_key(l2) then a subsequent search for k1 will yield the record l2 which has a smaller sequence number than u1 because the sort order for records sorts increasing by user key but decreaing by sequence number. This change add a call to a new function AddBoundaryInputs to SetupOtherInputs. AddBoundaryInputs searches for a block b2 matching the criteria above and adds it to the set of files to be compacted. Whenever AddBoundaryInputs is called it is important that the compaction fileset in level i+1 (known as c->inputs_[1] in the code) be recomputed. Each call to AddBoundaryInputs is followed by a call to GetOverlappingInputs. SetupOtherInputs is called on both manual and automated compaction passes. It is called for both level zero and for levels greater than 0. The original change posted in google#339 has been modified to also include changed made by Chris Mumford<cmumford@google.com> in cmumford@4b72cb1 1. Releasing snapshots during test cleanup to avoid memory leak warnings. 2. Refactored test to use testutil.h to be in line with other issue tests and to create the test database in the correct temporary location. 3. Added copyright banner. Otherwise, just minor formatting and limiting character width to 80 characters. Additionally the change was rebased on top of current master and changes previously made to the Makefile were ported to the CMakeLists.txt. Testing Done: A test program (issue320_test) was constructed that performs mutations while snapshots are active. issue320_test fails without this bug fix after 64k writes. It passes with this bug fix. It was run with 200M writes and passed. Unit tests were written for the new function that was added to the code. Make test was run and seen to pass. Signed-off-by: Richard Cole <richcole@amazon.com> # Conflicts: # CMakeLists.txt
When using snapshots, background compaction causes data inconsistency. Existing records can became missing or previously existed records or their values can appear again. Test program that reproduces the issue https://gist.github.com/specialforest/fa136b660bdd672e3d4b
The text was updated successfully, but these errors were encountered: