-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Makes fuzzer call explicitly I/O operations #143
Conversation
479f5c0
to
a760120
Compare
Allows to enforce that the database is either create+write, either write, either read-only.
It is only composed of two fields
Adds test-only options to control some behaviors
Makes fuzzing way more deterministic Allows a more careful model of which state are already persisted and which state is not
operations: impl IntoIterator<Item = &'a Operation>, | ||
model: &mut Model, | ||
) { | ||
let mut counts = [None; 256]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let mut counts = [None; 256]; | |
let mut counts = [None; u8::MAX]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or a constant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's 256 and not 255 (=u8::MAX
) because it's the number of possible values in a u8
. I might do 1 << u8::BITS
but I am not sure if it's more readable.
} | ||
|
||
fn model_optional_content(model: &Model) -> Vec<(Vec<u8>, Vec<u8>)> { | ||
Self::model_required_content(model) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please correct me if wrong, here the difference between required and optional is that optional does include already saved content.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the difference between required and optional is here for the ref-counted case: keys which number of references is 0 might still be in the database (hence "optional") but are not required to be still there ("required"). This is not useful when not using ref-counting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we just keep old code here (Self::model_required_content(model))?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the review!
for layer in model { | ||
if let Some(c) = layer.counts[key] { | ||
if !layer.is_maybe_saved && c < 0 { | ||
min_count += c; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Option
here is the differentiate between Some(0)
aka "no reference anymore, might be garbage collected" and None
"has never been present in the dabase".
Indeed. I got mixed up here. I wrote that to handle the case where some layers were written to disk but not flushed and have been lost. But we have a proper "recovery" process for our in-memory model of the database state so it is not required anymore. I have removed the condition.
} | ||
|
||
fn model_optional_content(model: &Model) -> Vec<(Vec<u8>, Vec<u8>)> { | ||
Self::model_required_content(model) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the difference between required and optional is here for the ref-counted case: keys which number of references is 0 might still be in the database (hence "optional") but are not required to be still there ("required"). This is not useful when not using ref-counting.
let mut count = None; | ||
for layer in layers { | ||
if let Some(c) = layer.counts[key] { | ||
*count.get_or_insert(0) += c; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed. Done.
1. properly fails if not able to recover to a known tests 2. For ref-counted, in case of multiple eligible states, we build a state that is as less restrictive as possible
Changes since last review:
|
Operation::Set(k) => *counts[usize::from(k)].get_or_insert(0) += 1, | ||
Operation::Dereference(k) => | ||
if counts[usize::from(k)].unwrap_or(0) > 0 { | ||
*counts[usize::from(k)].get_or_insert(0) -= 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kind of like the previous version where the modification of count for dereference and reference did only happen if in the previous layer there was a net RC > 0 (aka an entry in db).
(for operation set I got no doubt it is correct)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A way to do (and would work with the if condition here), would be to replace
let mut counts = model.last().map_or([None; NUMBER_OF_POSSIBLE_KEYS], |l| l.counts);
with
let mut counts = model.last().map_or([None; NUMBER_OF_POSSIBLE_KEYS], |l| if l.counts == Some(0) {
None
} else {
i.counts
});
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kind of like the previous version where the modification of count for dereference and reference did only happen if in the previous layer there was a net RC > 0 (aka an entry in db).
I believe it's already happenning with if counts[usize::from(k)].unwrap_or(0) > 0 {
. But maybe i'm misssing something. I have tweaked the code to make each layer store the total number of reference at this time, not the changes in reference count.
Changing Some(0)
to None
would mean to assume that the GC is collecting all unused key-value at this step and I am not sure it's the case in the DB.
} | ||
|
||
// if we are multiple candidates, we are unsure. We pick the lower count per candidate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not get this minimal candidate logic.
Correct me where I am wrong, what I did understand:
- each layer is containing net rc of changes from a commit
- current on disk value should therefore be last layer with is_written to true
- so resetting to earlier state should use the last candidates at a lates layer an not a merge of all lesser rc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
each layer is containing net rc of changes from a commit
Sorry, it's the same missleading thing. I have simplified the implementation by storing not the "net rc of changes" but the "total rc at the time the commit is done". I should have written a code review comment about it.
current on disk value should therefore be last layer with is_written to true
I believe it is not always the case. What if the DB crashes between write and flush? We set is_written
during write, not after flush.
so resetting to earlier state should use the last candidates at a lates layer an not a merge of all lesser rc
There is also a "fun thing": the DB API returns the present key-values not their RC count. So, if we take the following sequence:
- Commit 1: Set([0], [0])
- Commit 2: Increment([0])
If we are unsure that commit 2 has been flushed properly there is now way to know from the presence/abscence of key [0] if the reference count of key [0] is 1 or 2.
} | ||
|
||
fn model_optional_content(model: &Model) -> Vec<(Vec<u8>, Vec<u8>)> { | ||
Self::model_required_content(model) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we just keep old code here (Self::model_required_content(model))?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @cheme for the review!
I have fixed or reply to your comments.
I have also made simple_model validation of removed keys stricter (it was unnecessarily lenient).
Operation::Set(k) => *counts[usize::from(k)].get_or_insert(0) += 1, | ||
Operation::Dereference(k) => | ||
if counts[usize::from(k)].unwrap_or(0) > 0 { | ||
*counts[usize::from(k)].get_or_insert(0) -= 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kind of like the previous version where the modification of count for dereference and reference did only happen if in the previous layer there was a net RC > 0 (aka an entry in db).
I believe it's already happenning with if counts[usize::from(k)].unwrap_or(0) > 0 {
. But maybe i'm misssing something. I have tweaked the code to make each layer store the total number of reference at this time, not the changes in reference count.
Changing Some(0)
to None
would mean to assume that the GC is collecting all unused key-value at this step and I am not sure it's the case in the DB.
} | ||
|
||
// if we are multiple candidates, we are unsure. We pick the lower count per candidate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
each layer is containing net rc of changes from a commit
Sorry, it's the same missleading thing. I have simplified the implementation by storing not the "net rc of changes" but the "total rc at the time the commit is done". I should have written a code review comment about it.
current on disk value should therefore be last layer with is_written to true
I believe it is not always the case. What if the DB crashes between write and flush? We set is_written
during write, not after flush.
so resetting to earlier state should use the last candidates at a lates layer an not a merge of all lesser rc
There is also a "fun thing": the DB API returns the present key-values not their RC count. So, if we take the following sequence:
- Commit 1: Set([0], [0])
- Commit 2: Increment([0])
If we are unsure that commit 2 has been flushed properly there is now way to know from the presence/abscence of key [0] if the reference count of key [0] is 1 or 2.
This sentence explains a lot to me. I was expecting is_written to be set when we consider things flushed (since running without thread we should be able to set it at this point deterministically, may make sense as a next step).
yes that was my initial assumption :)
I guess your example was more on a decrement, but I see what you mean. |
Thank you!
Yes! I started to write something about it but I encountered some issue. I prefered to keep it as a next step. |
Now that I think of it when I did put the test with different processing target, I did remove one variant (I think data in WAL cache but not flushed to WAL file) for being awkward to implement. |
@cheme And when there are also I/O errors it creates "fun" states like "flushed if it has been actually written" or "maybe flushed"... |
In theory these state should be cover by a layer upward or when restarting by the WAL commit not being complete (or crc failing). |
Makes fuzzing way more deterministic
Allows a more careful model of which state are already persisted and which state is not
Does also some refactoring of
Db
test related API to be able to call them from the fuzzer