-
Notifications
You must be signed in to change notification settings - Fork 171
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use new ChecksummedBlock in DataCache #572
Conversation
Introduce a new `ChecksummedBlock` type which represents a bytes buffer and its matching checksum. It is a simpler version of `ChecksummedBytes` in that the checksum is always matching the exposed bytes buffer, rather than potentially a containing larger buffer. The new type is better suited to be used in the `DataCache` because it can be more efficiently serialized/deserialized preserving its checksum. This change also introduces a new `checksums` module, containing both `ChecksummedBytes` and `ChecksummedBlock`, in addition to other checksum functions and types. Signed-off-by: Alessandro Passaro <alexpax@amazon.co.uk>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, let's get a second (third?) opinion on the extend fn.
Plus fix typo.
/// Append the given bytes to current `ChecksummedBlock`. | ||
pub fn extend(&mut self, extend: ChecksummedBlock) { | ||
if self.is_empty() { | ||
*self = extend; | ||
return; | ||
} | ||
if extend.is_empty() { | ||
return; | ||
} | ||
|
||
let total_len = self.bytes.len() + extend.len(); | ||
let mut bytes_mut = BytesMut::with_capacity(total_len); | ||
bytes_mut.extend_from_slice(&self.bytes); | ||
bytes_mut.extend_from_slice(&extend.bytes); | ||
let new_bytes = bytes_mut.freeze(); | ||
let new_checksum = combine_checksums(self.checksum, extend.checksum, extend.len()); | ||
*self = ChecksummedBlock { | ||
bytes: new_bytes, | ||
checksum: new_checksum, | ||
}; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is safe since we are taking two checksummed buffers, combining the two, and calculating the new checksum independently of the new buffer.
IMO the durability risk here is mitigated, but I'd also like a second opinion from the team.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that sounds right. We know the expected checksum of each side (unlike in the ChecksummedBytes
case where we only know the checksum of some larger slice of each side), and can compute the new expected checksum from those without actually looking at the bytes.
Can you add a comment here capturing that reasoning?
Signed-off-by: Alessandro Passaro <alexpax@amazon.co.uk>
Signed-off-by: Alessandro Passaro <alexpax@amazon.co.uk>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
/// A `ChecksummedBlock` is a bytes buffer that carries its checksum. | ||
/// The implementation guarantees that its integrity will be validated when data is accessed. | ||
#[derive(Debug, Clone)] | ||
pub struct ChecksummedBlock { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wondering if it might be nicer to have just one implementation of this stuff, and give ChecksummedBytes
a shrink_to_fit
-style method to get the guarantee you're looking for. But then I guess that makes extend
et al more complicated because you have to handle all the different combinations to decide when you can skip validating the checksums, so probably not worth it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After all, I think shrink_to_fit
would be a better approach and can also be used to improve extend
. I will close this PR and open a new one with that change.
if self.is_empty() { | ||
*self = extend; | ||
return; | ||
} | ||
if extend.is_empty() { | ||
return; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For these cases you probably need to validate the checksum of the empty side (which will be trivial to compute because they're zero-length slices), because the length might have been corrupted.
/// Append the given bytes to current `ChecksummedBlock`. | ||
pub fn extend(&mut self, extend: ChecksummedBlock) { | ||
if self.is_empty() { | ||
*self = extend; | ||
return; | ||
} | ||
if extend.is_empty() { | ||
return; | ||
} | ||
|
||
let total_len = self.bytes.len() + extend.len(); | ||
let mut bytes_mut = BytesMut::with_capacity(total_len); | ||
bytes_mut.extend_from_slice(&self.bytes); | ||
bytes_mut.extend_from_slice(&extend.bytes); | ||
let new_bytes = bytes_mut.freeze(); | ||
let new_checksum = combine_checksums(self.checksum, extend.checksum, extend.len()); | ||
*self = ChecksummedBlock { | ||
bytes: new_bytes, | ||
checksum: new_checksum, | ||
}; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that sounds right. We know the expected checksum of each side (unlike in the ChecksummedBytes
case where we only know the checksum of some larger slice of each side), and can compute the new expected checksum from those without actually looking at the bytes.
Can you add a comment here capturing that reasoning?
/// Validate data integrity in this `ChecksummedBlock`. | ||
/// | ||
/// Return `IntegrityError` on data corruption. | ||
pub fn validate(&self) -> Result<(), IntegrityError> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we ever use this as public API? If not, might be better to make it private, since it kinda invites time-of-check/time-of-use problems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd leave this public:
- it can be useful to fail fast
- it does not return the data, so at worst it could be redundant
self.validate().expect("should be valid"); | ||
other.validate().expect("should be valid"); | ||
|
||
true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be unreachable? here we know the bytes are equal but the checksums aren't, but they both passed validation?
if self.bytes != other.bytes { | ||
return false; | ||
} | ||
|
||
if self.checksum == other.checksum { | ||
return true; | ||
} | ||
|
||
self.validate().expect("should be valid"); | ||
other.validate().expect("should be valid"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't really matter since it's just test code, but I think you want to do it this way to be correctly bracketed:
let result = self.bytes == other.bytes;
self.validate().expect("should be valid");
other.validate().expect("should be valid");
result
Description of change
Introduce a new
ChecksummedBlock
type which represents a bytes buffer and its matching checksum. It is a simpler version ofChecksummedBytes
in that the checksum is always matching the exposed bytes buffer, rather than potentially a containing larger buffer. The new type is better suited to be used in theDataCache
because it can be more efficiently serialized/deserialized preserving its checksum.This change also introduces a new
checksums
module, containing bothChecksummedBytes
andChecksummedBlock
, in addition to other checksum functions and types.Relevant issues: #255
Does this change impact existing behavior?
No changes.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and I agree to the terms of the Developer Certificate of Origin (DCO).