-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce StorageCodec which helps with encoding/decoding of (codec, value) pairs #1389
Conversation
Some open questions:
|
I've pushed an update to this PR which reflects our offline discussion @AhmedSoliman. The Making the domain object aware of its storage type requires that we co-locate both. This is currently not the case for the pp storage objects where we have the |
5ea54a8
to
cd5229e
Compare
cd5229e
to
5416f15
Compare
382d9ea
to
a8a2b31
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Absolute legend! Thanks for bearing with me on this one.
// todo add proper format version | ||
let value = bincode::serde::encode_to_vec(value, bincode::config::standard()) | ||
.map_err(|err| WriteError::Codec(err.into()))?; | ||
let mut buf = BytesMut::default(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential future improvement would be to follow a design similar to storage-rocksdb's codec, if types can return an estimate of the buffer size needed for serialization, the buffer can be created with enough capacity to reduce the number of allocations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, indeed. This has become a bit harder now because before we already had a Protobuf
type which easily gave us this information. Right now we only create the protobuf types for the protobuf-encoded values when actually serializing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#[derive(Debug, Clone, serde::Serialize, serde::Deserialize)] | ||
pub struct Schema { | ||
pub version: Version, | ||
pub components: HashMap<String, ComponentSchemas>, | ||
// flexbuffers only supports string-keyed maps :-( --> so we store it as vector of kv pairs | ||
#[serde_as(as = "serde_with::Seq<(_, _)>")] | ||
pub deployments: HashMap<DeploymentId, DeploymentSchemas>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would happen if one forgot to add the serde_as here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would fail at runtime with a KeyMustBeString
error if the map contained an entry.
#[serde_as(as = "serde_with::Seq<(_, _)>")] | ||
pub subscriptions: HashMap<SubscriptionId, Subscription>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
potential future improvement would be to wrap serializable maps into a new type to DRY.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this makes sense. Good idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use crate::Schema; | ||
use restate_types::flexbuffers_storage_encode_decode; | ||
|
||
flexbuffers_storage_encode_decode!(Schema); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
turning this into a proc-macro (derive) would be a nice exercise :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed. I didn't want to press my luck with macros yesterday ;-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
crates/types/src/storage.rs
Outdated
/// To support codec evolution, this trait implementation needs to be able to decode values encoded | ||
/// with any previously used codec. | ||
pub trait StorageDecode { | ||
fn decode(buf: &[u8], kind: StorageCodecKind) -> Result<Self, StorageDecodeError> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential future improvement might to change the input to &mut T where T: Buf
to make it easier to decode from cursors (and to advance it for incremental decoding).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True. Let me check whether I can quickly change it as part of this PR.
Making the storage-api crate aware of the storage types will allow us to implement for each API type how it will be serialized. This increases coupling but simplifies things until we truely need to have multiple storage representations.
StorageSerde encodes how a given type is encoded for storing in storage.
We move the dedup types into the storage-api crate so that we can implement StorageSerde for them based on the protobuf types defined in this crate.
This commit removes the partial storage of the VirtualObjectStatus to simplify transition to the StorageSerde. Optimizations for what to not store can be done later.
We now require that stored values implement the StorageEncode trait to encode them together with their codec. When reading a value, the value needs to implement StorageDecode. Alternatively, one can also store raw bytes into RocksDB as it is needed for state values, for example.
Flexbuffers does not support non-string keyed maps. Therefore, we have to serialize maps that have a non-string key as a vector of key-value pairs.
a8a2b31
to
6e74ed0
Compare
Thanks for all your valuable feedback @AhmedSoliman 🙏 Merging this PR once GHA gives green light. |
6e74ed0
to
d32f3ef
Compare
Attempt to generalize the serialization of versioned values by introducing the
VersionedSerializer
trait. The versioned serializer is aware of the serialization format version and always writes it first (u16
) before writing the actual value. When deserializing, the serializer first reads the version and then gives it to the deserialize method for the actual value. Thereby, the deserialization step can be conditioned on the actual version. One thing that becomes apparent is that by decoupling the domain object from the storage object, one needs to provide at the storage layer boundary information about which serializer to use for a given domain object.Happy to hear what you think about this approach @AhmedSoliman.