-
-
Notifications
You must be signed in to change notification settings - Fork 774
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deserialize into a pre-existing struct #855
Comments
The analogous approach would be something like: pub trait Deserialize<'de>: Sized {
fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
where D: Deserializer<'de>;
fn deserialize_from<D>(&mut self, deserializer: D) -> Result<(), D::Error>
where D: Deserializer<'de>
{
*self = Deserialize::deserialize(deserializer)?;
Ok(())
}
} |
@nox mentioned being interested in working on this. |
I might look into this to further optimize webrender deserialization. |
Note the semantics we'd like in webrender is that it blindly clobbers the source struct. On failure it's in a partially overwritten (but memory-safe) state. We want this because serde errors are hard aborts for us (all sources are trusted), and we want to minimize work. Not sure if that matches what y'all want. |
Yes, partially overwritten but memory-safe state is acceptable. The important thing is that this is about reusing allocations, not merging one deserialized struct into another. So if you have: struct S {
a: Option<T>,
b: Option<T>,
} Deserializing |
Blugh, this is getting to a bit of a clumsy mess. Just to check that I'm understanding this correctly: I basically need to duplicate the entire of the deserialization subsystem with _from variants? Like every single method on Visitor will need a _from variant? Or I'll need to make a FromVisitor trait...? |
@gankro I hope not... Here is a quick prototype. |
Status: this will be useless for webrender as-is because enums represent a fundamental barrier to this optimization. Currently pushing on rust-lang/rfcs#2195 to fix this. |
Is the fundamental barrier enums in general, or consecutive messages that contain different variants of the same enum field? If the common case is that consecutive messages contain the same variant, we can certainly optimize that. |
The item we're deserializing is an enum whose variant is ~random. In theory, yes, we could do something clever if we got the same variant multiple times in a row, but this basically never happens for us. We need a way to build an enum in-place regardless of its old value. |
Proof of concept here: rust-lang/rust#45688 |
Ok been a bit sick but I've had a chance to work through enough of this to have an idea of how it looks, and what I need to do for my use-case. So the trick you showed me makes the API work nice and clean, and I should have structs deriving nice and well with that. Enums are more complicated, and I'll probably need to fork serde_derive to provide a webrender-custom implementation to meet our needs, since you won't want my solution. So my enum of interest has two nice properties: it's Copy (no dtors), and it has a None-like (C-like?) variant. The fact that it's Copy isn't something I'm going to be able to rely on, though. There are some plans to break that. In addition, I'm not super happy with relying on Copyness since that does nothing for UB like So I plan to write a fn deserialize_enum_from(&mut self, &mut deserializer) -> Result<(), ()> {
drop_in_place(self); // clear out old stuff
self.tag = #nonelike_variant; // mark all contents as invalid, by setting enum to "None"
match read_tag()? {
A => {
// deserialize fields
self.A.field1.deserialize_from(deserializer)?
self.A.field2.deserialize_from(deserializer)?
// Now that we're done, we can actually say we're A
self.tag = A;
Ok(())
... The basic idea here is the one we use for leak-amplification in Vec::Drain: we mark the enum as empty before starting our work, and then only at the end do we tell it that it has contents, for minimal overhead. For a Copy enum this is perfect, but for one with destructors this can end up leaking stuff. We could set up code to keep track of where we are in the parse and "unwind" our work, but I expect this will pessimize codegen for something that, in our application, is a show-stopping programming error. Emitting this implementation will be conditional on detecting that the enum has a |
Very promising early results having only implemented structs, tuples, and primitives (which can be upstreamed to y'all): https://gist.github.com/Gankro/63d1028a0b69e418add58c23fc828243 Need to do some actual profiling though, the assembly is a bit hard to follow since different changes lead to loop funrolling. |
One annoying thing I'm seeing for a [f32; 16] is that even in my most optimized result we're still generating what appears to be:
rather than
It should be within the contract of deserialize_from to do emit the latter, since we don't guarantee partial deserializations to have any particular form. That said I'm not sure if I can actually coax the three abstractions of serde+bincode+Read to understand this transformation. |
Possibly fixing https://bugs.llvm.org/show_bug.cgi?id=34911 would be sufficient. |
Implemented in #1094. |
Example from @BurntSushi - when iterating over rows of a CSV file, we would like to be able to reuse the same five String buffers on every row.
The use case is similar to
Clone::clone_from
.The text was updated successfully, but these errors were encountered: