-
Notifications
You must be signed in to change notification settings - Fork 192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
omitempty tags implemented. #164
Conversation
Thanks for the PR. A couple high-level comments
And a couple nit-picks
|
afddfba
to
d92a832
Compare
Thanks for the thoughtful comments, Philip.
agreed. I thought it was better documentation, but it's an implementation detail really. I changed to private.
The structmap() method in both encode.go and marshal.go is where things happen, specifically... and those calls happen downstream of the gStruct() method in both cases, which means that the AsTuple/tuple() invocation gets priority, and there will never be any omitempty logic run when using tuple mode. https://github.com/glycerine/msgp/blob/tag_omitempty/gen/encode.go#L71 conclusion: I added a note at the bottom of the README that Tuple overrules the tag.
LOL for unhealthy amounts of C : ). I find this makes my code more fragile, but it's your baby. I changed %v to %s and %d and pushed an update.
To check this I added a couple of nested structs to _generated/def.go, and I only see recursive calls... example from the resulting _generated/generated.go:
is generated from this schema:
Nonetheless, as a defensive measure -- say, in case things change in the future -- I added a serial number suffix to the empty variable. It's in the latest push. |
e45a84e
to
fc5241d
Compare
This code would make reusing structs very risky. At the moment decoding a struct will always set all members. With this patch it's possible members will keep their previous value as they might not be present in the encoded struct. Since msgp is very useful for high performance environments I expect many people use msgp in combination with See also: pquerna/ffjson#195 |
Thanks for your thoughtful comment, @erikdubbelboer. If one was decoding into a bigger struct--one that had more members than were on the wire, one always had this concern. Those extra fields were not altered. Hence one was never able to blindly "reuse" a struct without zeroing it first--if you are updating your data structures. The wire data received could be from an earlier generation that lacked some fields. The best practice is simple and unchanged; and doesn't alter the value of the omitempty tag for performance optimization. Simply zero out any target struct before decoding into it. Easy peasy. The go compiler optimizes zeroing, even for slices (e.g. golang/go#5373), so zeroing the entire struct in one go (pun intended) will typically be much faster than individually zeroing some fields. Conclusion: To address your concern, @erikdubbelboer, I added code to do this zeroing by default, before doing unmarshal or decode into the target struct. On measurement with BenchmarkFastDecode, this made less than 1% difference (122 nsec/op versus 121 nsec/op). Nonetheless, there is also an out if you know what you're doing (for example, you were already zeroing your structs if you were re-using them; or you know that they are empty because you just made them fresh). The -skip-decode-struct-zeroing flag to msgp will turn off the automatic zeroing in the generated code. So now the default is safer, all around. |
@glycerine do I understand correctly from your implementation that it only zeroes the top level struct? If so this would mean child structs with pointers to them wouldn't be reused as the pointers would be set to nil. The ffjson pull request I linked actually keeps track of which members have been set an only zeroes members not present in the input. |
d8d0f82
to
ab01ea4
Compare
Hi @erikdubbelboer, Fully recursive, this is difficult, implementation wise. The Insane struct in particular presented difficulties when I tried not doing only top-level zeroing. The way the decode implementation inlines sub-elements makes things tricky. See here I was assuming that if you are really tuning that much for performance, then you would have already embedded your nested struct directly, rather than using a pointer. Right? : ) Nonetheless, for convenience, I can see the utility of saving and reusing sub-structs pointed to by the top level struct. It's not difficult to preserve top level pointers (so I implemented it; the latest push now does this). But anything after the first level... Ugh. The inlining makes this... a mess. I invite you to try the implementation yourself. Perhaps you'll find a way where I found it difficult going. There's something to Rob Pike's love of implementation simplicity that is worthwhile. Actually I was also assuming at first that there wasn't so much inlining (the encode side is simpler). Hence top level clearing with preserving of pointers should suffice. I wonder if there is a way to experiment with less inlining on the decode side... @philhofer is there a way to disable decode inlining?/was this a source of huge performance gain? Remember we always the have the safety hatch: If the user wants finer control, they can always use the -skip-decode-struct-zeroing flag and prep their struct tree exactly as they like, before calling DecodeMsg/UnmarshalMsg. |
If I recall, the anonymous structs |
I read Philip's comments here #154 :
So it looks like in-place merge, even of slices and maps, has been a long standing feature, to aim for less allocation. Hence having the zero-ing of structs prior to decode sounds like a big break with traditional expectations, and perhaps not what we want. |
Right; zero-ing structs before decoding would be a (breaking) behavioral change. It's entirely possible that there are users out there who depend upon being able to decode into a struct full of non-zero 'default' values. Plus, as you say, the zeroing (but not nulling-out) sub-structures requires a whole bunch more complexity in the code generator. |
Yep. So I went back to the drawing board. I speculate that pseudo code like
would work. I tried this. It works for 95% of the cases. Unfortunately, the anonymous types like TestType.Obj gum up the works:
e.g. here: these anonymous types fly under the radar, so there's no way to define a zeroMissingFields method for them... |
ab01ea4
to
daf4da5
Compare
Okay! I got something working. It keeps the existing stance towards zero allocation, and just treats empty fields (and only empty fields) as if they were nil on the wire. It is implemented for DecodeMsg only as of tonight; I still have yet to do the port to Unmarshal. Let me know if you like the approach before I do that work. It adds a mode to the reader where it only returns nil. Here's the summary of the approach, which is implemented in the latest push https://github.com/glycerine/msgp/blob/tag_omitempty/gen/decode.go#L103:
|
daf4da5
to
2ee78ce
Compare
both DecodeMsg and UnmarshalMsg are now implemented with the latest approach. |
I wouldn't say it's ready to merge yet; still needs tests and performance comparison. Also I wasn't sure how to handle decoding of a missing field into an Extension. |
0017e71
to
7e780bd
Compare
With the latest update, both DecodeMsg and UnmarshalMsg work and are tested. This implements the fully recursive, no-allocation merge, and so preserves backwards compatibility. |
struct fields can be tagged with `msg:",omitempty"` Such fields will not be serialized if they contain their zero values. For structs with many optional fields, the space and time savings can be substantial. From a zero-alloc standpoint, UnmarshalMsg and DecodeMsg continue, as before, to re-use existing fields in an object rather than allocating new ones. So, if you decode into the same object repeatedly, things like slices maps, and fields that point to other structures won't be re-allocated. Instead, maps and fields will be re-sized appropriately. In other words, mutable fields are simply mutated in-place. Fixes tinylib#103
654b3a2
to
68f025a
Compare
This is now tested and working great. I note For my aims, I need the omitempty feature to support my zebrapack experiment (prefixing messages with a schema, like gobs, but all in msgpack) to support deprecated fields, https://github.com/glycerine/zebra, so I think I'll just maintain a friendly fork of msgp for that purpose; unless I hear from you, @philhofer, that you'd really like to have it in mainline msgp. |
Closing this out for reasons in previous message. I've advance this line further than the above PR, so it should no longer be considered state of the art. Interested parties can diff against my fork https://github.com/glycerine/msgp. |
fyi my friendly fork https://github.com/glycerine/zebrapack (described https://github.com/glycerine/zebra ) now has full support for omitempty, as well as using schema, capnproto style field numbering, and flatbuffers style deprecation. There are some nice benchmarks available showing that with schema (int keys instead of strings in the maps) one can go 20-30% faster. It is completely safe to use, as per @erikdubbelboer's suggestion: if structs are re-used their fields are zeroed upon encountering a missing field on the wire. |
A rough draft of the omitempty feature. Not ready to merge, but posting it since I'd appreciate feedback on a couple of things specifically:
commit log:
struct fields can be tagged with
msgp:",omitempty"
Such fields will not be serialized
if they contain their zero values.
There is no cost to this feature if
omitempty tags are not used. And for structs with
many optional fields, the space and
time savings can be substantial.
Fixes #103
This change is