omitempty tags implemented. #164

glycerine · 2016-10-16T05:55:34Z

A rough draft of the omitempty feature. Not ready to merge, but posting it since I'd appreciate feedback on a couple of things specifically:

where/how to add appropriate tests and benchmarks
if the Msgsize() method should also be using FieldsNotEmpty()... is their any disadvantage to over-estimating the size required?
anything else that would be helpful polish

commit log:

struct fields can be tagged with
msgp:",omitempty"

Such fields will not be serialized
if they contain their zero values.

There is no cost to this feature if
omitempty tags are not used. And for structs with
many optional fields, the space and
time savings can be substantial.

Fixes #103

This change is

philhofer · 2016-10-16T06:42:37Z

Thanks for the PR.

A couple high-level comments

I'm not sure there's any value in exporting FieldsNotEmpty; I think it would suffice as a lower-cased helper method (without the exported interface as well). This will also buy you some flexibility if you decide to change the implementation of this feature.
You should make sure this interacts reasonably with the //msgp: directives. Specifically, I'm not sure it makes sense for a struct to be a tuple if it has nullable fields. (Or maybe it does...? But in either case, we should decide on a behavior and document it.)

And a couple nit-picks

I prefer %d or %s formatting strings to %v when applicable. (Partly because static analyzers can detect when you've used a totally insane formatting directive unintentionally, and partly because I have written an unhealthy amount of C.)
I'd randomly generate the name of the array that holds the "not empty" bools. A huge portion of all the bugs ever filed against this project have been caused by unintentional collisions within a scope. (I think right now you'd get an issue if one struct with ",omitempty" fields was inlined into another, for example.)

glycerine · 2016-10-16T08:07:27Z

Thanks for the thoughtful comments, Philip.

I'm not sure there's any value in exporting FieldsNotEmpty...

agreed. I thought it was better documentation, but it's an implementation detail really. I changed to private.

You should make sure this interacts reasonably with the //msgp: directives. Specifically, I'm not sure it makes sense for a struct to be a tuple if it has nullable fields. (Or maybe it does...? But in either case, we should decide on a behavior and document it.)

The structmap() method in both encode.go and marshal.go is where things happen, specifically...
https://github.com/glycerine/msgp/blob/tag_omitempty/gen/encode.go#L106
https://github.com/glycerine/msgp/blob/tag_omitempty/gen/marshal.go#L101

and those calls happen downstream of the gStruct() method in both cases, which means that the AsTuple/tuple() invocation gets priority, and there will never be any omitempty logic run when using tuple mode.

https://github.com/glycerine/msgp/blob/tag_omitempty/gen/encode.go#L71
https://github.com/glycerine/msgp/blob/tag_omitempty/gen/marshal.go#L77

conclusion: I added a note at the bottom of the README that Tuple overrules the tag.

I prefer %d or %s formatting strings to %v when applicable. (Partly because static analyzers can detect when you've used a totally insane formatting directive unintentionally, and partly because I have written an unhealthy amount of C.)

LOL for unhealthy amounts of C : ). I find this makes my code more fragile, but it's your baby. I changed %v to %s and %d and pushed an update.

I'd randomly generate the name of the array that holds the "not empty" bools. A huge portion of all the bugs ever filed against this project have been caused by unintentional collisions within a scope. (I think right now you'd get an issue if one struct with ",omitempty" fields was inlined into another, for example.)

omitempty never applies to structs recursively; this simplifies things and means that each struct controls its own level of omitempty optimization. Mostly I avoided it because it seemed overcomplicated and perhaps expensive to walk the member struct.

To check this I added a couple of nested structs to _generated/def.go, and I only see recursive calls... example from the resulting _generated/generated.go:

// EncodeMsg implements msgp.Encodable
func (z *OmitSimple) EncodeMsg(en *msgp.Writer) (err error) {

    // honor the omitempty tags
    var empty [2]bool
    fieldsInUse := z.fieldsNotEmpty(empty[:])

    // map header
    err = en.WriteMapHeader(fieldsInUse)
    if err != nil {
        return err
    }

    // write "CountDrocula"
    err = en.Append(0xac, 0x43, 0x6f, 0x75, 0x6e, 0x74, 0x44, 0x72, 0x6f, 0x63, 0x75, 0x6c, 0x61)
    if err != nil {
        return err
    }
    err = en.WriteInt(z.CountDrocula)
    if err != nil {
        return
    }
    if !empty[1] {
        // write "Inside1"
        err = en.Append(0xa7, 0x49, 0x6e, 0x73, 0x69, 0x64, 0x65, 0x31)
        if err != nil {
            return err
        }
        err = z.Inside1.EncodeMsg(en)
        if err != nil {
            return
        }
    }

    return
}

is generated from this schema:

type OmitEmptyInside1 struct {
    CountOfMonteCrisco int
    Name               string           `msg:"name,omitempty"`
    Inside2            OmitEmptyInside2 `msg:",omitempty"`
}

type OmitEmptyInside2 struct {
    Name string `msg:",omitempty"`
}

type OmitSimple struct {
    CountDrocula int
    Inside1      OmitEmptyInside1 `msg:",omitempty"`
}

Nonetheless, as a defensive measure -- say, in case things change in the future -- I added a serial number suffix to the empty variable. It's in the latest push.

erikdubbelboer · 2016-10-16T09:03:09Z

This code would make reusing structs very risky. At the moment decoding a struct will always set all members. With this patch it's possible members will keep their previous value as they might not be present in the encoded struct.

Since msgp is very useful for high performance environments I expect many people use msgp in combination with sync.Pool. Using omitempty in this case would result in invalid structs and would force people to choose between using sync.Pool or omitempty.

See also: pquerna/ffjson#195

glycerine · 2016-10-16T11:36:16Z

At the moment decoding a struct will always set all members.

Thanks for your thoughtful comment, @erikdubbelboer.

If one was decoding into a bigger struct--one that had more members than were on the wire, one always had this concern. Those extra fields were not altered.

Hence one was never able to blindly "reuse" a struct without zeroing it first--if you are updating your data structures. The wire data received could be from an earlier generation that lacked some fields.

The best practice is simple and unchanged; and doesn't alter the value of the omitempty tag for performance optimization. Simply zero out any target struct before decoding into it. Easy peasy.

The go compiler optimizes zeroing, even for slices (e.g. golang/go#5373), so zeroing the entire struct in one go (pun intended) will typically be much faster than individually zeroing some fields.

Conclusion: To address your concern, @erikdubbelboer, I added code to do this zeroing by default, before doing unmarshal or decode into the target struct. On measurement with BenchmarkFastDecode, this made less than 1% difference (122 nsec/op versus 121 nsec/op).

Nonetheless, there is also an out if you know what you're doing (for example, you were already zeroing your structs if you were re-using them; or you know that they are empty because you just made them fresh). The -skip-decode-struct-zeroing flag to msgp will turn off the automatic zeroing in the generated code.

So now the default is safer, all around.

erikdubbelboer · 2016-10-16T12:02:11Z

@glycerine do I understand correctly from your implementation that it only zeroes the top level struct? If so this would mean child structs with pointers to them wouldn't be reused as the pointers would be set to nil. The ffjson pull request I linked actually keeps track of which members have been set an only zeroes members not present in the input.

glycerine · 2016-10-16T19:20:56Z

track which members have been set and only zeroes members not present in the input

Hi @erikdubbelboer,

Fully recursive, this is difficult, implementation wise.

The Insane struct in particular presented difficulties when I tried not doing only top-level zeroing. The way the decode implementation inlines sub-elements makes things tricky. See here
https://github.com/glycerine/msgp/blob/tag_omitempty/_generated/def.go#L176
and the _generated/generated.go code derived from it.

I was assuming that if you are really tuning that much for performance, then you would have already embedded your nested struct directly, rather than using a pointer. Right? : )

Nonetheless, for convenience, I can see the utility of saving and reusing sub-structs pointed to by the top level struct. It's not difficult to preserve top level pointers (so I implemented it; the latest push now does this). But anything after the first level... Ugh. The inlining makes this... a mess. I invite you to try the implementation yourself. Perhaps you'll find a way where I found it difficult going. There's something to Rob Pike's love of implementation simplicity that is worthwhile.

Actually I was also assuming at first that there wasn't so much inlining (the encode side is simpler). Hence top level clearing with preserving of pointers should suffice. I wonder if there is a way to experiment with less inlining on the decode side... @philhofer is there a way to disable decode inlining?/was this a source of huge performance gain?

Remember we always the have the safety hatch: If the user wants finer control, they can always use the -skip-decode-struct-zeroing flag and prep their struct tree exactly as they like, before calling DecodeMsg/UnmarshalMsg.

glycerine · 2016-10-16T19:49:24Z

If I recall, the anonymous structs
https://github.com/glycerine/msgp/blob/tag_omitempty/_generated/def.go#L43
also gave difficulty.

glycerine · 2016-10-16T20:22:29Z

I read Philip's comments here #154 :

The only special feature of UnmarshalMsg and DecodeMsg (from a zero-alloc standpoint) is that they will use pre-existing fields in an object rather than allocating new ones. So, if you decode into the same object repeatedly, things like slices and maps won't be re-allocated on each decode; instead, they will be re-sized appropriately. In other words, mutable fields are simply mutated in-place.

So it looks like in-place merge, even of slices and maps, has been a long standing feature, to aim for less allocation.

Hence having the zero-ing of structs prior to decode sounds like a big break with traditional expectations, and perhaps not what we want.

philhofer · 2016-10-16T21:55:09Z

Right; zero-ing structs before decoding would be a (breaking) behavioral change. It's entirely possible that there are users out there who depend upon being able to decode into a struct full of non-zero 'default' values.

Plus, as you say, the zeroing (but not nulling-out) sub-structures requires a whole bunch more complexity in the code generator.

glycerine · 2016-10-16T23:53:39Z

Yep.

So I went back to the drawing board.

I speculate that pseudo code like

func (z *Mystruct) DecodeMsg() {
    var found [11]bool
    for i, field in range Mystruct {
         if field is present in msgpack bytes {
              found[i] = true
              ...deserialize into field i...
         }
   }
   z.zeroMissingFields(found[:]) // set only the missing fields to zero types
}

would work. I tried this. It works for 95% of the cases. Unfortunately, the anonymous types like TestType.Obj gum up the works:

type TestType struct {
    F   *float64          `msg:"float"`
    Els map[string]string `msg:"elements"`
    Obj struct {          // test anonymous struct                                  
        ValueA string `msg:"value_a"`
        ValueB []byte `msg:"value_b"`
    } `msg:"object"`
...
}

e.g. here:
https://github.com/glycerine/msgp/blob/tag_omitempty/_generated/def.go#L43

these anonymous types fly under the radar, so there's no way to define a zeroMissingFields method for them...

glycerine · 2016-10-17T07:24:09Z

Okay! I got something working. It keeps the existing stance towards zero allocation, and just treats empty fields (and only empty fields) as if they were nil on the wire. It is implemented for DecodeMsg only as of tonight; I still have yet to do the port to Unmarshal. Let me know if you like the approach before I do that work. It adds a mode to the reader where it only returns nil.

Here's the summary of the approach, which is implemented in the latest push https://github.com/glycerine/msgp/blob/tag_omitempty/gen/decode.go#L103:

/* func (d *decodeGen) structAsMap(s *Struct):

 // missing (empty) field handling logic, in almost-not-pseudo code:
 //
 // The secret to nil handling is to keep the logic
 // the same whether the field is missing or nil on
 // the wire. To do so we use the Reader.PushAlwaysNil()
 // method to tell the Reader to pretend to supply
 // only nils until further notice (which comes
 // from dc.PopAlwaysNil() emptying the LIFO).

    var fieldOrder = []string{"Name", "BirthDay"}
    const maxFields = 2

    var needNull uint32 = maxFields
    var nextNull int32 = -1

    var found [maxFields]bool
    var field []byte
    _ = field
    var sz uint32
    var curField string

    sz, err = dc.ReadMapHeader()
    if err != nil {
        return
    }
    needNull -= sz // compute number of nil fields we'll need to set.

done_with_struct:
    for sz > 0 || needNull > 0 {
        if sz > 0 {
            sz--
            field, err = dc.ReadMapKeyPtr()
            if err != nil {
                return
            }
            curField = msgp.UnsafeString(field)
        } else {
            //missing field handling
            if nextNull < 0 {
                dc.PushAlwaysNil()
                nextNull = 0
            }
            for nextNull < maxFields && found[nextNull] {
                nextNull++
            }
            if nextNull == maxFields {
                // filled all the empty fields!
                break done_with_struct
            }
            needNull--
            curField = fieldOrder[nextNull]
        }

        switch curField {
        case "Name":
            found[0] = true
            z.Name, err = dc.ReadString()
            if err != nil {
                return
            }
        case "BirthDay":
            found[1] = true
            z.BirthDay, err = dc.ReadTime()
            if err != nil {
                return
            }

        } // end switch curField

    } // end for
    if nextNull != -1 {
        dc.PopAlwaysNil()
    }
*/

glycerine · 2016-10-20T09:54:47Z

both DecodeMsg and UnmarshalMsg are now implemented with the latest approach.

glycerine · 2016-10-20T11:14:40Z

I wouldn't say it's ready to merge yet; still needs tests and performance comparison.

Also I wasn't sure how to handle decoding of a missing field into an Extension.

glycerine · 2016-10-22T10:02:47Z

With the latest update, both DecodeMsg and UnmarshalMsg work and are tested.

This implements the fully recursive, no-allocation merge, and so preserves backwards compatibility.

struct fields can be tagged with `msg:",omitempty"` Such fields will not be serialized if they contain their zero values. For structs with many optional fields, the space and time savings can be substantial. From a zero-alloc standpoint, UnmarshalMsg and DecodeMsg continue, as before, to re-use existing fields in an object rather than allocating new ones. So, if you decode into the same object repeatedly, things like slices maps, and fields that point to other structures won't be re-allocated. Instead, maps and fields will be re-sized appropriately. In other words, mutable fields are simply mutated in-place. Fixes tinylib#103

glycerine · 2016-10-23T18:03:16Z

This is now tested and working great.

I note
a) this is pretty big change;
b) I haven't gotten any feedback on it recently; and
c) I can understand if you don't want to merge it into mainline msgp. For those that don't need the feature, it just adds complexity to the read-paths.

For my aims, I need the omitempty feature to support my zebrapack experiment (prefixing messages with a schema, like gobs, but all in msgpack) to support deprecated fields, https://github.com/glycerine/zebra, so I think I'll just maintain a friendly fork of msgp for that purpose; unless I hear from you, @philhofer, that you'd really like to have it in mainline msgp.

glycerine · 2016-10-24T00:08:18Z

Closing this out for reasons in previous message. I've advance this line further than the above PR, so it should no longer be considered state of the art. Interested parties can diff against my fork https://github.com/glycerine/msgp.

glycerine · 2016-12-26T06:02:46Z

fyi my friendly fork https://github.com/glycerine/zebrapack (described https://github.com/glycerine/zebra ) now has full support for omitempty, as well as using schema, capnproto style field numbering, and flatbuffers style deprecation. There are some nice benchmarks available showing that with schema (int keys instead of strings in the maps) one can go 20-30% faster.

It is completely safe to use, as per @erikdubbelboer's suggestion: if structs are re-used their fields are zeroed upon encountering a missing field on the wire.

glycerine mentioned this pull request Oct 16, 2016

support the omitempty field tag convention #103

Closed

glycerine force-pushed the tag_omitempty branch 2 times, most recently from afddfba to d92a832 Compare October 16, 2016 07:54

glycerine force-pushed the tag_omitempty branch 3 times, most recently from e45a84e to fc5241d Compare October 16, 2016 08:40

glycerine force-pushed the tag_omitempty branch from d8d0f82 to ab01ea4 Compare October 16, 2016 19:10

glycerine force-pushed the tag_omitempty branch from ab01ea4 to daf4da5 Compare October 17, 2016 07:17

glycerine force-pushed the tag_omitempty branch from daf4da5 to 2ee78ce Compare October 20, 2016 09:52

glycerine force-pushed the tag_omitempty branch 2 times, most recently from 0017e71 to 7e780bd Compare October 22, 2016 09:59

glycerine force-pushed the tag_omitempty branch from 654b3a2 to 68f025a Compare October 23, 2016 17:56

vendor github.com/shurcooL/go-goon

16df849

glycerine closed this Oct 24, 2016

philhofer mentioned this pull request May 19, 2017

Remove blank arrays and false boolean variables from serialised output #177

Open

erikdubbelboer mentioned this pull request Oct 10, 2017

WIP: Add omitempty annotation support #206

Closed

bradleypeabody mentioned this pull request Oct 2, 2019

omitempty support #263

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

omitempty tags implemented. #164

omitempty tags implemented. #164

glycerine commented Oct 16, 2016 •

edited by philhofer

Loading

philhofer commented Oct 16, 2016

glycerine commented Oct 16, 2016

erikdubbelboer commented Oct 16, 2016 •

edited

Loading

glycerine commented Oct 16, 2016 •

edited

Loading

erikdubbelboer commented Oct 16, 2016

glycerine commented Oct 16, 2016

glycerine commented Oct 16, 2016

glycerine commented Oct 16, 2016

philhofer commented Oct 16, 2016 •

edited

Loading

glycerine commented Oct 16, 2016

glycerine commented Oct 17, 2016 •

edited

Loading

glycerine commented Oct 20, 2016

glycerine commented Oct 20, 2016

glycerine commented Oct 22, 2016 •

edited

Loading

glycerine commented Oct 23, 2016

glycerine commented Oct 24, 2016

glycerine commented Dec 26, 2016 •

edited

Loading

omitempty tags implemented. #164

omitempty tags implemented. #164

Conversation

glycerine commented Oct 16, 2016 • edited by philhofer Loading

philhofer commented Oct 16, 2016

glycerine commented Oct 16, 2016

erikdubbelboer commented Oct 16, 2016 • edited Loading

glycerine commented Oct 16, 2016 • edited Loading

erikdubbelboer commented Oct 16, 2016

glycerine commented Oct 16, 2016

glycerine commented Oct 16, 2016

glycerine commented Oct 16, 2016

philhofer commented Oct 16, 2016 • edited Loading

glycerine commented Oct 16, 2016

glycerine commented Oct 17, 2016 • edited Loading

glycerine commented Oct 20, 2016

glycerine commented Oct 20, 2016

glycerine commented Oct 22, 2016 • edited Loading

glycerine commented Oct 23, 2016

glycerine commented Oct 24, 2016

glycerine commented Dec 26, 2016 • edited Loading

glycerine commented Oct 16, 2016 •

edited by philhofer

Loading

erikdubbelboer commented Oct 16, 2016 •

edited

Loading

glycerine commented Oct 16, 2016 •

edited

Loading

philhofer commented Oct 16, 2016 •

edited

Loading

glycerine commented Oct 17, 2016 •

edited

Loading

glycerine commented Oct 22, 2016 •

edited

Loading

glycerine commented Dec 26, 2016 •

edited

Loading