Skip to content

Commit

Permalink
Fixed bug when parsing an unknown value in a proto2 enum extension. #…
Browse files Browse the repository at this point in the history
…fuzzing

Proto2 enum parsing is the only case where we have to look at the wire value (not merely the tag) to decide whether the field is known or unknown.  If the value is unknown, we need to put the value in the Unknown Fields, but for an extension we no longer have easy access to the message, because for extensions we replace the `msg` pointer with a pointer to the extension.  The bug occurred when we were treating the fake `upb_Message*` (which was actually a pointer to an extension) as a real `upb_Message*` that can have unknown fields.

This CL fixes the problem by preserving the true message pointer in `d->unknown_msg` when we are parsing an extension.

This also required fixing a bug in MiniTable building when fasttables are enabled.  We need to set the table_mask to `-1` to disable fasttable parsing, not `0`.

For unknown reasons, this CL appears to speed up parsing somewhat significantly.  Ideally we should be tracking parsing performance better over time, as it is possible this is merely regaining performance that was lost at a different time:

```
benchy --reference=srcfs third_party/upb/benchmarks:benchmark
 10 / 10 [=================================================================================================================] 100.00% 2m32s
(Generated by http://go/benchy. Settings: --runs 5 --reference "srcfs")

name                                           old cpu/op  new cpu/op  delta
BM_ArenaOneAlloc                                 23.9ns ± 6%  23.7ns ± 4%    ~     (p=0.180 n=53+51)
BM_ArenaInitialBlockOneAlloc                     7.62ns ± 4%  7.70ns ± 5%  +0.99%  (p=0.024 n=59+60)
BM_LoadAdsDescriptor_Upb<NoLayout>               6.60ms ±10%  6.57ms ± 8%    ~     (p=0.607 n=47+54)
BM_LoadAdsDescriptor_Upb<WithLayout>             6.92ms ± 5%  6.88ms ± 8%    ~     (p=0.257 n=54+54)
BM_LoadAdsDescriptor_Proto2<NoLayout>            14.2ms ± 8%  14.0ms ± 7%  -1.38%  (p=0.025 n=58+59)
BM_LoadAdsDescriptor_Proto2<WithLayout>          14.3ms ± 8%  14.2ms ± 8%  -1.16%  (p=0.031 n=58+57)
BM_Parse_Upb_FileDesc<UseArena, Copy>            15.9µs ± 4%  14.6µs ± 4%  -7.85%  (p=0.000 n=57+59)
BM_Parse_Upb_FileDesc<UseArena, Alias>           14.5µs ± 4%  13.3µs ± 5%  -8.50%  (p=0.000 n=57+60)
BM_Parse_Upb_FileDesc<InitBlock, Copy>           15.7µs ± 4%  14.4µs ± 5%  -7.99%  (p=0.000 n=59+60)
BM_Parse_Upb_FileDesc<InitBlock, Alias>          14.2µs ± 5%  13.0µs ± 4%  -8.56%  (p=0.000 n=57+58)
BM_Parse_Proto2<FileDesc, NoArena, Copy>         26.3µs ± 4%  26.2µs ± 4%    ~     (p=0.195 n=55+53)
BM_Parse_Proto2<FileDesc, UseArena, Copy>        13.3µs ± 5%  13.2µs ± 4%    ~     (p=0.085 n=59+59)
BM_Parse_Proto2<FileDesc, InitBlock, Copy>       12.9µs ± 4%  12.8µs ± 3%  -0.66%  (p=0.023 n=60+58)
BM_Parse_Proto2<FileDescSV, InitBlock, Alias>    10.9µs ± 6%  10.9µs ± 4%    ~     (p=0.063 n=59+58)
BM_SerializeDescriptor_Proto2                    7.57µs ± 6%  7.62µs ± 6%    ~     (p=0.147 n=57+58)
BM_SerializeDescriptor_Upb                       12.8µs ± 4%  12.8µs ± 4%    ~     (p=0.163 n=59+56)

name                                           old time/op             new time/op             delta
BM_ArenaOneAlloc                                 23.9ns ± 5%             23.7ns ± 4%    ~           (p=0.172 n=53+51)
BM_ArenaInitialBlockOneAlloc                     7.62ns ± 4%             7.70ns ± 5%  +1.02%        (p=0.017 n=59+60)
BM_LoadAdsDescriptor_Upb<NoLayout>               6.60ms ±10%             6.58ms ± 8%    ~           (p=0.727 n=47+55)
BM_LoadAdsDescriptor_Upb<WithLayout>             6.92ms ± 5%             6.88ms ± 8%    ~           (p=0.260 n=54+54)
BM_LoadAdsDescriptor_Proto2<NoLayout>            14.2ms ± 7%             14.0ms ± 7%  -1.40%        (p=0.019 n=58+59)
BM_LoadAdsDescriptor_Proto2<WithLayout>          14.3ms ± 8%             14.2ms ± 8%  -1.13%        (p=0.037 n=58+57)
BM_Parse_Upb_FileDesc<UseArena, Copy>            15.9µs ± 4%             14.6µs ± 3%  -7.88%        (p=0.000 n=57+59)
BM_Parse_Upb_FileDesc<UseArena, Alias>           14.5µs ± 4%             13.3µs ± 5%  -8.46%        (p=0.000 n=57+60)
BM_Parse_Upb_FileDesc<InitBlock, Copy>           15.7µs ± 4%             14.4µs ± 5%  -7.99%        (p=0.000 n=59+60)
BM_Parse_Upb_FileDesc<InitBlock, Alias>          14.2µs ± 5%             13.0µs ± 4%  -8.56%        (p=0.000 n=57+58)
BM_Parse_Proto2<FileDesc, NoArena, Copy>         26.3µs ± 4%             26.2µs ± 4%    ~           (p=0.224 n=55+53)
BM_Parse_Proto2<FileDesc, UseArena, Copy>        13.3µs ± 5%             13.2µs ± 4%    ~           (p=0.098 n=59+59)
BM_Parse_Proto2<FileDesc, InitBlock, Copy>       12.9µs ± 4%             12.8µs ± 3%  -0.68%        (p=0.015 n=60+58)
BM_Parse_Proto2<FileDescSV, InitBlock, Alias>    10.9µs ± 6%             10.9µs ± 4%    ~           (p=0.052 n=59+58)
BM_SerializeDescriptor_Proto2                    7.56µs ± 6%             7.62µs ± 6%    ~           (p=0.111 n=58+58)
BM_SerializeDescriptor_Upb                       12.8µs ± 4%             12.8µs ± 4%    ~           (p=0.241 n=56+56)

name                                           old allocs/op           new allocs/op           delta
BM_ArenaOneAlloc                                   1.00 ± 0%               1.00 ± 0%    ~     (all samples are equal)
BM_ArenaInitialBlockOneAlloc                       0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_LoadAdsDescriptor_Upb<NoLayout>                5.98k ± 0%              5.98k ± 0%    ~     (all samples are equal)
BM_LoadAdsDescriptor_Upb<WithLayout>              5.98k ± 0%              5.98k ± 0%    ~     (all samples are equal)
BM_LoadAdsDescriptor_Proto2<NoLayout>             80.9k ± 0%              80.9k ± 0%    ~     (all samples are equal)
BM_LoadAdsDescriptor_Proto2<WithLayout>           82.1k ± 0%              82.1k ± 0%    ~     (all samples are equal)
BM_Parse_Upb_FileDesc<UseArena, Copy>              7.00 ± 0%               7.00 ± 0%    ~     (all samples are equal)
BM_Parse_Upb_FileDesc<UseArena, Alias>             7.00 ± 0%               7.00 ± 0%    ~     (all samples are equal)
BM_Parse_Upb_FileDesc<InitBlock, Copy>             0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_Parse_Upb_FileDesc<InitBlock, Alias>            0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_Parse_Proto2<FileDesc, NoArena, Copy>            765 ± 0%                765 ± 0%    ~     (all samples are equal)
BM_Parse_Proto2<FileDesc, UseArena, Copy>          9.00 ± 0%               9.00 ± 0%    ~     (all samples are equal)
BM_Parse_Proto2<FileDesc, InitBlock, Copy>         0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_Parse_Proto2<FileDescSV, InitBlock, Alias>      0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_SerializeDescriptor_Proto2                      0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_SerializeDescriptor_Upb                         0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)

name                                           old peak-mem(Bytes)/op  new peak-mem(Bytes)/op  delta
BM_ArenaOneAlloc                                    344 ± 0%                344 ± 0%    ~     (all samples are equal)
BM_ArenaInitialBlockOneAlloc                       0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_LoadAdsDescriptor_Upb<NoLayout>                9.60M ± 0%              9.60M ± 0%    ~     (all samples are equal)
BM_LoadAdsDescriptor_Upb<WithLayout>              9.68M ± 0%              9.68M ± 0%    ~     (all samples are equal)
BM_LoadAdsDescriptor_Proto2<NoLayout>             6.41M ± 0%              6.41M ± 0%    ~     (all samples are equal)
BM_LoadAdsDescriptor_Proto2<WithLayout>           6.44M ± 0%              6.44M ± 0%    ~     (all samples are equal)
BM_Parse_Upb_FileDesc<UseArena, Copy>             36.5k ± 0%              36.5k ± 0%    ~     (all samples are equal)
BM_Parse_Upb_FileDesc<UseArena, Alias>            36.5k ± 0%              36.5k ± 0%    ~     (all samples are equal)
BM_Parse_Upb_FileDesc<InitBlock, Copy>             0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_Parse_Upb_FileDesc<InitBlock, Alias>            0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_Parse_Proto2<FileDesc, NoArena, Copy>          35.8k ± 0%              35.8k ± 0%    ~     (all samples are equal)
BM_Parse_Proto2<FileDesc, UseArena, Copy>         40.7k ± 0%              40.7k ± 0%    ~     (all samples are equal)
BM_Parse_Proto2<FileDesc, InitBlock, Copy>         0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_Parse_Proto2<FileDescSV, InitBlock, Alias>      0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_SerializeDescriptor_Proto2                      0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_SerializeDescriptor_Upb                         0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)

name                                           old speed               new speed               delta
BM_LoadAdsDescriptor_Upb<NoLayout>              113MB/s ± 9%            113MB/s ± 8%    ~           (p=0.712 n=47+55)
BM_LoadAdsDescriptor_Upb<WithLayout>            107MB/s ± 8%            108MB/s ± 8%    ~           (p=0.200 n=55+54)
BM_LoadAdsDescriptor_Proto2<NoLayout>          52.5MB/s ± 8%           53.3MB/s ± 7%  +1.51%        (p=0.018 n=59+59)
BM_LoadAdsDescriptor_Proto2<WithLayout>        51.9MB/s ± 7%           52.4MB/s ± 8%  +1.01%        (p=0.050 n=58+58)
BM_Parse_Upb_FileDesc<UseArena, Copy>           473MB/s ± 4%            514MB/s ± 4%  +8.52%        (p=0.000 n=57+59)
BM_Parse_Upb_FileDesc<UseArena, Alias>          518MB/s ± 4%            566MB/s ± 5%  +9.30%        (p=0.000 n=57+60)
BM_Parse_Upb_FileDesc<InitBlock, Copy>          480MB/s ± 4%            521MB/s ± 5%  +8.69%        (p=0.000 n=59+60)
BM_Parse_Upb_FileDesc<InitBlock, Alias>         528MB/s ± 4%            578MB/s ± 4%  +9.36%        (p=0.000 n=57+58)
BM_Parse_Proto2<FileDesc, NoArena, Copy>        286MB/s ± 4%            287MB/s ± 4%    ~           (p=0.195 n=55+53)
BM_Parse_Proto2<FileDesc, UseArena, Copy>       566MB/s ± 5%            570MB/s ± 4%    ~           (p=0.085 n=59+59)
BM_Parse_Proto2<FileDesc, InitBlock, Copy>      583MB/s ± 5%            587MB/s ± 3%  +0.64%        (p=0.023 n=60+58)
BM_Parse_Proto2<FileDescSV, InitBlock, Alias>   688MB/s ± 6%            693MB/s ± 4%    ~           (p=0.063 n=59+58)
BM_SerializeDescriptor_Proto2                   995MB/s ± 6%            988MB/s ± 5%    ~           (p=0.147 n=57+58)
BM_SerializeDescriptor_Upb                      586MB/s ± 4%            589MB/s ± 4%    ~           (p=0.163 n=59+56)
```

PiperOrigin-RevId: 462022073
  • Loading branch information
haberman authored and copybara-github committed Jul 20, 2022
1 parent 48d6764 commit ececc21
Show file tree
Hide file tree
Showing 4 changed files with 21 additions and 9 deletions.
8 changes: 5 additions & 3 deletions upb/decode.c
Original file line number Diff line number Diff line change
Expand Up @@ -421,7 +421,9 @@ static bool decode_checkenum_slow(upb_Decoder* d, const char* ptr,
// For packed fields the tag could be arbitrarily far in the past, so we
// just re-encode the tag and value here.
uint32_t tag = ((uint32_t)field->number << 3) | kUpb_WireType_Varint;
upb_Decode_AddUnknownVarints(d, msg, tag, v);
upb_Message* unknown_msg =
field->mode & kUpb_LabelFlags_IsExtension ? d->unknown_msg : msg;
upb_Decode_AddUnknownVarints(d, unknown_msg, tag, v);
return false;
}

Expand Down Expand Up @@ -1008,6 +1010,7 @@ static const char* decode_known(upb_Decoder* d, const char* ptr,
upb_Message_Extension* ext =
_upb_Message_GetOrCreateExtension(msg, ext_layout, &d->arena);
if (UPB_UNLIKELY(!ext)) return decode_err(d, kUpb_DecodeStatus_OutOfMemory);
d->unknown_msg = msg;
msg = &ext->data;
subs = &ext->ext->sub;
}
Expand Down Expand Up @@ -1073,7 +1076,6 @@ static const char* decode_unknown(upb_Decoder* d, const char* ptr,
d->unknown_msg = msg;
ptr = decode_group(d, ptr, NULL, NULL, field_number);
start = d->unknown;
d->unknown_msg = NULL;
d->unknown = NULL;
}
if (!_upb_Message_AddUnknown(msg, start, ptr - start, &d->arena)) {
Expand Down Expand Up @@ -1189,7 +1191,7 @@ upb_DecodeStatus upb_Decode(const char* buf, size_t size, void* msg,

state.extreg = extreg;
state.limit_ptr = state.end;
state.unknown_msg = NULL;
state.unknown = NULL;
state.depth = depth ? depth : 64;
state.end_group = DECODE_NOGROUP;
state.options = (uint16_t)options;
Expand Down
6 changes: 3 additions & 3 deletions upb/internal/decode.h
Original file line number Diff line number Diff line change
Expand Up @@ -46,8 +46,8 @@
typedef struct upb_Decoder {
const char* end; /* Can read up to 16 bytes slop beyond this. */
const char* limit_ptr; /* = end + UPB_MIN(limit, 0) */
upb_Message* unknown_msg; /* If non-NULL, add unknown data at buffer flip. */
const char* unknown; /* Start of unknown data. */
upb_Message* unknown_msg; /* Used for preserving unknown data. */
const char* unknown; /* Start of unknown data, preserve at buffer flip. */
const upb_ExtensionRegistry*
extreg; /* For looking up extensions during the parse. */
int limit; /* Submessage limit relative to end. */
Expand Down Expand Up @@ -120,7 +120,7 @@ const char* decode_isdonefallback_inl(upb_Decoder* d, const char* ptr,
if (overrun < d->limit) {
/* Need to copy remaining data into patch buffer. */
UPB_ASSERT(overrun < 16);
if (d->unknown_msg) {
if (d->unknown) {
if (!_upb_Message_AddUnknown(d->unknown_msg, d->unknown, ptr - d->unknown,
&d->arena)) {
*status = kUpb_DecodeStatus_OutOfMemory;
Expand Down
6 changes: 3 additions & 3 deletions upb/mini_table.c
Original file line number Diff line number Diff line change
Expand Up @@ -963,7 +963,7 @@ upb_MiniTable* upb_MiniTable_BuildWithBuf(const char* data, size_t len,
decoder.table->field_count = 0;
decoder.table->ext = kUpb_ExtMode_NonExtendable;
decoder.table->dense_below = 0;
decoder.table->table_mask = 0;
decoder.table->table_mask = -1;
decoder.table->required_count = 0;

upb_MtDecoder_ParseMessage(&decoder, data, len);
Expand All @@ -986,7 +986,7 @@ upb_MiniTable* upb_MiniTable_BuildMessageSet(upb_MiniTablePlatform platform,
ret->field_count = 0;
ret->ext = kUpb_ExtMode_IsMessageSet;
ret->dense_below = 0;
ret->table_mask = 0;
ret->table_mask = -1;
ret->required_count = 0;
return ret;
}
Expand Down Expand Up @@ -1027,7 +1027,7 @@ upb_MiniTable* upb_MiniTable_BuildMapEntry(upb_FieldType key_type,
ret->field_count = 2;
ret->ext = kUpb_ExtMode_NonExtendable | kUpb_ExtMode_IsMapEntry;
ret->dense_below = 2;
ret->table_mask = 0;
ret->table_mask = -1;
ret->required_count = 0;
ret->subs = subs;
ret->fields = fields;
Expand Down
10 changes: 10 additions & 0 deletions upb/msg_test.cc
Original file line number Diff line number Diff line change
Expand Up @@ -515,4 +515,14 @@ TEST(MessageTest, MapField) {
// }
// FUZZ_TEST(FuzzTest, DecodeEncodeArbitrarySchemaAndPayload);
//
// TEST(FuzzTest, DecodeEncodeArbitrarySchemaAndPayloadRegression) {
// DecodeEncodeArbitrarySchemaAndPayload(
// {{"\256\354Rt\216\3271\234", "\243\243\267\207\336gV\366w"},
// {"z"},
// "}\212\304d\371\363\341\2329\325B\264\377?\215\223\201\201\226y\201%"
// "\321\363\255;",
// {}},
// "\010", -724543908, -591643538);
// }
//
// end:google_only

0 comments on commit ececc21

Please sign in to comment.