-
Notifications
You must be signed in to change notification settings - Fork 264
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Filling out the functionality of defs. #426
Conversation
Biggest/key changes: 1. Defs are now nested per the .proto file syntax. 2. Options are parsed and vended.
upb/def.c
Outdated
return c >= low && c <= high; | ||
} | ||
|
||
static bool upb_isletter(char c) { | ||
return upb_isbetween(c, 'A', 'Z') || upb_isbetween(c, 'a', 'z') || c == '_'; | ||
return upb_isbetween(c | 0x20, 'a', 'z') || c == '_'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had to look up the ascii table to realize that | 0x20
is equivalent to lowercasing a letter. I think that c | ('a' - 'A')
is a bit more obvious, but perhaps just a comment would be best.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
upb/def.c
Outdated
} | ||
|
||
const upb_fielddef *upb_msgdef_nestedext(const upb_msgdef *m, int i) { | ||
UPB_ASSERT(i >= 0 && i < m->nested_ext_count); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
below you are consistent with 0 <= i && i < max
, so I think you should update these as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -978,7 +1272,7 @@ const upb_fielddef *upb_symtab_lookupext2(const upb_symtab *s, const char *name, | |||
return unpack_def(v, UPB_DEFTYPE_FIELD); | |||
case UPB_DEFTYPE_MSG: { | |||
const upb_msgdef *m = unpack_def(v, UPB_DEFTYPE_MSG); | |||
return m->message_set_ext; /* May be NULL if not in MessageeSet. */ | |||
return m->in_message_set ? &m->nested_exts[0] : NULL; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't &m->nested_exts[0]
equivalent to m->nested_exts
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes but I find &m->nested_exts[0]
more semantically clear here; we are returning the address of the first thing in an array. This mirrors the usage in eg. upb_msgdef_nestedext()
.
} | ||
} | ||
|
||
const char *last_dot = strrchr(name, '.'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I always worry about string apis that take a const char* and don't have a size associated with it... feels like a buffer overflow waiting to happen
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I definitely avoid APIs like strcpy()
that have a precondition that a target buffer is big enough. But strrchr()
is no more dangerous than strlen()
: the only danger is that the buffer isn't NULL-terminated, which will show up quickly in tests.
upb/def.c
Outdated
@@ -1480,6 +1842,9 @@ static char* makejsonname(symtab_addctx *ctx, const char* name) { | |||
return json_name; | |||
} | |||
|
|||
/* Adds a symbol to the symtab. The def's pointer to upb_filedef* must be set |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what def? is this ctx->def
or something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a comment, v
is a packed def pointer.
upb/def.c
Outdated
|
||
// Resolve subdef by type name, if necessary. | ||
switch ((int)f->type_) { | ||
case 0: { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does 0
not have a better named constant here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
upb/def.c
Outdated
|
||
if (!ctx.arena) { | ||
if (!ctx.arena && !ctx.tmp_arena) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't this be ||
not &&
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes good catch.
@@ -263,6 +263,38 @@ bool upb_strtable_resize(upb_strtable *t, size_t size_lg2, upb_arena *a); | |||
|
|||
/* Iterators ******************************************************************/ | |||
|
|||
/* New-style iterators. Much simpler, iterator state is held in size_t. | |||
* | |||
* intptr_t iter = UPB_INTTABLE_BEGIN; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment says state is a size_t
but code uses intptr_t
.
Also the "new" and "much simpler" don't really add much to the comment (and will become out of date as soon as these have settled for a while).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
upbc/protoc-gen-upb.cc
Outdated
// proto (descriptor.proto) so we don't worry about it. | ||
const protobuf::Descriptor* max32 = nullptr; | ||
const protobuf::Descriptor* max64 = nullptr; | ||
for (auto message : this_file_messages) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
auto*
in general the const
and *
tokens are very high value for readers
This function was introduced in #426 but it appears it was never used. I am not sure what the purpose was, but in any case it is not needed. With this function removed, we no longer need to tag pointers for the DefPool "files" table. PiperOrigin-RevId: 470565730
This function was introduced in #426 but it appears it was never used. I am not sure what the purpose was, but in any case it is not needed. With this function removed, we no longer need to tag pointers for the DefPool "files" table. PiperOrigin-RevId: 470567000
This is a large-ish PR that fills in several previously missing features from upb defs.
The major changes here are:
google_protobuf_MessageOptions
forupb_msgdef
, etc)..proto
files. upb used to store all defs in a flat array per file for simplicity. However existing APIs like Python expose the nested structure of.proto
files, and require (for example) that you can easily iterate over all of the messages nested inside a given message.There are a few other more minor changes here:
json_name
duplicates if a special flag is set on the symtab. This is very inadvisable, since it makes JSON input ambiguous, but some existing code depends on this behavior.upb_symtab_add()
operation if there is an error while building the defs.This unfortunately regresses the ads descriptor loading benchmark a bit (understandable, as options support does require fundamentally more work than we were doing before). We will have to try reclaiming this later. Code size also grows significantly in
upb/def.c
: this is an expected result of the new functionality.