-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: Add protobuf builtins and column type #47534
Comments
Wouldn't technically be my first issue, but I'd like to try this if that's ok. Will probably take the builtin+native type approach as I've spent time in builtin before. Looks like for the If you know of any particular existing types off of which I might want to model my work, that'd be extremely helpful. |
I'd suggest avoid adding a new type initially as it triggers a lot of tests to break because it assumes the type can be encoded to, decoded from, etc. dumpable, printable, etc... You can follow along these PRs (#47171) (#42481) to see it's a little....much. Maybe later as an extension (but it probably deserves an RFC or at least a one-pager describing format, encoding, display, dumping, etc). For the builtin approach, |
As far as a test case, the one that made me wish we had this would be something like:
where '\x...' is the bytes for the (proto message) descriptor for our |
Actually, after refreshing my memory of how the proto descriptor stuff works, I guess we might need the whole encoded FileDescriptorSet, not just the one message descriptor, since it could embed other messages? I guess that would mean we'd need another arg as well -- the name of the message type within that set? I haven't played with this recently enough to be sure though. |
Where do the Adding a PROTO type seems like a much nicer long term solution. We're laying the groundwork for custom types in 20.2 with Enum. I suppose we should take care to make that support general. |
Yeah, I agree the native type sounds much better, but the built-ins seems like a good place to start? I think the built-in wouldn't be that bad when used in an app if that app has that proto linked in? I think it'd look something like eventProto, _ := EventDetail{}.Descriptor()
...
res, err := db.Query(
`SELECT ... FROM events WHERE proto_decode(details, $1)->category = ...`, eventProto,
) Interactive shell would be a bit clunkier, but even then, I could still see it being better than nothing: I can ask You could also imagine making a table But yeah, I think the ideal is a native column type that is parameterized with the file desc set and expected message name so you can set once and then just interact with it seamlessly. |
Summarizing some discussions elsewhere: It would be good if this got us closer to building an index on the descriptor type in |
Sorry for the delay in updating, but I don't think I'll be able to get to this in a timely fashion. In case anyone else was looking to jump in, I don't want to block them. |
52169: sql: Implement crdb_internal.pb_to_json function r=miretskiy a=miretskiy Informs #47534 `crdb_internal.pb_to_json` function can convert any protocol message linked into our binary to an JSONB object. The function take 2 arguments: the fully qualified name of the protocol message, and the encoded bytes for that message. The returned JSONB object can then be queried, using regular JSONB primitives. ``` SELECT crdb_internal.pb_to_json('cockroach.sql.sqlbase.Descriptor', descriptor)->'Union'->'Table' FROM system.descriptor where id=15; ``` Release Notes: None Co-authored-by: Yevgeniy Miretskiy <yevgeniy@cockroachlabs.com>
To make sure I understand the issue as it currently stands, PR #52169 created the builtin functions, but cockroach still is not able to natively decode and encode proto messages stored in bytes columns given a message descriptor containing a native proto column type. This would the next step in tackling this issue, correct? |
The current builtin functions are internal-use only: they only allow de/encoding bytes from/to any proto message types that are directly compiled into CockroachDB itself, where we use protobuf messages defined in the CockroachDB codebase to store things in system tables like jobs, schema descriptors, etc. I think this feature request is more about allowing a user-supplied message descriptor to be used to marshal or unmarshal messages for user-supplied message types, not just the built-in descriptors / message types. Once you have that facility, you could go further and persist a user-supplied message descriptor somewhere, either where they can then be used by name or potentially in a table schema along side a bytes column or in a user-defined type, so that columns of that type get the automatic ser/de treatment? Either way, we haven't yet added functions that allow interacting with anything other than pre-defined system-internal proto message types. |
@dt I've made some progress on this and have the results of it in a separate repository https://github.com/cpustejovsky/filedescriptorjson It is currently separated because I wasn't sure how to get a test It seems like using a It should be possible to use the new google/protobuf package to do this and return a |
Part of cockroachdb#47534 Cockroachdb cannot currently marshal a protobuf binary into JSON based on a provided FileDescriptor. The newest version of the `google.golang.org/protobuf` package allows for this functionality. gogo/protobuf does not. Since cockroachdb uses gogo/protobuf for its better performance, I imported `google.golang.org/protobuf` under an alias. Using that libary, I created `NewJSONMessageFromFileDescriptorn` to take a protobuf binary and a FileDescriptorand return JSON. To handle linting errors, temporary wrapper functions `TODOMarshal` and `TODOUnmarshal`; these will need to be renamed if this workaround is acceptable Release note: none
@dt I've written a PR (#116262) to address the first step. I sorted out the problems I was having earlier around testing, but to handle linting errors, I created wrapper functions I marked them as TODO because I wasn't sure if this solution was acceptable and, if it is, what the names should be to properly distinguish them from the |
Exciting stuff @cpustejovsky ! Let me ask around internally to see if I can find someone closer to this area of the codebase to work with you on answering some of those questions. |
@dt were you able to find someone? |
Hi @cpustejovsky, I'm so sorry this slipped off my queue while I was out for a couple weeks over the holidays and I dropped the ball here. Let me ask around again and get back to you, and sorry again about the delay! |
That's what I figured! No worries |
@dt It looks like there won't be updates to this issue any time soon based on this this comment:
|
Informs: cockroachdb#47534 Cockroachdb cannot currently marshal a protobuf binary into JSON based on a provided FileDescriptor. The newest version of the `google.golang.org/protobuf` package allows for this functionality. gogo/protobuf does not. Since cockroachdb uses gogo/protobuf for its better performance, I imported `google.golang.org/protobuf` under an alias. Using that libary, I created `NewJSONMessageFromFileDescriptor` to take a protobuf binary and a FileDescriptorand return JSON. To handle linting errors, temporary wrapper functions `TODOMarshal` and `TODOUnmarshal`; these will need to be renamed if this workaround is acceptable Release note: none
Informs: cockroachdb#47534 Cockroachdb cannot currently marshal a protobuf binary into JSON based on a provided FileDescriptor. The newest version of the `google.golang.org/protobuf` package allows for this functionality. gogo/protobuf does not. Since cockroachdb uses gogo/protobuf for its better performance, I imported `google.golang.org/protobuf` under an alias. Using that libary, I created `NewJSONMessageFromFileDescriptor` to take a protobuf binary and a FileDescriptorand return JSON. To handle linting errors, I created temporary functions `TODOMarshal` and `TODOUnmarshal` that wrap the later protobuf package; these may need to be renamed if this workaround is acceptable Release note: none
With the built-in "protobuf" type, the protobuf data which is stored in the database can be rendered as a human-readable format directly. It can also be queried and indexed directly. The original issue is cockroachdb#47534. Fixes: None Release note: None
We currently store encoded protobuf messages in
bytes
columns in many cases, as they neatly handle collecting and encoding system-specific fields. However currently this has the downside of making that column totally opaque to SQL interactions -- in order to manually read a given field, or even harder, manipulate one, we end up copy-pasting those bytes outprotoc
or custom-written one-off programs. This complicates debugging during incidents (when we often want to peak into system tables which contain encoded protos in ways that sometimes don't have pre-written debugging tools to expose them) and is just generally source of friction.Instead, it would be nice if cockroach could natively decode, and ideally also encode, proto messages stored in
bytes
columns. Obviously this requires that cockroach have the message descriptor.One quick win here could be a pair of builtin functions:
proto_decode(msg bytes, desc bytes) -> jsonb
andproto_encode(msg json, desc bytes) -> bytes
which translate bytes to/from json, using the supplied message descriptor.Even nicer though would be to have a native
proto
column type, which is parameterized with a supplied message descriptor. With this persisted in the TableDescriptor, the column could then be rendered as JSON natively in SQL transparently, without the cruft of the builtins or constantly re-supplying the message descriptor.Ideally we'd also extend our
ALTER .. SET TYPE
support to include updating the persisted message descriptor, and maybe even allow upgrading abytes
column to/from aproto
column in place (since this would just be changing the type and desc in the metadata, as the on-disk data is the same either way).The builtin approach is likely easier, and even complementary with the native type, as it lays the groundwork and provides a compatibility bridge for those, like us, with existing bytes columns.
Jira issue: CRDB-4401
The text was updated successfully, but these errors were encountered: