Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: use proto-plus instead of _pb #332

Closed
wants to merge 1 commit into from

Conversation

cguardia
Copy link
Contributor

This is a proof of concept PR to fix #319. This almost works, but there are four tests failing. The failures are of two types:

  1. The message.from_json method, which is the likeliest successor to the json_format.ParseDict call used with the _pb object, does not allow us to pass in the "ignore_unknown_fields" flag to the parser, thus causing errors in tests that try to ignore fields.

  2. The json format created by the regular json library has some slight differences with the format generated by json_format, which causes tests trying to compare repr output to fail.

I don't like very much the use of the json library in this PR, and would prefer to avoid it if possible. I think perhaps making a PR to proto-plus to make the to_json and from_json methods more flexible would be a good idea, along with possibly adding to_dict and from_dict methods there, which seems to me could be generally helpful.

@cguardia cguardia requested a review from a team October 19, 2020 08:17
@google-cla google-cla bot added the cla: yes This human has signed the Contributor License Agreement. label Oct 19, 2020
@cguardia cguardia requested a review from tswast October 19, 2020 08:17
@cguardia cguardia added the do not merge Indicates a pull request not ready for merge, due to either quality or timing. label Oct 19, 2020
@tswast
Copy link
Contributor

tswast commented Oct 19, 2020

I've filed googleapis/proto-plus-python#153 and googleapis/proto-plus-python#152 which will allow us to ignore unknown fields and avoid the use of the json module.

this._proto = json_format.ParseDict(
resource, types.Model()._pb, ignore_unknown_fields=True
)
this._proto = types.Model.from_json(json.dumps(resource))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we pass resource to the Model constructor here?

googleapis/proto-plus-python#152 (comment)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll still want googleapis/proto-plus-python#153 to support ignore_unknown_fields, though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I already had tried that during my testing. It gives a KeyError on the metaclass during __init__, right here:

https://github.com/googleapis/proto-plus-python/blob/master/proto/message.py#L425

I was using this code:

this._proto = types.Model(resource)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be able to update this to types.Model(resource, ignore_unknown_fields=True) now.

@tswast
Copy link
Contributor

tswast commented Oct 20, 2020

googleapis/proto-plus-python#154 adds ignore_unknown_fields to both the Model constructor and a to_dict method. Hopefully that means this PR won't be blocked by proto-plus once they make a release.

@cguardia
Copy link
Contributor Author

They already released. Still getting some test errors here, but I think they are related to assumptions about how the pb json_format conversion worked. Hopefully will have a working PR at some point tonight.

@cguardia
Copy link
Contributor Author

@tswast Speaking about hidden json_format pb conversion behavior: somewhere along the line it converted key names to camelCase automatically, so many checks we have on resources fail due to the ids now using snake_case and the manually constructed test resources using camelCase. In some cases, the camelCase names are part of the main code, like in job and client modules. Dov said this has been a problem elsewhere. I'm still not familiar enough with the bigquery codebase to be completely sure which way to go here: my instinct is to convert everything to snake_case, what do you think?

@tswast
Copy link
Contributor

tswast commented Oct 21, 2020

json_format pb conversion behavior: somewhere along the line it converted key names to camelCase automatically, so many checks we have on resources fail due to the ids now using snake_case and the manually constructed test resources using camelCase.

@cguardia You mean it's taking the camelCase keys we set in tests and converting them to snake_case? I suspect the API will accept either in most cases, since the language guide says

Parsers accept both the lowerCamelCase name (or the one specified by the json_name option) and the original proto field name.

https://developers.google.com/protocol-buffers/docs/proto3#json

Does the same behavior happen with to_dict?

@cguardia
Copy link
Contributor Author

Not using JSON anymore. The tests are expecting CamelCase, but the calls to the constructor and to_dict are returning snake_case. I was thinking of getting rid of camelCase everywhere, but unsure if that would affect something else.

@tswast
Copy link
Contributor

tswast commented Oct 21, 2020

calls to the constructor and to_dict are returning snake_case. I was thinking of getting rid of camelCase everywhere, but unsure if that would affect something else.

This seems like a safe operation to me. We aren't ever sending that JSON representation to the API, so it shouldn't even affect that.

@product-auto-label product-auto-label bot added the api: bigquery Issues related to the googleapis/python-bigquery API. label Oct 22, 2020
@cguardia
Copy link
Contributor Author

Well, this has been quite the rabbit hole. Turns out not only the tests, but some of the code as well, seem to be counting on the automatic case conversion, so that a lot of code would need to be changed to get rid of that. The json_format call also performs some data conversion, which again is expected to happen in some of the code. I believe we have three options:

  1. Bite the bullet and go through all the code and tests until this is right. This is complicated by some API calls that want camelCase arguments, so each change has to be carefully done to avoid affecting those.
  2. Write some helper code to do the case conversion on our side and handle this as an API requirement.
  3. Use json.loads and json.dumps in combination with the Model.to_json code to get the conversion done like it was done before (mostly the current code with a few additions).

@tswast what do you think?

@tswast
Copy link
Contributor

tswast commented Oct 26, 2020

  1. and 3. sound like asking for trouble performance-wise. Those would require several passes over the data in Python code to get things in the structure we want.

Bite the bullet and go through all the code and tests until this is right.

I don't recall what code relies on this. Could you send a pointer to the line on GitHub? Are there really all that many tests for this? I recall that all of the properties we use _proto for are read-only.

@cguardia
Copy link
Contributor Author

I'm going to close this one, and open another with the current approach.

@cguardia cguardia closed this Oct 28, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery API. cla: yes This human has signed the Contributor License Agreement. do not merge Indicates a pull request not ready for merge, due to either quality or timing.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Breaking Change: update model.py classes to use proto-plus instead of _pb
2 participants