Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BigQuery: Add load_table_from_json() method to BQ client #9070

Closed
wants to merge 1 commit into from

Conversation

plamut
Copy link
Contributor

@plamut plamut commented Aug 21, 2019

Closes #4553.

This PR adds a helper client method load_table_from_json().

How to test

Check that the method is similar to e.g. client.load_table_from_dataframe() and in line with the requirements.

P.S.: I will address the last issue comment separately once this is approved (replacing the _add_rows() helper in system tests is a separate refactoring task IMO).

@plamut plamut added the api: bigquery Issues related to the BigQuery API. label Aug 21, 2019
@plamut plamut requested a review from a team August 21, 2019 15:13
@googlebot googlebot added the cla: yes This human has signed the Contributor License Agreement. label Aug 21, 2019
Copy link
Contributor

@tswast tswast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yay!

A few nitpicky things.


if job_config is None:
job_config = job.LoadJobConfig()
job_config.source_format = job.SourceFormat.NEWLINE_DELIMITED_JSON
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm having second thoughts about modifying the input config object. See:

if job_config is None:
job_config = job.LoadJobConfig()
else:
# Make a copy so that the job config isn't modified in-place.
job_config_properties = copy.deepcopy(job_config._properties)
job_config = job.LoadJobConfig()
job_config._properties = job_config_properties

We might want to refactor that code into a helper function for creating or copying a LoadJobConfig.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, if there isn't a schema set, we should enable schema autodetect for them.

Copy link
Contributor Author

@plamut plamut Aug 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason for not simply deep-copying the entire LoadJobConfig instance instead of separately deep-copying the _properties attribute?

The only other private attribute seems to be the _job_type string, and __init__() (that would not get called) does nothing apart from setting some attributes through setters, which more or less manipulate the same _properties dict behind the scenes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One potential reason is that _properties can be used to send values to the API for Alpha releases that aren't yet supported in client libraries. Not sure if deep copy of LoadJobConfig would carry those over.

_properties is the real core of the object, since that's what defines what we end up actually including in the API request.

bigquery/tests/system.py Show resolved Hide resolved
@plamut
Copy link
Contributor Author

plamut commented Aug 22, 2019

Pushed new commits, but GitHub does not detect them. Also tried rebasing the branch, still no luck. Maybe closing and re-opening will work.

Update: The same happened on the replacement PR, but then the new changes to the latter showed up after a couple of minutes. Seems like a temporary GitHub service glitch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the BigQuery API. cla: yes This human has signed the Contributor License Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add functions equivalent to create_rows and create_rows_json that create a table for you using a load job
3 participants