Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add DML samples for BigQuery. #546

Merged
merged 7 commits into from
Sep 28, 2016
Merged

Add DML samples for BigQuery. #546

merged 7 commits into from
Sep 28, 2016

Conversation

tswast
Copy link
Contributor

@tswast tswast commented Sep 28, 2016

No description provided.

This generates a random MySQL database (tables defined in
create_sample_db.sql) with random user actions. The reason for putting
these into a SQL database instead of directly into BigQuery is that it
will be used to show export form a SQL database into BigQuery.
Hopefully the limits on query sizes for BigQuery are large enough that
this works for larger databases.

Change-Id: I446c3af72dab60d9ed79a2c814a68f05801ae17b
This tests the SQL code to create the tables, as well as the Python code
that creates the rows. Uses SQLAlchemy to abstract away differences
between database engines.

Change-Id: Id9e70eef56f5e203921b6c1f21708631f3a767f7
This sample reads a SQL file (for example: one that was output from
mysqldump) and executes each line as a query. At least in my
configuration of mysqldump, each insert statement was on a single line,
so I was able to write data from MySQL to BigQuery with this sample.

Change-Id: Id14b648b0ce6bac651e436d402f480c56d80bd37
Also, adds a few explanatory comments for the docs.

Change-Id: I623bf226839ab43f8da8297938223323a04e5838
Change-Id: Ie0d12ac9aedfdd83d2f6f533ad30265f75126e4f
@googlebot googlebot added the cla: yes This human has signed the Contributor License Agreement. label Sep 28, 2016
Copy link
Contributor

@theacodes theacodes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may have said this in my original review, but the populate_db.py script is a really complex sample. I'm curious as to the reasoning behind choosing such a complex sample. Is there a much simpler sample that could teach the same thing?

# See the License for the specific language governing permissions and
# limitations under the License.

"""Sample to run line-separated SQL statements in Big Query from a file.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awkward phrasing, maybe: Sample that runs a file containing line-separated SQL statements in Big Query.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. (I realized this sample was a bit more complicated than it needed to be, too. I changed it to "Sample that runs a file containing INSERT SQL statements in Big Query." and modified the loop to look for lines that start with INSERT (to match the command-line sample I wrote for the docs)

from __future__ import print_function

import argparse
# [START insert_sql]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either include all imports, or include none of them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. (Included all)

query.run()
return
except exceptions.GCloudError as err:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This newline isn't needed (blank newline is recommended between expressions and statements, but not between two statements)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, good to know. I deleted this function entirely based on your previous feedback. (I made some command-line samples to do the same thing and it really does complicate it a lot to add retries. It doesn't seem to be helping for most errors, anyway)

parser = argparse.ArgumentParser(
description=__doc__,
formatter_class=argparse.RawDescriptionHelpFormatter)
parser.add_argument('--project', help='Google Cloud project name')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use positional command-line args for required items, use flags for items with sensible defaults.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

session = create_session(engine)

try:
populate_db(session, total_users=100)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we do like < 10 for the sake of speed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Removes unnecessary UserActions table from populate_db.py. Removes retry
logic and changes sample to only look for INSERT lines in insert_sql.py

Change-Id: If8994c420cd95babf3c4673a3b87affbfca4f32a
Copy link
Contributor Author

@tswast tswast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I simplified the populate_db.py script. You are right that it was too complicated. The UserActions table was unnecessary for the docs I have written.

from __future__ import print_function

import argparse
# [START insert_sql]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. (Included all)

# See the License for the specific language governing permissions and
# limitations under the License.

"""Sample to run line-separated SQL statements in Big Query from a file.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. (I realized this sample was a bit more complicated than it needed to be, too. I changed it to "Sample that runs a file containing INSERT SQL statements in Big Query." and modified the loop to look for lines that start with INSERT (to match the command-line sample I wrote for the docs)

query.run()
return
except exceptions.GCloudError as err:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, good to know. I deleted this function entirely based on your previous feedback. (I made some command-line samples to do the same thing and it really does complicate it a lot to add retries. It doesn't seem to be helping for most errors, anyway)

parser = argparse.ArgumentParser(
description=__doc__,
formatter_class=argparse.RawDescriptionHelpFormatter)
parser.add_argument('--project', help='Google Cloud project name')
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

session = create_session(engine)

try:
populate_db(session, total_users=100)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@theacodes theacodes merged commit 0f85ad0 into master Sep 28, 2016
@theacodes theacodes deleted the tswast-bq-dml-insert branch September 28, 2016 19:06
telpirion pushed a commit that referenced this pull request Jan 18, 2023
* chore(deps): update all dependencies

* 🦉 Updates from OwlBot post-processor

See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md

* revert

Co-authored-by: Owl Bot <gcf-owl-bot[bot]@users.noreply.github.com>
Co-authored-by: Anthonios Partheniou <partheniou@google.com>
dandhlee pushed a commit that referenced this pull request Feb 6, 2023
* chore(deps): update all dependencies

* 🦉 Updates from OwlBot post-processor

See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md

* revert

Co-authored-by: Owl Bot <gcf-owl-bot[bot]@users.noreply.github.com>
Co-authored-by: Anthonios Partheniou <partheniou@google.com>
telpirion pushed a commit that referenced this pull request Mar 13, 2023
* chore(deps): update all dependencies

* 🦉 Updates from OwlBot post-processor

See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md

* revert

Co-authored-by: Owl Bot <gcf-owl-bot[bot]@users.noreply.github.com>
Co-authored-by: Anthonios Partheniou <partheniou@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes This human has signed the Contributor License Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants