The BigQuery Migration Automation Tool is used with the BigQuery Migration Service to batch transpile all of the teradata SQLs to BigQuery SQLs and then validates those queries with Bigquery dry-run. Then the BigQuery SQLs are pushed to github as well as a transpilation logs table in BigQuery.
- BQ Migration CLI - CLI tooling built by the BigQuery Migration Service team that allows you to interact with the BigQuery Migration Service.
- Makefile- A Makefile is used to run the python scripts, the pip installs and all the requirements needed to run the BigQuery Migration Automation Tool.
UC4 SQL Repo
This is the repository of all of the validated SQLs generated by the BigQuery Migration
Service tool. The tool uploads the verified SQL to this repository.
Folder Structure:
|-- Teradata SQLs |--radd_master_directory |-- my_sql.sql … |-- BigQuery SQLs |--simba_directory |-- my_sql.sql
The Teradata SQLs folder contains all of the Teradata SQLs for the UC4 Jobs. The BigQuery SQLs folder contains all of the converted BigQuery SQLs for the UC4 Jobs
DWH Migration Tools Repo
This is the Github repository that contains the dwh-migration-tools that is required
for the transpilation of the Teradata SQL.
The first part of the Makefile will run the setup.py. This script will clone the required repos
into the local file system, if the given Github repo exists already in our local file system, we will
do a git pull instead.
The Dataset Mapping parses through the given SQL for each job, extracts the table names attached to each job,
and creates a mapping configuration file that maps those tables into the correct dataset for the associated business unit.
The transpilation is the process that takes the terdata SQL and transpiles it into BigQuery SQL.
This process is completed using bqms-run
. The script sets the environment variables required by
the BQMS tool and then run the bqms-run
command to initilize the transpilation process.
We then iterate through the files in the BQMS output, we submit a dry run for every query for each specific job.
If the query is successful the file will then be moved into the UC4_SQL_REPO in the bigquery_sql/
directory. If the querys it will stay in the teradata_sql/ directory.
At the end of the Dry-run validation, whether a dry-run is successul for not, the query data is uploaded to the transpilation_logs table in BigQuery where it can be accessed to get accurate logs for the dry-runs success' or failures. If the Dry-run is successful it will have a status of SUCCEEDED
, it will have the time the dry-run ran and the specific query that succeeded. If the dry-run fails it will have a status of FAILED
, it will have the time the dry-run ran, the specific query that failed and the error message explaining why the dry-run validation wasn't successful.
Upon completion of the validation process, the script will create a new branch in the repository,
and push the new branch with its changes to the repository stored in github.
In order to utilize this tool, you first need to clone the project into the directory of your choice
git clone https://github.com/RealistAI/bq_migration_automation_tool.git
, navigate into the newly cloned
directory cd bq_migration_automation_tool
, and alter the config.py to your own specification. Create
a Pip virtual environment using pipenv shell
and install the required libraries pipenv install
,
and run the Makefile make run
.
The repository url and branch containing the dwh-migration-tools
The repository url and branch containing the SQL's to transpile & validate
The name of the repository containing the SQL's to transpile & validate
The base path for which the dataset mapping grabs the SQLs that is parses through and adjusts to work for BigQuery
The name of the Google Cloud Platform project that will perform the bulk transpilation & validation
The name of the Google Cloud Platform Dataset that will perform the bulk transpilation & validation.
A Google Cloud Storage bucket that will be used by bqms-run
as a staging area for the translation process
The directory in your Github repository containing .sql files for translation and validation
A Google Cloud Storage bucket that will be used by bqms-run
to store translated files before dumping
them back into the local file system.
The local directory that bqms-run
will use to store the results of the run.
The directory within the origin Github repository to contain the translated and validated .sql files
The path to the base config directyory which hosts the config.yaml file and the object name mapping file.
The path to the dwh-migration-tools config.yaml file.
The path to the object name mapping configuration file.