A Python package to manage Google Cloud Data Catalog Tag export scripts.
Disclaimer: This is not an officially supported Google product.
- Executing in Cloud Shell
- 1. Environment setup
- 2. Export Tags to CSV file
# Set your SERVICE ACCOUNT, for instructions go to 1.3. Auth credentials
# This name is just a suggestion, feel free to name it following your naming conventions
export GOOGLE_APPLICATION_CREDENTIALS=~/datacatalog-tag-exporter-sa.json
# Install datacatalog-tag-exporter
pip3 install datacatalog-tag-exporter --user
# Add to your PATH
export PATH=~/.local/bin:$PATH
# Look for available commands
datacatalog-tag-exporter --help
Using virtualenv is optional, but strongly recommended unless you use Docker.
git clone https://github.com/mesmacosta/datacatalog-tag-exporter
cd ./datacatalog-tag-exporter
All paths starting with ./
in the next steps are relative to the datacatalog-tag-exporter
folder.
pip install --upgrade virtualenv
python3 -m virtualenv --python python3 env
source ./env/bin/activate
pip install --upgrade .
Docker may be used as an alternative to run the script. In this case, please disregard the Virtualenv setup instructions.
- Data Catalog Admin
This name is just a suggestion, feel free to name it following your naming conventions
./credentials/datacatalog-tag-exporter-sa.json
This step may be skipped if you're using Docker.
export GOOGLE_APPLICATION_CREDENTIALS=~/credentials/datacatalog-tag-exporter-sa.json
One file with summary with stats about each template, will also be created on the same directory.
The columns for the summary file are described as follows:
Column | Description |
---|---|
template_name | Resource name of the Tag Template for the Tag. |
tags_count | Number of tags found from the template. |
tagged_entries_count | Number of tagged entries with the template. |
tagged_columns_count | Number of tagged columns with the template. |
tag_string_fields_count | Number of used String fields on tags of the template. |
tag_bool_fields_count | Number of used Bool fields on tags of the template. |
tag_double_fields_count | Number of used Double fields on tags of the template. |
tag_timestamp_fields_count | Number of used Timestamp fields on tags of the template. |
tag_enum_fields_count | Number of used Enum fields on tags of the template. |
The columns for each template file are described as follows:
Column | Description |
---|---|
relative_resource_name | Full resource name of the asset the Entry refers to. |
linked_resource | Full name of the asset the Entry refers to. |
template_name | Resource name of the Tag Template for the Tag. |
tag_name | Resource name of the Tag. |
column | Attach Tags to a column belonging to the Entry schema. |
field_id | Id of the Tag field. |
field_type | Type of the Tag field. |
field_value | Value of the Tag field. |
- Python + virtualenv
datacatalog-tag-exporter tags export --project-ids my-project --dir-path DIR_PATH
- Python + virtualenv
datacatalog-tag-exporter tags export --project-ids my-project \
--dir-path DIR_PATH \
--tag-templates-names projects/my-project/locations/us-central1/tagTemplates/my-template,\
projects/my-project/locations/us-central1/tagTemplates/my-template-2