From f3886f7c9bc1b3f11cc5a76818e24ec0f575686e Mon Sep 17 00:00:00 2001 From: Liran Bareket Date: Fri, 29 Sep 2023 14:37:14 -0400 Subject: [PATCH] Ported internal document about UCX persistence schema (#345) Generated the table persistence document in docs --- docs/table_persistence.md | 120 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 120 insertions(+) create mode 100644 docs/table_persistence.md diff --git a/docs/table_persistence.md b/docs/table_persistence.md new file mode 100644 index 0000000000..461f0b2ac0 --- /dev/null +++ b/docs/table_persistence.md @@ -0,0 +1,120 @@ + # UCX Persistence + +Enumeration of all UCX persistence elements + +## Overview +Table Utilization: + +| Table | Generate Assessment | Migrate Local Groups | Migrate External Tables | Migrate SQL Warehouses | Upgrade Jobs | Migrate managed tables | +|--------------------|---------------------|----------------------|-------------------------|------------------------|--------------|------------------------| +| tables | RW | | RO | | | RO | +| grants | | | RW | | | RW | +| mounts | RW | | RO | | RO | RO | +| permissions | | RW | | | | | +| jobs | RW | | | | RO | | +| clusters | RW | +| external_locations | RW | | RO | +| workspace | RW | RO | | | RO | + +**RW** - Read/Write the job that generates the table +**RO** - Read Only + +### Inventory Database +#### _$inventory_.tables + +Holds Inventory of all tables in all databases and their relevant metadata. + +| Column | Datatype | Description | Comments | +|-----------|----------|-------------|----------| +| catalog | string | Original catalog of the table. _hive_metastore_ by default | +| database | string | Original schema of the table | +| name |string|Name of the table| +|object_type|string|MANAGED, EXTERNAL, or VIEW| +|table_format|string|Table provider. Like delta or json or parquet.| +|location|string|Location of the data for table| +|view_text|nullable string|If the table is the view, then this column holds the definition of the view| + +
+ +#### _$inventory_.table_failures +Holds failures that occurred during crawling HMS tables + +| Column | Datatype | Description | Comments | +|-----------|----------|-------------|----------| +|catalog|string|Original catalog of the table. hive_metastore by default| +|database|string|Original schema of the table| +|name|string|Name of the table| +|failures|string|Exception message context| + +
+ +#### _$inventory_.grants +Inventory of all Table ACLs for tables indexed in tables table. + +| Column | Datatype | Description | Comments | +|-----------|----------|-------------|----------| +|principal|string|User name, group name, or service principal name| +|action_type|string|Name of GRANT action| +|catalog|string|Original catalog of the table. hive_metastore by default| +|database|Nullable string|Original schema of the table| +|table|Nullable string|Name of the table| +|view|Nullable string|Name of the view| +|any_file|bool|Any file| +|anonymous_function|string|Grant for the anonymous function| + +
+ +#### _$inventory_.mounts +List of DBFS mount points. + +| Column | Datatype | Description | Comments | +|-----------|----------|-------------|----------| +|name|string|Name of the mount point| +|source|string|Location of the backing dataset| +|instance_profile|Nullable string|This mount point is accessible only with this AWS IAM instance profile| + +
+ +#### _$inventory_.permissions +Workspace object level permissions + +| Column | Datatype | Description | Comments | +|-----------|----------|-------------|----------| +|object_id|string|Either:
Group ID
Workspace Object ID
Redash Object ID
Scope name +|supports|string|One of:
AUTHORIZATION

CLUSTERS
CLUSTER_POLICIES
DIRECTORIES
EXPERIMENTS
FILES
INSTANCE_POOLS
JOBS
NOTEBOOKS
PIPELINES
REGISTERED_MODELS
REPOS
SERVING_ENDPOINTS
SQL_WAREHOUSES +|raw_object_permissions|JSON|JSON-serialized response of:
Generic Permissions
Secret ACL
Group roles and entitlements
Redash permissions| + +
+ +#### _$inventory_.jobs +Holds a list of all jobs with a notation of potential issues. + +| Column | Datatype | Description | Comments | +|-----------|----------|-------------|----------| +|job_id|string|Job ID| +|job_name|string|Job Name| +|job_creator|string|UserID of the Job Creator| +|compatible|int|1 or 0, used for percentage reporting| +|failures|string|List of issues identified by the assessment in JSON format| + + +#### _$inventory_.clusters +Holds a list of all clusters with a notation of potential issues. + +| Column | Datatype | Description | Comments | +|-----------|----------|-------------|----------| +|cluster_id|string|Cluster Id| +|cluster_name|string|Cluster Name| +|cluster_creator|string|UserID of the Cluster Creator| +|compatible|int|1 or 0, used for percentage reporting| +|failures|string|List of issues identified by the assessment in JSON format| + + +#### _$inventory_.external_locations +Holds a list of all external locations that will be required for the migration. + +| Column | Datatype | Description | Comments | +|-----------|----------|-------------|----------| +|external_location|string|External Location URL| + +