Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added a column to $inventory.tables to specify if a table might have been synchronised to Unity Catalog already or not #306

Merged
merged 17 commits into from
Oct 3, 2023
Merged
Show file tree
Hide file tree
Changes from 11 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
122 changes: 122 additions & 0 deletions docs/table_persistence.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,122 @@
# UCX Persistence

Enumeration of all UCX persistence elements

## Overview
Table Utilization:

| Table | Generate Assessment | Migrate Local Groups | Migrate External Tables | Migrate SQL Warehouses | Upgrade Jobs | Migrate managed tables |
|--------------------|---------------------|----------------------|-------------------------|------------------------|--------------|------------------------|
| tables | RW | | RO | | | RO |
| grants | | | RW | | | RW |
| mounts | RW | | RO | | RO | RO |
| permissions | | RW | | | | |
| jobs | RW | | | | RO | |
| clusters | RW |
| external_locations | RW | | RO |
| workspace | RW | RO | | | RO |

**RW** - Read/Write the job that generates the table
**RO** - Read Only

### Inventory Database
#### _$inventory_.tables

Holds Inventory of all tables in all databases and their relevant metadata.

| Column | Datatype | Description | Comments |
|----------------|----------|-------------|----------|
| catalog | string | Original catalog of the table. _hive_metastore_ by default |
| database | string | Original schema of the table |
| name |string|Name of the table|
| object_type |string|MANAGED, EXTERNAL, or VIEW|
| table_format |string|Table provider. Like delta or json or parquet.|
| location |string|Location of the data for table|
| view_text |nullable string|If the table is the view, then this column holds the definition of the view|
| upgrade_status |int|0-Not Upgraded (default) 1-Upgraded|
| upgrade_target |string|Upgrade Target (3 level namespace)|

<br/>

#### _$inventory_.table_failures
Holds failures that occurred during crawling HMS tables

| Column | Datatype | Description | Comments |
|-----------|----------|-------------|----------|
|catalog|string|Original catalog of the table. hive_metastore by default|
|database|string|Original schema of the table|
|name|string|Name of the table|
|failures|string|Exception message context|

<br/>

#### _$inventory_.grants
Inventory of all Table ACLs for tables indexed in tables table.

| Column | Datatype | Description | Comments |
|-----------|----------|-------------|----------|
|principal|string|User name, group name, or service principal name|
|action_type|string|Name of GRANT action|
|catalog|string|Original catalog of the table. hive_metastore by default|
|database|Nullable string|Original schema of the table|
|table|Nullable string|Name of the table|
|view|Nullable string|Name of the view|
|any_file|bool|Any file|
|anonymous_function|string|Grant for the anonymous function|

<br/>

#### _$inventory_.mounts
List of DBFS mount points.

| Column | Datatype | Description | Comments |
|-----------|----------|-------------|----------|
|name|string|Name of the mount point|
|source|string|Location of the backing dataset|
|instance_profile|Nullable string|This mount point is accessible only with this AWS IAM instance profile|

<br/>

#### _$inventory_.permissions
Workspace object level permissions

| Column | Datatype | Description | Comments |
|-----------|----------|-------------|----------|
|object_id|string|Either:<br/>Group ID<br/>Workspace Object ID<br/>Redash Object ID<br/>Scope name
|supports|string|One of:<br/>AUTHORIZATION<br/><br/>CLUSTERS<br/>CLUSTER_POLICIES<br/>DIRECTORIES<br/>EXPERIMENTS<br/>FILES<br/>INSTANCE_POOLS<br/>JOBS<br/>NOTEBOOKS<br/>PIPELINES<br/>REGISTERED_MODELS<br/>REPOS<br/>SERVING_ENDPOINTS<br/>SQL_WAREHOUSES
|raw_object_permissions|JSON|JSON-serialized response of:<br/>Generic Permissions<br/>Secret ACL<br/>Group roles and entitlements<br/>Redash permissions|

<br/>

#### _$inventory_.jobs
Holds a list of all jobs with a notation of potential issues.

| Column | Datatype | Description | Comments |
|-----------|----------|-------------|----------|
|job_id|string|Job ID|
|job_name|string|Job Name|
|job_creator|string|UserID of the Job Creator|
|compatible|int|1 or 0, used for percentage reporting|
|failures|string|List of issues identified by the assessment in JSON format|


#### _$inventory_.clusters
Holds a list of all clusters with a notation of potential issues.

| Column | Datatype | Description | Comments |
|-----------|----------|-------------|----------|
|cluster_id|string|Cluster Id|
|cluster_name|string|Cluster Name|
|cluster_creator|string|UserID of the Cluster Creator|
|compatible|int|1 or 0, used for percentage reporting|
|failures|string|List of issues identified by the assessment in JSON format|


#### _$inventory_.external_locations
Holds a list of all external locations that will be required for the migration.

| Column | Datatype | Description | Comments |
|-----------|----------|-------------|----------|
|external_location|string|External Location URL|


2 changes: 2 additions & 0 deletions src/databricks/labs/ucx/hive_metastore/tables.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,8 @@ class Table:

location: str = None
view_text: str = None
upgrade_status: int = 0
FastLee marked this conversation as resolved.
Show resolved Hide resolved
upgrade_target: str = None

@property
def is_delta(self) -> bool:
Expand Down
6 changes: 4 additions & 2 deletions src/databricks/labs/ucx/hive_metastore/tables.scala
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ import org.apache.spark.sql.DataFrame

// must follow the same structure as databricks.labs.ucx.hive_metastore.tables.Table
case class TableDetails(catalog: String, database: String, name: String, object_type: String,
table_format: String, location: String, view_text: String)
table_format: String, location: String, view_text: String, upgrade_status: Int, upgrade_target: String)

// recording error log in the database
case class TableError(catalog: String, database: String, name: String, error: String)
Expand Down Expand Up @@ -36,8 +36,10 @@ def metadataForAllTables(databases: Seq[String], queue: ConcurrentLinkedQueue[Ta
failures.add(TableError("hive_metastore", databaseName, tableName, s"result is null"))
None
} else {
val upgrade_to=table.properties.get("upgraded_to")
Some(TableDetails("hive_metastore", databaseName, tableName, table.tableType.name, table.provider.orNull,
table.storage.locationUri.map(_.toString).orNull, table.viewText.orNull))
table.storage.locationUri.map(_.toString).orNull, table.viewText.orNull,
upgrade_to match {case Some(target) => 1 case None => 0}, upgrade_to match {case Some(target) => target case None => ""}))
FastLee marked this conversation as resolved.
Show resolved Hide resolved
}
} catch {
case err: Throwable =>
Expand Down