Skip to content

Commit

Permalink
Expanded end-user documentation with detailed descriptions for workfl…
Browse files Browse the repository at this point in the history
…ows and commands (#999)

The Databricks Labs UCX project has been updated with several new
features to assist in upgrading to Unity Catalog. These include various
workflows and command-line utilities, such as an assessment workflow
that generates a detailed compatibility report for workspace entities
and a group migration workflow to upgrade all Databricks workspace
assets. Additionally, new utility commands have been added for managing
cross-workspace installations, and users can now view deployed
workflows' status and repair failed workflows. A new end-user
documentation has also been introduced, featuring comprehensive
descriptions of workflows, commands, and an assessment report image. The
Assessment Report, generated from UCX tools, now includes a more
detailed summary of the assessment findings, table counts, database
summaries, and external locations. Improved documentation for external
Hive Metastore integration and a new debugging notebook are also
included in this release. Lastly, the workspace group migration feature
has been expanded to handle potential conflicts when migrating multiple
workspaces with locally scoped group names.
  • Loading branch information
nfx authored Mar 4, 2024
1 parent 906e187 commit 06b7f5a
Show file tree
Hide file tree
Showing 10 changed files with 1,054 additions and 340 deletions.
638 changes: 514 additions & 124 deletions README.md

Large diffs are not rendered by default.

Binary file added docs/assessment-report.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
197 changes: 186 additions & 11 deletions docs/assessment.md

Large diffs are not rendered by default.

Binary file added docs/debug-logs.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/debug-notebook.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
37 changes: 24 additions & 13 deletions docs/external_hms_glue.md
Original file line number Diff line number Diff line change
@@ -1,22 +1,27 @@
# External HMS and Glue Integration
External Hive Metastore Integration
===

### TL;DR
<!-- TOC -->
* [External Hive Metastore Integration](#external-hive-metastore-integration)
* [Current External HMS Integration](#current-external-hms-integration)
* [Manual Setup/Override](#manual-setupoverride)
* [Challenges and Gotchas](#challenges-and-gotchas)
<!-- TOC -->

The UCX toolkit by default relies on the internal workspace HMS as a source for tables and views.
<br/>The UCX is set up to run and introspect a single HMS.
<br/>The installer is looking for evidence of an external Metastore (Glue and Others)
<br/>If we find an external metastore we allow the user to use this configuration for UCX.
- is set up to run and introspect a single HMS.
- The installer is looking for evidence of an external Metastore (Glue and Others)
- If we find an external metastore we allow the user to use this configuration for UCX.

### Current External HMS Integration
# Current External HMS Integration

To integrate with an External Metastore we need to configure the job clusters we generate.
<br/> The setup process follows the following steps
To integrate with an External Metastore we need to configure the job clusters we generate. The setup process follows the following steps

- We are list the existing cluster policies and look for an evidence of External Metastore
-- Spark config `spark.databricks.hive.metastore.glueCatalog.enabled=true`
-- Spark config containing `spark.sql.hive.metastore`
- If we find evidence of external metastore we prompt the user with the following message:<br/>
_We have identified one or more cluster policies set up for an external metastore. <br/>
- If we find evidence of external metastore we prompt the user with the following message:
_We have identified one or more cluster policies set up for an external metastore.
Would you like to set UCX to connect to the external metastore._
- Selecting **Yes** will display a list of the matching policies and allow the user to select the proper one.
- We copy the Instance Profile and the spark configuration parameters from the cluster policy and apply these to the job
Expand All @@ -25,7 +30,9 @@ To integrate with an External Metastore we need to configure the job clusters we
Metastore, the Dashboard will fail.
- DBSQL Warehouse settings are global to the workspace and cannot be set individually on a single warehouse.

### Manual Setup/Override
[[back to top](#external-hive-metastore-integration)]

# Manual Setup/Override

If the workspace doesn't have a cluster policy that is set up for External Metastore, there are two options to set UCX
with External Metastore:
Expand All @@ -52,10 +59,14 @@ with External Metastore:
Clusters before running the workflows.
- Set up the DBSQL warehouses for the External Metastore

### Challenges and Gotchas
[[back to top](#external-hive-metastore-integration)]

# Challenges and Gotchas

- UCX is currently designed to run on a single workspace at a time.
- If you run UCX on multiple workspace leveraging the same metastore, follow the following guidelines:
-- Use a different inventory database name for each of the workspaces. Otherwise, they will override one another.
-- Migrate the table once. Running table migration (when it will become available) from multiple workspaces is
redundant.
redundant.

[[back to top](#external-hive-metastore-integration)]
33 changes: 16 additions & 17 deletions docs/group_name_conflict.md
Original file line number Diff line number Diff line change
@@ -1,33 +1,32 @@
# Group Name Conflict Resolution
Group Name Conflict Resolution
===

See [this document](local-group-migration.md) for workspace group migration.

During the UC upgrade process we migrate all the local workspace group to account level group.
The process is detailed here: [local-group-migration.md](local-group-migration.md)
<br/>
When migrating multiple workspaces we can run into conflicts.
These conflicts occur when groups with the same name in different workspaces have different membership and different
use.

## Suggested Workflow

During the installation process we pose the following question:
<br/>
"Do you need to rename the workspace groups to match the account groups' name?"
During the installation process we pose the following question: `Do you need to rename the workspace groups to match the account groups' name?`

If the answer is "Yes" a follow-up question will be:
<br/>
"Choose How to rename the workspace groups:"

1. Apply a Prefix
2. Apply a Suffix
3. Use Regular Expression Substitution
4. User Regular Expression to extract a value from the account and the workspace
5. Map using External Group ID
```text
Choose how to map the workspace groups:
[0] Match by Name
[1] Apply a Prefix
[2] Apply a Suffix
[3] Match by External ID
[4] Regex Substitution
[5] Regex Matching
Enter a number between 0 and 5:
```

The user then input the Prefix/Suffix/Regular Expression.
The installation process will validate the regular expression.
The installation process will register the selection as regular expression in the configuration YAML file.

We introduce 3 more parameters to the configuration yaml and the group manager:
We introduce 3 more parameters to the [configuration](../README.md#open-remote-config-command) and the group manager:

- workspace_group_regex
- workspace_group_replace
Expand Down
Loading

0 comments on commit 06b7f5a

Please sign in to comment.