Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added databricks labs ucx cluster-remap command to remap legacy cluster configurations to UC-compatible #994

Merged
merged 34 commits into from
Mar 20, 2024

Conversation

prajin-29
Copy link
Contributor

@prajin-29 prajin-29 commented Mar 1, 2024

This is the functionality for migrating cluster configurations to UC. The ClusterAccess class has several methods:

  • list_clusters: This method lists all the clusters that are not associated with a job in the Databricks workspace.
  • _get_access_mode: This method maps data security mode values to the DataSecurityMode enum.
  • map_cluster_to_uc: This method edits a given cluster to be compatible with UC-X by updating its properties such as access mode, spark version, etc. It also saves a backup of the original cluster configuration before editing. If an error occurs during the editing process, the method skips the cluster and logs a warning message.
  • revert_cluster_remap: This method restores the original configuration of a cluster that was previously edited by the map_cluster_to_uc method. It loads the backup configuration from a JSON file and applies it to the cluster. If an error occurs during the restoration process, the method skips the cluster and logs a warning message.

The map_cluster_to_uc method takes care of editing the cluster configurations while revert_cluster_remap can be used to restore the original configurations if required, making it a useful feature for migrating legacy clusters to the UC-X format.

The ClusterAccess class uses the WorkspaceClient class from the databricks.sdk.service.compute module to interact with the Databricks workspace and retrieve information about the clusters. A test suite is added to check the functionality of the map_cluster_to_uc, list_clusters, revert_cluster_remap methods and also error handling scenarios. This class is the backbone of the new feature which allows to manage and migrate cluster configurations to UC.

Copy link

codecov bot commented Mar 1, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 89.14%. Comparing base (f4d5311) to head (23a1b68).

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #994      +/-   ##
==========================================
+ Coverage   89.01%   89.14%   +0.13%     
==========================================
  Files          54       55       +1     
  Lines        6652     6732      +80     
  Branches     1197     1214      +17     
==========================================
+ Hits         5921     6001      +80     
  Misses        481      481              
  Partials      250      250              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link

github-actions bot commented Mar 1, 2024

❌ 111/112 passed, 2 flaky, 1 failed, 19 skipped, 1h55m35s total

❌ test_running_real_assessment_job: TimeoutError: timed out after 0:20:00: (21m24.81s)
TimeoutError: timed out after 0:20:00:
19:15 INFO [databricks.labs.ucx.mixins.fixtures] Schema hive_metastore.ucx_sad1i: https://DATABRICKS_HOST/explore/data/hive_metastore/ucx_sad1i
19:15 DEBUG [databricks.labs.ucx.mixins.fixtures] added schema fixture: SchemaInfo(browse_only=None, catalog_name='hive_metastore', catalog_type=None, comment=None, created_at=None, created_by=None, effective_predictive_optimization_flag=None, enable_predictive_optimization=None, full_name='hive_metastore.ucx_sad1i', metastore_id=None, name='ucx_sad1i', owner=None, properties=None, storage_location=None, storage_root=None, updated_at=None, updated_by=None)
[gw3] linux -- Python 3.10.13 /home/runner/work/ucx/ucx/.venv/bin/python
19:15 INFO [databricks.labs.ucx.mixins.fixtures] Schema hive_metastore.ucx_sad1i: https://DATABRICKS_HOST/explore/data/hive_metastore/ucx_sad1i
19:15 DEBUG [databricks.labs.ucx.mixins.fixtures] added schema fixture: SchemaInfo(browse_only=None, catalog_name='hive_metastore', catalog_type=None, comment=None, created_at=None, created_by=None, effective_predictive_optimization_flag=None, enable_predictive_optimization=None, full_name='hive_metastore.ucx_sad1i', metastore_id=None, name='ucx_sad1i', owner=None, properties=None, storage_location=None, storage_root=None, updated_at=None, updated_by=None)
19:15 DEBUG [databricks.labs.ucx.mixins.fixtures] added workspace user fixture: User(active=True, display_name='sdk-vamv@example.com', emails=[ComplexValue(display=None, primary=True, ref=None, type='work', value='sdk-vamv@example.com')], entitlements=[], external_id=None, groups=[], id='4293340815517162', name=Name(family_name=None, given_name='sdk-vamv@example.com'), roles=[], schemas=[<UserSchema.URN_IETF_PARAMS_SCIM_SCHEMAS_CORE_2_0_USER: 'urn:ietf:params:scim:schemas:core:2.0:User'>, <UserSchema.URN_IETF_PARAMS_SCIM_SCHEMAS_EXTENSION_WORKSPACE_2_0_USER: 'urn:ietf:params:scim:schemas:extension:workspace:2.0:User'>], user_name='sdk-vamv@example.com')
19:15 INFO [databricks.labs.ucx.mixins.fixtures] Workspace group ucx_6sUD: https://DATABRICKS_HOST#setting/accounts/groups/471845917123931
19:15 DEBUG [databricks.labs.ucx.mixins.fixtures] added workspace group fixture: Group(display_name='ucx_6sUD', entitlements=[ComplexValue(display=None, primary=None, ref=None, type=None, value='allow-cluster-create')], external_id=None, groups=[], id='471845917123931', members=[ComplexValue(display='sdk-vamv@example.com', primary=None, ref='Users/4293340815517162', type=None, value='4293340815517162')], meta=ResourceMeta(resource_type='WorkspaceGroup'), roles=[], schemas=[<GroupSchema.URN_IETF_PARAMS_SCIM_SCHEMAS_CORE_2_0_GROUP: 'urn:ietf:params:scim:schemas:core:2.0:Group'>])
19:15 INFO [databricks.labs.ucx.mixins.fixtures] Account group ucx_6sUD: https://accounts.CLOUD_ENVdatabricks.net/users/groups/89168514476100/members
19:15 DEBUG [databricks.labs.ucx.mixins.fixtures] added account group fixture: Group(display_name='ucx_6sUD', entitlements=[], external_id=None, groups=[], id='89168514476100', members=[ComplexValue(display='sdk-vamv@example.com', primary=None, ref='Users/4293340815517162', type=None, value='4293340815517162')], meta=None, roles=[], schemas=[<GroupSchema.URN_IETF_PARAMS_SCIM_SCHEMAS_CORE_2_0_GROUP: 'urn:ietf:params:scim:schemas:core:2.0:Group'>])
19:15 INFO [databricks.labs.ucx.mixins.fixtures] Cluster policy: https://DATABRICKS_HOST#setting/clusters/cluster-policies/view/0000E36F27AA1468
19:15 DEBUG [databricks.labs.ucx.mixins.fixtures] added cluster policy fixture: CreatePolicyResponse(policy_id='0000E36F27AA1468')
19:15 DEBUG [databricks.labs.ucx.mixins.fixtures] added cluster_policy permissions fixture: 0000E36F27AA1468 [group_name admins CAN_USE] -> [group_name ucx_6sUD CAN_USE]
19:15 DEBUG [databricks.labs.ucx.install] Cannot find previous installation: Path (/Users/0a330eb5-dd51-4d97-b6e4-c474356b1d5d/.pmLl/config.yml) doesn't exist.
19:15 INFO [databricks.labs.ucx.install] Please answer a couple of questions to configure Unity Catalog migration
19:15 INFO [databricks.labs.ucx.installer.hms_lineage] HMS Lineage feature creates one system table named system.hms_to_uc_migration.table_access and helps in your migration process from HMS to UC by allowing you to programmatically query HMS lineage data.
19:15 INFO [databricks.labs.ucx.install] Fetching installations...
19:15 INFO [databricks.labs.ucx.installer.policy] Creating UCX cluster policy.
19:15 INFO [databricks.labs.ucx.install] Installing UCX v0.17.1+4220240320191542
19:15 INFO [databricks.labs.ucx.install] Creating dashboards...
19:15 INFO [databricks.labs.ucx.install] Fetching warehouse_id from a config
19:15 DEBUG [databricks.labs.ucx.install] Creating jobs from tasks in main
19:15 DEBUG [databricks.labs.ucx.framework.dashboards] Reading step folder /home/runner/work/ucx/ucx/src/databricks/labs/ucx/queries/views...
19:15 DEBUG [databricks.labs.ucx.framework.dashboards] Reading step folder /home/runner/work/ucx/ucx/src/databricks/labs/ucx/queries/assessment...
19:15 DEBUG [databricks.labs.ucx.framework.dashboards] Reading dashboard folder /home/runner/work/ucx/ucx/src/databricks/labs/ucx/queries/assessment/estimates...
19:15 INFO [databricks.labs.ucx.framework.dashboards] Creating dashboard [PMLL] UCX  Assessment (Estimates)...
19:15 DEBUG [databricks.labs.ucx.framework.dashboards] Skipping query 01_0_group_migration.md because it's a text widget
19:15 DEBUG [databricks.labs.ucx.framework.dashboards] Skipping viz 01_0_group_migration.md because it's a text widget
19:15 DEBUG [databricks.labs.ucx.framework.dashboards] Skipping query 00_0_metastore_assignment.md because it's a text widget
19:15 DEBUG [databricks.labs.ucx.framework.dashboards] Skipping viz 00_0_metastore_assignment.md because it's a text widget
19:15 DEBUG [databricks.labs.ucx.framework.dashboards] Skipping query 02_0_data_modeling.md because it's a text widget
19:15 DEBUG [databricks.labs.ucx.framework.dashboards] Skipping viz 02_0_data_modeling.md because it's a text widget
19:15 DEBUG [databricks.labs.ucx.framework.dashboards] Skipping query 03_0_data_migration.md because it's a text widget
19:15 DEBUG [databricks.labs.ucx.framework.dashboards] Skipping viz 03_0_data_migration.md because it's a text widget
19:15 DEBUG [databricks.labs.ucx.framework.dashboards] Reading dashboard folder /home/runner/work/ucx/ucx/src/databricks/labs/ucx/queries/assessment/main...
19:15 INFO [databricks.labs.ucx.framework.dashboards] Creating dashboard [PMLL] UCX  Assessment (Main)...
19:16 INFO [databricks.labs.ucx.install] Fetching warehouse_id from a config
19:16 INFO [databricks.labs.ucx.install] Fetching warehouse_id from a config
19:16 INFO [databricks.labs.ucx.install] Creating new job configuration for step=assessment
19:16 INFO [databricks.labs.ucx.install] Creating new job configuration for step=remove-workspace-local-backup-groups
19:16 INFO [databricks.labs.ucx.install] Creating new job configuration for step=099-destroy-schema
19:16 INFO [databricks.labs.ucx.install] Creating new job configuration for step=validate-groups-permissions
19:16 INFO [databricks.labs.ucx.install] Creating new job configuration for step=migrate-tables
19:16 INFO [databricks.labs.ucx.install] Creating new job configuration for step=migrate-groups
19:16 DEBUG [databricks.labs.ucx.framework.dashboards] Reading dashboard folder /home/runner/work/ucx/ucx/src/databricks/labs/ucx/queries/assessment/CLOUD_ENV...
19:16 INFO [databricks.labs.ucx.framework.dashboards] Creating dashboard [PMLL] UCX  Assessment (Azure)...
19:16 INFO [databricks.labs.ucx.install] Installation completed successfully! Please refer to the https://DATABRICKS_HOST/#workspace/Users/0a330eb5-dd51-4d97-b6e4-c474356b1d5d/.pmLl/README for the next steps.
19:16 DEBUG [databricks.labs.ucx.install] starting assessment job: https://DATABRICKS_HOST#job/982808153637866
19:15 INFO [databricks.labs.ucx.mixins.fixtures] Schema hive_metastore.ucx_sad1i: https://DATABRICKS_HOST/explore/data/hive_metastore/ucx_sad1i
19:15 DEBUG [databricks.labs.ucx.mixins.fixtures] added schema fixture: SchemaInfo(browse_only=None, catalog_name='hive_metastore', catalog_type=None, comment=None, created_at=None, created_by=None, effective_predictive_optimization_flag=None, enable_predictive_optimization=None, full_name='hive_metastore.ucx_sad1i', metastore_id=None, name='ucx_sad1i', owner=None, properties=None, storage_location=None, storage_root=None, updated_at=None, updated_by=None)
19:15 DEBUG [databricks.labs.ucx.mixins.fixtures] added workspace user fixture: User(active=True, display_name='sdk-vamv@example.com', emails=[ComplexValue(display=None, primary=True, ref=None, type='work', value='sdk-vamv@example.com')], entitlements=[], external_id=None, groups=[], id='4293340815517162', name=Name(family_name=None, given_name='sdk-vamv@example.com'), roles=[], schemas=[<UserSchema.URN_IETF_PARAMS_SCIM_SCHEMAS_CORE_2_0_USER: 'urn:ietf:params:scim:schemas:core:2.0:User'>, <UserSchema.URN_IETF_PARAMS_SCIM_SCHEMAS_EXTENSION_WORKSPACE_2_0_USER: 'urn:ietf:params:scim:schemas:extension:workspace:2.0:User'>], user_name='sdk-vamv@example.com')
19:15 INFO [databricks.labs.ucx.mixins.fixtures] Workspace group ucx_6sUD: https://DATABRICKS_HOST#setting/accounts/groups/471845917123931
19:15 DEBUG [databricks.labs.ucx.mixins.fixtures] added workspace group fixture: Group(display_name='ucx_6sUD', entitlements=[ComplexValue(display=None, primary=None, ref=None, type=None, value='allow-cluster-create')], external_id=None, groups=[], id='471845917123931', members=[ComplexValue(display='sdk-vamv@example.com', primary=None, ref='Users/4293340815517162', type=None, value='4293340815517162')], meta=ResourceMeta(resource_type='WorkspaceGroup'), roles=[], schemas=[<GroupSchema.URN_IETF_PARAMS_SCIM_SCHEMAS_CORE_2_0_GROUP: 'urn:ietf:params:scim:schemas:core:2.0:Group'>])
19:15 INFO [databricks.labs.ucx.mixins.fixtures] Account group ucx_6sUD: https://accounts.CLOUD_ENVdatabricks.net/users/groups/89168514476100/members
19:15 DEBUG [databricks.labs.ucx.mixins.fixtures] added account group fixture: Group(display_name='ucx_6sUD', entitlements=[], external_id=None, groups=[], id='89168514476100', members=[ComplexValue(display='sdk-vamv@example.com', primary=None, ref='Users/4293340815517162', type=None, value='4293340815517162')], meta=None, roles=[], schemas=[<GroupSchema.URN_IETF_PARAMS_SCIM_SCHEMAS_CORE_2_0_GROUP: 'urn:ietf:params:scim:schemas:core:2.0:Group'>])
19:15 INFO [databricks.labs.ucx.mixins.fixtures] Cluster policy: https://DATABRICKS_HOST#setting/clusters/cluster-policies/view/0000E36F27AA1468
19:15 DEBUG [databricks.labs.ucx.mixins.fixtures] added cluster policy fixture: CreatePolicyResponse(policy_id='0000E36F27AA1468')
19:15 DEBUG [databricks.labs.ucx.mixins.fixtures] added cluster_policy permissions fixture: 0000E36F27AA1468 [group_name admins CAN_USE] -> [group_name ucx_6sUD CAN_USE]
19:15 DEBUG [databricks.labs.ucx.install] Cannot find previous installation: Path (/Users/0a330eb5-dd51-4d97-b6e4-c474356b1d5d/.pmLl/config.yml) doesn't exist.
19:15 INFO [databricks.labs.ucx.install] Please answer a couple of questions to configure Unity Catalog migration
19:15 INFO [databricks.labs.ucx.installer.hms_lineage] HMS Lineage feature creates one system table named system.hms_to_uc_migration.table_access and helps in your migration process from HMS to UC by allowing you to programmatically query HMS lineage data.
19:15 INFO [databricks.labs.ucx.install] Fetching installations...
19:15 INFO [databricks.labs.ucx.installer.policy] Creating UCX cluster policy.
19:15 INFO [databricks.labs.ucx.install] Installing UCX v0.17.1+4220240320191542
19:15 INFO [databricks.labs.ucx.install] Creating dashboards...
19:15 INFO [databricks.labs.ucx.install] Fetching warehouse_id from a config
19:15 DEBUG [databricks.labs.ucx.install] Creating jobs from tasks in main
19:15 DEBUG [databricks.labs.ucx.framework.dashboards] Reading step folder /home/runner/work/ucx/ucx/src/databricks/labs/ucx/queries/views...
19:15 DEBUG [databricks.labs.ucx.framework.dashboards] Reading step folder /home/runner/work/ucx/ucx/src/databricks/labs/ucx/queries/assessment...
19:15 DEBUG [databricks.labs.ucx.framework.dashboards] Reading dashboard folder /home/runner/work/ucx/ucx/src/databricks/labs/ucx/queries/assessment/estimates...
19:15 INFO [databricks.labs.ucx.framework.dashboards] Creating dashboard [PMLL] UCX  Assessment (Estimates)...
19:15 DEBUG [databricks.labs.ucx.framework.dashboards] Skipping query 01_0_group_migration.md because it's a text widget
19:15 DEBUG [databricks.labs.ucx.framework.dashboards] Skipping viz 01_0_group_migration.md because it's a text widget
19:15 DEBUG [databricks.labs.ucx.framework.dashboards] Skipping query 00_0_metastore_assignment.md because it's a text widget
19:15 DEBUG [databricks.labs.ucx.framework.dashboards] Skipping viz 00_0_metastore_assignment.md because it's a text widget
19:15 DEBUG [databricks.labs.ucx.framework.dashboards] Skipping query 02_0_data_modeling.md because it's a text widget
19:15 DEBUG [databricks.labs.ucx.framework.dashboards] Skipping viz 02_0_data_modeling.md because it's a text widget
19:15 DEBUG [databricks.labs.ucx.framework.dashboards] Skipping query 03_0_data_migration.md because it's a text widget
19:15 DEBUG [databricks.labs.ucx.framework.dashboards] Skipping viz 03_0_data_migration.md because it's a text widget
19:15 DEBUG [databricks.labs.ucx.framework.dashboards] Reading dashboard folder /home/runner/work/ucx/ucx/src/databricks/labs/ucx/queries/assessment/main...
19:15 INFO [databricks.labs.ucx.framework.dashboards] Creating dashboard [PMLL] UCX  Assessment (Main)...
19:16 INFO [databricks.labs.ucx.install] Fetching warehouse_id from a config
19:16 INFO [databricks.labs.ucx.install] Fetching warehouse_id from a config
19:16 INFO [databricks.labs.ucx.install] Creating new job configuration for step=assessment
19:16 INFO [databricks.labs.ucx.install] Creating new job configuration for step=remove-workspace-local-backup-groups
19:16 INFO [databricks.labs.ucx.install] Creating new job configuration for step=099-destroy-schema
19:16 INFO [databricks.labs.ucx.install] Creating new job configuration for step=validate-groups-permissions
19:16 INFO [databricks.labs.ucx.install] Creating new job configuration for step=migrate-tables
19:16 INFO [databricks.labs.ucx.install] Creating new job configuration for step=migrate-groups
19:16 DEBUG [databricks.labs.ucx.framework.dashboards] Reading dashboard folder /home/runner/work/ucx/ucx/src/databricks/labs/ucx/queries/assessment/CLOUD_ENV...
19:16 INFO [databricks.labs.ucx.framework.dashboards] Creating dashboard [PMLL] UCX  Assessment (Azure)...
19:16 INFO [databricks.labs.ucx.install] Installation completed successfully! Please refer to the https://DATABRICKS_HOST/#workspace/Users/0a330eb5-dd51-4d97-b6e4-c474356b1d5d/.pmLl/README for the next steps.
19:16 DEBUG [databricks.labs.ucx.install] starting assessment job: https://DATABRICKS_HOST#job/982808153637866
19:36 DEBUG [databricks.labs.ucx.mixins.fixtures] clearing 1 cluster_policy permissions fixtures
19:36 DEBUG [databricks.labs.ucx.mixins.fixtures] removing cluster_policy permissions fixture: 0000E36F27AA1468 [group_name admins CAN_USE] -> [group_name ucx_6sUD CAN_USE]
19:36 DEBUG [databricks.labs.ucx.mixins.fixtures] clearing 1 cluster policy fixtures
19:36 DEBUG [databricks.labs.ucx.mixins.fixtures] removing cluster policy fixture: CreatePolicyResponse(policy_id='0000E36F27AA1468')
19:36 DEBUG [databricks.labs.ucx.mixins.fixtures] clearing 1 workspace user fixtures
19:36 DEBUG [databricks.labs.ucx.mixins.fixtures] removing workspace user fixture: User(active=True, display_name='sdk-vamv@example.com', emails=[ComplexValue(display=None, primary=True, ref=None, type='work', value='sdk-vamv@example.com')], entitlements=[], external_id=None, groups=[], id='4293340815517162', name=Name(family_name=None, given_name='sdk-vamv@example.com'), roles=[], schemas=[<UserSchema.URN_IETF_PARAMS_SCIM_SCHEMAS_CORE_2_0_USER: 'urn:ietf:params:scim:schemas:core:2.0:User'>, <UserSchema.URN_IETF_PARAMS_SCIM_SCHEMAS_EXTENSION_WORKSPACE_2_0_USER: 'urn:ietf:params:scim:schemas:extension:workspace:2.0:User'>], user_name='sdk-vamv@example.com')
19:36 DEBUG [databricks.labs.ucx.mixins.fixtures] clearing 1 account group fixtures
19:36 DEBUG [databricks.labs.ucx.mixins.fixtures] removing account group fixture: Group(display_name='ucx_6sUD', entitlements=[], external_id=None, groups=[], id='89168514476100', members=[ComplexValue(display='sdk-vamv@example.com', primary=None, ref='Users/4293340815517162', type=None, value='4293340815517162')], meta=None, roles=[], schemas=[<GroupSchema.URN_IETF_PARAMS_SCIM_SCHEMAS_CORE_2_0_GROUP: 'urn:ietf:params:scim:schemas:core:2.0:Group'>])
19:36 DEBUG [databricks.labs.ucx.mixins.fixtures] clearing 1 workspace group fixtures
19:36 DEBUG [databricks.labs.ucx.mixins.fixtures] removing workspace group fixture: Group(display_name='ucx_6sUD', entitlements=[ComplexValue(display=None, primary=None, ref=None, type=None, value='allow-cluster-create')], external_id=None, groups=[], id='471845917123931', members=[ComplexValue(display='sdk-vamv@example.com', primary=None, ref='Users/4293340815517162', type=None, value='4293340815517162')], meta=ResourceMeta(resource_type='WorkspaceGroup'), roles=[], schemas=[<GroupSchema.URN_IETF_PARAMS_SCIM_SCHEMAS_CORE_2_0_GROUP: 'urn:ietf:params:scim:schemas:core:2.0:Group'>])
19:36 INFO [databricks.labs.ucx.install] Deleting UCX v0.17.1+4220240320193650 from https://DATABRICKS_HOST
19:36 INFO [databricks.labs.ucx.install] Deleting inventory database ucx_sad1i
19:36 INFO [databricks.labs.ucx.install] Deleting jobs
19:36 INFO [databricks.labs.ucx.install] Deleting assessment job_id=982808153637866.
19:36 INFO [databricks.labs.ucx.install] Deleting remove-workspace-local-backup-groups job_id=1080802747777005.
19:36 INFO [databricks.labs.ucx.install] Deleting 099-destroy-schema job_id=1110645404516419.
19:36 INFO [databricks.labs.ucx.install] Deleting validate-groups-permissions job_id=847717906522105.
19:36 INFO [databricks.labs.ucx.install] Deleting migrate-tables job_id=663406672574304.
19:36 INFO [databricks.labs.ucx.install] Deleting migrate-groups job_id=908294259772843.
19:36 INFO [databricks.labs.ucx.install] Deleting cluster policy
19:36 INFO [databricks.labs.ucx.install] Deleting secret scope
19:36 INFO [databricks.labs.ucx.install] UnInstalling UCX complete
19:36 DEBUG [databricks.labs.ucx.mixins.fixtures] clearing 1 schema fixtures
19:36 DEBUG [databricks.labs.ucx.mixins.fixtures] removing schema fixture: SchemaInfo(browse_only=None, catalog_name='hive_metastore', catalog_type=None, comment=None, created_at=None, created_by=None, effective_predictive_optimization_flag=None, enable_predictive_optimization=None, full_name='hive_metastore.ucx_sad1i', metastore_id=None, name='ucx_sad1i', owner=None, properties=None, storage_location=None, storage_root=None, updated_at=None, updated_by=None)
[gw3] linux -- Python 3.10.13 /home/runner/work/ucx/ucx/.venv/bin/python

Flaky tests:

  • 🤪 test_table_migration_job_cluster_override (7m27.563s)
  • 🤪 test_table_migration_job (6m50.066s)

Running from acceptance #1689

@nfx nfx linked an issue Mar 1, 2024 that may be closed by this pull request
1 task
@nkvuong
Copy link
Contributor

nkvuong commented Mar 1, 2024

@prajin-29 need to understand how this command can be used - in its current state, users have to specify individual cluster_id to be remapped - this won't be of much use to our users

@prajin-29
Copy link
Contributor Author

@prajin-29 need to understand how this command can be used - in its current state, users have to specify individual cluster_id to be remapped - this won't be of much use to our users

So in that case should the command iterates through all the clusters and convert that to UC enabled? Instead of specifying the cluster id.

src/databricks/labs/ucx/install.py Outdated Show resolved Hide resolved
src/databricks/labs/ucx/install.py Outdated Show resolved Hide resolved
src/databricks/labs/ucx/install.py Outdated Show resolved Hide resolved
@qziyuan
Copy link
Contributor

qziyuan commented Mar 1, 2024

@prajin-29 need to understand how this command can be used - in its current state, users have to specify individual cluster_id to be remapped - this won't be of much use to our users

So in that case should the command iterates through all the clusters and convert that to UC enabled? Instead of specifying the cluster id.

I'm wondering if it is a good choice to directly edit existing clusters, which may break codes and query running there. For example AT&T had a lot of jobs failed just because they switch on UC and we automatically change all new job clusters to be single user. We should be careful here. In reality, many customer create new UC clusters and gradually migrate their codes to these new clusters.
I think this cli will be used in following scenarios:

  1. The users has migrated and tested that their codes and queries, and confirm they want to convert all their clusters into UC. Then we do in place cluster edit in bulk.
  2. The users just want a UC cluster that is a counterpart of their existing cluster, and use it to test their codes. Then we should create one or list of new UC clusters.

@nfx nfx marked this pull request as ready for review March 5, 2024 08:55
Copy link
Contributor

@dmoore247 dmoore247 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Clarify the code
  2. DRY
  3. Log problems as ERROR and continue processing instead of sometimes blowing up the list processing.

src/databricks/labs/ucx/cli.py Outdated Show resolved Hide resolved
src/databricks/labs/ucx/workspace_access/clusters.py Outdated Show resolved Hide resolved
continue
access_mode = self._get_access_mode(cluster_details.data_security_mode.name)
self._installation.save(cluster_details, filename=f'backup/clusters/{cluster_details.cluster_id}.json')
self._ws.clusters.edit(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to make this resilient to cluster API changes? The Cluster UI and options are constantly changing, how can this code just focus on the specifics of credentials, spark configs and data security mode and pass through all the other configuration stuff. Will this code break frequently? [I don't know]

src/databricks/labs/ucx/cli.py Outdated Show resolved Hide resolved
src/databricks/labs/ucx/workspace_access/clusters.py Outdated Show resolved Hide resolved
src/databricks/labs/ucx/workspace_access/clusters.py Outdated Show resolved Hide resolved
src/databricks/labs/ucx/workspace_access/clusters.py Outdated Show resolved Hide resolved
src/databricks/labs/ucx/workspace_access/clusters.py Outdated Show resolved Hide resolved
tests/unit/workspace_access/test_clusters.py Show resolved Hide resolved
src/databricks/labs/ucx/cli.py Outdated Show resolved Hide resolved
src/databricks/labs/ucx/cli.py Outdated Show resolved Hide resolved
src/databricks/labs/ucx/cli.py Outdated Show resolved Hide resolved
src/databricks/labs/ucx/cli.py Outdated Show resolved Hide resolved
src/databricks/labs/ucx/workspace_access/clusters.py Outdated Show resolved Hide resolved
src/databricks/labs/ucx/workspace_access/clusters.py Outdated Show resolved Hide resolved
for cluster in cluster_list:
try:
cluster_details = self._installation.load(ClusterDetails, filename=f"/backup/clusters/{cluster}.json")
if cluster_details.spark_version is None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spark version may be absent if cluster is using a policy, so this line is not necessary

src/databricks/labs/ucx/workspace_access/clusters.py Outdated Show resolved Hide resolved
Copy link
Contributor

@nfx nfx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking good and well-covered with tests - would be easy to enhance after. Make sure integration tests pass

@nfx nfx added the ready to merge this pull request is ready to merge label Mar 20, 2024
@nfx nfx changed the title Adding command to Remap the cluster to UC databricks labs ucx cluster-remap Added databricks labs ucx cluster-remap command to remap legacy cluster configurations to UC-compatible Mar 20, 2024
@nfx
Copy link
Contributor

nfx commented Mar 20, 2024

failed test run is unrelated to code added. merging.

@nfx nfx merged commit 102a110 into main Mar 20, 2024
6 of 7 checks passed
@nfx nfx deleted the feature/cluster_remap_command branch March 20, 2024 19:56
nfx added a commit that referenced this pull request Mar 21, 2024
* Added Legacy Table ACL grants migration ([#1054](#1054)). This commit introduces a legacy table ACL grants migration to the `migrate-tables` workflow, resolving issue [#340](#340) and paving the way for follow-up PRs [#887](#887) and [#907](#907). A new `GrantsCrawler` class is added for crawling grants, along with a `GroupManager` class to manage groups during migration. The `TablesMigrate` class is updated to accept an instance of `GrantsCrawler` and `GroupManager` in its constructor. The migration process has been thoroughly tested with unit tests, integration tests, and manual testing on a staging environment. The changes include the addition of a new Enum class `AclMigrationWhat` and updates to the `Table` dataclass, and affect the way tables are selected for migration based on rules. The logging and error handling have been improved in the `skip_schema` function.
* Added `databricks labs ucx cluster-remap` command to remap legacy cluster configurations to UC-compatible ([#994](#994)). In this open-source library update, we have developed and added the `databricks labs ucx cluster-remap` command, which facilitates the remapping of legacy cluster configurations to UC-compatible ones. This new CLI command comes with user documentation to guide the cluster remapping process. Additionally, we have expanded the functionality of creating and managing UC external catalogs and schemas with the inclusion of `create-catalogs-schemas` and `revert-cluster-remap` commands. This change does not modify existing commands or workflows and does not introduce new tables. The `databricks labs ucx cluster-remap` command allows users to re-map and revert the re-mapping of clusters from Unity Catalog (UC) using the CLI, ensuring compatibility and streamlining the migration process. The new command and associated functions have been manually tested for functionality.
* Added `migrate-tables` workflow ([#1051](#1051)). The `migrate-tables` workflow has been added, which allows for more fine-grained control over the resources allocated to the workspace. This workflow includes two new instance variables `min_workers` and `max_workers` in the `WorkspaceConfig` class, with default values of 1 and 10 respectively. A new `trigger` function has also been introduced, which initializes a configuration, SQL backend, and WorkspaceClient based on the provided configuration file. The `run_task` function has been added, which looks up the specified task, logs relevant information, and runs the task's function with the provided arguments. The `Task` class's `fn` attribute now includes an `Installation` object as a parameter. Additionally, a new `migrate-tables` workflow has been added for migrating tables from the Hive Metastore to the Unity Catalog, along with new classes and methods for table mapping, migration status refreshing, and migrating tables. The `migrate_dbfs_root_delta_tables` and `migrate_external_tables_sync` methods perform migrations for Delta tables located in the DBFS root and synchronize external tables, respectively. These functions use the workspace client to access the catalogs and ensure proper migration. Integration tests have also been added for these new methods to ensure their correct operation.
* Added handling for `SYNC` command failures ([#1073](#1073)). This pull request introduces changes to improve handling of `SYNC` command failures during external table migrations in the Hive metastore. Previously, the `SYNC` command's result was not checked, and failures were not logged. Now, the `_migrate_external_table` method in `table_migrate.py` fetches the result of the `SYNC` command execution, logs a warning message for failures, and returns `False` if the command fails. A new integration test has been added to simulate a failed `SYNC` command due to a non-existent catalog and schema, ensuring the migration tool handles such failures. A new test case has also been added to verify the handling of `SYNC` command failures during external table migrations, using a mock backend to simulate failures and checking for appropriate log messages. These changes enhance the reliability and robustness of the migration process, providing clearer error diagnosis and handling for potential `SYNC` command failures.
* Added initial version of `databricks labs ucx migrate-local-code` command ([#1067](#1067)). A new `databricks labs ucx migrate-local-code` command has been added to facilitate migration of local code to a Databricks environment, specifically targeting Python and SQL files. This initial version is experimental and aims to help users and administrators manage code migration, maintain consistency across workspaces, and enhance compatibility with the Unity Catalog, a component of Databricks' data and AI offerings. The command introduces a new `Files` class for applying migrations to code files, considering their language. It also updates the `.gitignore` file and the pyproject.toml file to ensure appropriate version control management. Additionally, new classes and methods have been added to support code analysis, transformation, and linting for various programming languages. These improvements will aid in streamlining the migration process and ensuring compatibility with Databricks' environment.
* Added instance pool to cluster policy ([#1078](#1078)). A new field, `instance_pool_id`, has been added to the cluster policy configuration in `policy.py`, allowing users to specify the ID of an instance pool to be applied to all workflow clusters in the policy. This ID can be manually set or automatically retrieved by the system. A new private method, `_get_instance_pool_id()`, has been added to handle the retrieval of the instance pool ID. Additionally, a new test for table migration jobs has been added to `test_installation.py` to ensure the migration job is correctly configured with the specified parallelism, minimum and maximum number of workers, and instance pool ID. A new test case for creating a cluster policy with an instance pool has also been added to `tests/unit/installer/test_policy.py` to ensure the instance pool is added to the cluster policy during creation. These changes provide users with more control over instance pools and cluster policies, and improve the overall functionality of the library.
* Fixed `ucx move` logic for `MANAGED` & `EXTERNAL` tables ([#1062](#1062)). The `ucx move` command has been updated to allow for the movement of UC tables/views after the table upgrade process, providing flexibility in managing catalog structure. The command now supports moving multiple tables simultaneously, dropping managed tables/views upon confirmation, and deep-cloning managed tables while dropping and recreating external tables. A refactoring of the `TableMove` class has improved code organization and readability, and the associated unit tests have been updated to reflect these changes. This feature is targeted towards developers and administrators seeking to adjust their catalog structure after table upgrades, with the added ability to manage exceptional conditions gracefully.
* Fixed integration testing with random product names ([#1074](#1074)). In the recent update, the `trigger` function in the `tasks.py` module of the `ucx` framework has undergone modification to incorporate a new argument, `install_folder`, within the `Installation` object. This object is now generated locally within the `trigger` function and subsequently passed to the `run_task` function. The `install_folder` is determined by obtaining the parent directory of the `config_path` variable, transforming it into a POSIX-style path, and eliminating the leading "/Workspace" prefix. This enhancement guarantees that the `run_task` function acquires the correct installation folder for the `ucx` framework, thereby improving the overall functionality and precision of the framework. Furthermore, the `Installation.current` method has been supplanted with the newly formed `Installation` object, which now encompasses the `install_folder` argument.
* Refactor installer to separate workflows methods from the installer class ([#1055](#1055)). In this release, the installer in the `cli.py` file has been refactored to improve modularity and maintainability. The installation and workflow functionalities have been separated by importing a new class called `WorkflowsInstallation` from `databricks.labs.ucx.installer.workflows`. The `WorkspaceInstallation` class is no longer used in various functions, and the new `WorkflowsInstallation` class is used instead. Additionally, a new mixin class called `InstallationMixin` has been introduced, which includes methods for uninstalling UCX, removing jobs, and validating installation steps. The `WorkflowsInstallation` class now inherits from this mixin class. A new file, `workflows.py`, has been added to the `databricks/labs/ucx/installer` directory, which contains methods for managing Databricks jobs. The new `WorkflowsInstallation` class is responsible for deploying workflows, uploading wheels to DBFS or WSFS, and creating debug notebooks. The refactoring also includes the addition of new methods for handling specific workflows, such as `run_workflow`, `validate_step`, and `repair_run`, which are now contained in the `WorkflowsInstallation` class. The `test_install.py` file in the `tests/unit` directory has also been updated to include new imports and test functions to accommodate these changes.
* Skip unsupported locations while migrating to external location in Azure ([#1066](#1066)). In this release, we have updated the functionality of migrating to an external location in Azure. A new private method `_filter_unsupported_location` has been added to the `locations.py` file, which checks if the location URLs are supported and removes the unsupported ones from the list. Only locations starting with "abfss://" are considered supported. Unsupported locations are logged with a warning message. Additionally, a new test `test_skip_unsupported_location` has been introduced to verify that the `location_migration` function correctly skips unsupported locations during migration to external locations in Azure. The test checks if the correct log messages are generated for skipped unsupported locations, and it mocks various scenarios such as crawled HMS external locations, storage credentials, UC external locations, and installation with permission mapping. The mock crawled HMS external locations contain two unsupported locations: `adl://` and `wasbs://`. This ensures that the function handles unsupported locations correctly, avoiding any unnecessary errors or exceptions during migration.
* Triggering Assessment Workflow from Installer based on User Prompt ([#1007](#1007)). A new functionality has been added to the installer that allows users to trigger an assessment workflow based on a prompt during the installation process. The `_trigger_workflow` method has been implemented, which can be initiated with a step string argument. This method retrieves the job ID for the specified step from the `_state.jobs` dictionary, generates the job URL, and triggers the job using the `run_now` method from the `jobs` class of the Workspace object. Users will be asked to confirm triggering the assessment workflow and will have the option to open the job URL in a web browser after triggering it. A new unit test, `test_triggering_assessment_wf`, has been introduced to the `test_install.py` file to verify the functionality of triggering an assessment workflow based on user prompt. This test uses existing classes and functions, such as `MockBackend`, `MockPrompts`, `WorkspaceConfig`, and `WorkspaceInstallation`, to run the `WorkspaceInstallation.run` method with a mocked `WorkspaceConfig` object and a mock installation. The test also includes a user prompt to confirm triggering the assessment job and opening the assessment job URL. The new functionality and test improve the installation process by enabling users to easily trigger the assessment workflow based on their specific needs.
* Updated README.md for Service Principal Installation Limit ([#1076](#1076)). This release includes an update to the README.md file to clarify that installing UCX with a Service Principal is not supported. Previously, the file indicated that Databricks Workspace Administrator privileges were required for the user running the installation, but did not explicitly state that Service Principal installation is not supported. The updated text now includes this information, ensuring that users have a clear understanding of the requirements and limitations of the installation process. The rest of the file remains unchanged and continues to provide instructions for installing UCX, including required software and network access. No new methods or functionality have been added, and no existing functionality has been changed beyond the addition of this clarification. The changes in this release have been manually tested to ensure they are functioning as intended.
@nfx nfx mentioned this pull request Mar 21, 2024
nfx added a commit that referenced this pull request Mar 21, 2024
* Added Legacy Table ACL grants migration
([#1054](#1054)). This
commit introduces a legacy table ACL grants migration to the
`migrate-tables` workflow, resolving issue
[#340](#340) and paving the
way for follow-up PRs
[#887](#887) and
[#907](#907). A new
`GrantsCrawler` class is added for crawling grants, along with a
`GroupManager` class to manage groups during migration. The
`TablesMigrate` class is updated to accept an instance of
`GrantsCrawler` and `GroupManager` in its constructor. The migration
process has been thoroughly tested with unit tests, integration tests,
and manual testing on a staging environment. The changes include the
addition of a new Enum class `AclMigrationWhat` and updates to the
`Table` dataclass, and affect the way tables are selected for migration
based on rules. The logging and error handling have been improved in the
`skip_schema` function.
* Added `databricks labs ucx cluster-remap` command to remap legacy
cluster configurations to UC-compatible
([#994](#994)). In this
open-source library update, we have developed and added the `databricks
labs ucx cluster-remap` command, which facilitates the remapping of
legacy cluster configurations to UC-compatible ones. This new CLI
command comes with user documentation to guide the cluster remapping
process. Additionally, we have expanded the functionality of creating
and managing UC external catalogs and schemas with the inclusion of
`create-catalogs-schemas` and `revert-cluster-remap` commands. This
change does not modify existing commands or workflows and does not
introduce new tables. The `databricks labs ucx cluster-remap` command
allows users to re-map and revert the re-mapping of clusters from Unity
Catalog (UC) using the CLI, ensuring compatibility and streamlining the
migration process. The new command and associated functions have been
manually tested for functionality.
* Added `migrate-tables` workflow
([#1051](#1051)). The
`migrate-tables` workflow has been added, which allows for more
fine-grained control over the resources allocated to the workspace. This
workflow includes two new instance variables `min_workers` and
`max_workers` in the `WorkspaceConfig` class, with default values of 1
and 10 respectively. A new `trigger` function has also been introduced,
which initializes a configuration, SQL backend, and WorkspaceClient
based on the provided configuration file. The `run_task` function has
been added, which looks up the specified task, logs relevant
information, and runs the task's function with the provided arguments.
The `Task` class's `fn` attribute now includes an `Installation` object
as a parameter. Additionally, a new `migrate-tables` workflow has been
added for migrating tables from the Hive Metastore to the Unity Catalog,
along with new classes and methods for table mapping, migration status
refreshing, and migrating tables. The `migrate_dbfs_root_delta_tables`
and `migrate_external_tables_sync` methods perform migrations for Delta
tables located in the DBFS root and synchronize external tables,
respectively. These functions use the workspace client to access the
catalogs and ensure proper migration. Integration tests have also been
added for these new methods to ensure their correct operation.
* Added handling for `SYNC` command failures
([#1073](#1073)). This pull
request introduces changes to improve handling of `SYNC` command
failures during external table migrations in the Hive metastore.
Previously, the `SYNC` command's result was not checked, and failures
were not logged. Now, the `_migrate_external_table` method in
`table_migrate.py` fetches the result of the `SYNC` command execution,
logs a warning message for failures, and returns `False` if the command
fails. A new integration test has been added to simulate a failed `SYNC`
command due to a non-existent catalog and schema, ensuring the migration
tool handles such failures. A new test case has also been added to
verify the handling of `SYNC` command failures during external table
migrations, using a mock backend to simulate failures and checking for
appropriate log messages. These changes enhance the reliability and
robustness of the migration process, providing clearer error diagnosis
and handling for potential `SYNC` command failures.
* Added initial version of `databricks labs ucx migrate-local-code`
command ([#1067](#1067)). A
new `databricks labs ucx migrate-local-code` command has been added to
facilitate migration of local code to a Databricks environment,
specifically targeting Python and SQL files. This initial version is
experimental and aims to help users and administrators manage code
migration, maintain consistency across workspaces, and enhance
compatibility with the Unity Catalog, a component of Databricks' data
and AI offerings. The command introduces a new `Files` class for
applying migrations to code files, considering their language. It also
updates the `.gitignore` file and the pyproject.toml file to ensure
appropriate version control management. Additionally, new classes and
methods have been added to support code analysis, transformation, and
linting for various programming languages. These improvements will aid
in streamlining the migration process and ensuring compatibility with
Databricks' environment.
* Added instance pool to cluster policy
([#1078](#1078)). A new
field, `instance_pool_id`, has been added to the cluster policy
configuration in `policy.py`, allowing users to specify the ID of an
instance pool to be applied to all workflow clusters in the policy. This
ID can be manually set or automatically retrieved by the system. A new
private method, `_get_instance_pool_id()`, has been added to handle the
retrieval of the instance pool ID. Additionally, a new test for table
migration jobs has been added to `test_installation.py` to ensure the
migration job is correctly configured with the specified parallelism,
minimum and maximum number of workers, and instance pool ID. A new test
case for creating a cluster policy with an instance pool has also been
added to `tests/unit/installer/test_policy.py` to ensure the instance
pool is added to the cluster policy during creation. These changes
provide users with more control over instance pools and cluster
policies, and improve the overall functionality of the library.
* Fixed `ucx move` logic for `MANAGED` & `EXTERNAL` tables
([#1062](#1062)). The `ucx
move` command has been updated to allow for the movement of UC
tables/views after the table upgrade process, providing flexibility in
managing catalog structure. The command now supports moving multiple
tables simultaneously, dropping managed tables/views upon confirmation,
and deep-cloning managed tables while dropping and recreating external
tables. A refactoring of the `TableMove` class has improved code
organization and readability, and the associated unit tests have been
updated to reflect these changes. This feature is targeted towards
developers and administrators seeking to adjust their catalog structure
after table upgrades, with the added ability to manage exceptional
conditions gracefully.
* Fixed integration testing with random product names
([#1074](#1074)). In the
recent update, the `trigger` function in the `tasks.py` module of the
`ucx` framework has undergone modification to incorporate a new
argument, `install_folder`, within the `Installation` object. This
object is now generated locally within the `trigger` function and
subsequently passed to the `run_task` function. The `install_folder` is
determined by obtaining the parent directory of the `config_path`
variable, transforming it into a POSIX-style path, and eliminating the
leading "/Workspace" prefix. This enhancement guarantees that the
`run_task` function acquires the correct installation folder for the
`ucx` framework, thereby improving the overall functionality and
precision of the framework. Furthermore, the `Installation.current`
method has been supplanted with the newly formed `Installation` object,
which now encompasses the `install_folder` argument.
* Refactor installer to separate workflows methods from the installer
class ([#1055](#1055)). In
this release, the installer in the `cli.py` file has been refactored to
improve modularity and maintainability. The installation and workflow
functionalities have been separated by importing a new class called
`WorkflowsInstallation` from `databricks.labs.ucx.installer.workflows`.
The `WorkspaceInstallation` class is no longer used in various
functions, and the new `WorkflowsInstallation` class is used instead.
Additionally, a new mixin class called `InstallationMixin` has been
introduced, which includes methods for uninstalling UCX, removing jobs,
and validating installation steps. The `WorkflowsInstallation` class now
inherits from this mixin class. A new file, `workflows.py`, has been
added to the `databricks/labs/ucx/installer` directory, which contains
methods for managing Databricks jobs. The new `WorkflowsInstallation`
class is responsible for deploying workflows, uploading wheels to DBFS
or WSFS, and creating debug notebooks. The refactoring also includes the
addition of new methods for handling specific workflows, such as
`run_workflow`, `validate_step`, and `repair_run`, which are now
contained in the `WorkflowsInstallation` class. The `test_install.py`
file in the `tests/unit` directory has also been updated to include new
imports and test functions to accommodate these changes.
* Skip unsupported locations while migrating to external location in
Azure ([#1066](#1066)). In
this release, we have updated the functionality of migrating to an
external location in Azure. A new private method
`_filter_unsupported_location` has been added to the `locations.py`
file, which checks if the location URLs are supported and removes the
unsupported ones from the list. Only locations starting with "abfss://"
are considered supported. Unsupported locations are logged with a
warning message. Additionally, a new test
`test_skip_unsupported_location` has been introduced to verify that the
`location_migration` function correctly skips unsupported locations
during migration to external locations in Azure. The test checks if the
correct log messages are generated for skipped unsupported locations,
and it mocks various scenarios such as crawled HMS external locations,
storage credentials, UC external locations, and installation with
permission mapping. The mock crawled HMS external locations contain two
unsupported locations: `adl://` and `wasbs://`. This ensures that the
function handles unsupported locations correctly, avoiding any
unnecessary errors or exceptions during migration.
* Triggering Assessment Workflow from Installer based on User Prompt
([#1007](#1007)). A new
functionality has been added to the installer that allows users to
trigger an assessment workflow based on a prompt during the installation
process. The `_trigger_workflow` method has been implemented, which can
be initiated with a step string argument. This method retrieves the job
ID for the specified step from the `_state.jobs` dictionary, generates
the job URL, and triggers the job using the `run_now` method from the
`jobs` class of the Workspace object. Users will be asked to confirm
triggering the assessment workflow and will have the option to open the
job URL in a web browser after triggering it. A new unit test,
`test_triggering_assessment_wf`, has been introduced to the
`test_install.py` file to verify the functionality of triggering an
assessment workflow based on user prompt. This test uses existing
classes and functions, such as `MockBackend`, `MockPrompts`,
`WorkspaceConfig`, and `WorkspaceInstallation`, to run the
`WorkspaceInstallation.run` method with a mocked `WorkspaceConfig`
object and a mock installation. The test also includes a user prompt to
confirm triggering the assessment job and opening the assessment job
URL. The new functionality and test improve the installation process by
enabling users to easily trigger the assessment workflow based on their
specific needs.
* Updated README.md for Service Principal Installation Limit
([#1076](#1076)). This
release includes an update to the README.md file to clarify that
installing UCX with a Service Principal is not supported. Previously,
the file indicated that Databricks Workspace Administrator privileges
were required for the user running the installation, but did not
explicitly state that Service Principal installation is not supported.
The updated text now includes this information, ensuring that users have
a clear understanding of the requirements and limitations of the
installation process. The rest of the file remains unchanged and
continues to provide instructions for installing UCX, including required
software and network access. No new methods or functionality have been
added, and no existing functionality has been changed beyond the
addition of this clarification. The changes in this release have been
manually tested to ensure they are functioning as intended.
dmoore247 pushed a commit that referenced this pull request Mar 23, 2024
…ster configurations to UC-compatible (#994)

## Changes
Adding Cli command to Remap the cluster to UC `databricks labs ucx
cluster-remap`

### Linked issues
#928 

Resolves #..

### Functionality 

- [ ] added relevant user documentation
- [x] added new CLI command
- [ ] modified existing command: `databricks labs ucx ...`
- [ ] added a new workflow
- [ ] modified existing workflow: `...`
- [ ] added a new table
- [ ] modified existing table: `...`

### Tests
<!-- How is this tested? Please see the checklist below and also
describe any other relevant tests -->

- [ ] manually tested
- [ ] added unit tests
- [ ] added integration tests
- [ ] verified on staging environment (screenshot attached)

---------

Co-authored-by: Serge Smertin <259697+nfx@users.noreply.github.com>
dmoore247 pushed a commit that referenced this pull request Mar 23, 2024
* Added Legacy Table ACL grants migration
([#1054](#1054)). This
commit introduces a legacy table ACL grants migration to the
`migrate-tables` workflow, resolving issue
[#340](#340) and paving the
way for follow-up PRs
[#887](#887) and
[#907](#907). A new
`GrantsCrawler` class is added for crawling grants, along with a
`GroupManager` class to manage groups during migration. The
`TablesMigrate` class is updated to accept an instance of
`GrantsCrawler` and `GroupManager` in its constructor. The migration
process has been thoroughly tested with unit tests, integration tests,
and manual testing on a staging environment. The changes include the
addition of a new Enum class `AclMigrationWhat` and updates to the
`Table` dataclass, and affect the way tables are selected for migration
based on rules. The logging and error handling have been improved in the
`skip_schema` function.
* Added `databricks labs ucx cluster-remap` command to remap legacy
cluster configurations to UC-compatible
([#994](#994)). In this
open-source library update, we have developed and added the `databricks
labs ucx cluster-remap` command, which facilitates the remapping of
legacy cluster configurations to UC-compatible ones. This new CLI
command comes with user documentation to guide the cluster remapping
process. Additionally, we have expanded the functionality of creating
and managing UC external catalogs and schemas with the inclusion of
`create-catalogs-schemas` and `revert-cluster-remap` commands. This
change does not modify existing commands or workflows and does not
introduce new tables. The `databricks labs ucx cluster-remap` command
allows users to re-map and revert the re-mapping of clusters from Unity
Catalog (UC) using the CLI, ensuring compatibility and streamlining the
migration process. The new command and associated functions have been
manually tested for functionality.
* Added `migrate-tables` workflow
([#1051](#1051)). The
`migrate-tables` workflow has been added, which allows for more
fine-grained control over the resources allocated to the workspace. This
workflow includes two new instance variables `min_workers` and
`max_workers` in the `WorkspaceConfig` class, with default values of 1
and 10 respectively. A new `trigger` function has also been introduced,
which initializes a configuration, SQL backend, and WorkspaceClient
based on the provided configuration file. The `run_task` function has
been added, which looks up the specified task, logs relevant
information, and runs the task's function with the provided arguments.
The `Task` class's `fn` attribute now includes an `Installation` object
as a parameter. Additionally, a new `migrate-tables` workflow has been
added for migrating tables from the Hive Metastore to the Unity Catalog,
along with new classes and methods for table mapping, migration status
refreshing, and migrating tables. The `migrate_dbfs_root_delta_tables`
and `migrate_external_tables_sync` methods perform migrations for Delta
tables located in the DBFS root and synchronize external tables,
respectively. These functions use the workspace client to access the
catalogs and ensure proper migration. Integration tests have also been
added for these new methods to ensure their correct operation.
* Added handling for `SYNC` command failures
([#1073](#1073)). This pull
request introduces changes to improve handling of `SYNC` command
failures during external table migrations in the Hive metastore.
Previously, the `SYNC` command's result was not checked, and failures
were not logged. Now, the `_migrate_external_table` method in
`table_migrate.py` fetches the result of the `SYNC` command execution,
logs a warning message for failures, and returns `False` if the command
fails. A new integration test has been added to simulate a failed `SYNC`
command due to a non-existent catalog and schema, ensuring the migration
tool handles such failures. A new test case has also been added to
verify the handling of `SYNC` command failures during external table
migrations, using a mock backend to simulate failures and checking for
appropriate log messages. These changes enhance the reliability and
robustness of the migration process, providing clearer error diagnosis
and handling for potential `SYNC` command failures.
* Added initial version of `databricks labs ucx migrate-local-code`
command ([#1067](#1067)). A
new `databricks labs ucx migrate-local-code` command has been added to
facilitate migration of local code to a Databricks environment,
specifically targeting Python and SQL files. This initial version is
experimental and aims to help users and administrators manage code
migration, maintain consistency across workspaces, and enhance
compatibility with the Unity Catalog, a component of Databricks' data
and AI offerings. The command introduces a new `Files` class for
applying migrations to code files, considering their language. It also
updates the `.gitignore` file and the pyproject.toml file to ensure
appropriate version control management. Additionally, new classes and
methods have been added to support code analysis, transformation, and
linting for various programming languages. These improvements will aid
in streamlining the migration process and ensuring compatibility with
Databricks' environment.
* Added instance pool to cluster policy
([#1078](#1078)). A new
field, `instance_pool_id`, has been added to the cluster policy
configuration in `policy.py`, allowing users to specify the ID of an
instance pool to be applied to all workflow clusters in the policy. This
ID can be manually set or automatically retrieved by the system. A new
private method, `_get_instance_pool_id()`, has been added to handle the
retrieval of the instance pool ID. Additionally, a new test for table
migration jobs has been added to `test_installation.py` to ensure the
migration job is correctly configured with the specified parallelism,
minimum and maximum number of workers, and instance pool ID. A new test
case for creating a cluster policy with an instance pool has also been
added to `tests/unit/installer/test_policy.py` to ensure the instance
pool is added to the cluster policy during creation. These changes
provide users with more control over instance pools and cluster
policies, and improve the overall functionality of the library.
* Fixed `ucx move` logic for `MANAGED` & `EXTERNAL` tables
([#1062](#1062)). The `ucx
move` command has been updated to allow for the movement of UC
tables/views after the table upgrade process, providing flexibility in
managing catalog structure. The command now supports moving multiple
tables simultaneously, dropping managed tables/views upon confirmation,
and deep-cloning managed tables while dropping and recreating external
tables. A refactoring of the `TableMove` class has improved code
organization and readability, and the associated unit tests have been
updated to reflect these changes. This feature is targeted towards
developers and administrators seeking to adjust their catalog structure
after table upgrades, with the added ability to manage exceptional
conditions gracefully.
* Fixed integration testing with random product names
([#1074](#1074)). In the
recent update, the `trigger` function in the `tasks.py` module of the
`ucx` framework has undergone modification to incorporate a new
argument, `install_folder`, within the `Installation` object. This
object is now generated locally within the `trigger` function and
subsequently passed to the `run_task` function. The `install_folder` is
determined by obtaining the parent directory of the `config_path`
variable, transforming it into a POSIX-style path, and eliminating the
leading "/Workspace" prefix. This enhancement guarantees that the
`run_task` function acquires the correct installation folder for the
`ucx` framework, thereby improving the overall functionality and
precision of the framework. Furthermore, the `Installation.current`
method has been supplanted with the newly formed `Installation` object,
which now encompasses the `install_folder` argument.
* Refactor installer to separate workflows methods from the installer
class ([#1055](#1055)). In
this release, the installer in the `cli.py` file has been refactored to
improve modularity and maintainability. The installation and workflow
functionalities have been separated by importing a new class called
`WorkflowsInstallation` from `databricks.labs.ucx.installer.workflows`.
The `WorkspaceInstallation` class is no longer used in various
functions, and the new `WorkflowsInstallation` class is used instead.
Additionally, a new mixin class called `InstallationMixin` has been
introduced, which includes methods for uninstalling UCX, removing jobs,
and validating installation steps. The `WorkflowsInstallation` class now
inherits from this mixin class. A new file, `workflows.py`, has been
added to the `databricks/labs/ucx/installer` directory, which contains
methods for managing Databricks jobs. The new `WorkflowsInstallation`
class is responsible for deploying workflows, uploading wheels to DBFS
or WSFS, and creating debug notebooks. The refactoring also includes the
addition of new methods for handling specific workflows, such as
`run_workflow`, `validate_step`, and `repair_run`, which are now
contained in the `WorkflowsInstallation` class. The `test_install.py`
file in the `tests/unit` directory has also been updated to include new
imports and test functions to accommodate these changes.
* Skip unsupported locations while migrating to external location in
Azure ([#1066](#1066)). In
this release, we have updated the functionality of migrating to an
external location in Azure. A new private method
`_filter_unsupported_location` has been added to the `locations.py`
file, which checks if the location URLs are supported and removes the
unsupported ones from the list. Only locations starting with "abfss://"
are considered supported. Unsupported locations are logged with a
warning message. Additionally, a new test
`test_skip_unsupported_location` has been introduced to verify that the
`location_migration` function correctly skips unsupported locations
during migration to external locations in Azure. The test checks if the
correct log messages are generated for skipped unsupported locations,
and it mocks various scenarios such as crawled HMS external locations,
storage credentials, UC external locations, and installation with
permission mapping. The mock crawled HMS external locations contain two
unsupported locations: `adl://` and `wasbs://`. This ensures that the
function handles unsupported locations correctly, avoiding any
unnecessary errors or exceptions during migration.
* Triggering Assessment Workflow from Installer based on User Prompt
([#1007](#1007)). A new
functionality has been added to the installer that allows users to
trigger an assessment workflow based on a prompt during the installation
process. The `_trigger_workflow` method has been implemented, which can
be initiated with a step string argument. This method retrieves the job
ID for the specified step from the `_state.jobs` dictionary, generates
the job URL, and triggers the job using the `run_now` method from the
`jobs` class of the Workspace object. Users will be asked to confirm
triggering the assessment workflow and will have the option to open the
job URL in a web browser after triggering it. A new unit test,
`test_triggering_assessment_wf`, has been introduced to the
`test_install.py` file to verify the functionality of triggering an
assessment workflow based on user prompt. This test uses existing
classes and functions, such as `MockBackend`, `MockPrompts`,
`WorkspaceConfig`, and `WorkspaceInstallation`, to run the
`WorkspaceInstallation.run` method with a mocked `WorkspaceConfig`
object and a mock installation. The test also includes a user prompt to
confirm triggering the assessment job and opening the assessment job
URL. The new functionality and test improve the installation process by
enabling users to easily trigger the assessment workflow based on their
specific needs.
* Updated README.md for Service Principal Installation Limit
([#1076](#1076)). This
release includes an update to the README.md file to clarify that
installing UCX with a Service Principal is not supported. Previously,
the file indicated that Databricks Workspace Administrator privileges
were required for the user running the installation, but did not
explicitly state that Service Principal installation is not supported.
The updated text now includes this information, ensuring that users have
a clear understanding of the requirements and limitations of the
installation process. The rest of the file remains unchanged and
continues to provide instructions for installing UCX, including required
software and network access. No new methods or functionality have been
added, and no existing functionality has been changed beyond the
addition of this clarification. The changes in this release have been
manually tested to ensure they are functioning as intended.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready to merge this pull request is ready to merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEATURE]: Command to re-map cluster type
5 participants