Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release v0.14.0 #1000

Merged
merged 1 commit into from
Mar 4, 2024
Merged

Release v0.14.0 #1000

merged 1 commit into from
Mar 4, 2024

Conversation

nfx
Copy link
Contributor

@nfx nfx commented Mar 4, 2024

  • Added upgraded_from_workspace_id property to migrated tables to indicated the source workspace (#987). In this release, updates have been made to the _migrate_external_table, _migrate_dbfs_root_table, and _migrate_view methods in the table_migrate.py file to include a new parameter upgraded_from_ws in the SQL commands used to alter tables, views, or managed tables. This parameter is used to store the source workspace ID in the migrated tables, indicating the migration origin. A new utility method sql_alter_from has been added to the Table class in tables.py to generate the SQL command with the new parameter. Additionally, a new class-level attribute UPGRADED_FROM_WS_PARAM has been added to the Table class in tables.py to indicate the source workspace. A new property upgraded_from_workspace_id has been added to migrated tables to store the source workspace ID. These changes resolve issue #899 and are tested through manual testing, unit tests, and integration tests. No new CLI commands, workflows, or tables have been added or modified, and there are no changes to user documentation.
  • Added a command to create account level groups if they do not exist (#763). This commit introduces a new feature that enables the creation of account-level groups if they do not already exist in the account. A new command, create-account-groups, has been added to the databricks labs ucx tool, which crawls all workspaces in the account and creates account-level groups if a corresponding workspace-local group is not found. The feature supports various scenarios, including creating account-level groups that exist in some workspaces but not in others, and creating multiple account-level groups with the same name but different members. Several new methods have been added to the account.py file to support the new feature, and the test_account.py file has been updated with new tests to ensure the correct behavior of the create_account_level_groups method. Additionally, the cli.py file has been updated to include the new create-account-groups command. With these changes, users can easily manage account-level groups and ensure that they are consistent across all workspaces in the account, improving the overall user experience.
  • Added assessment for the incompatible RunSubmit API usages (#849). In this release, the assessment functionality for incompatible RunSubmit API usages has been significantly enhanced through various changes. The 'clusters.py' file has seen improvements in clarity and consistency with the renaming of private methods check_spark_conf to _check_spark_conf and check_cluster_failures to _check_cluster_failures. The _assess_clusters method has been updated to call the renamed _check_cluster_failures method for thorough checks of cluster configurations, resulting in better assessment functionality. A new SubmitRunsCrawler class has been added to the databricks.labs.ucx.assessment.jobs module, implementing CrawlerBase, JobsMixin, and CheckClusterMixin classes. This class crawls and assesses job runs based on their submitted runs, ensuring compatibility and identifying failure issues. Additionally, a new configuration attribute, num_days_submit_runs_history, has been introduced in the WorkspaceConfig class of the config.py module, controlling the number of days for which submission history of RunSubmit API calls is retained. Lastly, various new JSON files have been added for unit testing, assessing the RunSubmit API usages related to different scenarios like dbt task runs, Git source-based job runs, JAR file runs, and more. These tests will aid in identifying and addressing potential compatibility issues with the RunSubmit API.
  • Added group members difference to the output of validate-groups-membership cli command (#995). The validate-groups-membership command has been updated to include a comparison of group memberships at both the account and workspace levels. This enhancement is implemented through the validate_group_membership function, which has been updated to calculate the difference in members between the two levels and display it in a new group_members_difference column. This allows for a more detailed analysis of group memberships and easily identifies any discrepancies between the account and workspace levels. The corresponding unit test file, "test_groups.py," has been updated to include a new test case that verifies the calculation of the group_members_difference value. The functionality of the other commands remains unchanged. The new group_members_difference value is calculated as the difference in the number of members in the workspace group and the account group, with a positive value indicating more members in the workspace group and a negative value indicating more members in the account group. The table template in the labs.yml file has also been updated to include the new column for the group membership difference.
  • Added handling for empty directory_id if managed identity encountered during the crawling of StoragePermissionMapping (#986). This PR adds a type field to the StoragePermissionMapping and Principal dataclasses to differentiate between service principals and managed identities, allowing None for the directory_id field if the principal is not a service principal. During the migration to UC storage credentials, managed identities are currently ignored. These changes improve handling of managed identities during the crawling of StoragePermissionMapping, prevent errors when creating storage credentials with managed identities, and address issue #339. The changes are tested through unit tests, manual testing, and integration tests, and only affect the StoragePermissionMapping class and related methods, without introducing new commands, workflows, or tables.
  • Added migration for Azure Service Principals with secrets stored in Databricks Secret to UC Storage Credentials (#874). In this release, we have made significant updates to migrate Azure Service Principals with their secrets stored in Databricks Secret to UC Storage Credentials, enhancing security and management of storage access. The changes include: Addition of a new migrate_credentials command in the labs.yml file to migrate credentials for storage access to UC storage credential. Modification of secrets.py to handle the case where a secret has been removed from the backend and to log warning messages for secrets with invalid Base64 bytes. Introduction of the StorageCredentialManager and ServicePrincipalMigration classes in credentials.py to manage Azure Service Principals and their associated client secrets, and to migrate them to UC Storage Credentials. Addition of a new directory_id attribute in the Principal class and its associated dataclass in resources.py to store the directory ID for creating UC storage credentials using a service principal. Creation of a new pytest fixture, make_storage_credential_spn, in fixtures.py to simplify writing tests requiring Databricks Storage Credentials with Azure Service Principal auth. Addition of a new test file for the Azure integration of the project, including new classes, methods, and test cases for testing the migration of Azure Service Principals to UC Storage Credentials. These improvements will ensure better security and management of storage access using Azure Service Principals, while providing more efficient and robust testing capabilities.
  • Added permission migration support for feature tables and the root permissions for models and feature tables (#997). This commit introduces support for migration of permissions related to feature tables and sets root permissions for models and feature tables. New functions such as feature_store_listing, feature_tables_root_page, models_root_page, and tokens_and_passwords have been added to facilitate population of a workspace access page with necessary permissions information. The factory function in manager.py has been updated to include new listings for models' root page, feature tables' root page, and the feature store for enhanced management and access control of models and feature tables. New classes and methods have been implemented to handle permissions for these resources, utilizing GenericPermissionsSupport, AccessControlRequest, and MigratedGroup classes. Additionally, new test methods have been included to verify feature tables listing functionality and root page listing functionality for feature tables and registered models. The test manager method has been updated to include feature-tables in the list of items to be checked for permissions, ensuring comprehensive testing of permission functionality related to these new feature tables.
  • Added support for serving endpoints (#990). In this release, we have made significant enhancements to support serving endpoints in our open-source library. The fixtures.py file in the databricks.labs.ucx.mixins module has been updated with new classes and functions to create and manage serving endpoints, accompanied by integration tests to verify their functionality. We have added a new listing for serving endpoints in the assessment's permissions crawling, using the ws.serving_endpoints.list function and the serving-endpoints category. A new integration test, "test_endpoints," has been added to verify that assessments now crawl permissions for serving endpoints. This test demonstrates the ability to migrate permissions from one group to another. The test suite has been updated to ensure the proper functioning of the new feature and improve the assessment of permissions for serving endpoints, ensuring compatibility with the updated test_manager.py file.
  • Expanded end-user documentation with detailed descriptions for workflows and commands (#999). The Databricks Labs UCX project has been updated with several new features to assist in upgrading to Unity Catalog, including an assessment workflow that generates a detailed compatibility report for workspace entities, a group migration workflow for upgrading all Databricks workspace assets, and utility commands for managing cross-workspace installations. The Assessment Report now includes a more detailed summary of the assessment findings, table counts, database summaries, and external locations. Additional improvements include expanded workspace group migration to handle potential conflicts with locally scoped group names, enhanced documentation for external Hive Metastore integration, a new debugging notebook, and detailed descriptions of table upgrade considerations, data access permissions, external storage, and table crawler.
  • Fixed config.yml upgrade from very old versions (#984). In this release, we've introduced enhancements to the configuration upgrading process for config.yml in our open-source library. We've replaced the previous v1_migrate class method with a new implementation that specifically handles migration from version 1. The new method retrieves the groups field, extracts the selected value, and assigns it to the include_group_names key in the configuration. The backup_group_prefix value from the groups field is assigned to the renamed_group_prefix key, and the groups field is removed, with the version number updated to 2. These changes simplify the code and improve readability, enabling users to upgrade smoothly from version 1 of the configuration. Furthermore, we've added new unit tests to the test_config.py file to ensure backward compatibility. Two new tests, test_v1_migrate_zeroconf and test_v1_migrate_some_conf, have been added, utilizing the MockInstallation class and loading the configuration using WorkspaceConfig. These tests enhance the robustness and reliability of the migration process for config.yml.
  • Renamed columns in assessment SQL queries to use actual names, not aliases (#983). In this update, we have resolved an issue where aliases used for column references in SQL queries caused errors in certain setups by renaming them to use actual names. Specifically, for assessment SQL queries, we have modified the definition of the is_delta column to use the actual table_format name instead of the alias format. This change improves compatibility and enhances the reliability of query execution. As a software engineer, you will appreciate that this modification ensures consistent interpretation of column references across various setups, thereby avoiding potential errors caused by aliases. This change does not introduce any new methods, but instead modifies existing functionality to use actual column names, ensuring a more reliable and consistent SQL query for the 05_0_all_tables assessment.
  • Updated groups permissions validation to use Table ACL cluster (#979). In this update, the validate_groups_permissions task has been modified to utilize the Table ACL cluster, as indicated by the inclusion of job_cluster="tacl". This task is responsible for ensuring that all crawled permissions are accurately applied to the destination groups by calling the permission_manager.apply_group_permissions method during the migration state. This modification enhances the validation of group permissions by performing it on the Table ACL cluster, potentially improving performance or functionality. If you are implementing this project, it is crucial to comprehend the consequences of this change on your permissions validation process and adjust your workflows appropriately.

* Added `upgraded_from_workspace_id` property to migrated tables to indicated the source workspace ([#987](#987)). In this release, updates have been made to the `_migrate_external_table`, `_migrate_dbfs_root_table`, and `_migrate_view` methods in the `table_migrate.py` file to include a new parameter `upgraded_from_ws` in the SQL commands used to alter tables, views, or managed tables. This parameter is used to store the source workspace ID in the migrated tables, indicating the migration origin. A new utility method `sql_alter_from` has been added to the `Table` class in `tables.py` to generate the SQL command with the new parameter. Additionally, a new class-level attribute `UPGRADED_FROM_WS_PARAM` has been added to the `Table` class in `tables.py` to indicate the source workspace. A new property `upgraded_from_workspace_id` has been added to migrated tables to store the source workspace ID. These changes resolve issue [#899](#899) and are tested through manual testing, unit tests, and integration tests. No new CLI commands, workflows, or tables have been added or modified, and there are no changes to user documentation.
* Added a command to create account level groups if they do not exist ([#763](#763)). This commit introduces a new feature that enables the creation of account-level groups if they do not already exist in the account. A new command, `create-account-groups`, has been added to the `databricks labs ucx` tool, which crawls all workspaces in the account and creates account-level groups if a corresponding workspace-local group is not found. The feature supports various scenarios, including creating account-level groups that exist in some workspaces but not in others, and creating multiple account-level groups with the same name but different members. Several new methods have been added to the `account.py` file to support the new feature, and the `test_account.py` file has been updated with new tests to ensure the correct behavior of the `create_account_level_groups` method. Additionally, the `cli.py` file has been updated to include the new `create-account-groups` command. With these changes, users can easily manage account-level groups and ensure that they are consistent across all workspaces in the account, improving the overall user experience.
* Added assessment for the incompatible `RunSubmit` API usages ([#849](#849)). In this release, the assessment functionality for incompatible `RunSubmit` API usages has been significantly enhanced through various changes. The 'clusters.py' file has seen improvements in clarity and consistency with the renaming of private methods `check_spark_conf` to `_check_spark_conf` and `check_cluster_failures` to `_check_cluster_failures`. The `_assess_clusters` method has been updated to call the renamed `_check_cluster_failures` method for thorough checks of cluster configurations, resulting in better assessment functionality. A new `SubmitRunsCrawler` class has been added to the `databricks.labs.ucx.assessment.jobs` module, implementing `CrawlerBase`, `JobsMixin`, and `CheckClusterMixin` classes. This class crawls and assesses job runs based on their submitted runs, ensuring compatibility and identifying failure issues. Additionally, a new configuration attribute, `num_days_submit_runs_history`, has been introduced in the `WorkspaceConfig` class of the `config.py` module, controlling the number of days for which submission history of `RunSubmit` API calls is retained. Lastly, various new JSON files have been added for unit testing, assessing the `RunSubmit` API usages related to different scenarios like dbt task runs, Git source-based job runs, JAR file runs, and more. These tests will aid in identifying and addressing potential compatibility issues with the `RunSubmit` API.
* Added group members difference to the output of `validate-groups-membership` cli command ([#995](#995)). The `validate-groups-membership` command has been updated to include a comparison of group memberships at both the account and workspace levels. This enhancement is implemented through the `validate_group_membership` function, which has been updated to calculate the difference in members between the two levels and display it in a new `group_members_difference` column. This allows for a more detailed analysis of group memberships and easily identifies any discrepancies between the account and workspace levels. The corresponding unit test file, "test_groups.py," has been updated to include a new test case that verifies the calculation of the `group_members_difference` value. The functionality of the other commands remains unchanged. The new `group_members_difference` value is calculated as the difference in the number of members in the workspace group and the account group, with a positive value indicating more members in the workspace group and a negative value indicating more members in the account group. The table template in the labs.yml file has also been updated to include the new column for the group membership difference.
* Added handling for empty `directory_id` if managed identity encountered during the crawling of StoragePermissionMapping ([#986](#986)). This PR adds a `type` field to the `StoragePermissionMapping` and `Principal` dataclasses to differentiate between service principals and managed identities, allowing `None` for the `directory_id` field if the principal is not a service principal. During the migration to UC storage credentials, managed identities are currently ignored. These changes improve handling of managed identities during the crawling of `StoragePermissionMapping`, prevent errors when creating storage credentials with managed identities, and address issue [#339](#339). The changes are tested through unit tests, manual testing, and integration tests, and only affect the `StoragePermissionMapping` class and related methods, without introducing new commands, workflows, or tables.
* Added migration for Azure Service Principals with secrets stored in Databricks Secret to UC Storage Credentials ([#874](#874)). In this release, we have made significant updates to migrate Azure Service Principals with their secrets stored in Databricks Secret to UC Storage Credentials, enhancing security and management of storage access. The changes include: Addition of a new `migrate_credentials` command in the `labs.yml` file to migrate credentials for storage access to UC storage credential. Modification of `secrets.py` to handle the case where a secret has been removed from the backend and to log warning messages for secrets with invalid Base64 bytes. Introduction of the `StorageCredentialManager` and `ServicePrincipalMigration` classes in `credentials.py` to manage Azure Service Principals and their associated client secrets, and to migrate them to UC Storage Credentials. Addition of a new `directory_id` attribute in the `Principal` class and its associated dataclass in `resources.py` to store the directory ID for creating UC storage credentials using a service principal. Creation of a new pytest fixture, `make_storage_credential_spn`, in `fixtures.py` to simplify writing tests requiring Databricks Storage Credentials with Azure Service Principal auth. Addition of a new test file for the Azure integration of the project, including new classes, methods, and test cases for testing the migration of Azure Service Principals to UC Storage Credentials. These improvements will ensure better security and management of storage access using Azure Service Principals, while providing more efficient and robust testing capabilities.
* Added permission migration support for feature tables and the root permissions for models and feature tables ([#997](#997)). This commit introduces support for migration of permissions related to feature tables and sets root permissions for models and feature tables. New functions such as `feature_store_listing`, `feature_tables_root_page`, `models_root_page`, and `tokens_and_passwords` have been added to facilitate population of a workspace access page with necessary permissions information. The `factory` function in `manager.py` has been updated to include new listings for models' root page, feature tables' root page, and the feature store for enhanced management and access control of models and feature tables. New classes and methods have been implemented to handle permissions for these resources, utilizing `GenericPermissionsSupport`, `AccessControlRequest`, and `MigratedGroup` classes. Additionally, new test methods have been included to verify feature tables listing functionality and root page listing functionality for feature tables and registered models. The test manager method has been updated to include `feature-tables` in the list of items to be checked for permissions, ensuring comprehensive testing of permission functionality related to these new feature tables.
* Added support for serving endpoints ([#990](#990)). In this release, we have made significant enhancements to support serving endpoints in our open-source library. The `fixtures.py` file in the `databricks.labs.ucx.mixins` module has been updated with new classes and functions to create and manage serving endpoints, accompanied by integration tests to verify their functionality. We have added a new listing for serving endpoints in the assessment's permissions crawling, using the `ws.serving_endpoints.list` function and the `serving-endpoints` category. A new integration test, "test_endpoints," has been added to verify that assessments now crawl permissions for serving endpoints. This test demonstrates the ability to migrate permissions from one group to another. The test suite has been updated to ensure the proper functioning of the new feature and improve the assessment of permissions for serving endpoints, ensuring compatibility with the updated `test_manager.py` file.
* Expanded end-user documentation with detailed descriptions for workflows and commands ([#999](#999)). The Databricks Labs UCX project has been updated with several new features to assist in upgrading to Unity Catalog, including an assessment workflow that generates a detailed compatibility report for workspace entities, a group migration workflow for upgrading all Databricks workspace assets, and utility commands for managing cross-workspace installations. The Assessment Report now includes a more detailed summary of the assessment findings, table counts, database summaries, and external locations. Additional improvements include expanded workspace group migration to handle potential conflicts with locally scoped group names, enhanced documentation for external Hive Metastore integration, a new debugging notebook, and detailed descriptions of table upgrade considerations, data access permissions, external storage, and table crawler.
* Fixed `config.yml` upgrade from very old versions ([#984](#984)). In this release, we've introduced enhancements to the configuration upgrading process for `config.yml` in our open-source library. We've replaced the previous `v1_migrate` class method with a new implementation that specifically handles migration from version 1. The new method retrieves the `groups` field, extracts the `selected` value, and assigns it to the `include_group_names` key in the configuration. The `backup_group_prefix` value from the `groups` field is assigned to the `renamed_group_prefix` key, and the `groups` field is removed, with the version number updated to 2. These changes simplify the code and improve readability, enabling users to upgrade smoothly from version 1 of the configuration. Furthermore, we've added new unit tests to the `test_config.py` file to ensure backward compatibility. Two new tests, `test_v1_migrate_zeroconf` and `test_v1_migrate_some_conf`, have been added, utilizing the `MockInstallation` class and loading the configuration using `WorkspaceConfig`. These tests enhance the robustness and reliability of the migration process for `config.yml`.
* Renamed columns in assessment SQL queries to use actual names, not aliases ([#983](#983)). In this update, we have resolved an issue where aliases used for column references in SQL queries caused errors in certain setups by renaming them to use actual names. Specifically, for assessment SQL queries, we have modified the definition of the `is_delta` column to use the actual `table_format` name instead of the alias `format`. This change improves compatibility and enhances the reliability of query execution. As a software engineer, you will appreciate that this modification ensures consistent interpretation of column references across various setups, thereby avoiding potential errors caused by aliases. This change does not introduce any new methods, but instead modifies existing functionality to use actual column names, ensuring a more reliable and consistent SQL query for the `05_0_all_tables` assessment.
* Updated groups permissions validation to use Table ACL cluster ([#979](#979)). In this update, the `validate_groups_permissions` task has been modified to utilize the Table ACL cluster, as indicated by the inclusion of `job_cluster="tacl"`. This task is responsible for ensuring that all crawled permissions are accurately applied to the destination groups by calling the `permission_manager.apply_group_permissions` method during the migration state. This modification enhances the validation of group permissions by performing it on the Table ACL cluster, potentially improving performance or functionality. If you are implementing this project, it is crucial to comprehend the consequences of this change on your permissions validation process and adjust your workflows appropriately.
@nfx nfx requested review from a team and stikkireddy March 4, 2024 14:28
@nfx nfx merged commit 1a60a8d into main Mar 4, 2024
4 of 5 checks passed
@nfx nfx deleted the prepare/0.14.0 branch March 4, 2024 14:29
Copy link

github-actions bot commented Mar 4, 2024

❌ 108/109 passed, 2 flaky, 1 failed, 9 skipped, 1h17m8s total

❌ test_running_real_assessment_job: databricks.labs.blueprint.parallel.ManyError: Detected 9 failures: Unknown: assess_CLOUD_ENV_service_principals: run failed with error message (6m52.428s)
databricks.labs.blueprint.parallel.ManyError: Detected 9 failures: Unknown: assess_CLOUD_ENV_service_principals: run failed with error message
 Could not reach driver of cluster DATABRICKS_CLUSTER_ID., Unknown: assess_clusters: run failed with error message
 Could not reach driver of cluster DATABRICKS_CLUSTER_ID., Unknown: assess_global_init_scripts: run failed with error message
 Could not reach driver of cluster DATABRICKS_CLUSTER_ID., Unknown: assess_incompatible_submit_runs: run failed with error message
 Could not reach driver of cluster DATABRICKS_CLUSTER_ID., Unknown: assess_jobs: run failed with error message
 Could not reach driver of cluster DATABRICKS_CLUSTER_ID., Unknown: assess_pipelines: run failed with error message
 Could not reach driver of cluster DATABRICKS_CLUSTER_ID., Unknown: crawl_groups: run failed with error message
 Could not reach driver of cluster DATABRICKS_CLUSTER_ID., Unknown: crawl_mounts: run failed with error message
 Could not reach driver of cluster DATABRICKS_CLUSTER_ID., Unknown: workspace_listing: run failed with error message
 Could not reach driver of cluster DATABRICKS_CLUSTER_ID.
14:32 DEBUG [databricks.labs.ucx.framework.crawlers] [api][execute] CREATE SCHEMA hive_metastore.ucx_slrwr
14:32 INFO [databricks.labs.ucx.mixins.fixtures] Schema hive_metastore.ucx_slrwr: https://DATABRICKS_HOST/explore/data/hive_metastore/ucx_slrwr
14:32 DEBUG [databricks.labs.ucx.mixins.fixtures] added schema fixture: SchemaInfo(catalog_name='hive_metastore', catalog_type=None, comment=None, created_at=None, created_by=None, effective_predictive_optimization_flag=None, enable_predictive_optimization=None, full_name='hive_metastore.ucx_slrwr', metastore_id=None, name='ucx_slrwr', owner=None, properties=None, storage_location=None, storage_root=None, updated_at=None, updated_by=None)
[gw1] linux -- Python 3.10.13 /home/runner/work/ucx/ucx/.venv/bin/python
14:32 DEBUG [databricks.labs.ucx.framework.crawlers] [api][execute] CREATE SCHEMA hive_metastore.ucx_slrwr
14:32 INFO [databricks.labs.ucx.mixins.fixtures] Schema hive_metastore.ucx_slrwr: https://DATABRICKS_HOST/explore/data/hive_metastore/ucx_slrwr
14:32 DEBUG [databricks.labs.ucx.mixins.fixtures] added schema fixture: SchemaInfo(catalog_name='hive_metastore', catalog_type=None, comment=None, created_at=None, created_by=None, effective_predictive_optimization_flag=None, enable_predictive_optimization=None, full_name='hive_metastore.ucx_slrwr', metastore_id=None, name='ucx_slrwr', owner=None, properties=None, storage_location=None, storage_root=None, updated_at=None, updated_by=None)
14:32 DEBUG [databricks.labs.ucx.mixins.fixtures] added workspace user fixture: User(active=True, display_name='sdk-dvdj@example.com', emails=[ComplexValue(display=None, primary=True, ref=None, type='work', value='sdk-dvdj@example.com')], entitlements=[], external_id=None, groups=[], id='6578067966231233', name=Name(family_name=None, given_name='sdk-dvdj@example.com'), roles=[], schemas=[<UserSchema.URN_IETF_PARAMS_SCIM_SCHEMAS_CORE_2_0_USER: 'urn:ietf:params:scim:schemas:core:2.0:User'>, <UserSchema.URN_IETF_PARAMS_SCIM_SCHEMAS_EXTENSION_WORKSPACE_2_0_USER: 'urn:ietf:params:scim:schemas:extension:workspace:2.0:User'>], user_name='sdk-dvdj@example.com')
14:32 INFO [databricks.labs.ucx.mixins.fixtures] Workspace group ucx_IpbN: https://DATABRICKS_HOST#setting/accounts/groups/845404866548890
14:32 DEBUG [databricks.labs.ucx.mixins.fixtures] added workspace group fixture: Group(display_name='ucx_IpbN', entitlements=[ComplexValue(display=None, primary=None, ref=None, type=None, value='allow-cluster-create')], external_id=None, groups=[], id='845404866548890', members=[ComplexValue(display='sdk-dvdj@example.com', primary=None, ref='Users/6578067966231233', type=None, value='6578067966231233')], meta=ResourceMeta(resource_type='WorkspaceGroup'), roles=[], schemas=[<GroupSchema.URN_IETF_PARAMS_SCIM_SCHEMAS_CORE_2_0_GROUP: 'urn:ietf:params:scim:schemas:core:2.0:Group'>])
14:32 INFO [databricks.labs.ucx.mixins.fixtures] Account group ucx_IpbN: https://accounts.CLOUD_ENVdatabricks.net/users/groups/1064020501681437/members
14:32 DEBUG [databricks.labs.ucx.mixins.fixtures] added account group fixture: Group(display_name='ucx_IpbN', entitlements=[], external_id=None, groups=[], id='1064020501681437', members=[ComplexValue(display='sdk-dvdj@example.com', primary=None, ref='Users/6578067966231233', type=None, value='6578067966231233')], meta=None, roles=[], schemas=[<GroupSchema.URN_IETF_PARAMS_SCIM_SCHEMAS_CORE_2_0_GROUP: 'urn:ietf:params:scim:schemas:core:2.0:Group'>])
14:32 INFO [databricks.labs.ucx.mixins.fixtures] Cluster policy: https://DATABRICKS_HOST#setting/clusters/cluster-policies/view/000EBD8B04DA085B
14:32 DEBUG [databricks.labs.ucx.mixins.fixtures] added cluster policy fixture: CreatePolicyResponse(policy_id='000EBD8B04DA085B')
14:32 DEBUG [databricks.labs.ucx.mixins.fixtures] added cluster_policy permissions fixture: 000EBD8B04DA085B [group_name admins CAN_USE] -> [group_name ucx_IpbN CAN_USE]
14:32 DEBUG [databricks.labs.ucx.install] Cannot find previous installation: Path (/Users/0a330eb5-dd51-4d97-b6e4-c474356b1d5d/.ujhF/config.yml) doesn't exist.
14:32 INFO [databricks.labs.ucx.install] Please answer a couple of questions to configure Unity Catalog migration
14:32 INFO [databricks.labs.ucx.installer.hms_lineage] HMS Lineage feature creates one system table named system.hms_to_uc_migration.table_access and helps in your migration process from HMS to UC by allowing you to programmatically query HMS lineage data.
14:32 INFO [databricks.labs.ucx.install] Creating UCX cluster policy.
14:32 INFO [databricks.labs.ucx.install] Installing UCX v0.13.3+1820240304143231
14:32 INFO [databricks.labs.ucx.install] Creating dashboards...
14:32 INFO [databricks.labs.ucx.framework.crawlers] Ensuring ucx_slrwr database exists
14:32 DEBUG [databricks.labs.ucx.framework.crawlers] [api][execute] CREATE SCHEMA IF NOT EXISTS hive_metastore.ucx_slrwr
14:32 DEBUG [databricks.labs.ucx.install] Creating jobs from tasks in main
14:32 INFO [databricks.labs.ucx.install] Fetching warehouse_id from a config
14:32 DEBUG [databricks.labs.ucx.framework.dashboards] Reading step folder /home/runner/work/ucx/ucx/src/databricks/labs/ucx/queries/assessment...
14:32 DEBUG [databricks.labs.ucx.framework.dashboards] Reading dashboard folder /home/runner/work/ucx/ucx/src/databricks/labs/ucx/queries/assessment/main...
14:32 INFO [databricks.labs.ucx.framework.dashboards] Creating dashboard [UJHF] UCX  Assessment (Main)...
14:32 INFO [databricks.labs.ucx.framework.crawlers] Ensuring ucx_slrwr.CLOUD_ENV_service_principals table exists
14:32 DEBUG [databricks.labs.ucx.framework.crawlers] [api][execute] CREATE TABLE IF NOT EXISTS hive_metastore.ucx_slrwr.CLOUD_ENV_service_principals (application_id STR... (107 more bytes)
14:32 INFO [databricks.labs.ucx.framework.crawlers] Ensuring ucx_slrwr.clusters table exists
14:32 DEBUG [databricks.labs.ucx.framework.crawlers] [api][execute] CREATE TABLE IF NOT EXISTS hive_metastore.ucx_slrwr.clusters (cluster_id STRING NOT NULL, succes... (91 more bytes)
14:32 INFO [databricks.labs.ucx.framework.crawlers] Ensuring ucx_slrwr.global_init_scripts table exists
14:32 DEBUG [databricks.labs.ucx.framework.crawlers] [api][execute] CREATE TABLE IF NOT EXISTS hive_metastore.ucx_slrwr.global_init_scripts (script_id STRING NOT NU... (120 more bytes)
14:32 INFO [databricks.labs.ucx.framework.crawlers] Ensuring ucx_slrwr.jobs table exists
14:32 DEBUG [databricks.labs.ucx.framework.crawlers] [api][execute] CREATE TABLE IF NOT EXISTS hive_metastore.ucx_slrwr.jobs (job_id STRING NOT NULL, success LONG N... (79 more bytes)
14:32 INFO [databricks.labs.ucx.framework.crawlers] Ensuring ucx_slrwr.pipelines table exists
14:32 INFO [databricks.labs.ucx.framework.crawlers] Ensuring ucx_slrwr.external_locations table exists
14:32 DEBUG [databricks.labs.ucx.framework.crawlers] [api][execute] CREATE TABLE IF NOT EXISTS hive_metastore.ucx_slrwr.pipelines (pipeline_id STRING NOT NULL, succ... (99 more bytes)
14:32 DEBUG [databricks.labs.ucx.framework.crawlers] [api][execute] CREATE TABLE IF NOT EXISTS hive_metastore.ucx_slrwr.external_locations (location STRING NOT NULL... (40 more bytes)
14:32 INFO [databricks.labs.ucx.framework.crawlers] Ensuring ucx_slrwr.mounts table exists
14:32 INFO [databricks.labs.ucx.framework.crawlers] Ensuring ucx_slrwr.grants table exists
14:32 INFO [databricks.labs.ucx.framework.crawlers] Ensuring ucx_slrwr.groups table exists
14:32 DEBUG [databricks.labs.ucx.framework.crawlers] [api][execute] CREATE TABLE IF NOT EXISTS hive_metastore.ucx_slrwr.mounts (name STRING NOT NULL, source STRING ... (21 more bytes)
14:32 DEBUG [databricks.labs.ucx.framework.crawlers] [api][execute] CREATE TABLE IF NOT EXISTS hive_metastore.ucx_slrwr.grants (principal STRING NOT NULL, action_ty... (167 more bytes)
14:32 INFO [databricks.labs.ucx.framework.crawlers] Ensuring ucx_slrwr.tables table exists
14:32 INFO [databricks.labs.ucx.framework.crawlers] Ensuring ucx_slrwr.table_size table exists
14:32 DEBUG [databricks.labs.ucx.framework.crawlers] [api][execute] CREATE TABLE IF NOT EXISTS hive_metastore.ucx_slrwr.groups (id_in_workspace STRING NOT NULL, nam... (179 more bytes)
14:32 INFO [databricks.labs.ucx.framework.crawlers] Ensuring ucx_slrwr.table_failures table exists
14:32 INFO [databricks.labs.ucx.framework.crawlers] Ensuring ucx_slrwr.workspace_objects table exists
14:32 INFO [databricks.labs.ucx.framework.crawlers] Ensuring ucx_slrwr.permissions table exists
14:32 INFO [databricks.labs.ucx.framework.crawlers] Ensuring ucx_slrwr.submit_runs table exists
14:32 DEBUG [databricks.labs.ucx.framework.crawlers] [api][execute] CREATE TABLE IF NOT EXISTS hive_metastore.ucx_slrwr.tables (catalog STRING NOT NULL, database ST... (189 more bytes)
14:32 DEBUG [databricks.labs.ucx.framework.crawlers] [api][execute] CREATE TABLE IF NOT EXISTS hive_metastore.ucx_slrwr.table_size (catalog STRING NOT NULL, databas... (81 more bytes)
14:32 DEBUG [databricks.labs.ucx.framework.crawlers] [api][execute] CREATE TABLE IF NOT EXISTS hive_metastore.ucx_slrwr.table_failures (catalog STRING NOT NULL, dat... (61 more bytes)
14:32 DEBUG [databricks.labs.ucx.framework.crawlers] [api][execute] CREATE TABLE IF NOT EXISTS hive_metastore.ucx_slrwr.workspace_objects (path STRING NOT NULL, obj... (63 more bytes)
14:32 DEBUG [databricks.labs.ucx.framework.crawlers] [api][execute] CREATE TABLE IF NOT EXISTS hive_metastore.ucx_slrwr.permissions (object_id STRING NOT NULL, obje... (57 more bytes)
14:32 DEBUG [databricks.labs.ucx.framework.crawlers] [api][execute] CREATE TABLE IF NOT EXISTS hive_metastore.ucx_slrwr.submit_runs (run_ids STRING NOT NULL, hashed... (58 more bytes)
14:32 INFO [databricks.labs.ucx.install] Creating new job configuration for step=remove-workspace-local-backup-groups
14:32 INFO [databricks.labs.ucx.install] Fetching warehouse_id from a config
14:32 INFO [databricks.labs.ucx.install] Creating new job configuration for step=assessment
14:32 INFO [databricks.labs.ucx.install] Creating new job configuration for step=migrate-groups
14:32 INFO [databricks.labs.ucx.install] Creating new job configuration for step=validate-groups-permissions
14:32 INFO [databricks.labs.ucx.install] Creating new job configuration for step=099-destroy-schema
14:32 INFO [databricks.labs.ucx.framework.crawlers] Ensuring ucx_slrwr.objects view matches queries/views/objects.sql contents
14:32 DEBUG [databricks.labs.ucx.framework.crawlers] [api][execute] CREATE OR REPLACE VIEW hive_metastore.ucx_slrwr.objects AS SELECT "jobs" AS object_type, job_id ... (1639 more bytes)
14:32 INFO [databricks.labs.ucx.framework.crawlers] Ensuring ucx_slrwr.grant_detail view matches queries/views/grant_detail.sql contents
14:32 DEBUG [databricks.labs.ucx.framework.crawlers] [api][execute] CREATE OR REPLACE VIEW hive_metastore.ucx_slrwr.grant_detail AS SELECT CASE WHEN anonymous_funct... (1037 more bytes)
14:33 DEBUG [databricks.labs.ucx.framework.dashboards] Reading dashboard folder /home/runner/work/ucx/ucx/src/databricks/labs/ucx/queries/assessment/CLOUD_ENV...
14:33 INFO [databricks.labs.ucx.framework.dashboards] Creating dashboard [UJHF] UCX  Assessment (Azure)...
14:33 DEBUG [databricks.labs.ucx.framework.dashboards] Reading step folder /home/runner/work/ucx/ucx/src/databricks/labs/ucx/queries/views...
14:33 INFO [databricks.labs.ucx.install] Installation completed successfully! Please refer to the https://DATABRICKS_HOST/#workspace/Users/0a330eb5-dd51-4d97-b6e4-c474356b1d5d/.ujhF/README for the next steps.
14:33 DEBUG [databricks.labs.ucx.install] starting assessment job: https://DATABRICKS_HOST#job/419912551419614
14:32 DEBUG [databricks.labs.ucx.framework.crawlers] [api][execute] CREATE SCHEMA hive_metastore.ucx_slrwr
14:32 INFO [databricks.labs.ucx.mixins.fixtures] Schema hive_metastore.ucx_slrwr: https://DATABRICKS_HOST/explore/data/hive_metastore/ucx_slrwr
14:32 DEBUG [databricks.labs.ucx.mixins.fixtures] added schema fixture: SchemaInfo(catalog_name='hive_metastore', catalog_type=None, comment=None, created_at=None, created_by=None, effective_predictive_optimization_flag=None, enable_predictive_optimization=None, full_name='hive_metastore.ucx_slrwr', metastore_id=None, name='ucx_slrwr', owner=None, properties=None, storage_location=None, storage_root=None, updated_at=None, updated_by=None)
14:32 DEBUG [databricks.labs.ucx.mixins.fixtures] added workspace user fixture: User(active=True, display_name='sdk-dvdj@example.com', emails=[ComplexValue(display=None, primary=True, ref=None, type='work', value='sdk-dvdj@example.com')], entitlements=[], external_id=None, groups=[], id='6578067966231233', name=Name(family_name=None, given_name='sdk-dvdj@example.com'), roles=[], schemas=[<UserSchema.URN_IETF_PARAMS_SCIM_SCHEMAS_CORE_2_0_USER: 'urn:ietf:params:scim:schemas:core:2.0:User'>, <UserSchema.URN_IETF_PARAMS_SCIM_SCHEMAS_EXTENSION_WORKSPACE_2_0_USER: 'urn:ietf:params:scim:schemas:extension:workspace:2.0:User'>], user_name='sdk-dvdj@example.com')
14:32 INFO [databricks.labs.ucx.mixins.fixtures] Workspace group ucx_IpbN: https://DATABRICKS_HOST#setting/accounts/groups/845404866548890
14:32 DEBUG [databricks.labs.ucx.mixins.fixtures] added workspace group fixture: Group(display_name='ucx_IpbN', entitlements=[ComplexValue(display=None, primary=None, ref=None, type=None, value='allow-cluster-create')], external_id=None, groups=[], id='845404866548890', members=[ComplexValue(display='sdk-dvdj@example.com', primary=None, ref='Users/6578067966231233', type=None, value='6578067966231233')], meta=ResourceMeta(resource_type='WorkspaceGroup'), roles=[], schemas=[<GroupSchema.URN_IETF_PARAMS_SCIM_SCHEMAS_CORE_2_0_GROUP: 'urn:ietf:params:scim:schemas:core:2.0:Group'>])
14:32 INFO [databricks.labs.ucx.mixins.fixtures] Account group ucx_IpbN: https://accounts.CLOUD_ENVdatabricks.net/users/groups/1064020501681437/members
14:32 DEBUG [databricks.labs.ucx.mixins.fixtures] added account group fixture: Group(display_name='ucx_IpbN', entitlements=[], external_id=None, groups=[], id='1064020501681437', members=[ComplexValue(display='sdk-dvdj@example.com', primary=None, ref='Users/6578067966231233', type=None, value='6578067966231233')], meta=None, roles=[], schemas=[<GroupSchema.URN_IETF_PARAMS_SCIM_SCHEMAS_CORE_2_0_GROUP: 'urn:ietf:params:scim:schemas:core:2.0:Group'>])
14:32 INFO [databricks.labs.ucx.mixins.fixtures] Cluster policy: https://DATABRICKS_HOST#setting/clusters/cluster-policies/view/000EBD8B04DA085B
14:32 DEBUG [databricks.labs.ucx.mixins.fixtures] added cluster policy fixture: CreatePolicyResponse(policy_id='000EBD8B04DA085B')
14:32 DEBUG [databricks.labs.ucx.mixins.fixtures] added cluster_policy permissions fixture: 000EBD8B04DA085B [group_name admins CAN_USE] -> [group_name ucx_IpbN CAN_USE]
14:32 DEBUG [databricks.labs.ucx.install] Cannot find previous installation: Path (/Users/0a330eb5-dd51-4d97-b6e4-c474356b1d5d/.ujhF/config.yml) doesn't exist.
14:32 INFO [databricks.labs.ucx.install] Please answer a couple of questions to configure Unity Catalog migration
14:32 INFO [databricks.labs.ucx.installer.hms_lineage] HMS Lineage feature creates one system table named system.hms_to_uc_migration.table_access and helps in your migration process from HMS to UC by allowing you to programmatically query HMS lineage data.
14:32 INFO [databricks.labs.ucx.install] Creating UCX cluster policy.
14:32 INFO [databricks.labs.ucx.install] Installing UCX v0.13.3+1820240304143231
14:32 INFO [databricks.labs.ucx.install] Creating dashboards...
14:32 INFO [databricks.labs.ucx.framework.crawlers] Ensuring ucx_slrwr database exists
14:32 DEBUG [databricks.labs.ucx.framework.crawlers] [api][execute] CREATE SCHEMA IF NOT EXISTS hive_metastore.ucx_slrwr
14:32 DEBUG [databricks.labs.ucx.install] Creating jobs from tasks in main
14:32 INFO [databricks.labs.ucx.install] Fetching warehouse_id from a config
14:32 DEBUG [databricks.labs.ucx.framework.dashboards] Reading step folder /home/runner/work/ucx/ucx/src/databricks/labs/ucx/queries/assessment...
14:32 DEBUG [databricks.labs.ucx.framework.dashboards] Reading dashboard folder /home/runner/work/ucx/ucx/src/databricks/labs/ucx/queries/assessment/main...
14:32 INFO [databricks.labs.ucx.framework.dashboards] Creating dashboard [UJHF] UCX  Assessment (Main)...
14:32 INFO [databricks.labs.ucx.framework.crawlers] Ensuring ucx_slrwr.CLOUD_ENV_service_principals table exists
14:32 DEBUG [databricks.labs.ucx.framework.crawlers] [api][execute] CREATE TABLE IF NOT EXISTS hive_metastore.ucx_slrwr.CLOUD_ENV_service_principals (application_id STR... (107 more bytes)
14:32 INFO [databricks.labs.ucx.framework.crawlers] Ensuring ucx_slrwr.clusters table exists
14:32 DEBUG [databricks.labs.ucx.framework.crawlers] [api][execute] CREATE TABLE IF NOT EXISTS hive_metastore.ucx_slrwr.clusters (cluster_id STRING NOT NULL, succes... (91 more bytes)
14:32 INFO [databricks.labs.ucx.framework.crawlers] Ensuring ucx_slrwr.global_init_scripts table exists
14:32 DEBUG [databricks.labs.ucx.framework.crawlers] [api][execute] CREATE TABLE IF NOT EXISTS hive_metastore.ucx_slrwr.global_init_scripts (script_id STRING NOT NU... (120 more bytes)
14:32 INFO [databricks.labs.ucx.framework.crawlers] Ensuring ucx_slrwr.jobs table exists
14:32 DEBUG [databricks.labs.ucx.framework.crawlers] [api][execute] CREATE TABLE IF NOT EXISTS hive_metastore.ucx_slrwr.jobs (job_id STRING NOT NULL, success LONG N... (79 more bytes)
14:32 INFO [databricks.labs.ucx.framework.crawlers] Ensuring ucx_slrwr.pipelines table exists
14:32 INFO [databricks.labs.ucx.framework.crawlers] Ensuring ucx_slrwr.external_locations table exists
14:32 DEBUG [databricks.labs.ucx.framework.crawlers] [api][execute] CREATE TABLE IF NOT EXISTS hive_metastore.ucx_slrwr.pipelines (pipeline_id STRING NOT NULL, succ... (99 more bytes)
14:32 DEBUG [databricks.labs.ucx.framework.crawlers] [api][execute] CREATE TABLE IF NOT EXISTS hive_metastore.ucx_slrwr.external_locations (location STRING NOT NULL... (40 more bytes)
14:32 INFO [databricks.labs.ucx.framework.crawlers] Ensuring ucx_slrwr.mounts table exists
14:32 INFO [databricks.labs.ucx.framework.crawlers] Ensuring ucx_slrwr.grants table exists
14:32 INFO [databricks.labs.ucx.framework.crawlers] Ensuring ucx_slrwr.groups table exists
14:32 DEBUG [databricks.labs.ucx.framework.crawlers] [api][execute] CREATE TABLE IF NOT EXISTS hive_metastore.ucx_slrwr.mounts (name STRING NOT NULL, source STRING ... (21 more bytes)
14:32 DEBUG [databricks.labs.ucx.framework.crawlers] [api][execute] CREATE TABLE IF NOT EXISTS hive_metastore.ucx_slrwr.grants (principal STRING NOT NULL, action_ty... (167 more bytes)
14:32 INFO [databricks.labs.ucx.framework.crawlers] Ensuring ucx_slrwr.tables table exists
14:32 INFO [databricks.labs.ucx.framework.crawlers] Ensuring ucx_slrwr.table_size table exists
14:32 DEBUG [databricks.labs.ucx.framework.crawlers] [api][execute] CREATE TABLE IF NOT EXISTS hive_metastore.ucx_slrwr.groups (id_in_workspace STRING NOT NULL, nam... (179 more bytes)
14:32 INFO [databricks.labs.ucx.framework.crawlers] Ensuring ucx_slrwr.table_failures table exists
14:32 INFO [databricks.labs.ucx.framework.crawlers] Ensuring ucx_slrwr.workspace_objects table exists
14:32 INFO [databricks.labs.ucx.framework.crawlers] Ensuring ucx_slrwr.permissions table exists
14:32 INFO [databricks.labs.ucx.framework.crawlers] Ensuring ucx_slrwr.submit_runs table exists
14:32 DEBUG [databricks.labs.ucx.framework.crawlers] [api][execute] CREATE TABLE IF NOT EXISTS hive_metastore.ucx_slrwr.tables (catalog STRING NOT NULL, database ST... (189 more bytes)
14:32 DEBUG [databricks.labs.ucx.framework.crawlers] [api][execute] CREATE TABLE IF NOT EXISTS hive_metastore.ucx_slrwr.table_size (catalog STRING NOT NULL, databas... (81 more bytes)
14:32 DEBUG [databricks.labs.ucx.framework.crawlers] [api][execute] CREATE TABLE IF NOT EXISTS hive_metastore.ucx_slrwr.table_failures (catalog STRING NOT NULL, dat... (61 more bytes)
14:32 DEBUG [databricks.labs.ucx.framework.crawlers] [api][execute] CREATE TABLE IF NOT EXISTS hive_metastore.ucx_slrwr.workspace_objects (path STRING NOT NULL, obj... (63 more bytes)
14:32 DEBUG [databricks.labs.ucx.framework.crawlers] [api][execute] CREATE TABLE IF NOT EXISTS hive_metastore.ucx_slrwr.permissions (object_id STRING NOT NULL, obje... (57 more bytes)
14:32 DEBUG [databricks.labs.ucx.framework.crawlers] [api][execute] CREATE TABLE IF NOT EXISTS hive_metastore.ucx_slrwr.submit_runs (run_ids STRING NOT NULL, hashed... (58 more bytes)
14:32 INFO [databricks.labs.ucx.install] Creating new job configuration for step=remove-workspace-local-backup-groups
14:32 INFO [databricks.labs.ucx.install] Fetching warehouse_id from a config
14:32 INFO [databricks.labs.ucx.install] Creating new job configuration for step=assessment
14:32 INFO [databricks.labs.ucx.install] Creating new job configuration for step=migrate-groups
14:32 INFO [databricks.labs.ucx.install] Creating new job configuration for step=validate-groups-permissions
14:32 INFO [databricks.labs.ucx.install] Creating new job configuration for step=099-destroy-schema
14:32 INFO [databricks.labs.ucx.framework.crawlers] Ensuring ucx_slrwr.objects view matches queries/views/objects.sql contents
14:32 DEBUG [databricks.labs.ucx.framework.crawlers] [api][execute] CREATE OR REPLACE VIEW hive_metastore.ucx_slrwr.objects AS SELECT "jobs" AS object_type, job_id ... (1639 more bytes)
14:32 INFO [databricks.labs.ucx.framework.crawlers] Ensuring ucx_slrwr.grant_detail view matches queries/views/grant_detail.sql contents
14:32 DEBUG [databricks.labs.ucx.framework.crawlers] [api][execute] CREATE OR REPLACE VIEW hive_metastore.ucx_slrwr.grant_detail AS SELECT CASE WHEN anonymous_funct... (1037 more bytes)
14:33 DEBUG [databricks.labs.ucx.framework.dashboards] Reading dashboard folder /home/runner/work/ucx/ucx/src/databricks/labs/ucx/queries/assessment/CLOUD_ENV...
14:33 INFO [databricks.labs.ucx.framework.dashboards] Creating dashboard [UJHF] UCX  Assessment (Azure)...
14:33 DEBUG [databricks.labs.ucx.framework.dashboards] Reading step folder /home/runner/work/ucx/ucx/src/databricks/labs/ucx/queries/views...
14:33 INFO [databricks.labs.ucx.install] Installation completed successfully! Please refer to the https://DATABRICKS_HOST/#workspace/Users/0a330eb5-dd51-4d97-b6e4-c474356b1d5d/.ujhF/README for the next steps.
14:33 DEBUG [databricks.labs.ucx.install] starting assessment job: https://DATABRICKS_HOST#job/419912551419614
14:39 DEBUG [databricks.labs.ucx.mixins.fixtures] clearing 1 cluster_policy permissions fixtures
14:39 DEBUG [databricks.labs.ucx.mixins.fixtures] removing cluster_policy permissions fixture: 000EBD8B04DA085B [group_name admins CAN_USE] -> [group_name ucx_IpbN CAN_USE]
14:39 DEBUG [databricks.labs.ucx.mixins.fixtures] clearing 1 cluster policy fixtures
14:39 DEBUG [databricks.labs.ucx.mixins.fixtures] removing cluster policy fixture: CreatePolicyResponse(policy_id='000EBD8B04DA085B')
14:39 DEBUG [databricks.labs.ucx.mixins.fixtures] clearing 1 workspace user fixtures
14:39 DEBUG [databricks.labs.ucx.mixins.fixtures] removing workspace user fixture: User(active=True, display_name='sdk-dvdj@example.com', emails=[ComplexValue(display=None, primary=True, ref=None, type='work', value='sdk-dvdj@example.com')], entitlements=[], external_id=None, groups=[], id='6578067966231233', name=Name(family_name=None, given_name='sdk-dvdj@example.com'), roles=[], schemas=[<UserSchema.URN_IETF_PARAMS_SCIM_SCHEMAS_CORE_2_0_USER: 'urn:ietf:params:scim:schemas:core:2.0:User'>, <UserSchema.URN_IETF_PARAMS_SCIM_SCHEMAS_EXTENSION_WORKSPACE_2_0_USER: 'urn:ietf:params:scim:schemas:extension:workspace:2.0:User'>], user_name='sdk-dvdj@example.com')
14:39 DEBUG [databricks.labs.ucx.mixins.fixtures] clearing 1 account group fixtures
14:39 DEBUG [databricks.labs.ucx.mixins.fixtures] removing account group fixture: Group(display_name='ucx_IpbN', entitlements=[], external_id=None, groups=[], id='1064020501681437', members=[ComplexValue(display='sdk-dvdj@example.com', primary=None, ref='Users/6578067966231233', type=None, value='6578067966231233')], meta=None, roles=[], schemas=[<GroupSchema.URN_IETF_PARAMS_SCIM_SCHEMAS_CORE_2_0_GROUP: 'urn:ietf:params:scim:schemas:core:2.0:Group'>])
14:39 DEBUG [databricks.labs.ucx.mixins.fixtures] clearing 1 workspace group fixtures
14:39 DEBUG [databricks.labs.ucx.mixins.fixtures] removing workspace group fixture: Group(display_name='ucx_IpbN', entitlements=[ComplexValue(display=None, primary=None, ref=None, type=None, value='allow-cluster-create')], external_id=None, groups=[], id='845404866548890', members=[ComplexValue(display='sdk-dvdj@example.com', primary=None, ref='Users/6578067966231233', type=None, value='6578067966231233')], meta=ResourceMeta(resource_type='WorkspaceGroup'), roles=[], schemas=[<GroupSchema.URN_IETF_PARAMS_SCIM_SCHEMAS_CORE_2_0_GROUP: 'urn:ietf:params:scim:schemas:core:2.0:Group'>])
14:39 INFO [databricks.labs.ucx.install] Deleting UCX v0.13.3+1820240304143912 from https://DATABRICKS_HOST
14:39 INFO [databricks.labs.ucx.install] Deleting inventory database ucx_slrwr
14:39 INFO [databricks.labs.ucx.framework.crawlers] deleting ucx_slrwr database
14:39 DEBUG [databricks.labs.ucx.framework.crawlers] [api][execute] DROP SCHEMA IF EXISTS hive_metastore.ucx_slrwr CASCADE
14:39 INFO [databricks.labs.ucx.install] Deleting jobs
14:39 INFO [databricks.labs.ucx.install] Deleting remove-workspace-local-backup-groups job_id=723051027443911.
14:39 INFO [databricks.labs.ucx.install] Deleting assessment job_id=419912551419614.
14:39 INFO [databricks.labs.ucx.install] Deleting migrate-groups job_id=28894494326076.
14:39 INFO [databricks.labs.ucx.install] Deleting validate-groups-permissions job_id=995302358817201.
14:39 INFO [databricks.labs.ucx.install] Deleting 099-destroy-schema job_id=686828127194860.
14:39 INFO [databricks.labs.ucx.install] Deleting cluster policy
14:39 INFO [databricks.labs.ucx.install] UnInstalling UCX complete
14:39 DEBUG [databricks.labs.ucx.mixins.fixtures] clearing 1 schema fixtures
14:39 DEBUG [databricks.labs.ucx.mixins.fixtures] removing schema fixture: SchemaInfo(catalog_name='hive_metastore', catalog_type=None, comment=None, created_at=None, created_by=None, effective_predictive_optimization_flag=None, enable_predictive_optimization=None, full_name='hive_metastore.ucx_slrwr', metastore_id=None, name='ucx_slrwr', owner=None, properties=None, storage_location=None, storage_root=None, updated_at=None, updated_by=None)
14:39 DEBUG [databricks.labs.ucx.framework.crawlers] [api][execute] DROP SCHEMA IF EXISTS hive_metastore.ucx_slrwr CASCADE
[gw1] linux -- Python 3.10.13 /home/runner/work/ucx/ucx/.venv/bin/python

Flaky tests:

  • 🤪 test_create_account_level_groups (1m41.504s)
  • 🤪 test_running_real_remove_backup_groups_job (4m30.494s)

Running from acceptance #1373

nkvuong pushed a commit that referenced this pull request Mar 6, 2024
* Added `upgraded_from_workspace_id` property to migrated tables to
indicated the source workspace
([#987](#987)). In this
release, updates have been made to the `_migrate_external_table`,
`_migrate_dbfs_root_table`, and `_migrate_view` methods in the
`table_migrate.py` file to include a new parameter `upgraded_from_ws` in
the SQL commands used to alter tables, views, or managed tables. This
parameter is used to store the source workspace ID in the migrated
tables, indicating the migration origin. A new utility method
`sql_alter_from` has been added to the `Table` class in `tables.py` to
generate the SQL command with the new parameter. Additionally, a new
class-level attribute `UPGRADED_FROM_WS_PARAM` has been added to the
`Table` class in `tables.py` to indicate the source workspace. A new
property `upgraded_from_workspace_id` has been added to migrated tables
to store the source workspace ID. These changes resolve issue
[#899](#899) and are tested
through manual testing, unit tests, and integration tests. No new CLI
commands, workflows, or tables have been added or modified, and there
are no changes to user documentation.
* Added a command to create account level groups if they do not exist
([#763](#763)). This commit
introduces a new feature that enables the creation of account-level
groups if they do not already exist in the account. A new command,
`create-account-groups`, has been added to the `databricks labs ucx`
tool, which crawls all workspaces in the account and creates
account-level groups if a corresponding workspace-local group is not
found. The feature supports various scenarios, including creating
account-level groups that exist in some workspaces but not in others,
and creating multiple account-level groups with the same name but
different members. Several new methods have been added to the
`account.py` file to support the new feature, and the `test_account.py`
file has been updated with new tests to ensure the correct behavior of
the `create_account_level_groups` method. Additionally, the `cli.py`
file has been updated to include the new `create-account-groups`
command. With these changes, users can easily manage account-level
groups and ensure that they are consistent across all workspaces in the
account, improving the overall user experience.
* Added assessment for the incompatible `RunSubmit` API usages
([#849](#849)). In this
release, the assessment functionality for incompatible `RunSubmit` API
usages has been significantly enhanced through various changes. The
'clusters.py' file has seen improvements in clarity and consistency with
the renaming of private methods `check_spark_conf` to
`_check_spark_conf` and `check_cluster_failures` to
`_check_cluster_failures`. The `_assess_clusters` method has been
updated to call the renamed `_check_cluster_failures` method for
thorough checks of cluster configurations, resulting in better
assessment functionality. A new `SubmitRunsCrawler` class has been added
to the `databricks.labs.ucx.assessment.jobs` module, implementing
`CrawlerBase`, `JobsMixin`, and `CheckClusterMixin` classes. This class
crawls and assesses job runs based on their submitted runs, ensuring
compatibility and identifying failure issues. Additionally, a new
configuration attribute, `num_days_submit_runs_history`, has been
introduced in the `WorkspaceConfig` class of the `config.py` module,
controlling the number of days for which submission history of
`RunSubmit` API calls is retained. Lastly, various new JSON files have
been added for unit testing, assessing the `RunSubmit` API usages
related to different scenarios like dbt task runs, Git source-based job
runs, JAR file runs, and more. These tests will aid in identifying and
addressing potential compatibility issues with the `RunSubmit` API.
* Added group members difference to the output of
`validate-groups-membership` cli command
([#995](#995)). The
`validate-groups-membership` command has been updated to include a
comparison of group memberships at both the account and workspace
levels. This enhancement is implemented through the
`validate_group_membership` function, which has been updated to
calculate the difference in members between the two levels and display
it in a new `group_members_difference` column. This allows for a more
detailed analysis of group memberships and easily identifies any
discrepancies between the account and workspace levels. The
corresponding unit test file, "test_groups.py," has been updated to
include a new test case that verifies the calculation of the
`group_members_difference` value. The functionality of the other
commands remains unchanged. The new `group_members_difference` value is
calculated as the difference in the number of members in the workspace
group and the account group, with a positive value indicating more
members in the workspace group and a negative value indicating more
members in the account group. The table template in the labs.yml file
has also been updated to include the new column for the group membership
difference.
* Added handling for empty `directory_id` if managed identity
encountered during the crawling of StoragePermissionMapping
([#986](#986)). This PR adds
a `type` field to the `StoragePermissionMapping` and `Principal`
dataclasses to differentiate between service principals and managed
identities, allowing `None` for the `directory_id` field if the
principal is not a service principal. During the migration to UC storage
credentials, managed identities are currently ignored. These changes
improve handling of managed identities during the crawling of
`StoragePermissionMapping`, prevent errors when creating storage
credentials with managed identities, and address issue
[#339](#339). The changes
are tested through unit tests, manual testing, and integration tests,
and only affect the `StoragePermissionMapping` class and related
methods, without introducing new commands, workflows, or tables.
* Added migration for Azure Service Principals with secrets stored in
Databricks Secret to UC Storage Credentials
([#874](#874)). In this
release, we have made significant updates to migrate Azure Service
Principals with their secrets stored in Databricks Secret to UC Storage
Credentials, enhancing security and management of storage access. The
changes include: Addition of a new `migrate_credentials` command in the
`labs.yml` file to migrate credentials for storage access to UC storage
credential. Modification of `secrets.py` to handle the case where a
secret has been removed from the backend and to log warning messages for
secrets with invalid Base64 bytes. Introduction of the
`StorageCredentialManager` and `ServicePrincipalMigration` classes in
`credentials.py` to manage Azure Service Principals and their associated
client secrets, and to migrate them to UC Storage Credentials. Addition
of a new `directory_id` attribute in the `Principal` class and its
associated dataclass in `resources.py` to store the directory ID for
creating UC storage credentials using a service principal. Creation of a
new pytest fixture, `make_storage_credential_spn`, in `fixtures.py` to
simplify writing tests requiring Databricks Storage Credentials with
Azure Service Principal auth. Addition of a new test file for the Azure
integration of the project, including new classes, methods, and test
cases for testing the migration of Azure Service Principals to UC
Storage Credentials. These improvements will ensure better security and
management of storage access using Azure Service Principals, while
providing more efficient and robust testing capabilities.
* Added permission migration support for feature tables and the root
permissions for models and feature tables
([#997](#997)). This commit
introduces support for migration of permissions related to feature
tables and sets root permissions for models and feature tables. New
functions such as `feature_store_listing`, `feature_tables_root_page`,
`models_root_page`, and `tokens_and_passwords` have been added to
facilitate population of a workspace access page with necessary
permissions information. The `factory` function in `manager.py` has been
updated to include new listings for models' root page, feature tables'
root page, and the feature store for enhanced management and access
control of models and feature tables. New classes and methods have been
implemented to handle permissions for these resources, utilizing
`GenericPermissionsSupport`, `AccessControlRequest`, and `MigratedGroup`
classes. Additionally, new test methods have been included to verify
feature tables listing functionality and root page listing functionality
for feature tables and registered models. The test manager method has
been updated to include `feature-tables` in the list of items to be
checked for permissions, ensuring comprehensive testing of permission
functionality related to these new feature tables.
* Added support for serving endpoints
([#990](#990)). In this
release, we have made significant enhancements to support serving
endpoints in our open-source library. The `fixtures.py` file in the
`databricks.labs.ucx.mixins` module has been updated with new classes
and functions to create and manage serving endpoints, accompanied by
integration tests to verify their functionality. We have added a new
listing for serving endpoints in the assessment's permissions crawling,
using the `ws.serving_endpoints.list` function and the
`serving-endpoints` category. A new integration test, "test_endpoints,"
has been added to verify that assessments now crawl permissions for
serving endpoints. This test demonstrates the ability to migrate
permissions from one group to another. The test suite has been updated
to ensure the proper functioning of the new feature and improve the
assessment of permissions for serving endpoints, ensuring compatibility
with the updated `test_manager.py` file.
* Expanded end-user documentation with detailed descriptions for
workflows and commands
([#999](#999)). The
Databricks Labs UCX project has been updated with several new features
to assist in upgrading to Unity Catalog, including an assessment
workflow that generates a detailed compatibility report for workspace
entities, a group migration workflow for upgrading all Databricks
workspace assets, and utility commands for managing cross-workspace
installations. The Assessment Report now includes a more detailed
summary of the assessment findings, table counts, database summaries,
and external locations. Additional improvements include expanded
workspace group migration to handle potential conflicts with locally
scoped group names, enhanced documentation for external Hive Metastore
integration, a new debugging notebook, and detailed descriptions of
table upgrade considerations, data access permissions, external storage,
and table crawler.
* Fixed `config.yml` upgrade from very old versions
([#984](#984)). In this
release, we've introduced enhancements to the configuration upgrading
process for `config.yml` in our open-source library. We've replaced the
previous `v1_migrate` class method with a new implementation that
specifically handles migration from version 1. The new method retrieves
the `groups` field, extracts the `selected` value, and assigns it to the
`include_group_names` key in the configuration. The
`backup_group_prefix` value from the `groups` field is assigned to the
`renamed_group_prefix` key, and the `groups` field is removed, with the
version number updated to 2. These changes simplify the code and improve
readability, enabling users to upgrade smoothly from version 1 of the
configuration. Furthermore, we've added new unit tests to the
`test_config.py` file to ensure backward compatibility. Two new tests,
`test_v1_migrate_zeroconf` and `test_v1_migrate_some_conf`, have been
added, utilizing the `MockInstallation` class and loading the
configuration using `WorkspaceConfig`. These tests enhance the
robustness and reliability of the migration process for `config.yml`.
* Renamed columns in assessment SQL queries to use actual names, not
aliases ([#983](#983)). In
this update, we have resolved an issue where aliases used for column
references in SQL queries caused errors in certain setups by renaming
them to use actual names. Specifically, for assessment SQL queries, we
have modified the definition of the `is_delta` column to use the actual
`table_format` name instead of the alias `format`. This change improves
compatibility and enhances the reliability of query execution. As a
software engineer, you will appreciate that this modification ensures
consistent interpretation of column references across various setups,
thereby avoiding potential errors caused by aliases. This change does
not introduce any new methods, but instead modifies existing
functionality to use actual column names, ensuring a more reliable and
consistent SQL query for the `05_0_all_tables` assessment.
* Updated groups permissions validation to use Table ACL cluster
([#979](#979)). In this
update, the `validate_groups_permissions` task has been modified to
utilize the Table ACL cluster, as indicated by the inclusion of
`job_cluster="tacl"`. This task is responsible for ensuring that all
crawled permissions are accurately applied to the destination groups by
calling the `permission_manager.apply_group_permissions` method during
the migration state. This modification enhances the validation of group
permissions by performing it on the Table ACL cluster, potentially
improving performance or functionality. If you are implementing this
project, it is crucial to comprehend the consequences of this change on
your permissions validation process and adjust your workflows
appropriately.
nkvuong added a commit that referenced this pull request Mar 6, 2024
add integration tests

fix

-

Fix integration tests on AWS (#978)

Update groups permissions validation to use Table ACL cluster (#979)

Renamed columns in assessment SQL queries to use actual names, not aliases (#983)

<!-- Summary of your changes that are easy to understand. Add
screenshots when necessary -->
Aliases are usually not allowed in projections (as they are replaced
later in the query execution phases). While the DBSQL was smart enough
to handle the references via aliases, for some setups this results in an
error. Changing column references to use actual names fixes this.

<!-- DOC: Link issue with a keyword: close, closes, closed, fix, fixes,
fixed, resolve, resolves, resolved. See
https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword
-->

Resolves #980

- [ ] added relevant user documentation
- [ ] added new CLI command
- [ ] modified existing command: `databricks labs ucx ...`
- [ ] added a new workflow
- [ ] modified existing workflow: `...`
- [ ] added a new table
- [ ] modified existing table: `...`

<!-- How is this tested? Please see the checklist below and also
describe any other relevant tests -->

- [x] manually tested
- [ ] added unit tests
- [ ] added integration tests
- [ ] verified on staging environment (screenshot attached)

Fixed `config.yml` upgrade from very old versions (#984)

Add a command to create account level groups if they do not exist (#763)

Attempt to fix
- #17
- #649

Adds a command to create groups at account level by crawling all
workspaces configured in the account and in scope of the migration

This pull request adds several new methods to the `account.py` file in
the `databricks/labs/ucx` directory. The main method added is
`create_account_level_groups`, which crawls all workspaces in an account
and creates account-level groups if a workspace-local group is not
present in the account. The method `get_valid_workspaces_groups` is
added to retrieve a dictionary of all valid workspace groups, while
`has_not_same_members` checks if two groups have the same members. The
method `get_account_groups` retrieves a dictionary of all account
groups.

Regarding the tests, the `test_account.py` file has been updated to
include new tests for the `create_account_level_groups` method. The test
`test_create_acc_groups_should_create_acc_group_if_no_group_found`
verifies that an account-level group is created if no group with the
same name is found. The test
`test_create_acc_groups_should_filter_groups_in_other_workspaces` checks
that the method filters groups present in other workspaces and only
creates groups that are not present in the account.

Additionally, the `cli.py` file has been updated to include a new
command, `create_account_level_groups`, which uploads workspace config
to all workspaces in the account where ucx is installed.

Added tokei.rs lines of code badge (#988)

[![lines of
code](https://tokei.rs/b1/github/databrickslabs/ucx)]([https://codecov.io/github/databrickslabs/ucx](https://github.com/databrickslabs/ucx))

Adding support for serving endpoints (#990)

Assessment did not crawled permissions for serving endpoints, this PR
aims to fix it

- [X] added integration tests

Added `upgraded_from_workspace_id` property to migrated tables to indicated the source workspace. (#987)

Added table parameter `upgraded_from_ws` to migrated tables. The
parameters contains the sources workspace id.

Resolves #899

- [ ] added relevant user documentation
- [ ] added new CLI command
- [ ] modified existing command: `databricks labs ucx ...`
- [ ] added a new workflow
- [ ] modified existing workflow: `...`
- [ ] added a new table
- [ ] modified existing table: `...`

<!-- How is this tested? Please see the checklist below and also
describe any other relevant tests -->

- [x] manually tested
- [x] added unit tests
- [x] added integration tests
- [x] verified on staging environment (screenshot attached)

Handle None directory_id if managed identity encountered during the crawling of StoragePermissionMapping  (#986)

While creating StoragePermissionMapping, a principal could be managed
identity which does not have directory_id. This PR will allow managed
identity to be stored in StoragePermissionMapping, and allow None
directory_id.

<!-- Summary of your changes that are easy to understand. Add
screenshots when necessary -->

- Add `type` field to dataclass `StoragePermissionMapping` and
`Principal` to indicate if a principal is service principal or managed
identity.
- Allow None `directory_id` if the principal is not a service principal.
- Ignore the managed identity while migrating to UC storage credentials
for now.

<!-- DOC: Link issue with a keyword: close, closes, closed, fix, fixes,
fixed, resolve, resolves, resolved. See
https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword
-->

fix #339

- [ ] added relevant user documentation
- [ ] added new CLI command
- [ ] modified existing command: `databricks labs ucx ...`
- [ ] added a new workflow
- [ ] modified existing workflow: `...`
- [ ] added a new table
- [ ] modified existing table: `...`

<!-- How is this tested? Please see the checklist below and also
describe any other relevant tests -->

- [ ] manually tested
- [ ] added unit tests
- [ ] added integration tests
- [ ] verified on staging environment (screenshot attached)

Added group members difference to the output of `validate-groups-membership` cli command (#995)

The `validate-groups-membership` command has been updated to include a
comparison of group memberships at both the account and workspace
levels, displaying the difference in members between the two levels in a
new column. This enhancement allows for a more detailed analysis of
group memberships, with the added functionality implemented in the
`validate_group_membership` function in the `groups.py` file located in
the `databricks/labs/ucx/workspace_access` directory. A new output
field, "group\_members\_difference," has been added to represent the
difference in the number of members between a workspace group and an
associated account group. The corresponding unit test file,
"test\_groups.py," has been updated to include a new test case that
verifies the calculation of the "group\_members\_difference" value. This
change provides users with a more comprehensive view of their group
memberships and allows them to easily identify any discrepancies between
the account and workspace levels. The functionality of the other
commands remains unchanged.

Added permission migration support for feature tables and the root permissions for models and feature tables  (#997)

Improved installation integration test flakiness (#998)

- improved `_infer_error_from_job_run` and `_infer_error_from_task_run`
to also catch `KeyError` and `ValueError`
- removed retries for `Unknown` errors for installation tests

Added assessment for the incompatible `RunSubmit` API usages (#849)

Expanded end-user documentation with detailed descriptions for workflows and commands (#999)

The Databricks Labs UCX project has been updated with several new
features to assist in upgrading to Unity Catalog. These include various
workflows and command-line utilities, such as an assessment workflow
that generates a detailed compatibility report for workspace entities
and a group migration workflow to upgrade all Databricks workspace
assets. Additionally, new utility commands have been added for managing
cross-workspace installations, and users can now view deployed
workflows' status and repair failed workflows. A new end-user
documentation has also been introduced, featuring comprehensive
descriptions of workflows, commands, and an assessment report image. The
Assessment Report, generated from UCX tools, now includes a more
detailed summary of the assessment findings, table counts, database
summaries, and external locations. Improved documentation for external
Hive Metastore integration and a new debugging notebook are also
included in this release. Lastly, the workspace group migration feature
has been expanded to handle potential conflicts when migrating multiple
workspaces with locally scoped group names.

Release v0.14.0 (#1000)

* Added `upgraded_from_workspace_id` property to migrated tables to
indicated the source workspace
([#987](#987)). In this
release, updates have been made to the `_migrate_external_table`,
`_migrate_dbfs_root_table`, and `_migrate_view` methods in the
`table_migrate.py` file to include a new parameter `upgraded_from_ws` in
the SQL commands used to alter tables, views, or managed tables. This
parameter is used to store the source workspace ID in the migrated
tables, indicating the migration origin. A new utility method
`sql_alter_from` has been added to the `Table` class in `tables.py` to
generate the SQL command with the new parameter. Additionally, a new
class-level attribute `UPGRADED_FROM_WS_PARAM` has been added to the
`Table` class in `tables.py` to indicate the source workspace. A new
property `upgraded_from_workspace_id` has been added to migrated tables
to store the source workspace ID. These changes resolve issue
[#899](#899) and are tested
through manual testing, unit tests, and integration tests. No new CLI
commands, workflows, or tables have been added or modified, and there
are no changes to user documentation.
* Added a command to create account level groups if they do not exist
([#763](#763)). This commit
introduces a new feature that enables the creation of account-level
groups if they do not already exist in the account. A new command,
`create-account-groups`, has been added to the `databricks labs ucx`
tool, which crawls all workspaces in the account and creates
account-level groups if a corresponding workspace-local group is not
found. The feature supports various scenarios, including creating
account-level groups that exist in some workspaces but not in others,
and creating multiple account-level groups with the same name but
different members. Several new methods have been added to the
`account.py` file to support the new feature, and the `test_account.py`
file has been updated with new tests to ensure the correct behavior of
the `create_account_level_groups` method. Additionally, the `cli.py`
file has been updated to include the new `create-account-groups`
command. With these changes, users can easily manage account-level
groups and ensure that they are consistent across all workspaces in the
account, improving the overall user experience.
* Added assessment for the incompatible `RunSubmit` API usages
([#849](#849)). In this
release, the assessment functionality for incompatible `RunSubmit` API
usages has been significantly enhanced through various changes. The
'clusters.py' file has seen improvements in clarity and consistency with
the renaming of private methods `check_spark_conf` to
`_check_spark_conf` and `check_cluster_failures` to
`_check_cluster_failures`. The `_assess_clusters` method has been
updated to call the renamed `_check_cluster_failures` method for
thorough checks of cluster configurations, resulting in better
assessment functionality. A new `SubmitRunsCrawler` class has been added
to the `databricks.labs.ucx.assessment.jobs` module, implementing
`CrawlerBase`, `JobsMixin`, and `CheckClusterMixin` classes. This class
crawls and assesses job runs based on their submitted runs, ensuring
compatibility and identifying failure issues. Additionally, a new
configuration attribute, `num_days_submit_runs_history`, has been
introduced in the `WorkspaceConfig` class of the `config.py` module,
controlling the number of days for which submission history of
`RunSubmit` API calls is retained. Lastly, various new JSON files have
been added for unit testing, assessing the `RunSubmit` API usages
related to different scenarios like dbt task runs, Git source-based job
runs, JAR file runs, and more. These tests will aid in identifying and
addressing potential compatibility issues with the `RunSubmit` API.
* Added group members difference to the output of
`validate-groups-membership` cli command
([#995](#995)). The
`validate-groups-membership` command has been updated to include a
comparison of group memberships at both the account and workspace
levels. This enhancement is implemented through the
`validate_group_membership` function, which has been updated to
calculate the difference in members between the two levels and display
it in a new `group_members_difference` column. This allows for a more
detailed analysis of group memberships and easily identifies any
discrepancies between the account and workspace levels. The
corresponding unit test file, "test_groups.py," has been updated to
include a new test case that verifies the calculation of the
`group_members_difference` value. The functionality of the other
commands remains unchanged. The new `group_members_difference` value is
calculated as the difference in the number of members in the workspace
group and the account group, with a positive value indicating more
members in the workspace group and a negative value indicating more
members in the account group. The table template in the labs.yml file
has also been updated to include the new column for the group membership
difference.
* Added handling for empty `directory_id` if managed identity
encountered during the crawling of StoragePermissionMapping
([#986](#986)). This PR adds
a `type` field to the `StoragePermissionMapping` and `Principal`
dataclasses to differentiate between service principals and managed
identities, allowing `None` for the `directory_id` field if the
principal is not a service principal. During the migration to UC storage
credentials, managed identities are currently ignored. These changes
improve handling of managed identities during the crawling of
`StoragePermissionMapping`, prevent errors when creating storage
credentials with managed identities, and address issue
[#339](#339). The changes
are tested through unit tests, manual testing, and integration tests,
and only affect the `StoragePermissionMapping` class and related
methods, without introducing new commands, workflows, or tables.
* Added migration for Azure Service Principals with secrets stored in
Databricks Secret to UC Storage Credentials
([#874](#874)). In this
release, we have made significant updates to migrate Azure Service
Principals with their secrets stored in Databricks Secret to UC Storage
Credentials, enhancing security and management of storage access. The
changes include: Addition of a new `migrate_credentials` command in the
`labs.yml` file to migrate credentials for storage access to UC storage
credential. Modification of `secrets.py` to handle the case where a
secret has been removed from the backend and to log warning messages for
secrets with invalid Base64 bytes. Introduction of the
`StorageCredentialManager` and `ServicePrincipalMigration` classes in
`credentials.py` to manage Azure Service Principals and their associated
client secrets, and to migrate them to UC Storage Credentials. Addition
of a new `directory_id` attribute in the `Principal` class and its
associated dataclass in `resources.py` to store the directory ID for
creating UC storage credentials using a service principal. Creation of a
new pytest fixture, `make_storage_credential_spn`, in `fixtures.py` to
simplify writing tests requiring Databricks Storage Credentials with
Azure Service Principal auth. Addition of a new test file for the Azure
integration of the project, including new classes, methods, and test
cases for testing the migration of Azure Service Principals to UC
Storage Credentials. These improvements will ensure better security and
management of storage access using Azure Service Principals, while
providing more efficient and robust testing capabilities.
* Added permission migration support for feature tables and the root
permissions for models and feature tables
([#997](#997)). This commit
introduces support for migration of permissions related to feature
tables and sets root permissions for models and feature tables. New
functions such as `feature_store_listing`, `feature_tables_root_page`,
`models_root_page`, and `tokens_and_passwords` have been added to
facilitate population of a workspace access page with necessary
permissions information. The `factory` function in `manager.py` has been
updated to include new listings for models' root page, feature tables'
root page, and the feature store for enhanced management and access
control of models and feature tables. New classes and methods have been
implemented to handle permissions for these resources, utilizing
`GenericPermissionsSupport`, `AccessControlRequest`, and `MigratedGroup`
classes. Additionally, new test methods have been included to verify
feature tables listing functionality and root page listing functionality
for feature tables and registered models. The test manager method has
been updated to include `feature-tables` in the list of items to be
checked for permissions, ensuring comprehensive testing of permission
functionality related to these new feature tables.
* Added support for serving endpoints
([#990](#990)). In this
release, we have made significant enhancements to support serving
endpoints in our open-source library. The `fixtures.py` file in the
`databricks.labs.ucx.mixins` module has been updated with new classes
and functions to create and manage serving endpoints, accompanied by
integration tests to verify their functionality. We have added a new
listing for serving endpoints in the assessment's permissions crawling,
using the `ws.serving_endpoints.list` function and the
`serving-endpoints` category. A new integration test, "test_endpoints,"
has been added to verify that assessments now crawl permissions for
serving endpoints. This test demonstrates the ability to migrate
permissions from one group to another. The test suite has been updated
to ensure the proper functioning of the new feature and improve the
assessment of permissions for serving endpoints, ensuring compatibility
with the updated `test_manager.py` file.
* Expanded end-user documentation with detailed descriptions for
workflows and commands
([#999](#999)). The
Databricks Labs UCX project has been updated with several new features
to assist in upgrading to Unity Catalog, including an assessment
workflow that generates a detailed compatibility report for workspace
entities, a group migration workflow for upgrading all Databricks
workspace assets, and utility commands for managing cross-workspace
installations. The Assessment Report now includes a more detailed
summary of the assessment findings, table counts, database summaries,
and external locations. Additional improvements include expanded
workspace group migration to handle potential conflicts with locally
scoped group names, enhanced documentation for external Hive Metastore
integration, a new debugging notebook, and detailed descriptions of
table upgrade considerations, data access permissions, external storage,
and table crawler.
* Fixed `config.yml` upgrade from very old versions
([#984](#984)). In this
release, we've introduced enhancements to the configuration upgrading
process for `config.yml` in our open-source library. We've replaced the
previous `v1_migrate` class method with a new implementation that
specifically handles migration from version 1. The new method retrieves
the `groups` field, extracts the `selected` value, and assigns it to the
`include_group_names` key in the configuration. The
`backup_group_prefix` value from the `groups` field is assigned to the
`renamed_group_prefix` key, and the `groups` field is removed, with the
version number updated to 2. These changes simplify the code and improve
readability, enabling users to upgrade smoothly from version 1 of the
configuration. Furthermore, we've added new unit tests to the
`test_config.py` file to ensure backward compatibility. Two new tests,
`test_v1_migrate_zeroconf` and `test_v1_migrate_some_conf`, have been
added, utilizing the `MockInstallation` class and loading the
configuration using `WorkspaceConfig`. These tests enhance the
robustness and reliability of the migration process for `config.yml`.
* Renamed columns in assessment SQL queries to use actual names, not
aliases ([#983](#983)). In
this update, we have resolved an issue where aliases used for column
references in SQL queries caused errors in certain setups by renaming
them to use actual names. Specifically, for assessment SQL queries, we
have modified the definition of the `is_delta` column to use the actual
`table_format` name instead of the alias `format`. This change improves
compatibility and enhances the reliability of query execution. As a
software engineer, you will appreciate that this modification ensures
consistent interpretation of column references across various setups,
thereby avoiding potential errors caused by aliases. This change does
not introduce any new methods, but instead modifies existing
functionality to use actual column names, ensuring a more reliable and
consistent SQL query for the `05_0_all_tables` assessment.
* Updated groups permissions validation to use Table ACL cluster
([#979](#979)). In this
update, the `validate_groups_permissions` task has been modified to
utilize the Table ACL cluster, as indicated by the inclusion of
`job_cluster="tacl"`. This task is responsible for ensuring that all
crawled permissions are accurately applied to the destination groups by
calling the `permission_manager.apply_group_permissions` method during
the migration state. This modification enhances the validation of group
permissions by performing it on the Table ACL cluster, potentially
improving performance or functionality. If you are implementing this
project, it is crucial to comprehend the consequences of this change on
your permissions validation process and adjust your workflows
appropriately.

Update databricks-labs-blueprint requirement from ~=0.2.4 to ~=0.3.0 (#1001)

Updates the requirements on
[databricks-labs-blueprint](https://github.com/databrickslabs/blueprint)
to permit the latest version.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/databrickslabs/blueprint/releases">databricks-labs-blueprint's
releases</a>.</em></p>
<blockquote>
<h2>v0.3.0</h2>
<ul>
<li>Added automated upgrade framework (<a
href="https://redirect.github.com/databrickslabs/blueprint/issues/50">#50</a>).
This update introduces an automated upgrade framework for managing and
applying upgrades to the product, with a new <code>upgrades.py</code>
file that includes a <code>ProductInfo</code> class having methods for
version handling, wheel building, and exception handling. The test code
organization has been improved, and new test cases, functions, and a
directory structure for fixtures and unit tests have been added for the
upgrades functionality. The <code>test_wheels.py</code> file now checks
the version of the Databricks SDK and handles cases where the version
marker is missing or does not contain the <code>__version__</code>
variable. Additionally, a new <code>Application State Migrations</code>
section has been added to the README, explaining the process of seamless
upgrades from version X to version Z through version Y, addressing the
need for configuration or database state migrations as the application
evolves. Users can apply these upgrades by following an idiomatic usage
pattern involving several classes and functions. Furthermore,
improvements have been made to the <code>_trim_leading_whitespace</code>
function in the <code>commands.py</code> file of the
<code>databricks.labs.blueprint</code> module, ensuring accurate and
consistent removal of leading whitespace for each line in the command
string, leading to better overall functionality and
maintainability.</li>
<li>Added brute-forcing <code>SerdeError</code> with
<code>as_dict()</code> and <code>from_dict()</code> (<a
href="https://redirect.github.com/databrickslabs/blueprint/issues/58">#58</a>).
This commit introduces a brute-forcing approach for handling
<code>SerdeError</code> using <code>as_dict()</code> and
<code>from_dict()</code> methods in an open-source library. The new
<code>SomePolicy</code> class demonstrates the usage of these methods
for manual serialization and deserialization of custom classes. The
<code>as_dict()</code> method returns a dictionary representation of the
class instance, and the <code>from_dict()</code> method, decorated with
<code>@classmethod</code>, creates a new instance from the provided
dictionary. Additionally, the GitHub Actions workflow for acceptance
tests has been updated to include the <code>ready_for_review</code>
event type, ensuring that tests run not only for opened and synchronized
pull requests but also when marked as &quot;ready for review.&quot;
These changes provide developers with more control over the
deserialization process and facilitate debugging in cases where default
deserialization fails, but should be used judiciously to avoid brittle
code.</li>
<li>Fixed nightly integration tests run as service principals (<a
href="https://redirect.github.com/databrickslabs/blueprint/issues/52">#52</a>).
In this release, we have enhanced the compatibility of our codebase with
service principals, particularly in the context of nightly integration
tests. The <code>Installation</code> class in the
<code>databricks.labs.blueprint.installation</code> module has been
refactored, deprecating the <code>current</code> method and introducing
two new methods: <code>assume_global</code> and
<code>assume_user_home</code>. These methods enable users to install and
manage <code>blueprint</code> as either a global or user-specific
installation. Additionally, the <code>existing</code> method has been
updated to work with the new <code>Installation</code> methods. In the
test suite, the <code>test_installation.py</code> file has been updated
to correctly detect global and user-specific installations when running
as a service principal. These changes improve the testability and
functionality of our software, ensuring seamless operation with service
principals during nightly integration tests.</li>
<li>Made <code>test_existing_installations_are_detected</code> more
resilient (<a
href="https://redirect.github.com/databrickslabs/blueprint/issues/51">#51</a>).
In this release, we have added a new test function
<code>test_existing_installations_are_detected</code> that checks if
existing installations are correctly detected and retries the test for
up to 15 seconds if they are not. This improves the reliability of the
test by making it more resilient to potential intermittent failures. We
have also added an import from <code>databricks.sdk.retries</code> named
<code>retried</code> which is used to retry the test function in case of
an <code>AssertionError</code>. Additionally, the test function
<code>test_existing</code> has been renamed to
<code>test_existing_installations_are_detected</code> and the
<code>xfail</code> marker has been removed. We have also renamed the
test function <code>test_dataclass</code> to
<code>test_loading_dataclass_from_installation</code> for better
clarity. This change will help ensure that the library is correctly
detecting existing installations and improve the overall quality of the
codebase.</li>
</ul>
<p>Contributors: <a
href="https://github.com/nfx"><code>@​nfx</code></a></p>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/databrickslabs/blueprint/blob/main/CHANGELOG.md">databricks-labs-blueprint's
changelog</a>.</em></p>
<blockquote>
<h2>0.3.0</h2>
<ul>
<li>Added automated upgrade framework (<a
href="https://redirect.github.com/databrickslabs/blueprint/issues/50">#50</a>).
This update introduces an automated upgrade framework for managing and
applying upgrades to the product, with a new <code>upgrades.py</code>
file that includes a <code>ProductInfo</code> class having methods for
version handling, wheel building, and exception handling. The test code
organization has been improved, and new test cases, functions, and a
directory structure for fixtures and unit tests have been added for the
upgrades functionality. The <code>test_wheels.py</code> file now checks
the version of the Databricks SDK and handles cases where the version
marker is missing or does not contain the <code>__version__</code>
variable. Additionally, a new <code>Application State Migrations</code>
section has been added to the README, explaining the process of seamless
upgrades from version X to version Z through version Y, addressing the
need for configuration or database state migrations as the application
evolves. Users can apply these upgrades by following an idiomatic usage
pattern involving several classes and functions. Furthermore,
improvements have been made to the <code>_trim_leading_whitespace</code>
function in the <code>commands.py</code> file of the
<code>databricks.labs.blueprint</code> module, ensuring accurate and
consistent removal of leading whitespace for each line in the command
string, leading to better overall functionality and
maintainability.</li>
<li>Added brute-forcing <code>SerdeError</code> with
<code>as_dict()</code> and <code>from_dict()</code> (<a
href="https://redirect.github.com/databrickslabs/blueprint/issues/58">#58</a>).
This commit introduces a brute-forcing approach for handling
<code>SerdeError</code> using <code>as_dict()</code> and
<code>from_dict()</code> methods in an open-source library. The new
<code>SomePolicy</code> class demonstrates the usage of these methods
for manual serialization and deserialization of custom classes. The
<code>as_dict()</code> method returns a dictionary representation of the
class instance, and the <code>from_dict()</code> method, decorated with
<code>@classmethod</code>, creates a new instance from the provided
dictionary. Additionally, the GitHub Actions workflow for acceptance
tests has been updated to include the <code>ready_for_review</code>
event type, ensuring that tests run not only for opened and synchronized
pull requests but also when marked as &quot;ready for review.&quot;
These changes provide developers with more control over the
deserialization process and facilitate debugging in cases where default
deserialization fails, but should be used judiciously to avoid brittle
code.</li>
<li>Fixed nightly integration tests run as service principals (<a
href="https://redirect.github.com/databrickslabs/blueprint/issues/52">#52</a>).
In this release, we have enhanced the compatibility of our codebase with
service principals, particularly in the context of nightly integration
tests. The <code>Installation</code> class in the
<code>databricks.labs.blueprint.installation</code> module has been
refactored, deprecating the <code>current</code> method and introducing
two new methods: <code>assume_global</code> and
<code>assume_user_home</code>. These methods enable users to install and
manage <code>blueprint</code> as either a global or user-specific
installation. Additionally, the <code>existing</code> method has been
updated to work with the new <code>Installation</code> methods. In the
test suite, the <code>test_installation.py</code> file has been updated
to correctly detect global and user-specific installations when running
as a service principal. These changes improve the testability and
functionality of our software, ensuring seamless operation with service
principals during nightly integration tests.</li>
<li>Made <code>test_existing_installations_are_detected</code> more
resilient (<a
href="https://redirect.github.com/databrickslabs/blueprint/issues/51">#51</a>).
In this release, we have added a new test function
<code>test_existing_installations_are_detected</code> that checks if
existing installations are correctly detected and retries the test for
up to 15 seconds if they are not. This improves the reliability of the
test by making it more resilient to potential intermittent failures. We
have also added an import from <code>databricks.sdk.retries</code> named
<code>retried</code> which is used to retry the test function in case of
an <code>AssertionError</code>. Additionally, the test function
<code>test_existing</code> has been renamed to
<code>test_existing_installations_are_detected</code> and the
<code>xfail</code> marker has been removed. We have also renamed the
test function <code>test_dataclass</code> to
<code>test_loading_dataclass_from_installation</code> for better
clarity. This change will help ensure that the library is correctly
detecting existing installations and improve the overall quality of the
codebase.</li>
</ul>
<h2>0.2.5</h2>
<ul>
<li>Automatically enable workspace filesystem if the feature is disabled
(<a
href="https://redirect.github.com/databrickslabs/blueprint/pull/42">#42</a>).</li>
</ul>
<h2>0.2.4</h2>
<ul>
<li>Added more integration tests for <code>Installation</code> (<a
href="https://redirect.github.com/databrickslabs/blueprint/pull/39">#39</a>).</li>
<li>Fixed <code>yaml</code> optional import error (<a
href="https://redirect.github.com/databrickslabs/blueprint/pull/38">#38</a>).</li>
</ul>
<h2>0.2.3</h2>
<ul>
<li>Added special handling for notebooks in
<code>Installation.upload(...)</code> (<a
href="https://redirect.github.com/databrickslabs/blueprint/pull/36">#36</a>).</li>
</ul>
<h2>0.2.2</h2>
<ul>
<li>Fixed issues with uploading wheels to DBFS and loading a
non-existing install state (<a
href="https://redirect.github.com/databrickslabs/blueprint/pull/34">#34</a>).</li>
</ul>
<h2>0.2.1</h2>
<ul>
<li>Aligned <code>Installation</code> framework with UCX project (<a
href="https://redirect.github.com/databrickslabs/blueprint/pull/32">#32</a>).</li>
</ul>
<h2>0.2.0</h2>
<ul>
<li>Added common install state primitives with strong typing (<a
href="https://redirect.github.com/databrickslabs/blueprint/pull/27">#27</a>).</li>
<li>Added documentation for Invoking Databricks Connect (<a
href="https://redirect.github.com/databrickslabs/blueprint/pull/28">#28</a>).</li>
<li>Added more documentation for Databricks CLI command router (<a
href="https://redirect.github.com/databrickslabs/blueprint/pull/30">#30</a>).</li>
<li>Enforced <code>pylint</code> standards (<a
href="https://redirect.github.com/databrickslabs/blueprint/pull/29">#29</a>).</li>
</ul>
<h2>0.1.0</h2>
<ul>
<li>Changed python requirement from 3.10.6 to 3.10 (<a
href="https://redirect.github.com/databrickslabs/blueprint/pull/25">#25</a>).</li>
</ul>
<h2>0.0.6</h2>
<ul>
<li>Make <code>find_project_root</code> more deterministic (<a
href="https://redirect.github.com/databrickslabs/blueprint/pull/23">#23</a>).</li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="https://github.com/databrickslabs/blueprint/commit/905e5ff5303a005d48bc98d101a613afeda15d51"><code>905e5ff</code></a>
Release v0.3.0 (<a
href="https://redirect.github.com/databrickslabs/blueprint/issues/59">#59</a>)</li>
<li><a
href="https://github.com/databrickslabs/blueprint/commit/a029f6bb1ecf807017754e298ea685326dbedf72"><code>a029f6b</code></a>
Added brute-forcing <code>SerdeError</code> with <code>as_dict()</code>
and <code>from_dict()</code> (<a
href="https://redirect.github.com/databrickslabs/blueprint/issues/58">#58</a>)</li>
<li><a
href="https://github.com/databrickslabs/blueprint/commit/c8a74f4129b4592d365aac9670eb86069f3517f7"><code>c8a74f4</code></a>
Added automated upgrade framework (<a
href="https://redirect.github.com/databrickslabs/blueprint/issues/50">#50</a>)</li>
<li><a
href="https://github.com/databrickslabs/blueprint/commit/24e62ef4f060e43e02c92a7d082d95e8bc164317"><code>24e62ef</code></a>
Don't run integration tests on draft pull requests (<a
href="https://redirect.github.com/databrickslabs/blueprint/issues/55">#55</a>)</li>
<li><a
href="https://github.com/databrickslabs/blueprint/commit/b4dd5abf4eaf8d022ae0b6ec7e659296ec3d2f37"><code>b4dd5ab</code></a>
Added tokei.rs badge (<a
href="https://redirect.github.com/databrickslabs/blueprint/issues/54">#54</a>)</li>
<li><a
href="https://github.com/databrickslabs/blueprint/commit/01d9467f425763ab08035001270593253bce11f0"><code>01d9467</code></a>
Fixed nightly integration tests run as service principals (<a
href="https://redirect.github.com/databrickslabs/blueprint/issues/52">#52</a>)</li>
<li><a
href="https://github.com/databrickslabs/blueprint/commit/aa5714179c65be8e13f54601e1d1fcd70548342d"><code>aa57141</code></a>
Made <code>test_existing_installations_are_detected</code> more
resilient (<a
href="https://redirect.github.com/databrickslabs/blueprint/issues/51">#51</a>)</li>
<li><a
href="https://github.com/databrickslabs/blueprint/commit/9cbc6f863d3ea06659f37939cf1b97115dd873bd"><code>9cbc6f8</code></a>
Bump <code>databrickslabs/sandbox/acceptance</code> to v0.1.0 (<a
href="https://redirect.github.com/databrickslabs/blueprint/issues/48">#48</a>)</li>
<li><a
href="https://github.com/databrickslabs/blueprint/commit/22fc1a8787b8e98de03048595202f88b7ddb9b94"><code>22fc1a8</code></a>
Use <code>databrickslabs/sandbox/acceptance</code> action (<a
href="https://redirect.github.com/databrickslabs/blueprint/issues/45">#45</a>)</li>
<li><a
href="https://github.com/databrickslabs/blueprint/commit/c7e47abd82b2f04e95b1d91f346cc1ea6df43961"><code>c7e47ab</code></a>
Release v0.2.5 (<a
href="https://redirect.github.com/databrickslabs/blueprint/issues/44">#44</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/databrickslabs/blueprint/compare/v0.2.4...v0.3.0">compare
view</a></li>
</ul>
</details>
<br />

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)

</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

Run integration tests only for pull requests ready for review (#1002)

Tested on https://github.com/databrickslabs/blueprint

Reducing flakiness of create account groups (#1003)

Prompt user if Terraform utilised for deploying infrastructure (#1004)

Added prompt is_terraform_used and updated the same in the config of
WorkspaceInstaller

Resolves #393

---------

Co-authored-by: Serge Smertin <259697+nfx@users.noreply.github.com>

Update CONTRIBUTING.md (#1005)

Closes #850
nkvuong added a commit that referenced this pull request Mar 6, 2024
author Vuong <vuong.nguyen@databricks.com> 1709737244 +0000
committer Vuong <vuong.nguyen@databricks.com> 1709739422 +0000

parent c866d42
author Vuong <vuong.nguyen@databricks.com> 1709737244 +0000
committer Vuong <vuong.nguyen@databricks.com> 1709739396 +0000

parent c866d42
author Vuong <vuong.nguyen@databricks.com> 1709737244 +0000
committer Vuong <vuong.nguyen@databricks.com> 1709739377 +0000

parent c866d42
author Vuong <vuong.nguyen@databricks.com> 1709737244 +0000
committer Vuong <vuong.nguyen@databricks.com> 1709739250 +0000

add trust relationship update

Fix integration tests on AWS (#978)

Update groups permissions validation to use Table ACL cluster (#979)

Renamed columns in assessment SQL queries to use actual names, not aliases (#983)

<!-- Summary of your changes that are easy to understand. Add
screenshots when necessary -->
Aliases are usually not allowed in projections (as they are replaced
later in the query execution phases). While the DBSQL was smart enough
to handle the references via aliases, for some setups this results in an
error. Changing column references to use actual names fixes this.

<!-- DOC: Link issue with a keyword: close, closes, closed, fix, fixes,
fixed, resolve, resolves, resolved. See
https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword
-->

Resolves #980

- [ ] added relevant user documentation
- [ ] added new CLI command
- [ ] modified existing command: `databricks labs ucx ...`
- [ ] added a new workflow
- [ ] modified existing workflow: `...`
- [ ] added a new table
- [ ] modified existing table: `...`

<!-- How is this tested? Please see the checklist below and also
describe any other relevant tests -->

- [x] manually tested
- [ ] added unit tests
- [ ] added integration tests
- [ ] verified on staging environment (screenshot attached)

Fixed `config.yml` upgrade from very old versions (#984)

Added `upgraded_from_workspace_id` property to migrated tables to indicated the source workspace. (#987)

Added table parameter `upgraded_from_ws` to migrated tables. The
parameters contains the sources workspace id.

Resolves #899

- [ ] added relevant user documentation
- [ ] added new CLI command
- [ ] modified existing command: `databricks labs ucx ...`
- [ ] added a new workflow
- [ ] modified existing workflow: `...`
- [ ] added a new table
- [ ] modified existing table: `...`

<!-- How is this tested? Please see the checklist below and also
describe any other relevant tests -->

- [x] manually tested
- [x] added unit tests
- [x] added integration tests
- [x] verified on staging environment (screenshot attached)

Added group members difference to the output of `validate-groups-membership` cli command (#995)

The `validate-groups-membership` command has been updated to include a
comparison of group memberships at both the account and workspace
levels, displaying the difference in members between the two levels in a
new column. This enhancement allows for a more detailed analysis of
group memberships, with the added functionality implemented in the
`validate_group_membership` function in the `groups.py` file located in
the `databricks/labs/ucx/workspace_access` directory. A new output
field, "group\_members\_difference," has been added to represent the
difference in the number of members between a workspace group and an
associated account group. The corresponding unit test file,
"test\_groups.py," has been updated to include a new test case that
verifies the calculation of the "group\_members\_difference" value. This
change provides users with a more comprehensive view of their group
memberships and allows them to easily identify any discrepancies between
the account and workspace levels. The functionality of the other
commands remains unchanged.

Improved installation integration test flakiness (#998)

- improved `_infer_error_from_job_run` and `_infer_error_from_task_run`
to also catch `KeyError` and `ValueError`
- removed retries for `Unknown` errors for installation tests

Expanded end-user documentation with detailed descriptions for workflows and commands (#999)

The Databricks Labs UCX project has been updated with several new
features to assist in upgrading to Unity Catalog. These include various
workflows and command-line utilities, such as an assessment workflow
that generates a detailed compatibility report for workspace entities
and a group migration workflow to upgrade all Databricks workspace
assets. Additionally, new utility commands have been added for managing
cross-workspace installations, and users can now view deployed
workflows' status and repair failed workflows. A new end-user
documentation has also been introduced, featuring comprehensive
descriptions of workflows, commands, and an assessment report image. The
Assessment Report, generated from UCX tools, now includes a more
detailed summary of the assessment findings, table counts, database
summaries, and external locations. Improved documentation for external
Hive Metastore integration and a new debugging notebook are also
included in this release. Lastly, the workspace group migration feature
has been expanded to handle potential conflicts when migrating multiple
workspaces with locally scoped group names.

Release v0.14.0 (#1000)

* Added `upgraded_from_workspace_id` property to migrated tables to
indicated the source workspace
([#987](#987)). In this
release, updates have been made to the `_migrate_external_table`,
`_migrate_dbfs_root_table`, and `_migrate_view` methods in the
`table_migrate.py` file to include a new parameter `upgraded_from_ws` in
the SQL commands used to alter tables, views, or managed tables. This
parameter is used to store the source workspace ID in the migrated
tables, indicating the migration origin. A new utility method
`sql_alter_from` has been added to the `Table` class in `tables.py` to
generate the SQL command with the new parameter. Additionally, a new
class-level attribute `UPGRADED_FROM_WS_PARAM` has been added to the
`Table` class in `tables.py` to indicate the source workspace. A new
property `upgraded_from_workspace_id` has been added to migrated tables
to store the source workspace ID. These changes resolve issue
[#899](#899) and are tested
through manual testing, unit tests, and integration tests. No new CLI
commands, workflows, or tables have been added or modified, and there
are no changes to user documentation.
* Added a command to create account level groups if they do not exist
([#763](#763)). This commit
introduces a new feature that enables the creation of account-level
groups if they do not already exist in the account. A new command,
`create-account-groups`, has been added to the `databricks labs ucx`
tool, which crawls all workspaces in the account and creates
account-level groups if a corresponding workspace-local group is not
found. The feature supports various scenarios, including creating
account-level groups that exist in some workspaces but not in others,
and creating multiple account-level groups with the same name but
different members. Several new methods have been added to the
`account.py` file to support the new feature, and the `test_account.py`
file has been updated with new tests to ensure the correct behavior of
the `create_account_level_groups` method. Additionally, the `cli.py`
file has been updated to include the new `create-account-groups`
command. With these changes, users can easily manage account-level
groups and ensure that they are consistent across all workspaces in the
account, improving the overall user experience.
* Added assessment for the incompatible `RunSubmit` API usages
([#849](#849)). In this
release, the assessment functionality for incompatible `RunSubmit` API
usages has been significantly enhanced through various changes. The
'clusters.py' file has seen improvements in clarity and consistency with
the renaming of private methods `check_spark_conf` to
`_check_spark_conf` and `check_cluster_failures` to
`_check_cluster_failures`. The `_assess_clusters` method has been
updated to call the renamed `_check_cluster_failures` method for
thorough checks of cluster configurations, resulting in better
assessment functionality. A new `SubmitRunsCrawler` class has been added
to the `databricks.labs.ucx.assessment.jobs` module, implementing
`CrawlerBase`, `JobsMixin`, and `CheckClusterMixin` classes. This class
crawls and assesses job runs based on their submitted runs, ensuring
compatibility and identifying failure issues. Additionally, a new
configuration attribute, `num_days_submit_runs_history`, has been
introduced in the `WorkspaceConfig` class of the `config.py` module,
controlling the number of days for which submission history of
`RunSubmit` API calls is retained. Lastly, various new JSON files have
been added for unit testing, assessing the `RunSubmit` API usages
related to different scenarios like dbt task runs, Git source-based job
runs, JAR file runs, and more. These tests will aid in identifying and
addressing potential compatibility issues with the `RunSubmit` API.
* Added group members difference to the output of
`validate-groups-membership` cli command
([#995](#995)). The
`validate-groups-membership` command has been updated to include a
comparison of group memberships at both the account and workspace
levels. This enhancement is implemented through the
`validate_group_membership` function, which has been updated to
calculate the difference in members between the two levels and display
it in a new `group_members_difference` column. This allows for a more
detailed analysis of group memberships and easily identifies any
discrepancies between the account and workspace levels. The
corresponding unit test file, "test_groups.py," has been updated to
include a new test case that verifies the calculation of the
`group_members_difference` value. The functionality of the other
commands remains unchanged. The new `group_members_difference` value is
calculated as the difference in the number of members in the workspace
group and the account group, with a positive value indicating more
members in the workspace group and a negative value indicating more
members in the account group. The table template in the labs.yml file
has also been updated to include the new column for the group membership
difference.
* Added handling for empty `directory_id` if managed identity
encountered during the crawling of StoragePermissionMapping
([#986](#986)). This PR adds
a `type` field to the `StoragePermissionMapping` and `Principal`
dataclasses to differentiate between service principals and managed
identities, allowing `None` for the `directory_id` field if the
principal is not a service principal. During the migration to UC storage
credentials, managed identities are currently ignored. These changes
improve handling of managed identities during the crawling of
`StoragePermissionMapping`, prevent errors when creating storage
credentials with managed identities, and address issue
[#339](#339). The changes
are tested through unit tests, manual testing, and integration tests,
and only affect the `StoragePermissionMapping` class and related
methods, without introducing new commands, workflows, or tables.
* Added migration for Azure Service Principals with secrets stored in
Databricks Secret to UC Storage Credentials
([#874](#874)). In this
release, we have made significant updates to migrate Azure Service
Principals with their secrets stored in Databricks Secret to UC Storage
Credentials, enhancing security and management of storage access. The
changes include: Addition of a new `migrate_credentials` command in the
`labs.yml` file to migrate credentials for storage access to UC storage
credential. Modification of `secrets.py` to handle the case where a
secret has been removed from the backend and to log warning messages for
secrets with invalid Base64 bytes. Introduction of the
`StorageCredentialManager` and `ServicePrincipalMigration` classes in
`credentials.py` to manage Azure Service Principals and their associated
client secrets, and to migrate them to UC Storage Credentials. Addition
of a new `directory_id` attribute in the `Principal` class and its
associated dataclass in `resources.py` to store the directory ID for
creating UC storage credentials using a service principal. Creation of a
new pytest fixture, `make_storage_credential_spn`, in `fixtures.py` to
simplify writing tests requiring Databricks Storage Credentials with
Azure Service Principal auth. Addition of a new test file for the Azure
integration of the project, including new classes, methods, and test
cases for testing the migration of Azure Service Principals to UC
Storage Credentials. These improvements will ensure better security and
management of storage access using Azure Service Principals, while
providing more efficient and robust testing capabilities.
* Added permission migration support for feature tables and the root
permissions for models and feature tables
([#997](#997)). This commit
introduces support for migration of permissions related to feature
tables and sets root permissions for models and feature tables. New
functions such as `feature_store_listing`, `feature_tables_root_page`,
`models_root_page`, and `tokens_and_passwords` have been added to
facilitate population of a workspace access page with necessary
permissions information. The `factory` function in `manager.py` has been
updated to include new listings for models' root page, feature tables'
root page, and the feature store for enhanced management and access
control of models and feature tables. New classes and methods have been
implemented to handle permissions for these resources, utilizing
`GenericPermissionsSupport`, `AccessControlRequest`, and `MigratedGroup`
classes. Additionally, new test methods have been included to verify
feature tables listing functionality and root page listing functionality
for feature tables and registered models. The test manager method has
been updated to include `feature-tables` in the list of items to be
checked for permissions, ensuring comprehensive testing of permission
functionality related to these new feature tables.
* Added support for serving endpoints
([#990](#990)). In this
release, we have made significant enhancements to support serving
endpoints in our open-source library. The `fixtures.py` file in the
`databricks.labs.ucx.mixins` module has been updated with new classes
and functions to create and manage serving endpoints, accompanied by
integration tests to verify their functionality. We have added a new
listing for serving endpoints in the assessment's permissions crawling,
using the `ws.serving_endpoints.list` function and the
`serving-endpoints` category. A new integration test, "test_endpoints,"
has been added to verify that assessments now crawl permissions for
serving endpoints. This test demonstrates the ability to migrate
permissions from one group to another. The test suite has been updated
to ensure the proper functioning of the new feature and improve the
assessment of permissions for serving endpoints, ensuring compatibility
with the updated `test_manager.py` file.
* Expanded end-user documentation with detailed descriptions for
workflows and commands
([#999](#999)). The
Databricks Labs UCX project has been updated with several new features
to assist in upgrading to Unity Catalog, including an assessment
workflow that generates a detailed compatibility report for workspace
entities, a group migration workflow for upgrading all Databricks
workspace assets, and utility commands for managing cross-workspace
installations. The Assessment Report now includes a more detailed
summary of the assessment findings, table counts, database summaries,
and external locations. Additional improvements include expanded
workspace group migration to handle potential conflicts with locally
scoped group names, enhanced documentation for external Hive Metastore
integration, a new debugging notebook, and detailed descriptions of
table upgrade considerations, data access permissions, external storage,
and table crawler.
* Fixed `config.yml` upgrade from very old versions
([#984](#984)). In this
release, we've introduced enhancements to the configuration upgrading
process for `config.yml` in our open-source library. We've replaced the
previous `v1_migrate` class method with a new implementation that
specifically handles migration from version 1. The new method retrieves
the `groups` field, extracts the `selected` value, and assigns it to the
`include_group_names` key in the configuration. The
`backup_group_prefix` value from the `groups` field is assigned to the
`renamed_group_prefix` key, and the `groups` field is removed, with the
version number updated to 2. These changes simplify the code and improve
readability, enabling users to upgrade smoothly from version 1 of the
configuration. Furthermore, we've added new unit tests to the
`test_config.py` file to ensure backward compatibility. Two new tests,
`test_v1_migrate_zeroconf` and `test_v1_migrate_some_conf`, have been
added, utilizing the `MockInstallation` class and loading the
configuration using `WorkspaceConfig`. These tests enhance the
robustness and reliability of the migration process for `config.yml`.
* Renamed columns in assessment SQL queries to use actual names, not
aliases ([#983](#983)). In
this update, we have resolved an issue where aliases used for column
references in SQL queries caused errors in certain setups by renaming
them to use actual names. Specifically, for assessment SQL queries, we
have modified the definition of the `is_delta` column to use the actual
`table_format` name instead of the alias `format`. This change improves
compatibility and enhances the reliability of query execution. As a
software engineer, you will appreciate that this modification ensures
consistent interpretation of column references across various setups,
thereby avoiding potential errors caused by aliases. This change does
not introduce any new methods, but instead modifies existing
functionality to use actual column names, ensuring a more reliable and
consistent SQL query for the `05_0_all_tables` assessment.
* Updated groups permissions validation to use Table ACL cluster
([#979](#979)). In this
update, the `validate_groups_permissions` task has been modified to
utilize the Table ACL cluster, as indicated by the inclusion of
`job_cluster="tacl"`. This task is responsible for ensuring that all
crawled permissions are accurately applied to the destination groups by
calling the `permission_manager.apply_group_permissions` method during
the migration state. This modification enhances the validation of group
permissions by performing it on the Table ACL cluster, potentially
improving performance or functionality. If you are implementing this
project, it is crucial to comprehend the consequences of this change on
your permissions validation process and adjust your workflows
appropriately.

Update databricks-labs-blueprint requirement from ~=0.2.4 to ~=0.3.0 (#1001)

Updates the requirements on
[databricks-labs-blueprint](https://github.com/databrickslabs/blueprint)
to permit the latest version.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/databrickslabs/blueprint/releases">databricks-labs-blueprint's
releases</a>.</em></p>
<blockquote>
<h2>v0.3.0</h2>
<ul>
<li>Added automated upgrade framework (<a
href="https://redirect.github.com/databrickslabs/blueprint/issues/50">#50</a>).
This update introduces an automated upgrade framework for managing and
applying upgrades to the product, with a new <code>upgrades.py</code>
file that includes a <code>ProductInfo</code> class having methods for
version handling, wheel building, and exception handling. The test code
organization has been improved, and new test cases, functions, and a
directory structure for fixtures and unit tests have been added for the
upgrades functionality. The <code>test_wheels.py</code> file now checks
the version of the Databricks SDK and handles cases where the version
marker is missing or does not contain the <code>__version__</code>
variable. Additionally, a new <code>Application State Migrations</code>
section has been added to the README, explaining the process of seamless
upgrades from version X to version Z through version Y, addressing the
need for configuration or database state migrations as the application
evolves. Users can apply these upgrades by following an idiomatic usage
pattern involving several classes and functions. Furthermore,
improvements have been made to the <code>_trim_leading_whitespace</code>
function in the <code>commands.py</code> file of the
<code>databricks.labs.blueprint</code> module, ensuring accurate and
consistent removal of leading whitespace for each line in the command
string, leading to better overall functionality and
maintainability.</li>
<li>Added brute-forcing <code>SerdeError</code> with
<code>as_dict()</code> and <code>from_dict()</code> (<a
href="https://redirect.github.com/databrickslabs/blueprint/issues/58">#58</a>).
This commit introduces a brute-forcing approach for handling
<code>SerdeError</code> using <code>as_dict()</code> and
<code>from_dict()</code> methods in an open-source library. The new
<code>SomePolicy</code> class demonstrates the usage of these methods
for manual serialization and deserialization of custom classes. The
<code>as_dict()</code> method returns a dictionary representation of the
class instance, and the <code>from_dict()</code> method, decorated with
<code>@classmethod</code>, creates a new instance from the provided
dictionary. Additionally, the GitHub Actions workflow for acceptance
tests has been updated to include the <code>ready_for_review</code>
event type, ensuring that tests run not only for opened and synchronized
pull requests but also when marked as &quot;ready for review.&quot;
These changes provide developers with more control over the
deserialization process and facilitate debugging in cases where default
deserialization fails, but should be used judiciously to avoid brittle
code.</li>
<li>Fixed nightly integration tests run as service principals (<a
href="https://redirect.github.com/databrickslabs/blueprint/issues/52">#52</a>).
In this release, we have enhanced the compatibility of our codebase with
service principals, particularly in the context of nightly integration
tests. The <code>Installation</code> class in the
<code>databricks.labs.blueprint.installation</code> module has been
refactored, deprecating the <code>current</code> method and introducing
two new methods: <code>assume_global</code> and
<code>assume_user_home</code>. These methods enable users to install and
manage <code>blueprint</code> as either a global or user-specific
installation. Additionally, the <code>existing</code> method has been
updated to work with the new <code>Installation</code> methods. In the
test suite, the <code>test_installation.py</code> file has been updated
to correctly detect global and user-specific installations when running
as a service principal. These changes improve the testability and
functionality of our software, ensuring seamless operation with service
principals during nightly integration tests.</li>
<li>Made <code>test_existing_installations_are_detected</code> more
resilient (<a
href="https://redirect.github.com/databrickslabs/blueprint/issues/51">#51</a>).
In this release, we have added a new test function
<code>test_existing_installations_are_detected</code> that checks if
existing installations are correctly detected and retries the test for
up to 15 seconds if they are not. This improves the reliability of the
test by making it more resilient to potential intermittent failures. We
have also added an import from <code>databricks.sdk.retries</code> named
<code>retried</code> which is used to retry the test function in case of
an <code>AssertionError</code>. Additionally, the test function
<code>test_existing</code> has been renamed to
<code>test_existing_installations_are_detected</code> and the
<code>xfail</code> marker has been removed. We have also renamed the
test function <code>test_dataclass</code> to
<code>test_loading_dataclass_from_installation</code> for better
clarity. This change will help ensure that the library is correctly
detecting existing installations and improve the overall quality of the
codebase.</li>
</ul>
<p>Contributors: <a
href="https://github.com/nfx"><code>@​nfx</code></a></p>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/databrickslabs/blueprint/blob/main/CHANGELOG.md">databricks-labs-blueprint's
changelog</a>.</em></p>
<blockquote>
<h2>0.3.0</h2>
<ul>
<li>Added automated upgrade framework (<a
href="https://redirect.github.com/databrickslabs/blueprint/issues/50">#50</a>).
This update introduces an automated upgrade framework for managing and
applying upgrades to the product, with a new <code>upgrades.py</code>
file that includes a <code>ProductInfo</code> class having methods for
version handling, wheel building, and exception handling. The test code
organization has been improved, and new test cases, functions, and a
directory structure for fixtures and unit tests have been added for the
upgrades functionality. The <code>test_wheels.py</code> file now checks
the version of the Databricks SDK and handles cases where the version
marker is missing or does not contain the <code>__version__</code>
variable. Additionally, a new <code>Application State Migrations</code>
section has been added to the README, explaining the process of seamless
upgrades from version X to version Z through version Y, addressing the
need for configuration or database state migrations as the application
evolves. Users can apply these upgrades by following an idiomatic usage
pattern involving several classes and functions. Furthermore,
improvements have been made to the <code>_trim_leading_whitespace</code>
function in the <code>commands.py</code> file of the
<code>databricks.labs.blueprint</code> module, ensuring accurate and
consistent removal of leading whitespace for each line in the command
string, leading to better overall functionality and
maintainability.</li>
<li>Added brute-forcing <code>SerdeError</code> with
<code>as_dict()</code> and <code>from_dict()</code> (<a
href="https://redirect.github.com/databrickslabs/blueprint/issues/58">#58</a>).
This commit introduces a brute-forcing approach for handling
<code>SerdeError</code> using <code>as_dict()</code> and
<code>from_dict()</code> methods in an open-source library. The new
<code>SomePolicy</code> class demonstrates the usage of these methods
for manual serialization and deserialization of custom classes. The
<code>as_dict()</code> method returns a dictionary representation of the
class instance, and the <code>from_dict()</code> method, decorated with
<code>@classmethod</code>, creates a new instance from the provided
dictionary. Additionally, the GitHub Actions workflow for acceptance
tests has been updated to include the <code>ready_for_review</code>
event type, ensuring that tests run not only for opened and synchronized
pull requests but also when marked as &quot;ready for review.&quot;
These changes provide developers with more control over the
deserialization process and facilitate debugging in cases where default
deserialization fails, but should be used judiciously to avoid brittle
code.</li>
<li>Fixed nightly integration tests run as service principals (<a
href="https://redirect.github.com/databrickslabs/blueprint/issues/52">#52</a>).
In this release, we have enhanced the compatibility of our codebase with
service principals, particularly in the context of nightly integration
tests. The <code>Installation</code> class in the
<code>databricks.labs.blueprint.installation</code> module has been
refactored, deprecating the <code>current</code> method and introducing
two new methods: <code>assume_global</code> and
<code>assume_user_home</code>. These methods enable users to install and
manage <code>blueprint</code> as either a global or user-specific
installation. Additionally, the <code>existing</code> method has been
updated to work with the new <code>Installation</code> methods. In the
test suite, the <code>test_installation.py</code> file has been updated
to correctly detect global and user-specific installations when running
as a service principal. These changes improve the testability and
functionality of our software, ensuring seamless operation with service
principals during nightly integration tests.</li>
<li>Made <code>test_existing_installations_are_detected</code> more
resilient (<a
href="https://redirect.github.com/databrickslabs/blueprint/issues/51">#51</a>).
In this release, we have added a new test function
<code>test_existing_installations_are_detected</code> that checks if
existing installations are correctly detected and retries the test for
up to 15 seconds if they are not. This improves the reliability of the
test by making it more resilient to potential intermittent failures. We
have also added an import from <code>databricks.sdk.retries</code> named
<code>retried</code> which is used to retry the test function in case of
an <code>AssertionError</code>. Additionally, the test function
<code>test_existing</code> has been renamed to
<code>test_existing_installations_are_detected</code> and the
<code>xfail</code> marker has been removed. We have also renamed the
test function <code>test_dataclass</code> to
<code>test_loading_dataclass_from_installation</code> for better
clarity. This change will help ensure that the library is correctly
detecting existing installations and improve the overall quality of the
codebase.</li>
</ul>
<h2>0.2.5</h2>
<ul>
<li>Automatically enable workspace filesystem if the feature is disabled
(<a
href="https://redirect.github.com/databrickslabs/blueprint/pull/42">#42</a>).</li>
</ul>
<h2>0.2.4</h2>
<ul>
<li>Added more integration tests for <code>Installation</code> (<a
href="https://redirect.github.com/databrickslabs/blueprint/pull/39">#39</a>).</li>
<li>Fixed <code>yaml</code> optional import error (<a
href="https://redirect.github.com/databrickslabs/blueprint/pull/38">#38</a>).</li>
</ul>
<h2>0.2.3</h2>
<ul>
<li>Added special handling for notebooks in
<code>Installation.upload(...)</code> (<a
href="https://redirect.github.com/databrickslabs/blueprint/pull/36">#36</a>).</li>
</ul>
<h2>0.2.2</h2>
<ul>
<li>Fixed issues with uploading wheels to DBFS and loading a
non-existing install state (<a
href="https://redirect.github.com/databrickslabs/blueprint/pull/34">#34</a>).</li>
</ul>
<h2>0.2.1</h2>
<ul>
<li>Aligned <code>Installation</code> framework with UCX project (<a
href="https://redirect.github.com/databrickslabs/blueprint/pull/32">#32</a>).</li>
</ul>
<h2>0.2.0</h2>
<ul>
<li>Added common install state primitives with strong typing (<a
href="https://redirect.github.com/databrickslabs/blueprint/pull/27">#27</a>).</li>
<li>Added documentation for Invoking Databricks Connect (<a
href="https://redirect.github.com/databrickslabs/blueprint/pull/28">#28</a>).</li>
<li>Added more documentation for Databricks CLI command router (<a
href="https://redirect.github.com/databrickslabs/blueprint/pull/30">#30</a>).</li>
<li>Enforced <code>pylint</code> standards (<a
href="https://redirect.github.com/databrickslabs/blueprint/pull/29">#29</a>).</li>
</ul>
<h2>0.1.0</h2>
<ul>
<li>Changed python requirement from 3.10.6 to 3.10 (<a
href="https://redirect.github.com/databrickslabs/blueprint/pull/25">#25</a>).</li>
</ul>
<h2>0.0.6</h2>
<ul>
<li>Make <code>find_project_root</code> more deterministic (<a
href="https://redirect.github.com/databrickslabs/blueprint/pull/23">#23</a>).</li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="https://github.com/databrickslabs/blueprint/commit/905e5ff5303a005d48bc98d101a613afeda15d51"><code>905e5ff</code></a>
Release v0.3.0 (<a
href="https://redirect.github.com/databrickslabs/blueprint/issues/59">#59</a>)</li>
<li><a
href="https://github.com/databrickslabs/blueprint/commit/a029f6bb1ecf807017754e298ea685326dbedf72"><code>a029f6b</code></a>
Added brute-forcing <code>SerdeError</code> with <code>as_dict()</code>
and <code>from_dict()</code> (<a
href="https://redirect.github.com/databrickslabs/blueprint/issues/58">#58</a>)</li>
<li><a
href="https://github.com/databrickslabs/blueprint/commit/c8a74f4129b4592d365aac9670eb86069f3517f7"><code>c8a74f4</code></a>
Added automated upgrade framework (<a
href="https://redirect.github.com/databrickslabs/blueprint/issues/50">#50</a>)</li>
<li><a
href="https://github.com/databrickslabs/blueprint/commit/24e62ef4f060e43e02c92a7d082d95e8bc164317"><code>24e62ef</code></a>
Don't run integration tests on draft pull requests (<a
href="https://redirect.github.com/databrickslabs/blueprint/issues/55">#55</a>)</li>
<li><a
href="https://github.com/databrickslabs/blueprint/commit/b4dd5abf4eaf8d022ae0b6ec7e659296ec3d2f37"><code>b4dd5ab</code></a>
Added tokei.rs badge (<a
href="https://redirect.github.com/databrickslabs/blueprint/issues/54">#54</a>)</li>
<li><a
href="https://github.com/databrickslabs/blueprint/commit/01d9467f425763ab08035001270593253bce11f0"><code>01d9467</code></a>
Fixed nightly integration tests run as service principals (<a
href="https://redirect.github.com/databrickslabs/blueprint/issues/52">#52</a>)</li>
<li><a
href="https://github.com/databrickslabs/blueprint/commit/aa5714179c65be8e13f54601e1d1fcd70548342d"><code>aa57141</code></a>
Made <code>test_existing_installations_are_detected</code> more
resilient (<a
href="https://redirect.github.com/databrickslabs/blueprint/issues/51">#51</a>)</li>
<li><a
href="https://github.com/databrickslabs/blueprint/commit/9cbc6f863d3ea06659f37939cf1b97115dd873bd"><code>9cbc6f8</code></a>
Bump <code>databrickslabs/sandbox/acceptance</code> to v0.1.0 (<a
href="https://redirect.github.com/databrickslabs/blueprint/issues/48">#48</a>)</li>
<li><a
href="https://github.com/databrickslabs/blueprint/commit/22fc1a8787b8e98de03048595202f88b7ddb9b94"><code>22fc1a8</code></a>
Use <code>databrickslabs/sandbox/acceptance</code> action (<a
href="https://redirect.github.com/databrickslabs/blueprint/issues/45">#45</a>)</li>
<li><a
href="https://github.com/databrickslabs/blueprint/commit/c7e47abd82b2f04e95b1d91f346cc1ea6df43961"><code>c7e47ab</code></a>
Release v0.2.5 (<a
href="https://redirect.github.com/databrickslabs/blueprint/issues/44">#44</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/databrickslabs/blueprint/compare/v0.2.4...v0.3.0">compare
view</a></li>
</ul>
</details>
<br />

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)

</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

Run integration tests only for pull requests ready for review (#1002)

Tested on https://github.com/databrickslabs/blueprint

Reducing flakiness of create account groups (#1003)

Prompt user if Terraform utilised for deploying infrastructure (#1004)

Added prompt is_terraform_used and updated the same in the config of
WorkspaceInstaller

Resolves #393

---------

Co-authored-by: Serge Smertin <259697+nfx@users.noreply.github.com>

Update CONTRIBUTING.md (#1005)

Closes #850

Added `databricks labs ucx create-uber-principal` command to create Azure Service Principal for migration (#976)

 - Added new cli cmd for create-master-principal in labs.yml, cli.py
- Added separate class for AzureApiClient to separate out azure API
calls
- Added logic to create SPN, secret, roleassignment in resources and
update workspace config with spn client_id
- added logic to call create spn, update rbac of all storage account to
that spn, update ucx cluster policy with spn secret for each storage
account
 - test unit and int test cases

Resolves #881

Related issues:
- #993
- #693

- [ ] added relevant user documentation
- [X] added new CLI command
- [ ] modified existing command: `databricks labs ucx ...`
- [ ] added a new workflow
- [ ] modified existing workflow: `...`
- [ ] added a new table
- [ ] modified existing table: `...`

- [X] manually tested
- [X] added unit tests
- [X] added integration tests
- [ ] verified on staging environment (screenshot attached)

Fix gitguardian warning caused by "hello world" secret used in unit test (#1010)

Replace the plain encoded string by base64.b64encode to mitigate the
gitguardian warning.

<!-- DOC: Link issue with a keyword: close, closes, closed, fix, fixes,
fixed, resolve, resolves, resolved. See
https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword
-->

Resolves #..

- [ ] added relevant user documentation
- [ ] added new CLI command
- [ ] modified existing command: `databricks labs ucx ...`
- [ ] added a new workflow
- [ ] modified existing workflow: `...`
- [ ] added a new table
- [ ] modified existing table: `...`

<!-- How is this tested? Please see the checklist below and also
describe any other relevant tests -->

- [ ] manually tested
- [ ] added unit tests
- [ ] added integration tests
- [ ] verified on staging environment (screenshot attached)

Create UC external locations in Azure based on migrated storage credentials (#992)

Handle widget delete on upgrade platform bug (#1011)
nkvuong added a commit that referenced this pull request Mar 7, 2024
author Vuong <vuong.nguyen@databricks.com> 1709738765 +0000
committer Vuong <vuong.nguyen@databricks.com> 1709812255 +0000

parent 7735a71
author Vuong <vuong.nguyen@databricks.com> 1709738765 +0000
committer Vuong <vuong.nguyen@databricks.com> 1709812237 +0000

parent 7735a71
author Vuong <vuong.nguyen@databricks.com> 1709738765 +0000
committer Vuong <vuong.nguyen@databricks.com> 1709812227 +0000

parent 7735a71
author Vuong <vuong.nguyen@databricks.com> 1709738765 +0000
committer Vuong <vuong.nguyen@databricks.com> 1709812214 +0000

parent 7735a71
author Vuong <vuong.nguyen@databricks.com> 1709738765 +0000
committer Vuong <vuong.nguyen@databricks.com> 1709812198 +0000

make fmt

Added `upgraded_from_workspace_id` property to migrated tables to indicated the source workspace. (#987)

Added table parameter `upgraded_from_ws` to migrated tables. The
parameters contains the sources workspace id.

Resolves #899

- [ ] added relevant user documentation
- [ ] added new CLI command
- [ ] modified existing command: `databricks labs ucx ...`
- [ ] added a new workflow
- [ ] modified existing workflow: `...`
- [ ] added a new table
- [ ] modified existing table: `...`

<!-- How is this tested? Please see the checklist below and also
describe any other relevant tests -->

- [x] manually tested
- [x] added unit tests
- [x] added integration tests
- [x] verified on staging environment (screenshot attached)

Added group members difference to the output of `validate-groups-membership` cli command (#995)

The `validate-groups-membership` command has been updated to include a
comparison of group memberships at both the account and workspace
levels, displaying the difference in members between the two levels in a
new column. This enhancement allows for a more detailed analysis of
group memberships, with the added functionality implemented in the
`validate_group_membership` function in the `groups.py` file located in
the `databricks/labs/ucx/workspace_access` directory. A new output
field, "group\_members\_difference," has been added to represent the
difference in the number of members between a workspace group and an
associated account group. The corresponding unit test file,
"test\_groups.py," has been updated to include a new test case that
verifies the calculation of the "group\_members\_difference" value. This
change provides users with a more comprehensive view of their group
memberships and allows them to easily identify any discrepancies between
the account and workspace levels. The functionality of the other
commands remains unchanged.

Improved installation integration test flakiness (#998)

- improved `_infer_error_from_job_run` and `_infer_error_from_task_run`
to also catch `KeyError` and `ValueError`
- removed retries for `Unknown` errors for installation tests

Expanded end-user documentation with detailed descriptions for workflows and commands (#999)

The Databricks Labs UCX project has been updated with several new
features to assist in upgrading to Unity Catalog. These include various
workflows and command-line utilities, such as an assessment workflow
that generates a detailed compatibility report for workspace entities
and a group migration workflow to upgrade all Databricks workspace
assets. Additionally, new utility commands have been added for managing
cross-workspace installations, and users can now view deployed
workflows' status and repair failed workflows. A new end-user
documentation has also been introduced, featuring comprehensive
descriptions of workflows, commands, and an assessment report image. The
Assessment Report, generated from UCX tools, now includes a more
detailed summary of the assessment findings, table counts, database
summaries, and external locations. Improved documentation for external
Hive Metastore integration and a new debugging notebook are also
included in this release. Lastly, the workspace group migration feature
has been expanded to handle potential conflicts when migrating multiple
workspaces with locally scoped group names.

Release v0.14.0 (#1000)

* Added `upgraded_from_workspace_id` property to migrated tables to
indicated the source workspace
([#987](#987)). In this
release, updates have been made to the `_migrate_external_table`,
`_migrate_dbfs_root_table`, and `_migrate_view` methods in the
`table_migrate.py` file to include a new parameter `upgraded_from_ws` in
the SQL commands used to alter tables, views, or managed tables. This
parameter is used to store the source workspace ID in the migrated
tables, indicating the migration origin. A new utility method
`sql_alter_from` has been added to the `Table` class in `tables.py` to
generate the SQL command with the new parameter. Additionally, a new
class-level attribute `UPGRADED_FROM_WS_PARAM` has been added to the
`Table` class in `tables.py` to indicate the source workspace. A new
property `upgraded_from_workspace_id` has been added to migrated tables
to store the source workspace ID. These changes resolve issue
[#899](#899) and are tested
through manual testing, unit tests, and integration tests. No new CLI
commands, workflows, or tables have been added or modified, and there
are no changes to user documentation.
* Added a command to create account level groups if they do not exist
([#763](#763)). This commit
introduces a new feature that enables the creation of account-level
groups if they do not already exist in the account. A new command,
`create-account-groups`, has been added to the `databricks labs ucx`
tool, which crawls all workspaces in the account and creates
account-level groups if a corresponding workspace-local group is not
found. The feature supports various scenarios, including creating
account-level groups that exist in some workspaces but not in others,
and creating multiple account-level groups with the same name but
different members. Several new methods have been added to the
`account.py` file to support the new feature, and the `test_account.py`
file has been updated with new tests to ensure the correct behavior of
the `create_account_level_groups` method. Additionally, the `cli.py`
file has been updated to include the new `create-account-groups`
command. With these changes, users can easily manage account-level
groups and ensure that they are consistent across all workspaces in the
account, improving the overall user experience.
* Added assessment for the incompatible `RunSubmit` API usages
([#849](#849)). In this
release, the assessment functionality for incompatible `RunSubmit` API
usages has been significantly enhanced through various changes. The
'clusters.py' file has seen improvements in clarity and consistency with
the renaming of private methods `check_spark_conf` to
`_check_spark_conf` and `check_cluster_failures` to
`_check_cluster_failures`. The `_assess_clusters` method has been
updated to call the renamed `_check_cluster_failures` method for
thorough checks of cluster configurations, resulting in better
assessment functionality. A new `SubmitRunsCrawler` class has been added
to the `databricks.labs.ucx.assessment.jobs` module, implementing
`CrawlerBase`, `JobsMixin`, and `CheckClusterMixin` classes. This class
crawls and assesses job runs based on their submitted runs, ensuring
compatibility and identifying failure issues. Additionally, a new
configuration attribute, `num_days_submit_runs_history`, has been
introduced in the `WorkspaceConfig` class of the `config.py` module,
controlling the number of days for which submission history of
`RunSubmit` API calls is retained. Lastly, various new JSON files have
been added for unit testing, assessing the `RunSubmit` API usages
related to different scenarios like dbt task runs, Git source-based job
runs, JAR file runs, and more. These tests will aid in identifying and
addressing potential compatibility issues with the `RunSubmit` API.
* Added group members difference to the output of
`validate-groups-membership` cli command
([#995](#995)). The
`validate-groups-membership` command has been updated to include a
comparison of group memberships at both the account and workspace
levels. This enhancement is implemented through the
`validate_group_membership` function, which has been updated to
calculate the difference in members between the two levels and display
it in a new `group_members_difference` column. This allows for a more
detailed analysis of group memberships and easily identifies any
discrepancies between the account and workspace levels. The
corresponding unit test file, "test_groups.py," has been updated to
include a new test case that verifies the calculation of the
`group_members_difference` value. The functionality of the other
commands remains unchanged. The new `group_members_difference` value is
calculated as the difference in the number of members in the workspace
group and the account group, with a positive value indicating more
members in the workspace group and a negative value indicating more
members in the account group. The table template in the labs.yml file
has also been updated to include the new column for the group membership
difference.
* Added handling for empty `directory_id` if managed identity
encountered during the crawling of StoragePermissionMapping
([#986](#986)). This PR adds
a `type` field to the `StoragePermissionMapping` and `Principal`
dataclasses to differentiate between service principals and managed
identities, allowing `None` for the `directory_id` field if the
principal is not a service principal. During the migration to UC storage
credentials, managed identities are currently ignored. These changes
improve handling of managed identities during the crawling of
`StoragePermissionMapping`, prevent errors when creating storage
credentials with managed identities, and address issue
[#339](#339). The changes
are tested through unit tests, manual testing, and integration tests,
and only affect the `StoragePermissionMapping` class and related
methods, without introducing new commands, workflows, or tables.
* Added migration for Azure Service Principals with secrets stored in
Databricks Secret to UC Storage Credentials
([#874](#874)). In this
release, we have made significant updates to migrate Azure Service
Principals with their secrets stored in Databricks Secret to UC Storage
Credentials, enhancing security and management of storage access. The
changes include: Addition of a new `migrate_credentials` command in the
`labs.yml` file to migrate credentials for storage access to UC storage
credential. Modification of `secrets.py` to handle the case where a
secret has been removed from the backend and to log warning messages for
secrets with invalid Base64 bytes. Introduction of the
`StorageCredentialManager` and `ServicePrincipalMigration` classes in
`credentials.py` to manage Azure Service Principals and their associated
client secrets, and to migrate them to UC Storage Credentials. Addition
of a new `directory_id` attribute in the `Principal` class and its
associated dataclass in `resources.py` to store the directory ID for
creating UC storage credentials using a service principal. Creation of a
new pytest fixture, `make_storage_credential_spn`, in `fixtures.py` to
simplify writing tests requiring Databricks Storage Credentials with
Azure Service Principal auth. Addition of a new test file for the Azure
integration of the project, including new classes, methods, and test
cases for testing the migration of Azure Service Principals to UC
Storage Credentials. These improvements will ensure better security and
management of storage access using Azure Service Principals, while
providing more efficient and robust testing capabilities.
* Added permission migration support for feature tables and the root
permissions for models and feature tables
([#997](#997)). This commit
introduces support for migration of permissions related to feature
tables and sets root permissions for models and feature tables. New
functions such as `feature_store_listing`, `feature_tables_root_page`,
`models_root_page`, and `tokens_and_passwords` have been added to
facilitate population of a workspace access page with necessary
permissions information. The `factory` function in `manager.py` has been
updated to include new listings for models' root page, feature tables'
root page, and the feature store for enhanced management and access
control of models and feature tables. New classes and methods have been
implemented to handle permissions for these resources, utilizing
`GenericPermissionsSupport`, `AccessControlRequest`, and `MigratedGroup`
classes. Additionally, new test methods have been included to verify
feature tables listing functionality and root page listing functionality
for feature tables and registered models. The test manager method has
been updated to include `feature-tables` in the list of items to be
checked for permissions, ensuring comprehensive testing of permission
functionality related to these new feature tables.
* Added support for serving endpoints
([#990](#990)). In this
release, we have made significant enhancements to support serving
endpoints in our open-source library. The `fixtures.py` file in the
`databricks.labs.ucx.mixins` module has been updated with new classes
and functions to create and manage serving endpoints, accompanied by
integration tests to verify their functionality. We have added a new
listing for serving endpoints in the assessment's permissions crawling,
using the `ws.serving_endpoints.list` function and the
`serving-endpoints` category. A new integration test, "test_endpoints,"
has been added to verify that assessments now crawl permissions for
serving endpoints. This test demonstrates the ability to migrate
permissions from one group to another. The test suite has been updated
to ensure the proper functioning of the new feature and improve the
assessment of permissions for serving endpoints, ensuring compatibility
with the updated `test_manager.py` file.
* Expanded end-user documentation with detailed descriptions for
workflows and commands
([#999](#999)). The
Databricks Labs UCX project has been updated with several new features
to assist in upgrading to Unity Catalog, including an assessment
workflow that generates a detailed compatibility report for workspace
entities, a group migration workflow for upgrading all Databricks
workspace assets, and utility commands for managing cross-workspace
installations. The Assessment Report now includes a more detailed
summary of the assessment findings, table counts, database summaries,
and external locations. Additional improvements include expanded
workspace group migration to handle potential conflicts with locally
scoped group names, enhanced documentation for external Hive Metastore
integration, a new debugging notebook, and detailed descriptions of
table upgrade considerations, data access permissions, external storage,
and table crawler.
* Fixed `config.yml` upgrade from very old versions
([#984](#984)). In this
release, we've introduced enhancements to the configuration upgrading
process for `config.yml` in our open-source library. We've replaced the
previous `v1_migrate` class method with a new implementation that
specifically handles migration from version 1. The new method retrieves
the `groups` field, extracts the `selected` value, and assigns it to the
`include_group_names` key in the configuration. The
`backup_group_prefix` value from the `groups` field is assigned to the
`renamed_group_prefix` key, and the `groups` field is removed, with the
version number updated to 2. These changes simplify the code and improve
readability, enabling users to upgrade smoothly from version 1 of the
configuration. Furthermore, we've added new unit tests to the
`test_config.py` file to ensure backward compatibility. Two new tests,
`test_v1_migrate_zeroconf` and `test_v1_migrate_some_conf`, have been
added, utilizing the `MockInstallation` class and loading the
configuration using `WorkspaceConfig`. These tests enhance the
robustness and reliability of the migration process for `config.yml`.
* Renamed columns in assessment SQL queries to use actual names, not
aliases ([#983](#983)). In
this update, we have resolved an issue where aliases used for column
references in SQL queries caused errors in certain setups by renaming
them to use actual names. Specifically, for assessment SQL queries, we
have modified the definition of the `is_delta` column to use the actual
`table_format` name instead of the alias `format`. This change improves
compatibility and enhances the reliability of query execution. As a
software engineer, you will appreciate that this modification ensures
consistent interpretation of column references across various setups,
thereby avoiding potential errors caused by aliases. This change does
not introduce any new methods, but instead modifies existing
functionality to use actual column names, ensuring a more reliable and
consistent SQL query for the `05_0_all_tables` assessment.
* Updated groups permissions validation to use Table ACL cluster
([#979](#979)). In this
update, the `validate_groups_permissions` task has been modified to
utilize the Table ACL cluster, as indicated by the inclusion of
`job_cluster="tacl"`. This task is responsible for ensuring that all
crawled permissions are accurately applied to the destination groups by
calling the `permission_manager.apply_group_permissions` method during
the migration state. This modification enhances the validation of group
permissions by performing it on the Table ACL cluster, potentially
improving performance or functionality. If you are implementing this
project, it is crucial to comprehend the consequences of this change on
your permissions validation process and adjust your workflows
appropriately.

Run integration tests only for pull requests ready for review (#1002)

Tested on https://github.com/databrickslabs/blueprint

Reducing flakiness of create account groups (#1003)

Prompt user if Terraform utilised for deploying infrastructure (#1004)

Added prompt is_terraform_used and updated the same in the config of
WorkspaceInstaller

Resolves #393

---------

Co-authored-by: Serge Smertin <259697+nfx@users.noreply.github.com>

Update CONTRIBUTING.md (#1005)

Closes #850

Fix gitguardian warning caused by "hello world" secret used in unit test (#1010)

Replace the plain encoded string by base64.b64encode to mitigate the
gitguardian warning.

<!-- DOC: Link issue with a keyword: close, closes, closed, fix, fixes,
fixed, resolve, resolves, resolved. See
https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword
-->

Resolves #..

- [ ] added relevant user documentation
- [ ] added new CLI command
- [ ] modified existing command: `databricks labs ucx ...`
- [ ] added a new workflow
- [ ] modified existing workflow: `...`
- [ ] added a new table
- [ ] modified existing table: `...`

<!-- How is this tested? Please see the checklist below and also
describe any other relevant tests -->

- [ ] manually tested
- [ ] added unit tests
- [ ] added integration tests
- [ ] verified on staging environment (screenshot attached)

Create UC external locations in Azure based on migrated storage credentials (#992)

Handle widget delete on upgrade platform bug (#1011)

Deprecate legacy installer (#1014)

<img width="799" alt="image"
src="https://github.com/databrickslabs/ucx/assets/259697/2aa5fed6-5734-44c2-87bc-39fbc214d5fa">

Automatically upgrade existing installations to avoid breaking changes (#985)

This PR incorporates the work from
databrickslabs/blueprint#50, which enables
smoother cross-version upgrades.

Fix #471

Added missing documentation for `create-uber-principal` command (#1015)

Add `migrate-locations` command (#1016)

Add cli command `migrate_locations` to create UC external location.

<!-- DOC: Link issue with a keyword: close, closes, closed, fix, fixes,
fixed, resolve, resolves, resolved. See
https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword
-->

- [ ] added relevant user documentation
- [ ] added new CLI command
- [ ] modified existing command: `databricks labs ucx ...`
- [ ] added a new workflow
- [ ] modified existing workflow: `...`
- [ ] added a new table
- [ ] modified existing table: `...`

<!-- How is this tested? Please see the checklist below and also
describe any other relevant tests -->

- [ ] manually tested
- [ ] added unit tests
- [ ] added integration tests
- [ ] verified on staging environment (screenshot attached)

Fix document for `migrate-locations` command (#1017)

<!-- Summary of your changes that are easy to understand. Add
screenshots when necessary -->

<!-- DOC: Link issue with a keyword: close, closes, closed, fix, fixes,
fixed, resolve, resolves, resolved. See
https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword
-->

- [ ] added relevant user documentation
- [ ] added new CLI command
- [ ] modified existing command: `databricks labs ucx ...`
- [ ] added a new workflow
- [ ] modified existing workflow: `...`
- [ ] added a new table
- [ ] modified existing table: `...`

<!-- How is this tested? Please see the checklist below and also
describe any other relevant tests -->

- [ ] manually tested
- [ ] added unit tests
- [ ] added integration tests
- [ ] verified on staging environment (screenshot attached)

Make code more readable by enforcing `max-nested-blocks = 3` with `pylint` (#1018)

No logic changes, just for readability and to spare code reviewer's
sanity.

Added AWS S3 support for `migrate-locations` command (#1009)

Release v0.15.0 (#1020)

* Added AWS S3 support for `migrate-locations` command
([#1009](#1009)). In this
release, the open-source library has been enhanced with AWS S3 support
for the `migrate-locations` command, enabling efficient and secure
management of S3 data. The new functionality includes the identification
of missing S3 prefixes and the creation of corresponding roles and
policies through the addition of methods `_identify_missing_paths`,
`_get_existing_credentials_dict`, and `create_external_locations`. The
library now also includes new classes `AwsIamRole`,
`ExternalLocationInfo`, and `StorageCredentialInfo` for better handling
of AWS-related functionality. Additionally, two new tests,
`test_create_external_locations` and
`test_create_external_locations_skip_existing`, have been added to
ensure the correct behavior of the new AWS-related functionality. The
new test function `test_migrate_locations_aws` checks the AWS-specific
implementation of the `migrate-locations` command, while
`test_missing_aws_cli` verifies the correct error message is displayed
when the AWS CLI is not found in the system path. These changes enhance
the library's capabilities, improving data security, privacy, and
overall performance for users working with AWS S3.
* Added `databricks labs ucx create-uber-principal` command to create
Azure Service Principal for migration
([#976](#976)). The new CLI
command, `databricks labs ucx create-uber-principal`, has been
introduced to create an Azure Service Principal (SPN) and grant it
STORAGE BLOB READER access on all the storage accounts used by the
tables in the workspace. The SPN information is then stored in the UCX
cluster policy. A new class, AzureApiClient, has been added to isolate
Azure API calls, and unit and integration tests have been included to
verify the functionality. This development enhances migration
capabilities for Azure workspaces, providing a more streamlined and
automated way to create and manage Service Principals, and improves the
functionality and usability of the UCX tool. The changes are
well-documented and follow the project's coding standards.
* Added `migrate-locations` command
([#1016](#1016)). In this
release, we've added a new CLI command, `migrate_locations`, to create
Unity Catalog (UC) external locations. This command extracts candidates
for location creation from the `guess_external_locations` assessment
task and checks if corresponding UC Storage Credentials exist before
creating the locations. Currently, the command only supports Azure, with
plans to add support for AWS and GCP in the future. The
`migrate_locations` function is marked with the `ucx.command` decorator
and is available as a command-line interface (CLI) command. The pull
request also includes unit tests for this new command, which check the
environment (Azure, AWS, or GCP) before executing the migration and log
a message if the environment is AWS or GCP, indicating that the
migration is not yet supported on those platforms. No changes have been
made to existing workflows, commands, or tables.
* Added handling for widget delete on upgrade platform bug
([#1011](#1011)). In this
release, the `_install_dashboard` method in `dashboards.py` has been
updated to handle a platform bug that occurred during the deletion of
dashboard widgets during an upgrade process (issue
[#1011](#1011)). Previously,
the method attempted to delete each widget using the
`self._ws.dashboard_widgets.delete(widget.id)` command, which resulted
in a `TypeError` when attempting to delete a widget. The updated method
now includes a try/except block that catches this `TypeError` and logs a
warning message, while also tracking the issue under bug ES-1061370. The
rest of the method remains unchanged, creating a dashboard with the
given name, role, and parent folder ID if no widgets are present. This
enhancement improves the robustness of the `_install_dashboard` method
by adding error handling for the SDK API response when deleting
dashboard widgets, ensuring a smoother upgrade process.
* Create UC external locations in Azure based on migrated storage
credentials ([#992](#992)).
The `locations.py` file in the `databricks.labs.ucx.azure` package has
been updated to include a new class `ExternalLocationsMigration`, which
creates UC external locations in Azure based on migrated storage
credentials. This class takes various arguments, including
`WorkspaceClient`, `HiveMetastoreLocations`, `AzureResourcePermissions`,
and `AzureResources`. It has a `run()` method that lists any missing
external locations in UC, extracts their location URLs, and attempts to
create a UC external location with a mapped storage credential name if
the missing external location is in the mapping. The class also includes
helper methods for generating credential name mappings. Additionally,
the `resources.py` file in the same package has been modified to include
a new method `managed_identity_client_id`, which retrieves the client ID
of a managed identity associated with a given access connector. Test
functions for the `ExternalLocationsMigration` class and Azure external
locations functionality have been added in the new file
`test_locations.py`. The `test_resources.py` file has been updated to
include tests for the `managed_identity_client_id` method. A new
`mappings.json` file has also been added for tests related to Azure
external location mappings based on migrated storage credentials.
* Deprecate legacy installer
([#1014](#1014)). In this
release, we have deprecated the legacy installer for the UCX project,
which was previously implemented as a bash script. A warning message has
been added to inform users about the deprecation and direct them to the
UCX installation instructions. The functionality of the script remains
unchanged, and it still performs tasks such as installing Python
dependencies and building Python bindings. The script will eventually be
replaced with the `databricks labs install ucx` command. This change is
part of issue [#1014](#1014)
and is intended to streamline the installation process and improve the
overall user experience. We recommend that users update their
installation process to the new recommended method as soon as possible
to avoid any issues with the legacy installer in the future.
* Prompt user if Terraform utilised for deploying infrastructure
([#1004](#1004)). In this
update, the `config.py` file has been modified to include a new
attribute, `is_terraform_used`, in the `WorkspaceConfig` class. This
boolean flag indicates whether Terraform has been used for deploying
certain entities in the workspace. Issue
[#393](#393) has been
addressed with this change. The `WorkspaceInstaller` configuration has
also been updated to take advantage of this new attribute, allowing
developers to determine if Terraform was used for infrastructure
deployment, thereby increasing visibility into the deployment process.
Additionally, a new prompt has been added to the `warehouse_type`
function to ascertain if Terraform is being utilized for infrastructure
deployment, setting the `is_terraform_used` variable to True if it is.
This improvement is intended for software engineers adopting this
open-source library.
* Updated CONTRIBUTING.md
([#1005](#1005)). In this
contribution to the open-source library, the CONTRIBUTING.md file has
been significantly updated with clearer instructions on how to
effectively contibute to the project. The previous command to print the
Python path has been removed, as the IDE is now advised to be configured
to use the Python interpreter from the virtual environment. A new step
has been added, recommending the use of a consistent styleguide and
formatting of the code before every commit. Moreover, it is now
encouraged to run tests before committing to minimize potential issues
during the review process. The steps on how to make a Fork from the ucx
repo and create a PR have been updated with links to official
documentation. Lastly, the commit now includes information on handling
dependency errors that may occur after `git pull`.
* Updated databricks-labs-blueprint requirement from ~=0.2.4 to ~=0.3.0
([#1001](#1001)). In this
pull request update, the requirements file, pyproject.toml, has been
modified to upgrade the databricks-labs-blueprint package from version
~0.2.4 to ~0.3.0. This update integrates the latest features and bug
fixes of the package, including an automated upgrade framework, a
brute-forcing approach for handling SerdeError, and enhancements for
running nightly integration tests with service principals. These
improvements increase the testability and functionality of the software,
ensuring its stable operation with service principals during nightly
integration tests. Furthermore, the reliability of the test for
detecting existing installations has been reinforced by adding a new
test function that checks for the correct detection of existing
installations and retries the test for up to 15 seconds if they are not.

Dependency updates:

* Updated databricks-labs-blueprint requirement from ~=0.2.4 to ~=0.3.0
([#1001](#1001)).
dmoore247 pushed a commit that referenced this pull request Mar 23, 2024
* Added `upgraded_from_workspace_id` property to migrated tables to
indicated the source workspace
([#987](#987)). In this
release, updates have been made to the `_migrate_external_table`,
`_migrate_dbfs_root_table`, and `_migrate_view` methods in the
`table_migrate.py` file to include a new parameter `upgraded_from_ws` in
the SQL commands used to alter tables, views, or managed tables. This
parameter is used to store the source workspace ID in the migrated
tables, indicating the migration origin. A new utility method
`sql_alter_from` has been added to the `Table` class in `tables.py` to
generate the SQL command with the new parameter. Additionally, a new
class-level attribute `UPGRADED_FROM_WS_PARAM` has been added to the
`Table` class in `tables.py` to indicate the source workspace. A new
property `upgraded_from_workspace_id` has been added to migrated tables
to store the source workspace ID. These changes resolve issue
[#899](#899) and are tested
through manual testing, unit tests, and integration tests. No new CLI
commands, workflows, or tables have been added or modified, and there
are no changes to user documentation.
* Added a command to create account level groups if they do not exist
([#763](#763)). This commit
introduces a new feature that enables the creation of account-level
groups if they do not already exist in the account. A new command,
`create-account-groups`, has been added to the `databricks labs ucx`
tool, which crawls all workspaces in the account and creates
account-level groups if a corresponding workspace-local group is not
found. The feature supports various scenarios, including creating
account-level groups that exist in some workspaces but not in others,
and creating multiple account-level groups with the same name but
different members. Several new methods have been added to the
`account.py` file to support the new feature, and the `test_account.py`
file has been updated with new tests to ensure the correct behavior of
the `create_account_level_groups` method. Additionally, the `cli.py`
file has been updated to include the new `create-account-groups`
command. With these changes, users can easily manage account-level
groups and ensure that they are consistent across all workspaces in the
account, improving the overall user experience.
* Added assessment for the incompatible `RunSubmit` API usages
([#849](#849)). In this
release, the assessment functionality for incompatible `RunSubmit` API
usages has been significantly enhanced through various changes. The
'clusters.py' file has seen improvements in clarity and consistency with
the renaming of private methods `check_spark_conf` to
`_check_spark_conf` and `check_cluster_failures` to
`_check_cluster_failures`. The `_assess_clusters` method has been
updated to call the renamed `_check_cluster_failures` method for
thorough checks of cluster configurations, resulting in better
assessment functionality. A new `SubmitRunsCrawler` class has been added
to the `databricks.labs.ucx.assessment.jobs` module, implementing
`CrawlerBase`, `JobsMixin`, and `CheckClusterMixin` classes. This class
crawls and assesses job runs based on their submitted runs, ensuring
compatibility and identifying failure issues. Additionally, a new
configuration attribute, `num_days_submit_runs_history`, has been
introduced in the `WorkspaceConfig` class of the `config.py` module,
controlling the number of days for which submission history of
`RunSubmit` API calls is retained. Lastly, various new JSON files have
been added for unit testing, assessing the `RunSubmit` API usages
related to different scenarios like dbt task runs, Git source-based job
runs, JAR file runs, and more. These tests will aid in identifying and
addressing potential compatibility issues with the `RunSubmit` API.
* Added group members difference to the output of
`validate-groups-membership` cli command
([#995](#995)). The
`validate-groups-membership` command has been updated to include a
comparison of group memberships at both the account and workspace
levels. This enhancement is implemented through the
`validate_group_membership` function, which has been updated to
calculate the difference in members between the two levels and display
it in a new `group_members_difference` column. This allows for a more
detailed analysis of group memberships and easily identifies any
discrepancies between the account and workspace levels. The
corresponding unit test file, "test_groups.py," has been updated to
include a new test case that verifies the calculation of the
`group_members_difference` value. The functionality of the other
commands remains unchanged. The new `group_members_difference` value is
calculated as the difference in the number of members in the workspace
group and the account group, with a positive value indicating more
members in the workspace group and a negative value indicating more
members in the account group. The table template in the labs.yml file
has also been updated to include the new column for the group membership
difference.
* Added handling for empty `directory_id` if managed identity
encountered during the crawling of StoragePermissionMapping
([#986](#986)). This PR adds
a `type` field to the `StoragePermissionMapping` and `Principal`
dataclasses to differentiate between service principals and managed
identities, allowing `None` for the `directory_id` field if the
principal is not a service principal. During the migration to UC storage
credentials, managed identities are currently ignored. These changes
improve handling of managed identities during the crawling of
`StoragePermissionMapping`, prevent errors when creating storage
credentials with managed identities, and address issue
[#339](#339). The changes
are tested through unit tests, manual testing, and integration tests,
and only affect the `StoragePermissionMapping` class and related
methods, without introducing new commands, workflows, or tables.
* Added migration for Azure Service Principals with secrets stored in
Databricks Secret to UC Storage Credentials
([#874](#874)). In this
release, we have made significant updates to migrate Azure Service
Principals with their secrets stored in Databricks Secret to UC Storage
Credentials, enhancing security and management of storage access. The
changes include: Addition of a new `migrate_credentials` command in the
`labs.yml` file to migrate credentials for storage access to UC storage
credential. Modification of `secrets.py` to handle the case where a
secret has been removed from the backend and to log warning messages for
secrets with invalid Base64 bytes. Introduction of the
`StorageCredentialManager` and `ServicePrincipalMigration` classes in
`credentials.py` to manage Azure Service Principals and their associated
client secrets, and to migrate them to UC Storage Credentials. Addition
of a new `directory_id` attribute in the `Principal` class and its
associated dataclass in `resources.py` to store the directory ID for
creating UC storage credentials using a service principal. Creation of a
new pytest fixture, `make_storage_credential_spn`, in `fixtures.py` to
simplify writing tests requiring Databricks Storage Credentials with
Azure Service Principal auth. Addition of a new test file for the Azure
integration of the project, including new classes, methods, and test
cases for testing the migration of Azure Service Principals to UC
Storage Credentials. These improvements will ensure better security and
management of storage access using Azure Service Principals, while
providing more efficient and robust testing capabilities.
* Added permission migration support for feature tables and the root
permissions for models and feature tables
([#997](#997)). This commit
introduces support for migration of permissions related to feature
tables and sets root permissions for models and feature tables. New
functions such as `feature_store_listing`, `feature_tables_root_page`,
`models_root_page`, and `tokens_and_passwords` have been added to
facilitate population of a workspace access page with necessary
permissions information. The `factory` function in `manager.py` has been
updated to include new listings for models' root page, feature tables'
root page, and the feature store for enhanced management and access
control of models and feature tables. New classes and methods have been
implemented to handle permissions for these resources, utilizing
`GenericPermissionsSupport`, `AccessControlRequest`, and `MigratedGroup`
classes. Additionally, new test methods have been included to verify
feature tables listing functionality and root page listing functionality
for feature tables and registered models. The test manager method has
been updated to include `feature-tables` in the list of items to be
checked for permissions, ensuring comprehensive testing of permission
functionality related to these new feature tables.
* Added support for serving endpoints
([#990](#990)). In this
release, we have made significant enhancements to support serving
endpoints in our open-source library. The `fixtures.py` file in the
`databricks.labs.ucx.mixins` module has been updated with new classes
and functions to create and manage serving endpoints, accompanied by
integration tests to verify their functionality. We have added a new
listing for serving endpoints in the assessment's permissions crawling,
using the `ws.serving_endpoints.list` function and the
`serving-endpoints` category. A new integration test, "test_endpoints,"
has been added to verify that assessments now crawl permissions for
serving endpoints. This test demonstrates the ability to migrate
permissions from one group to another. The test suite has been updated
to ensure the proper functioning of the new feature and improve the
assessment of permissions for serving endpoints, ensuring compatibility
with the updated `test_manager.py` file.
* Expanded end-user documentation with detailed descriptions for
workflows and commands
([#999](#999)). The
Databricks Labs UCX project has been updated with several new features
to assist in upgrading to Unity Catalog, including an assessment
workflow that generates a detailed compatibility report for workspace
entities, a group migration workflow for upgrading all Databricks
workspace assets, and utility commands for managing cross-workspace
installations. The Assessment Report now includes a more detailed
summary of the assessment findings, table counts, database summaries,
and external locations. Additional improvements include expanded
workspace group migration to handle potential conflicts with locally
scoped group names, enhanced documentation for external Hive Metastore
integration, a new debugging notebook, and detailed descriptions of
table upgrade considerations, data access permissions, external storage,
and table crawler.
* Fixed `config.yml` upgrade from very old versions
([#984](#984)). In this
release, we've introduced enhancements to the configuration upgrading
process for `config.yml` in our open-source library. We've replaced the
previous `v1_migrate` class method with a new implementation that
specifically handles migration from version 1. The new method retrieves
the `groups` field, extracts the `selected` value, and assigns it to the
`include_group_names` key in the configuration. The
`backup_group_prefix` value from the `groups` field is assigned to the
`renamed_group_prefix` key, and the `groups` field is removed, with the
version number updated to 2. These changes simplify the code and improve
readability, enabling users to upgrade smoothly from version 1 of the
configuration. Furthermore, we've added new unit tests to the
`test_config.py` file to ensure backward compatibility. Two new tests,
`test_v1_migrate_zeroconf` and `test_v1_migrate_some_conf`, have been
added, utilizing the `MockInstallation` class and loading the
configuration using `WorkspaceConfig`. These tests enhance the
robustness and reliability of the migration process for `config.yml`.
* Renamed columns in assessment SQL queries to use actual names, not
aliases ([#983](#983)). In
this update, we have resolved an issue where aliases used for column
references in SQL queries caused errors in certain setups by renaming
them to use actual names. Specifically, for assessment SQL queries, we
have modified the definition of the `is_delta` column to use the actual
`table_format` name instead of the alias `format`. This change improves
compatibility and enhances the reliability of query execution. As a
software engineer, you will appreciate that this modification ensures
consistent interpretation of column references across various setups,
thereby avoiding potential errors caused by aliases. This change does
not introduce any new methods, but instead modifies existing
functionality to use actual column names, ensuring a more reliable and
consistent SQL query for the `05_0_all_tables` assessment.
* Updated groups permissions validation to use Table ACL cluster
([#979](#979)). In this
update, the `validate_groups_permissions` task has been modified to
utilize the Table ACL cluster, as indicated by the inclusion of
`job_cluster="tacl"`. This task is responsible for ensuring that all
crawled permissions are accurately applied to the destination groups by
calling the `permission_manager.apply_group_permissions` method during
the migration state. This modification enhances the validation of group
permissions by performing it on the Table ACL cluster, potentially
improving performance or functionality. If you are implementing this
project, it is crucial to comprehend the consequences of this change on
your permissions validation process and adjust your workflows
appropriately.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant