Release v0.23.0 #1671

nfx · 2024-05-08T15:44:11Z

Added DBSQL queries & dashboard migration (#1532). The Databricks Labs Unified Command Extensions (UCX) project has been updated with two new experimental commands: migrate-dbsql-dashboards and revert-dbsql-dashboards. These commands are designed for migrating and reverting the migration of Databricks SQL dashboards in the workspace. The migrate-dbsql-dashboards command transforms all Databricks SQL dashboards in the workspace after table migration, tagging migrated dashboards and queries with migrated by UCX and backing up original queries. The revert-dbsql-dashboards command returns migrated Databricks SQL dashboards to their original state before migration. Both commands accept a --dashboard-id flag for migrating or reverting a specific dashboard. Additionally, two new functions, migrate_dbsql_dashboards and revert_dbsql_dashboards, have been added to the cli.py file, and new classes have been added to interact with Redash for data visualization and querying. The make_dashboard fixture has been updated to enhance testing capabilities, and new unit tests have been added for migrating and reverting DBSQL dashboards.
Added UDFs assessment (#1610). A User Defined Function (UDF) assessment feature has been introduced, addressing issue #1610. A new method, DESCRIBE_FUNCTION, has been implemented to retrieve detailed information about UDFs, including function description, input parameters, and return types. This method has been integrated into existing test cases, enhancing the validation of UDF metadata and associated privileges, and ensuring system reliability. The UDF constructor has been updated with a new parameter 'comment', initially left blank in the test function. Additionally, two new columns, success and 'failures', have been added to the udf table in the inventory database to store assessment data for UDFs. The UdfsCrawler class has been updated to return a list of UDF objects, and the assertions in the test have been updated accordingly. Furthermore, a new SQL file has been added to calculate the total count of UDFs in the $inventory.udfs table, with a widget displaying this information as a counter visualization named "Total UDF Count".
Added databricks labs ucx create-missing-principals command to create the missing UC roles in AWS (#1495). The databricks labs ucx tool now includes a new command, create-missing-principals, which creates missing Universal Catalog (UC) roles in AWS for S3 locations that lack a UC compatible role. This command is implemented using IamRoleCreation from databricks.labs.ucx.aws.credentials and updates AWSRoleAction with the corresponding role_arn while adding AWSUCRoleCandidate. The new command only supports AWS and does not affect Azure. The existing migrate_credentials function has been updated to handle Azure Service Principals migration. Additionally, new classes and methods have been added, including AWSUCRoleCandidate in aws.py, and create_missing_principals and list_uc_roles methods in access.py. The create_uc_roles_cli method in access.py has been refactored and renamed to list_uc_roles. New unit tests have been implemented to test the functionality of create_missing_principals for AWS and Azure, as well as testing the behavior when the command is not approved.
Added baseline for workflow linter (#1613). This change introduces the WorkflowLinter class in the application.py file of the databricks.labs.ucx.source_code.jobs package. The class is used to lint workflows by checking their dependencies and ensuring they meet certain criteria, taking in arguments such as workspace_client, dependency_resolver, path_lookup, and migration_index. Several properties have been moved from dependency_resolver to the CliContext class, and the NotebookLoader class has been moved to a new location. Additionally, several classes and methods have been introduced to build a dependency graph, resolve dependencies, and manage allowed dependencies, site packages, and supported programming languages. The generic and redash modules from databricks.labs.ucx.workspace_access and the GroupManager class from databricks.labs.ucx.workspace_access.groups are used. The VerifyHasMetastore, UdfsCrawler, and TablesMigrator classes from databricks.labs.ucx.hive_metastore and the DeployedWorkflows class from databricks.labs.ucx.installer.workflows are also used. This commit is part of a larger effort to improve workflow linting and addresses several related issues and pull requests.
Added linter to check for RDD use and JVM access (#1606). A new AstHelper class has been added to provide utility functions for working with abstract syntax trees (ASTs) in Python code, including methods for extracting attribute and function call node names. Additionally, a linter has been integrated to check for RDD use and JVM access, utilizing the AstHelper class, which has been moved to a separate module. A new file, 'spark_connect.py', introduces a linter with three matchers to ensure conformance to best practices and catch potential issues early in the development process related to RDD usage and JVM access. The linter is environment-aware, accommodating shared cluster and serverless configurations, and includes new test methods to validate its functionality. These improvements enhance codebase quality, promote reusability, and ensure performance and stability in Spark cluster environments.
Added non-Delta DBFS table migration (What.DBFS_ROOT_NON_DELTA) in migrate_table workflow (#1621). The migrate_tables workflow in workflows.py has been enhanced to support a new scenario, DBFS_ROOT_NON_DELTA, which covers non-delta tables stored in DBFS root from the Hive Metastore to the Unity Catalog using CTAS. Additionally, the ACL migration strategy has been updated to include the AclMigrationWhat.PRINCIPAL strategy. The migrate_external_tables_sync, migrate_dbfs_root_delta_tables, and migrate_views tasks now incorporate the new ACL migration strategy. These changes have been thoroughly tested through unit tests and integration tests, ensuring the continued functionality of the existing workflow while expanding its capabilities.
Added "seen tables" feature (#1465). The seen tables feature has been introduced, allowing for better handling of existing tables in the hive metastore and supporting their migration to UC. This enhancement includes the addition of a snapshot method that fetches and crawls table inventory, appending or overwriting records based on assessment results. The _crawl function has been updated to check for and skip existing tables in the current workspace. New methods such as '_get_tables_paths_from_assessment', '_overwrite_records', and _get_table_location have been included to facilitate these improvements. In the testing realm, a new test test_mount_listing_seen_tables has been implemented, replacing 'test_partitioned_csv_jsons'. This test checks the behavior of the TablesInMounts class when enumerating tables in mounts for a specific context, accounting for different table formats and managing external and managed tables. The diff modifies the 'locations.py' file in the databricks/labs/ucx directory, related to the hive metastore.
Added support for migrate-tables-ctas workflow in the databricks labs ucx migrate-tables CLI command (#1660). This commit adds support for the migrate-tables-ctas workflow in the databricks labs ucx migrate-tables command, which checks for external tables that cannot be synced and prompts the user to run the migrate-tables-ctas workflow. Two new methods, test_migrate_external_tables_ctas(ws) and migrate_tables(ws, prompts, ctx=ctx), have been added. The first method checks if the migrate-external-tables-ctas workflow is called correctly, while the second method runs the workflow after prompting the user. The method test_migrate_external_hiveserde_tables_in_place(ws) has been modified to test if the migrate-external-hiveserde-tables-in-place-experimental workflow is called correctly. No new methods or significant modifications to existing functionality have been made in this commit. The changes include updated unit tests and user documentation. The target audience for this feature are software engineers who adopt the project.
Added support for migrating external location permissions from interactive cluster mounts (#1487). This commit adds support for migrating external location permissions from interactive cluster mounts in Databricks Labs' UCX project, enhancing security and access control. It retrieves interactive cluster locations and user mappings from the AzureACL class, granting necessary permissions to each cluster principal for each location. The existing databricks labs ucx command is modified, with the addition of the new method create_external_locations and thorough testing through manual, unit, and integration tests. This feature is developed by vuong-nguyen and Vuong and addresses issues #1192 and #1193, ensuring a more robust and controlled user experience with interactive clusters.
Added uber principal spn details in SQL warehouse data access configuration when creating uber-SPN (#1631). In this release, we've implemented new features to enhance the security and control over data access during the migration process for the SQL warehouse data access configuration. The databricks labs ucx create-uber-principal command now creates a service principal with read-only access to all the storage used by tables in the workspace. The UCX Cluster Policy and SQL Warehouse data access configuration will be updated to use this service principal for migration workflows. A new method, _update_sql_dac_with_instance_profile, has been introduced in the access.py file to update the SQL data access configuration with the provided AWS instance profile, ensuring a more streamlined management of instance profiles within the SQL data access configuration during the creation of an uber service principal (SPN). Additionally, new methods and tests have been added to the sql module of the databricks.sdk.service package to improve Azure resource permissions, handling different scenarios related to creating a global SPN in the presence or absence of various conditions, such as storage, cluster policies, or secrets.
Addressed issue with disabled features in certain regions (#1618). In this release, we have implemented improvements to address an issue where certain features were disabled in specific regions. We have added error handling when listing serving endpoints to raise a NotFound error if a feature is disabled, preventing the code from failing silently and providing better error messages. A new method, test_serving_endpoints_not_enabled, has been added, which creates a mock WorkspaceClient and raises a NotFound error if serving endpoints are not enabled for a shard. The GenericPermissionsSupport class uses this method to get crawler tasks, and if serving endpoints are not enabled, an error message is logged. These changes increase the reliability and robustness of the codebase by providing better error handling and messaging for this particular issue. Additionally, the change includes unit tests and manual testing to ensure the proper functioning of the new features.
Aggregate UCX output across workspaces with CLI command (#1596). A new report-account-compatibility command has been added to the databricks labs ucx tool, enabling users to evaluate the compatibility of an entire Azure Databricks account with UCX (Unified Client Context). This command generates a readiness report for an Azure Databricks account, specifically for evaluating compatibility with UCX, by querying various aspects of the account such as clusters, configurations, and data formats. It uses Azure CLI authentication with AAD tokens for authentication and accepts a profile as an argument. The output includes warnings for workspaces that do not have UCX installed, and provides information about unsupported cluster types, unsupported configurations, data format compatibility, and more. Additionally, a new feature has been added to aggregate UCX output across workspaces in an account through a new CLI command, "report-account-compatibility", which can be run at the account level. The existing manual-workspace-info command remains unchanged. These changes will help assess the readiness and compatibility of an Azure Databricks account for UCX integration and simplify the process of checking compatibility across an entire account.
Assert if group name is in cluster policy (#1665). In this release, we have implemented a change to ensure the presence of the display name of a specific workspace group (ws_group_a) in the cluster policy. This is to prevent a key error previously encountered. The cluster policy is now loaded as a dictionary, and the group name is checked to confirm its presence. If the group is not found, a message is raised alerting users. Additionally, the permission level for the group is verified to ensure it is set to CAN_USE. No new methods have been added, and existing functionality remains unchanged. The test file test_ext_hms.py has been updated to include the new assertion and has undergone both unit tests and manual testing to ensure proper implementation. This change is intended for software engineers who adopt the project.
Automatically retrying with auth_type=azure-cli when constructing workspace_clients on Azure (#1650). This commit introduces automatic retrying with 'auth_type=azure-cli' when constructing workspace_clients on Azure, resolving TODO items for AccountWorkspaces and adding relevant suggestions in 'troubleshooting.md'. It closes issues #1574 and #1430, and includes new methods for generating readiness reports in AccountAggregate and testing the get_accessible_workspaces method in 'test_workspaces.py'. User documentation has been updated and the changes have been manually verified in a staging environment. For macOS and Windows users, explicit auth type settings are required for command line utilities.
Changes to identify service principal with custom roles on Azure storage account for principal-prefix-access (#1576). This release introduces several enhancements to the identification of service principals with custom roles on Azure storage accounts for principal-prefix-access. New methods such as _get_permission_level, _get_custom_role_privilege, and _get_role_privilege have been added to improve the functionality of the module. Additionally, two new classes, AzureRoleAssignment and AzureRoleDetails, have been added to enable more detailed management and access control for custom roles on Azure storage accounts. The 'test_access.py' file has been updated to include tests for saving custom roles in Azure storage accounts and ensuring the correct identification of service principals with custom roles. A new unit test function, test_role_assignments_custom_storage(), has also been added to verify the behavior of custom roles in Azure storage accounts. Overall, these changes provide a more efficient and fine-grained way to manage and control custom roles on Azure storage accounts.
Clarified unsupported config in compute crawler (#1656). In this release, we have made significant changes to clarify and improve the handling of unsupported configurations in our compute crawler related to the Hive metastore. We have expanded error messages for unsupported configurations and provided detailed recommendations for remediation. Additionally, we have added relevant user documentation and manually tested the changes. The changes include updates to the configuration for external Hive metastore and passthrough security model for Unity Catalog, which are incompatible with the current configurations. We recommend removing or altering the configs while migrating existing tables and views using UCX or other compatible clusters, and mapping the passthrough security model to a security model compatible with Unity Catalog. The code modifications include the addition of new methods for checking cluster init script and Spark configurations, as well as refining the error messages for unsupported configurations. We also added a new assertion in the test_cluster_with_multiple_failures unit test to check for the presence of a specific message regarding the use of the spark.databricks.passthrough.enabled configuration. This release is not yet verified on the staging environment.
Created a unique default schema when External Hive Metastore is detected (#1579). A new default database ucx is introduced for storing inventory in the hive metastore, with a suffix consisting of the workspace's client ID to ensure uniqueness when an external hive metastore is detected. The has_ext_hms() method is added to the InstallationPolicy class to detect external HMS and thereby create a unique default schema. The _prompt_for_new_installation method's default value for the Inventory Database stored in hive_metastore prompt is updated to use the new default database name, modified to include the workspace's client ID if external HMS is detected. Additionally, a test function test_save_config_ext_hms is implemented to demonstrate the WorkspaceInstaller class's behavior with external HMS, creating a unique default schema for improved system functionality and customization. This change is part of issue #1579.
Extend service principal migration to create storage credentials for access connectors created for each storage account (#1426). This commit extends the service principal migration to create storage credentials for access connectors associated with each storage account, resolving issues #1384 and #875. The update includes modifications to the existing databricks labs ucx command for creating access connectors, adds a new CLI command for creating storage credentials, and updates the documentation. A new workflow has been added for creating credentials for access connectors and service principals, and updates have been made to existing workflows. The commit includes manual, unit, and integration tests, and no new or modified methods are specified in the diff. The focus is on the feature description and its impact on the project's functionality. The commit has been co-authored by Serge Smertin and vuong-nguyen.
Suggest users to create Access Connector(s) with Managed Identity to access Azure Storage Accounts behind firewall (#1589). In this release, we have introduced a new feature to improve access to Azure Storage Accounts that are protected by firewalls. Due to limitations with service principals in such scenarios, we have developed Access Connectors with Managed Identities for more reliable connectivity. This change includes updates to the 'credentials.py' file, which introduces new methods for managing the migration of service principals to Access Connectors using Managed Identities. Users are warned that migrating to this new feature may cause issues when transitioning to UC, and are advised to validate external locations after running the migration command. This update enhances the security and functionality of the system, providing a more dependable method for accessing Azure Storage Accounts protected by firewalls.
Fixed catalog/schema grants when tables with same source schema have different target schemas (#1581). In this release, we have implemented a fix to address an issue where catalog/schema grants were not being handled correctly when tables with the same source schema had different target schemas. This was causing problems with granting appropriate permissions to users. We have modified the prepare_test function to include an additional test case with a different target schema for the same source table. Furthermore, we have updated the test_catalog_schema_acl function to ensure that grants are being created correctly for all catalogs, schemas, and tables. We have also added an extra query to grant use schema permissions for catalog2.schema3 to user1. Additionally, we have introduced a new SchemaInfo class to store information about catalogs and schemas, and refactored the _get_database_source_target_mapping method to return a dictionary that maps source databases to a list of SchemaInfo objects instead of a single dictionary. These changes ensure that grants are being handled correctly for catalogs, schemas, and tables, even when tables with the same source schema have different target schemas. This will improve the overall functionality and reliability of the system, making it easier for users to manage their catalogs and schemas.
Fixed Spark configuration parameter referencing secret (#1635). In this release, the code related to the Spark configuration parameter reference for a secret has been updated in the access.py file, specifically within the _update_cluster_policy_definition method. The change modifies the method to retrieve the OAuth client secret for a given storage account using an f-string to reference the secret, replacing the previous concatenation operator. This enhancement is aimed at improving the readability and maintainability of the code while preserving its functionality. Furthermore, the commit includes additional changes, such as new methods test_create_global_spn and "cluster_policies.edit", which may be related to this fix. These changes address the secret reference issue, ensuring secure access control and improved integration, particularly with the Spark configuration, benefiting engineers utilizing this project for handling sensitive information and managing clusters securely and effectively.
Fixed migration-locations and assign-metastore definitions in labs.yml (#1627). In this release, the migration-locations command in the labs.yml file has been updated to include new flags subscription-id and aws-profile. The subscription-id flag allows users to specify the subscription to scan the storage account in, and the aws-profile flag allows for authentication using a specified AWS Profile. The assign-metastore command has also been updated with a new description: "Enable Unity Catalog features on a workspace by assigning a metastore to it." The is_account_level parameter remains unchanged, and the new optional flag workspace-id has been added, allowing users to specify the Workspace ID to assign a metastore to. This change enhances the functionality of the migration-locations and assign-metastore commands, providing more options for users to customize their storage scanning and metastore assignment processes. The migration-locations and assign-metastore definitions in the labs.yml file have been fixed in this release.
Fixed prompt for using external metastore (#1668). A fix has been implemented in the create function of the policy.py file to correctly prompt users for using an external metastore. Previously, a missing period and space in the prompt caused potential confusion. The updated prompt now includes a clarifying sentence and the _prompts.confirm method has been modified to check if the user wants to set UCX to connect to an external metastore in two scenarios: when one or more cluster policies are set up for an external metastore, and when the workspace warehouse is configured for an external metastore. If the user chooses to set up an external metastore, an informational message will be recorded in the logger. This change ensures clear and precise communication with users during the external metastore setup process.
Fixed storage account network ACLs retrieved from properties (#1620). This release includes a fix to the storage account network ACLs retrieval in the open-source library, addressing issue #1. Previously, the network ACLs were being retrieved from an incorrect location, but this commit corrects that by obtaining the network ACLs from the storage account's properties.networkAcls field. The StorageAccount class has been updated to modify the way default network action is retrieved, with a new value Unknown added to the previous values Deny and "Allow". The from_raw_resource class method has also been updated to retrieve the default network action from the properties.networkAcls field instead of the networkAcls field. This change may affect any functionality that relies on network ACL information and impacts the existing command databricks labs ucx .... Relevant tests, including a new test test_azure_resource_storage_accounts_list_non_zero, have been added and manually and unit tested to ensure the fix is functioning correctly.
Fully refresh table migration status in table migration workflow (#1630). This release introduces a new method, index_full_refresh(), to the table migration workflow for fully refreshing the migration status, addressing an oversight from a previous commit (#1623) and resolving issue #1628. The new method resets the _migration_status_refresher before computing the index, ensuring the latest migration status is used for determining whether view dependencies have been migrated. The index() method was previously used to refresh the migration status, but it only provided a partial refresh. With this update, index_full_refresh() is utilized for a comprehensive refresh, affecting the refresh_migration_status task in multiple workflows such as migrate_views, scan_tables_in_mounts_experimental, and others. This change ensures a more accurate migration report, presenting the updated migration status.
Ignore existing corrupted installations when refreshing (#1605). A recent update has enhanced the error handling during the loading of installations in the install.py file. Specifically, the installation.load function now handles certain errors, including PermissionDenied, SerdeError, ValueError, and AttributeError, by logging a warning message and skipping the corrupted installation instead of raising an error. This behavior has been incorporated into both the configure and _check_inventory_database_exists functions, allowing the installation process to continue even in the presence of issues with existing installations, while providing improved error messages. This change resolves issue #1601 and introduces a new test case for a corrupted installation configuration, as well as an updated existing test case for test_save_config that includes a mock installation.
Improved exception handling (#1584). In this release, the exception handling during the upload of a wheel file to DBFS has been significantly improved. Previously, only PermissionDenied errors were caught and handled. Now, both BadRequest and PermissionDenied exceptions will be caught and logged as a warning. This change enhances the robustness of the code by handling a wider range of exceptions during the upload process. In addition, cluster overrides have been configured and DBFS write permissions have been set up. The specific changes made to the code include updating the import statement for NotFound to include BadRequest and modifying the except block in the _get_init_script_data method to catch both NotFound and BadRequest exceptions. These improvements ensure that the code can handle more types of errors, providing more helpful error messages and preventing crash scenarios, thereby enhancing the reliability and robustness of the code.
Improved exception handling for migrate_acl (#1590). In this release, the migrate_acl functionality has been enhanced to improve exception handling, addressing a flakiness issue in the test_migrate_managed_tables_with_acl test. Previously, unhandled not found exceptions during parallel test execution caused the flakiness. This release resolves this issue (#1549) by introducing error handling in the test_migrate_acls_should_produce_proper_queries test. A controlled error is now introduced to simulate a failed grant migration due to a TABLE_OR_VIEW_NOT_FOUND error. This enhancement allows for precise testing of error handling and logging mechanisms when migration fails for specific objects, ensuring a more reliable testing environment for the migrate_acl functionality.
Improved reliability of table migration status refresher (#1623). This release introduces improvements to the table migration status refresher in the open-source library, enhancing its reliability and robustness. The table_migrate function has been updated to ensure that the table migration status is always reset when requesting the latest snapshot, addressing issues #1623, #1622, and #1615. Additionally, the function now handles NotFound errors when refreshing migration status. The get_seen_tables function has been modified to convert the returned iterator to a list and raise a NotFound exception if the schema does not exist, which is then caught and logged as a warning. Furthermore, the migration status reset behavior has been improved, and the migration_status_refresher parameter type in the TableMigrate class constructor has been modified. New private methods _index_with_reset() and updated _migrate_views() and _view_can_be_migrated() methods have been added to ensure a more accurate and consistent table migration process. The changes have been thoroughly tested and are ready for review.
Refresh migration status at the end of the migrate_tables workflows (#1599). In this release, updates have been made to the migration status at the end of the migrate_tables workflows, with no new or modified tables or methods introduced. The _migration_status_refresher.reset() method has been added in two locations to ensure accurate migration status updates. A new refresh_migration_status method has been included in the RuntimeContext class in the databricks.labs.ucx.hive_metastore.workflows module, which refreshes the migration status for presentation in the dashboard. The changes also include the addition of the refresh_migration_status task in migrate_views, migrate_views_with_acl, and scan_tables_in_mounts_experimental workflows, and the migration_report method is now dependent on the refresh_migration_status task. Thorough testing has been conducted, including the creation of a new integration test in the file tests/integration/hive_metastore/test_workflows.py to verify that the migration status is refreshed after the migration job is run. These changes aim to ensure that the migration status is up-to-date and accurately presented in the dashboard.
Removed DBFS library installations (#1554). In this release, the "configure.py" file has been removed, which previously contained the ConfigureClusterOverrides class with methods for validating cluster IDs, distinguishing between classic and Table Access Control (TACL) clusters, and building a prompt for users to select a valid active cluster ID. The removal of this file signifies that these functionalities are no longer available. This change is part of a larger commit that also removes DBFS library installations and updates the Estimates Dashboard to remove metastore assignment, addressing issue #1098. The commit has been tested via integration tests and manual installation and running of UCX on a no-uc environment. Please note that the create_jobs method in the install.py file has been updated to reflect these changes, ensuring a more straightforward installation experience and usage of the Estimates Dashboard.
Removed the Is Terraform used prompt (#1664). In this release, we have removed the is_terraform_used prompt from the configuration file and the installation process in the ucx package. This prompt was not being utilized and had been a source of confusion for some users. Although the variable that stored its outcome will be retained for backwards compatibility, no new methods or modifications to existing functionality have been introduced. No tests have been added or modified as part of this change. The removal of this prompt simplifies the configuration process and aligns with the project's future plans to eliminate the use of Terraform state for ucx migration. Manual testing has been conducted to ensure that the removal of the prompt does not affect the functionality of other properties in the configuration file or the installation process.
Resolve relative paths when building dependency graph (#1608). This commit introduces support for resolving relative paths when building a dependency graph in the UCX project, addressing issues 1202, 1499, and 1287. The SysPathProvider now includes a cwd attribute, and a new class, LocalNotebookLoader, has been implemented to handle local files and folders. The PathLookup class is used to resolve paths, and new methods have been added to support these changes. Unit tests have been provided to ensure the correct functioning of the new functionality. This commit replaces issue 1593 and enhances the project's ability to handle local files and folders, resulting in a more robust and reliable dependency graph.
Show tables migration status in migration dashboard (#1507). A migration dashboard has been added to display the status of data object migrations, addressing issue #323. This new feature includes a query to show the migration status of tables, a new CLI command, and a modification to an existing command. The migrataion-* workflow has been updated to include a refresh migration dashboard option. The mock_installation function has been modified with an updated state.json file. The changes consist of manual testing and can be found in the migrations/main directory as a new SQL query file. This migration dashboard provides users with an easier way to monitor the progress and status of their data migration tasks.
Simulate loading of local files or notebooks after manipulation of sys.path (#1633). This commit updates the PathLookup process during the construction of the dependency graph, addressing issues #1202 and #1468. It simplifies the DependencyGraphBuilder by directly using the DependencyResolver with resolvers and lookup passed as arguments, and removes the DependencyGraphBuilder. The changes include new methods for handling compatibility checks, but no new user-facing features or changes to command-line interfaces or existing workflows are introduced. Unit tests are included to ensure correct behavior. The modifications aim to improve the internal handling of dependency resolution and compatibility checks.
Test if create-catalogs-schemas works with tables defined as mount paths (#1578). This release includes a new unit test for the create-catalogs-schemas logic that verifies the correct creation and management of catalogs and schemas defined as mount paths. The test checks the storage location of catalogs, ensures non-existing schemas are properly created, and prevents the creation of catalogs without a storage location. It also verifies the catalog schema ACL is set correctly. Using the CatalogSchema class and various test functions, the test creates and grants permissions to catalogs and schemas. This change resolves issue #1039 without modifying any existing commands or workflows. The release contains no new CLI commands or user documentation, but includes unit tests and assertion calls to validate the behavior of the create_all_catalogs_schemas method.
Upgraded databricks-sdk to 0.27 (#1626). In this release, the databricks-sdk package has been upgraded to version 0.27, bringing updated methods for Redash objects. The _install_query method in the dashboards.py file has been updated to include a tags parameter, set to None, when calling self._ws.queries.update and self._ws.queries.create. This ensures that the updated SDK version is used and that tags are not applied during query updates and creation. Additionally, the databricks-labs-lsql and databricks-labs-blueprint packages have been updated to versions 0.4.0 and 0.4.3 respectively, and the dependency for PyYAML has been updated to a version between 6.0.0 and 7.0.0. These updates may impact the functionality of the project. The changes have been manually tested, but there is no verification on a staging environment.
Use stack of dependency resolvers (#1560). This pull request introduces a stack-based implementation of resolvers, resolving issues #1202, #1499, and #1421, and implements an initial version of SysPathProvider, while eliminating previous hacks. The new functionality includes modified existing commands, a new workflow, and the addition of unit tests. No new documentation or CLI commands have been added. The problem_collector parameter is not addressed in this PR and has been moved to a separate issue. The changes include renaming and moving a Python file, as well as modifications to the Notebook class and its related methods for handling notebook dependencies and dependency checking. The code has been tested, but manual testing and integration tests are still pending.

* Added DBSQL queries & dashboard migration ([#1532](#1532)). The Databricks Labs Unified Command Extensions (UCX) project has been updated with two new experimental commands: `migrate-dbsql-dashboards` and `revert-dbsql-dashboards`. These commands are designed for migrating and reverting the migration of Databricks SQL dashboards in the workspace. The `migrate-dbsql-dashboards` command transforms all Databricks SQL dashboards in the workspace after table migration, tagging migrated dashboards and queries with `migrated by UCX` and backing up original queries. The `revert-dbsql-dashboards` command returns migrated Databricks SQL dashboards to their original state before migration. Both commands accept a `--dashboard-id` flag for migrating or reverting a specific dashboard. Additionally, two new functions, `migrate_dbsql_dashboards` and `revert_dbsql_dashboards`, have been added to the `cli.py` file, and new classes have been added to interact with Redash for data visualization and querying. The `make_dashboard` fixture has been updated to enhance testing capabilities, and new unit tests have been added for migrating and reverting DBSQL dashboards. * Added UDFs assessment ([#1610](#1610)). A User Defined Function (UDF) assessment feature has been introduced, addressing issue [#1610](#1610). A new method, DESCRIBE_FUNCTION, has been implemented to retrieve detailed information about UDFs, including function description, input parameters, and return types. This method has been integrated into existing test cases, enhancing the validation of UDF metadata and associated privileges, and ensuring system reliability. The UDF constructor has been updated with a new parameter 'comment', initially left blank in the test function. Additionally, two new columns, `success` and 'failures', have been added to the udf table in the inventory database to store assessment data for UDFs. The UdfsCrawler class has been updated to return a list of UDF objects, and the assertions in the test have been updated accordingly. Furthermore, a new SQL file has been added to calculate the total count of UDFs in the $inventory.udfs table, with a widget displaying this information as a counter visualization named "Total UDF Count". * Added `databricks labs ucx create-missing-principals` command to create the missing UC roles in AWS ([#1495](#1495)). The `databricks labs ucx` tool now includes a new command, `create-missing-principals`, which creates missing Universal Catalog (UC) roles in AWS for S3 locations that lack a UC compatible role. This command is implemented using `IamRoleCreation` from `databricks.labs.ucx.aws.credentials` and updates `AWSRoleAction` with the corresponding `role_arn` while adding `AWSUCRoleCandidate`. The new command only supports AWS and does not affect Azure. The existing `migrate_credentials` function has been updated to handle Azure Service Principals migration. Additionally, new classes and methods have been added, including `AWSUCRoleCandidate` in `aws.py`, and `create_missing_principals` and `list_uc_roles` methods in `access.py`. The `create_uc_roles_cli` method in `access.py` has been refactored and renamed to `list_uc_roles`. New unit tests have been implemented to test the functionality of `create_missing_principals` for AWS and Azure, as well as testing the behavior when the command is not approved. * Added baseline for workflow linter ([#1613](#1613)). This change introduces the `WorkflowLinter` class in the `application.py` file of the `databricks.labs.ucx.source_code.jobs` package. The class is used to lint workflows by checking their dependencies and ensuring they meet certain criteria, taking in arguments such as `workspace_client`, `dependency_resolver`, `path_lookup`, and `migration_index`. Several properties have been moved from `dependency_resolver` to the `CliContext` class, and the `NotebookLoader` class has been moved to a new location. Additionally, several classes and methods have been introduced to build a dependency graph, resolve dependencies, and manage allowed dependencies, site packages, and supported programming languages. The `generic` and `redash` modules from `databricks.labs.ucx.workspace_access` and the `GroupManager` class from `databricks.labs.ucx.workspace_access.groups` are used. The `VerifyHasMetastore`, `UdfsCrawler`, and `TablesMigrator` classes from `databricks.labs.ucx.hive_metastore` and the `DeployedWorkflows` class from `databricks.labs.ucx.installer.workflows` are also used. This commit is part of a larger effort to improve workflow linting and addresses several related issues and pull requests. * Added linter to check for RDD use and JVM access ([#1606](#1606)). A new `AstHelper` class has been added to provide utility functions for working with abstract syntax trees (ASTs) in Python code, including methods for extracting attribute and function call node names. Additionally, a linter has been integrated to check for RDD use and JVM access, utilizing the `AstHelper` class, which has been moved to a separate module. A new file, 'spark_connect.py', introduces a linter with three matchers to ensure conformance to best practices and catch potential issues early in the development process related to RDD usage and JVM access. The linter is environment-aware, accommodating shared cluster and serverless configurations, and includes new test methods to validate its functionality. These improvements enhance codebase quality, promote reusability, and ensure performance and stability in Spark cluster environments. * Added non-Delta DBFS table migration (What.DBFS_ROOT_NON_DELTA) in migrate_table workflow ([#1621](#1621)). The `migrate_tables` workflow in `workflows.py` has been enhanced to support a new scenario, DBFS_ROOT_NON_DELTA, which covers non-delta tables stored in DBFS root from the Hive Metastore to the Unity Catalog using CTAS. Additionally, the ACL migration strategy has been updated to include the AclMigrationWhat.PRINCIPAL strategy. The `migrate_external_tables_sync`, `migrate_dbfs_root_delta_tables`, and `migrate_views` tasks now incorporate the new ACL migration strategy. These changes have been thoroughly tested through unit tests and integration tests, ensuring the continued functionality of the existing workflow while expanding its capabilities. * Added "seen tables" feature ([#1465](#1465)). The `seen tables` feature has been introduced, allowing for better handling of existing tables in the hive metastore and supporting their migration to UC. This enhancement includes the addition of a `snapshot` method that fetches and crawls table inventory, appending or overwriting records based on assessment results. The `_crawl` function has been updated to check for and skip existing tables in the current workspace. New methods such as '_get_tables_paths_from_assessment', '_overwrite_records', and `_get_table_location` have been included to facilitate these improvements. In the testing realm, a new test `test_mount_listing_seen_tables` has been implemented, replacing 'test_partitioned_csv_jsons'. This test checks the behavior of the TablesInMounts class when enumerating tables in mounts for a specific context, accounting for different table formats and managing external and managed tables. The diff modifies the 'locations.py' file in the databricks/labs/ucx directory, related to the hive metastore. * Added support for `migrate-tables-ctas` workflow in the `databricks labs ucx migrate-tables` CLI command ([#1660](#1660)). This commit adds support for the `migrate-tables-ctas` workflow in the `databricks labs ucx migrate-tables` command, which checks for external tables that cannot be synced and prompts the user to run the `migrate-tables-ctas` workflow. Two new methods, `test_migrate_external_tables_ctas(ws)` and `migrate_tables(ws, prompts, ctx=ctx)`, have been added. The first method checks if the `migrate-external-tables-ctas` workflow is called correctly, while the second method runs the workflow after prompting the user. The method `test_migrate_external_hiveserde_tables_in_place(ws)` has been modified to test if the `migrate-external-hiveserde-tables-in-place-experimental` workflow is called correctly. No new methods or significant modifications to existing functionality have been made in this commit. The changes include updated unit tests and user documentation. The target audience for this feature are software engineers who adopt the project. * Added support for migrating external location permissions from interactive cluster mounts ([#1487](#1487)). This commit adds support for migrating external location permissions from interactive cluster mounts in Databricks Labs' UCX project, enhancing security and access control. It retrieves interactive cluster locations and user mappings from the AzureACL class, granting necessary permissions to each cluster principal for each location. The existing `databricks labs ucx` command is modified, with the addition of the new method `create_external_locations` and thorough testing through manual, unit, and integration tests. This feature is developed by vuong-nguyen and Vuong and addresses issues [#1192](#1192) and [#1193](#1193), ensuring a more robust and controlled user experience with interactive clusters. * Added uber principal spn details in SQL warehouse data access configuration when creating uber-SPN ([#1631](#1631)). In this release, we've implemented new features to enhance the security and control over data access during the migration process for the SQL warehouse data access configuration. The `databricks labs ucx create-uber-principal` command now creates a service principal with read-only access to all the storage used by tables in the workspace. The UCX Cluster Policy and SQL Warehouse data access configuration will be updated to use this service principal for migration workflows. A new method, `_update_sql_dac_with_instance_profile`, has been introduced in the `access.py` file to update the SQL data access configuration with the provided AWS instance profile, ensuring a more streamlined management of instance profiles within the SQL data access configuration during the creation of an uber service principal (SPN). Additionally, new methods and tests have been added to the sql module of the databricks.sdk.service package to improve Azure resource permissions, handling different scenarios related to creating a global SPN in the presence or absence of various conditions, such as storage, cluster policies, or secrets. * Addressed issue with disabled features in certain regions ([#1618](#1618)). In this release, we have implemented improvements to address an issue where certain features were disabled in specific regions. We have added error handling when listing serving endpoints to raise a NotFound error if a feature is disabled, preventing the code from failing silently and providing better error messages. A new method, test_serving_endpoints_not_enabled, has been added, which creates a mock WorkspaceClient and raises a NotFound error if serving endpoints are not enabled for a shard. The GenericPermissionsSupport class uses this method to get crawler tasks, and if serving endpoints are not enabled, an error message is logged. These changes increase the reliability and robustness of the codebase by providing better error handling and messaging for this particular issue. Additionally, the change includes unit tests and manual testing to ensure the proper functioning of the new features. * Aggregate UCX output across workspaces with CLI command ([#1596](#1596)). A new `report-account-compatibility` command has been added to the `databricks labs ucx` tool, enabling users to evaluate the compatibility of an entire Azure Databricks account with UCX (Unified Client Context). This command generates a readiness report for an Azure Databricks account, specifically for evaluating compatibility with UCX, by querying various aspects of the account such as clusters, configurations, and data formats. It uses Azure CLI authentication with AAD tokens for authentication and accepts a profile as an argument. The output includes warnings for workspaces that do not have UCX installed, and provides information about unsupported cluster types, unsupported configurations, data format compatibility, and more. Additionally, a new feature has been added to aggregate UCX output across workspaces in an account through a new CLI command, "report-account-compatibility", which can be run at the account level. The existing `manual-workspace-info` command remains unchanged. These changes will help assess the readiness and compatibility of an Azure Databricks account for UCX integration and simplify the process of checking compatibility across an entire account. * Assert if group name is in cluster policy ([#1665](#1665)). In this release, we have implemented a change to ensure the presence of the display name of a specific workspace group (ws_group_a) in the cluster policy. This is to prevent a key error previously encountered. The cluster policy is now loaded as a dictionary, and the group name is checked to confirm its presence. If the group is not found, a message is raised alerting users. Additionally, the permission level for the group is verified to ensure it is set to CAN_USE. No new methods have been added, and existing functionality remains unchanged. The test file test_ext_hms.py has been updated to include the new assertion and has undergone both unit tests and manual testing to ensure proper implementation. This change is intended for software engineers who adopt the project. * Automatically retrying with `auth_type=azure-cli` when constructing `workspace_clients` on Azure ([#1650](#1650)). This commit introduces automatic retrying with 'auth_type=azure-cli' when constructing `workspace_clients` on Azure, resolving TODO items for `AccountWorkspaces` and adding relevant suggestions in 'troubleshooting.md'. It closes issues [#1574](#1574) and [#1430](#1430), and includes new methods for generating readiness reports in `AccountAggregate` and testing the `get_accessible_workspaces` method in 'test_workspaces.py'. User documentation has been updated and the changes have been manually verified in a staging environment. For macOS and Windows users, explicit auth type settings are required for command line utilities. * Changes to identify service principal with custom roles on Azure storage account for principal-prefix-access ([#1576](#1576)). This release introduces several enhancements to the identification of service principals with custom roles on Azure storage accounts for principal-prefix-access. New methods such as `_get_permission_level`, `_get_custom_role_privilege`, and `_get_role_privilege` have been added to improve the functionality of the module. Additionally, two new classes, AzureRoleAssignment and AzureRoleDetails, have been added to enable more detailed management and access control for custom roles on Azure storage accounts. The 'test_access.py' file has been updated to include tests for saving custom roles in Azure storage accounts and ensuring the correct identification of service principals with custom roles. A new unit test function, test_role_assignments_custom_storage(), has also been added to verify the behavior of custom roles in Azure storage accounts. Overall, these changes provide a more efficient and fine-grained way to manage and control custom roles on Azure storage accounts. * Clarified unsupported config in compute crawler ([#1656](#1656)). In this release, we have made significant changes to clarify and improve the handling of unsupported configurations in our compute crawler related to the Hive metastore. We have expanded error messages for unsupported configurations and provided detailed recommendations for remediation. Additionally, we have added relevant user documentation and manually tested the changes. The changes include updates to the configuration for external Hive metastore and passthrough security model for Unity Catalog, which are incompatible with the current configurations. We recommend removing or altering the configs while migrating existing tables and views using UCX or other compatible clusters, and mapping the passthrough security model to a security model compatible with Unity Catalog. The code modifications include the addition of new methods for checking cluster init script and Spark configurations, as well as refining the error messages for unsupported configurations. We also added a new assertion in the `test_cluster_with_multiple_failures` unit test to check for the presence of a specific message regarding the use of the `spark.databricks.passthrough.enabled` configuration. This release is not yet verified on the staging environment. * Created a unique default schema when External Hive Metastore is detected ([#1579](#1579)). A new default database `ucx` is introduced for storing inventory in the hive metastore, with a suffix consisting of the workspace's client ID to ensure uniqueness when an external hive metastore is detected. The `has_ext_hms()` method is added to the `InstallationPolicy` class to detect external HMS and thereby create a unique default schema. The `_prompt_for_new_installation` method's default value for the `Inventory Database stored in hive_metastore` prompt is updated to use the new default database name, modified to include the workspace's client ID if external HMS is detected. Additionally, a test function `test_save_config_ext_hms` is implemented to demonstrate the `WorkspaceInstaller` class's behavior with external HMS, creating a unique default schema for improved system functionality and customization. This change is part of issue [#1579](#1579). * Extend service principal migration to create storage credentials for access connectors created for each storage account ([#1426](#1426)). This commit extends the service principal migration to create storage credentials for access connectors associated with each storage account, resolving issues [#1384](#1384) and [#875](#875). The update includes modifications to the existing `databricks labs ucx` command for creating access connectors, adds a new CLI command for creating storage credentials, and updates the documentation. A new workflow has been added for creating credentials for access connectors and service principals, and updates have been made to existing workflows. The commit includes manual, unit, and integration tests, and no new or modified methods are specified in the diff. The focus is on the feature description and its impact on the project's functionality. The commit has been co-authored by Serge Smertin and vuong-nguyen. * Suggest users to create Access Connector(s) with Managed Identity to access Azure Storage Accounts behind firewall ([#1589](#1589)). In this release, we have introduced a new feature to improve access to Azure Storage Accounts that are protected by firewalls. Due to limitations with service principals in such scenarios, we have developed Access Connectors with Managed Identities for more reliable connectivity. This change includes updates to the 'credentials.py' file, which introduces new methods for managing the migration of service principals to Access Connectors using Managed Identities. Users are warned that migrating to this new feature may cause issues when transitioning to UC, and are advised to validate external locations after running the migration command. This update enhances the security and functionality of the system, providing a more dependable method for accessing Azure Storage Accounts protected by firewalls. * Fixed catalog/schema grants when tables with same source schema have different target schemas ([#1581](#1581)). In this release, we have implemented a fix to address an issue where catalog/schema grants were not being handled correctly when tables with the same source schema had different target schemas. This was causing problems with granting appropriate permissions to users. We have modified the prepare_test function to include an additional test case with a different target schema for the same source table. Furthermore, we have updated the test_catalog_schema_acl function to ensure that grants are being created correctly for all catalogs, schemas, and tables. We have also added an extra query to grant use schema permissions for catalog2.schema3 to user1. Additionally, we have introduced a new `SchemaInfo` class to store information about catalogs and schemas, and refactored the `_get_database_source_target_mapping` method to return a dictionary that maps source databases to a list of `SchemaInfo` objects instead of a single dictionary. These changes ensure that grants are being handled correctly for catalogs, schemas, and tables, even when tables with the same source schema have different target schemas. This will improve the overall functionality and reliability of the system, making it easier for users to manage their catalogs and schemas. * Fixed Spark configuration parameter referencing secret ([#1635](#1635)). In this release, the code related to the Spark configuration parameter reference for a secret has been updated in the `access.py` file, specifically within the `_update_cluster_policy_definition` method. The change modifies the method to retrieve the OAuth client secret for a given storage account using an f-string to reference the secret, replacing the previous concatenation operator. This enhancement is aimed at improving the readability and maintainability of the code while preserving its functionality. Furthermore, the commit includes additional changes, such as new methods `test_create_global_spn` and "cluster_policies.edit", which may be related to this fix. These changes address the secret reference issue, ensuring secure access control and improved integration, particularly with the Spark configuration, benefiting engineers utilizing this project for handling sensitive information and managing clusters securely and effectively. * Fixed `migration-locations` and `assign-metastore` definitions in `labs.yml` ([#1627](#1627)). In this release, the `migration-locations` command in the `labs.yml` file has been updated to include new flags `subscription-id` and `aws-profile`. The `subscription-id` flag allows users to specify the subscription to scan the storage account in, and the `aws-profile` flag allows for authentication using a specified AWS Profile. The `assign-metastore` command has also been updated with a new description: "Enable Unity Catalog features on a workspace by assigning a metastore to it." The `is_account_level` parameter remains unchanged, and the new optional flag `workspace-id` has been added, allowing users to specify the Workspace ID to assign a metastore to. This change enhances the functionality of the `migration-locations` and `assign-metastore` commands, providing more options for users to customize their storage scanning and metastore assignment processes. The `migration-locations` and `assign-metastore` definitions in the `labs.yml` file have been fixed in this release. * Fixed prompt for using external metastore ([#1668](#1668)). A fix has been implemented in the `create` function of the `policy.py` file to correctly prompt users for using an external metastore. Previously, a missing period and space in the prompt caused potential confusion. The updated prompt now includes a clarifying sentence and the `_prompts.confirm` method has been modified to check if the user wants to set UCX to connect to an external metastore in two scenarios: when one or more cluster policies are set up for an external metastore, and when the workspace warehouse is configured for an external metastore. If the user chooses to set up an external metastore, an informational message will be recorded in the logger. This change ensures clear and precise communication with users during the external metastore setup process. * Fixed storage account network ACLs retrieved from properties ([#1620](#1620)). This release includes a fix to the storage account network ACLs retrieval in the open-source library, addressing issue [#1](#1). Previously, the network ACLs were being retrieved from an incorrect location, but this commit corrects that by obtaining the network ACLs from the storage account's properties.networkAcls field. The `StorageAccount` class has been updated to modify the way default network action is retrieved, with a new value `Unknown` added to the previous values `Deny` and "Allow". The `from_raw_resource` class method has also been updated to retrieve the default network action from the `properties.networkAcls` field instead of the `networkAcls` field. This change may affect any functionality that relies on network ACL information and impacts the existing command `databricks labs ucx ...`. Relevant tests, including a new test `test_azure_resource_storage_accounts_list_non_zero`, have been added and manually and unit tested to ensure the fix is functioning correctly. * Fully refresh table migration status in table migration workflow ([#1630](#1630)). This release introduces a new method, `index_full_refresh()`, to the table migration workflow for fully refreshing the migration status, addressing an oversight from a previous commit ([#1623](#1623)) and resolving issue [#1628](#1628). The new method resets the `_migration_status_refresher` before computing the index, ensuring the latest migration status is used for determining whether view dependencies have been migrated. The `index()` method was previously used to refresh the migration status, but it only provided a partial refresh. With this update, `index_full_refresh()` is utilized for a comprehensive refresh, affecting the `refresh_migration_status` task in multiple workflows such as `migrate_views`, `scan_tables_in_mounts_experimental`, and others. This change ensures a more accurate migration report, presenting the updated migration status. * Ignore existing corrupted installations when refreshing ([#1605](#1605)). A recent update has enhanced the error handling during the loading of installations in the `install.py` file. Specifically, the `installation.load` function now handles certain errors, including `PermissionDenied`, `SerdeError`, `ValueError`, and `AttributeError`, by logging a warning message and skipping the corrupted installation instead of raising an error. This behavior has been incorporated into both the `configure` and `_check_inventory_database_exists` functions, allowing the installation process to continue even in the presence of issues with existing installations, while providing improved error messages. This change resolves issue [#1601](#1601) and introduces a new test case for a corrupted installation configuration, as well as an updated existing test case for `test_save_config` that includes a mock installation. * Improved exception handling ([#1584](#1584)). In this release, the exception handling during the upload of a wheel file to DBFS has been significantly improved. Previously, only PermissionDenied errors were caught and handled. Now, both BadRequest and PermissionDenied exceptions will be caught and logged as a warning. This change enhances the robustness of the code by handling a wider range of exceptions during the upload process. In addition, cluster overrides have been configured and DBFS write permissions have been set up. The specific changes made to the code include updating the import statement for NotFound to include BadRequest and modifying the except block in the _get_init_script_data method to catch both NotFound and BadRequest exceptions. These improvements ensure that the code can handle more types of errors, providing more helpful error messages and preventing crash scenarios, thereby enhancing the reliability and robustness of the code. * Improved exception handling for `migrate_acl` ([#1590](#1590)). In this release, the `migrate_acl` functionality has been enhanced to improve exception handling, addressing a flakiness issue in the `test_migrate_managed_tables_with_acl` test. Previously, unhandled `not found` exceptions during parallel test execution caused the flakiness. This release resolves this issue ([#1549](#1549)) by introducing error handling in the `test_migrate_acls_should_produce_proper_queries` test. A controlled error is now introduced to simulate a failed grant migration due to a `TABLE_OR_VIEW_NOT_FOUND` error. This enhancement allows for precise testing of error handling and logging mechanisms when migration fails for specific objects, ensuring a more reliable testing environment for the `migrate_acl` functionality. * Improved reliability of table migration status refresher ([#1623](#1623)). This release introduces improvements to the table migration status refresher in the open-source library, enhancing its reliability and robustness. The `table_migrate` function has been updated to ensure that the table migration status is always reset when requesting the latest snapshot, addressing issues [#1623](#1623), [#1622](#1622), and [#1615](#1615). Additionally, the function now handles `NotFound` errors when refreshing migration status. The `get_seen_tables` function has been modified to convert the returned iterator to a list and raise a `NotFound` exception if the schema does not exist, which is then caught and logged as a warning. Furthermore, the migration status reset behavior has been improved, and the `migration_status_refresher` parameter type in the `TableMigrate` class constructor has been modified. New private methods `_index_with_reset()` and updated `_migrate_views()` and `_view_can_be_migrated()` methods have been added to ensure a more accurate and consistent table migration process. The changes have been thoroughly tested and are ready for review. * Refresh migration status at the end of the `migrate_tables` workflows ([#1599](#1599)). In this release, updates have been made to the migration status at the end of the `migrate_tables` workflows, with no new or modified tables or methods introduced. The `_migration_status_refresher.reset()` method has been added in two locations to ensure accurate migration status updates. A new `refresh_migration_status` method has been included in the `RuntimeContext` class in the `databricks.labs.ucx.hive_metastore.workflows` module, which refreshes the migration status for presentation in the dashboard. The changes also include the addition of the `refresh_migration_status` task in `migrate_views`, `migrate_views_with_acl`, and `scan_tables_in_mounts_experimental` workflows, and the `migration_report` method is now dependent on the `refresh_migration_status` task. Thorough testing has been conducted, including the creation of a new integration test in the file `tests/integration/hive_metastore/test_workflows.py` to verify that the migration status is refreshed after the migration job is run. These changes aim to ensure that the migration status is up-to-date and accurately presented in the dashboard. * Removed DBFS library installations ([#1554](#1554)). In this release, the "configure.py" file has been removed, which previously contained the `ConfigureClusterOverrides` class with methods for validating cluster IDs, distinguishing between classic and Table Access Control (TACL) clusters, and building a prompt for users to select a valid active cluster ID. The removal of this file signifies that these functionalities are no longer available. This change is part of a larger commit that also removes DBFS library installations and updates the Estimates Dashboard to remove metastore assignment, addressing issue [#1098](#1098). The commit has been tested via integration tests and manual installation and running of UCX on a no-uc environment. Please note that the `create_jobs` method in the `install.py` file has been updated to reflect these changes, ensuring a more straightforward installation experience and usage of the Estimates Dashboard. * Removed the `Is Terraform used` prompt ([#1664](#1664)). In this release, we have removed the `is_terraform_used` prompt from the configuration file and the installation process in the ucx package. This prompt was not being utilized and had been a source of confusion for some users. Although the variable that stored its outcome will be retained for backwards compatibility, no new methods or modifications to existing functionality have been introduced. No tests have been added or modified as part of this change. The removal of this prompt simplifies the configuration process and aligns with the project's future plans to eliminate the use of Terraform state for ucx migration. Manual testing has been conducted to ensure that the removal of the prompt does not affect the functionality of other properties in the configuration file or the installation process. * Resolve relative paths when building dependency graph ([#1608](#1608)). This commit introduces support for resolving relative paths when building a dependency graph in the UCX project, addressing issues 1202, 1499, and 1287. The SysPathProvider now includes a `cwd` attribute, and a new class, LocalNotebookLoader, has been implemented to handle local files and folders. The PathLookup class is used to resolve paths, and new methods have been added to support these changes. Unit tests have been provided to ensure the correct functioning of the new functionality. This commit replaces issue 1593 and enhances the project's ability to handle local files and folders, resulting in a more robust and reliable dependency graph. * Show tables migration status in migration dashboard ([#1507](#1507)). A migration dashboard has been added to display the status of data object migrations, addressing issue [#323](#323). This new feature includes a query to show the migration status of tables, a new CLI command, and a modification to an existing command. The `migrataion-*` workflow has been updated to include a refresh migration dashboard option. The `mock_installation` function has been modified with an updated state.json file. The changes consist of manual testing and can be found in the `migrations/main` directory as a new SQL query file. This migration dashboard provides users with an easier way to monitor the progress and status of their data migration tasks. * Simulate loading of local files or notebooks after manipulation of `sys.path` ([#1633](#1633)). This commit updates the PathLookup process during the construction of the dependency graph, addressing issues [#1202](#1202) and [#1468](#1468). It simplifies the DependencyGraphBuilder by directly using the DependencyResolver with resolvers and lookup passed as arguments, and removes the DependencyGraphBuilder. The changes include new methods for handling compatibility checks, but no new user-facing features or changes to command-line interfaces or existing workflows are introduced. Unit tests are included to ensure correct behavior. The modifications aim to improve the internal handling of dependency resolution and compatibility checks. * Test if `create-catalogs-schemas` works with tables defined as mount paths ([#1578](#1578)). This release includes a new unit test for the `create-catalogs-schemas` logic that verifies the correct creation and management of catalogs and schemas defined as mount paths. The test checks the storage location of catalogs, ensures non-existing schemas are properly created, and prevents the creation of catalogs without a storage location. It also verifies the catalog schema ACL is set correctly. Using the `CatalogSchema` class and various test functions, the test creates and grants permissions to catalogs and schemas. This change resolves issue [#1039](#1039) without modifying any existing commands or workflows. The release contains no new CLI commands or user documentation, but includes unit tests and assertion calls to validate the behavior of the `create_all_catalogs_schemas` method. * Upgraded `databricks-sdk` to 0.27 ([#1626](#1626)). In this release, the `databricks-sdk` package has been upgraded to version 0.27, bringing updated methods for Redash objects. The `_install_query` method in the `dashboards.py` file has been updated to include a `tags` parameter, set to `None`, when calling `self._ws.queries.update` and `self._ws.queries.create`. This ensures that the updated SDK version is used and that tags are not applied during query updates and creation. Additionally, the `databricks-labs-lsql` and `databricks-labs-blueprint` packages have been updated to versions 0.4.0 and 0.4.3 respectively, and the dependency for PyYAML has been updated to a version between 6.0.0 and 7.0.0. These updates may impact the functionality of the project. The changes have been manually tested, but there is no verification on a staging environment. * Use stack of dependency resolvers ([#1560](#1560)). This pull request introduces a stack-based implementation of resolvers, resolving issues [#1202](#1202), [#1499](#1499), and [#1421](#1421), and implements an initial version of SysPathProvider, while eliminating previous hacks. The new functionality includes modified existing commands, a new workflow, and the addition of unit tests. No new documentation or CLI commands have been added. The `problem_collector` parameter is not addressed in this PR and has been moved to a separate issue. The changes include renaming and moving a Python file, as well as modifications to the `Notebook` class and its related methods for handling notebook dependencies and dependency checking. The code has been tested, but manual testing and integration tests are still pending.

github-actions · 2024-05-08T16:08:52Z

❌ 164/165 passed, 1 failed, 25 skipped, 2h41m4s total

❌ test_compare_remote_local_install_versions: Failed: DID NOT RAISE (30.487s)

Failed: DID NOT RAISE <class 'RuntimeWarning'>
[gw8] linux -- Python 3.10.14 /home/runner/work/ucx/ucx/.venv/bin/python
15:59 INFO [databricks.labs.ucx.mixins.fixtures] Schema hive_metastore.ucx_savzr: https://DATABRICKS_HOST/explore/data/hive_metastore/ucx_savzr
15:59 DEBUG [databricks.labs.ucx.mixins.fixtures] added schema fixture: SchemaInfo(browse_only=None, catalog_name='hive_metastore', catalog_type=None, comment=None, created_at=None, created_by=None, effective_predictive_optimization_flag=None, enable_predictive_optimization=None, full_name='hive_metastore.ucx_savzr', metastore_id=None, name='ucx_savzr', owner=None, properties=None, schema_id=None, storage_location=None, storage_root=None, updated_at=None, updated_by=None)
15:59 DEBUG [databricks.labs.ucx.install] Cannot find previous installation: Path (/Users/0a330eb5-dd51-4d97-b6e4-c474356b1d5d/.VVOV/config.yml) doesn't exist.
15:59 INFO [databricks.labs.ucx.install] Please answer a couple of questions to configure Unity Catalog migration
15:59 INFO [databricks.labs.ucx.installer.hms_lineage] HMS Lineage feature creates one system table named system.hms_to_uc_migration.table_access and helps in your migration process from HMS to UC by allowing you to programmatically query HMS lineage data.
15:59 INFO [databricks.labs.ucx.install] Fetching installations...
15:59 INFO [databricks.labs.ucx.installer.policy] Creating UCX cluster policy.
15:59 DEBUG [tests.integration.conftest] Waiting for clusters to start...
15:59 DEBUG [tests.integration.conftest] Waiting for clusters to start...
15:59 INFO [databricks.labs.ucx.install] Installing UCX v0.22.1+6720240508155926
15:59 INFO [databricks.labs.ucx.install] Creating ucx schemas...
15:59 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=migrate-tables
15:59 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=migrate-tables-in-mounts-experimental
15:59 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=validate-groups-permissions
15:59 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=migrate-external-hiveserde-tables-in-place-experimental
15:59 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=assessment
15:59 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=migrate-groups
15:59 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=experimental-workflow-linter
15:59 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=failing
15:59 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=remove-workspace-local-backup-groups
15:59 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=migrate-external-tables-ctas
15:59 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=migrate-groups-experimental
15:59 INFO [databricks.labs.ucx.install] Installation completed successfully! Please refer to the https://DATABRICKS_HOST/#workspace/Users/0a330eb5-dd51-4d97-b6e4-c474356b1d5d/.VVOV/README for the next steps.
15:59 WARNING [databricks.labs.blueprint.upgrades] future version: v0.23.0_add_assessment_to_udf.py
15:59 INFO [databricks.labs.ucx.mixins.fixtures] Schema hive_metastore.ucx_savzr: https://DATABRICKS_HOST/explore/data/hive_metastore/ucx_savzr
15:59 DEBUG [databricks.labs.ucx.mixins.fixtures] added schema fixture: SchemaInfo(browse_only=None, catalog_name='hive_metastore', catalog_type=None, comment=None, created_at=None, created_by=None, effective_predictive_optimization_flag=None, enable_predictive_optimization=None, full_name='hive_metastore.ucx_savzr', metastore_id=None, name='ucx_savzr', owner=None, properties=None, schema_id=None, storage_location=None, storage_root=None, updated_at=None, updated_by=None)
15:59 DEBUG [databricks.labs.ucx.install] Cannot find previous installation: Path (/Users/0a330eb5-dd51-4d97-b6e4-c474356b1d5d/.VVOV/config.yml) doesn't exist.
15:59 INFO [databricks.labs.ucx.install] Please answer a couple of questions to configure Unity Catalog migration
15:59 INFO [databricks.labs.ucx.installer.hms_lineage] HMS Lineage feature creates one system table named system.hms_to_uc_migration.table_access and helps in your migration process from HMS to UC by allowing you to programmatically query HMS lineage data.
15:59 INFO [databricks.labs.ucx.install] Fetching installations...
15:59 INFO [databricks.labs.ucx.installer.policy] Creating UCX cluster policy.
15:59 DEBUG [tests.integration.conftest] Waiting for clusters to start...
15:59 DEBUG [tests.integration.conftest] Waiting for clusters to start...
15:59 INFO [databricks.labs.ucx.install] Installing UCX v0.22.1+6720240508155926
15:59 INFO [databricks.labs.ucx.install] Creating ucx schemas...
15:59 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=migrate-tables
15:59 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=migrate-tables-in-mounts-experimental
15:59 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=validate-groups-permissions
15:59 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=migrate-external-hiveserde-tables-in-place-experimental
15:59 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=assessment
15:59 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=migrate-groups
15:59 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=experimental-workflow-linter
15:59 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=failing
15:59 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=remove-workspace-local-backup-groups
15:59 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=migrate-external-tables-ctas
15:59 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=migrate-groups-experimental
15:59 INFO [databricks.labs.ucx.install] Installation completed successfully! Please refer to the https://DATABRICKS_HOST/#workspace/Users/0a330eb5-dd51-4d97-b6e4-c474356b1d5d/.VVOV/README for the next steps.
15:59 WARNING [databricks.labs.blueprint.upgrades] future version: v0.23.0_add_assessment_to_udf.py
15:59 INFO [databricks.labs.ucx.install] Deleting UCX v0.22.1+6720240508155942 from https://DATABRICKS_HOST
15:59 INFO [databricks.labs.ucx.install] Deleting inventory database ucx_savzr
15:59 INFO [databricks.labs.ucx.install] Deleting jobs
15:59 INFO [databricks.labs.ucx.install] Deleting migrate-tables job_id=303060391432629.
15:59 INFO [databricks.labs.ucx.install] Deleting migrate-tables-in-mounts-experimental job_id=1039162315191526.
15:59 INFO [databricks.labs.ucx.install] Deleting validate-groups-permissions job_id=890100287477725.
15:59 INFO [databricks.labs.ucx.install] Deleting migrate-external-hiveserde-tables-in-place-experimental job_id=207355530252457.
15:59 INFO [databricks.labs.ucx.install] Deleting assessment job_id=490558161480956.
15:59 INFO [databricks.labs.ucx.install] Deleting migrate-groups job_id=1112807226987307.
15:59 INFO [databricks.labs.ucx.install] Deleting experimental-workflow-linter job_id=728176174890363.
15:59 INFO [databricks.labs.ucx.install] Deleting failing job_id=1017570380665041.
15:59 INFO [databricks.labs.ucx.install] Deleting remove-workspace-local-backup-groups job_id=905311277296794.
15:59 INFO [databricks.labs.ucx.install] Deleting migrate-external-tables-ctas job_id=1068019894306102.
15:59 INFO [databricks.labs.ucx.install] Deleting migrate-groups-experimental job_id=218022735274874.
15:59 INFO [databricks.labs.ucx.install] Deleting cluster policy
15:59 INFO [databricks.labs.ucx.install] Deleting secret scope
15:59 INFO [databricks.labs.ucx.install] UnInstalling UCX complete
15:59 DEBUG [databricks.labs.ucx.mixins.fixtures] clearing 0 workspace user fixtures
15:59 DEBUG [databricks.labs.ucx.mixins.fixtures] clearing 0 account group fixtures
15:59 DEBUG [databricks.labs.ucx.mixins.fixtures] clearing 0 workspace group fixtures
15:59 DEBUG [databricks.labs.ucx.mixins.fixtures] clearing 0 table fixtures
15:59 DEBUG [databricks.labs.ucx.mixins.fixtures] clearing 0 table fixtures
15:59 DEBUG [databricks.labs.ucx.mixins.fixtures] clearing 1 schema fixtures
15:59 DEBUG [databricks.labs.ucx.mixins.fixtures] removing schema fixture: SchemaInfo(browse_only=None, catalog_name='hive_metastore', catalog_type=None, comment=None, created_at=None, created_by=None, effective_predictive_optimization_flag=None, enable_predictive_optimization=None, full_name='hive_metastore.ucx_savzr', metastore_id=None, name='ucx_savzr', owner=None, properties=None, schema_id=None, storage_location=None, storage_root=None, updated_at=None, updated_by=None)
[gw8] linux -- Python 3.10.14 /home/runner/work/ucx/ucx/.venv/bin/python

_{Running from acceptance #3097}

nfx requested review from a team and qziyuan May 8, 2024 15:44

nfx had a problem deploying to account-admin May 8, 2024 15:44 — with GitHub Actions Failure

nfx merged commit 2f58963 into main May 8, 2024
5 of 6 checks passed

nfx deleted the prepare/0.23.0 branch May 8, 2024 15:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Release v0.23.0 #1671

Release v0.23.0 #1671

nfx commented May 8, 2024

github-actions bot commented May 8, 2024

Release v0.23.0 #1671

Release v0.23.0 #1671

Conversation

nfx commented May 8, 2024

github-actions bot commented May 8, 2024