Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release v0.17.0 #1060

Merged
merged 1 commit into from
Mar 15, 2024
Merged

Release v0.17.0 #1060

merged 1 commit into from
Mar 15, 2024

Conversation

nfx
Copy link
Collaborator

@nfx nfx commented Mar 15, 2024

  • Added AWS IAM role support to databricks labs ucx create-uber-principal command (#993). The databricks labs ucx create-uber-principal command now supports AWS Identity and Access Management (IAM) roles for external table migration. This new feature introduces a CLI command to create an uber-IAM profile, which checks for the UCX migration cluster policy and updates or adds the migration policy to provide access to the relevant table locations. If no IAM instance profile or role is specified in the cluster policy, a new one is created and the new migration policy is added. This change includes new methods and functions to handle AWS IAM roles, instance profiles, and related trust policies. Additionally, new unit and integration tests have been added and verified on the staging environment. The implementation also identifies all S3 buckets used by the Instance Profiles configured in the workspace.
  • Added Dashboard widget to show the list of cluster policies along with DBR version (#1013). In this code revision, the assessment module of the 'databricks/labs/ucx' package has been updated to include a new PoliciesCrawler class, which fetches, assesses, and snapshots cluster policies. This class extends CrawlerBase and CheckClusterMixin and introduces the '_crawl', '_assess_policies', '_try_fetch', and snapshot methods. The PolicyInfo dataclass has been added to hold policy information, with a structure similar to the ClusterInfo dataclass. The ClusterInfo dataclass has been updated to include spark_version and policy_id attributes. A new table for policies has been added, and cluster policies along with the DBR version are loaded into this table. Relevant user documentation, tests, and a Dashboard widget have been added to support this feature. The create function in 'fixtures.py' has been updated to enable a Delta preview feature in Spark configurations, and a new SQL file has been included for querying cluster policies. Additionally, a new crawl_cluster_policies method has been added to scan and store cluster policies with matching configurations.
  • Added migration_status table to capture a snapshot of migrated tables (#1041). A migration_status table has been added to track the status of migrated tables in the database, enabling improved management and tracking of migrations. The new MigrationStatus class, which is a dataclass that holds the source and destination schema, table, and updated timestamp, is added. The TablesMigrate class now has a new _migration_status_refresher attribute that is an instance of the new MigrationStatusRefresher class. This class crawls the migration_status table and returns a snapshot of the migration status, which is used to refresh the migration status and check if the table is upgraded. Additionally, the _init_seen_tables method is updated to get the seen tables from the _migration_status_refresher instead of fetching from the table properties. The MigrationStatusRefresher class fetches the migration status table and returns a snapshot of the migration status. This change also adds new test functions in the test file for the Hive metastore, which covers various scenarios such as migrating managed tables with and without caching, migrating external tables, and reverting migrated tables.
  • Added a check for existing inventory database to avoid losing existing, inject installation objects in tests and try fetching existing installation before setting global as default (#1043). In this release, we have added a new method, _check_inventory_database_exists, to the WorkspaceInstallation class, which checks if an inventory database with a given name already exists in the Workspace. This prevents accidental overwriting of existing data and improves the robustness of handling inventory databases. The validate_and_run method has been updated to call app.current_installation(workspace_client), allowing for a more flexible handling of installations. The Installation class import has been updated to include SerdeError, and the test suite has been updated to inject installation objects and check for existing installations before setting the global installation as default. A new argument inventory_schema_suffix has been added to the factory method for customization of the inventory schema name. We have also added a new method check_inventory_database_exists to the WorkspaceInstaller class, which checks if an inventory database already exists for a given installation type and raises an AlreadyExists error if it does. The behavior of the download method in the WorkspaceClient class has been mocked, and the get_status method has been updated to return NotFound in certain tests. These changes aim to improve the robustness, flexibility, and safety of the installation process in the Workspace.
  • Added a check for external metastore in SQL warehouse configuration (#1046). In this release, we have added new functionality to the Unity Catalog (UCX) installation process to enable checking for and connecting to an external Hive metastore configuration. A new method, _get_warehouse_config_with_external_hive_metastore, has been introduced to retrieve the workspace warehouse config and identify if it is set up for an external Hive metastore. If so, and the user confirms the prompt, UCX will be configured to connect to the external metastore. Additionally, new methods _extract_external_hive_metastore_sql_conf and test_cluster_policy_definition_<cloud_provider>_hms_warehouse() have been added to handle the external metastore configuration for Azure, AWS, and GCP, and to handle the case when the data_access_config is empty. These changes provide more flexibility and ease of use when installing UCX with external Hive metastore configurations. The new imports EndpointConfPair, GetWorkspaceWarehouseConfigResponse from the databricks.sdk.service.sql package are used to handle the endpoint configuration of the SQL warehouse.
  • Added integration tests for AWS - create locations (#1026). In this release, we have added comprehensive integration tests for AWS resources and their management in the tests/unit/assessment/test_aws.py file. The AWSResources class has been updated with new methods (AwsIamRole, add_uc_role, add_uc_role_policy, and validate_connection) and the regular expression for matching S3 resource ARN has been modified. The create_external_locations method now allows for creating external locations without validating them, and the _identify_missing_external_locations function has been enhanced to match roles with a wildcard pattern. The new tests include validating the integration of AWS services with the system, testing the CLI's behavior when it is missing, and introducing new configuration scenarios with the addition of a Key Management Service (KMS) key during the creation of IAM roles and policies. These changes improve the robustness and reliability of AWS resource integration and handling in our system.
  • Bump Databricks SDK to v0.22.0 (#1059). In this release, we are bumping the Databricks SDK version to 0.22.0 and upgrading the databricks-labs-lsql package to ~0.2.2. The new dependencies for this release include databricks-sdk==0.22.0, databricks-labs-lsql~=0.2.2, databricks-labs-blueprint~=0.4.3, and PyYAML>=6.0.0,<7.0.0. In the fixtures.py file, we have added PermissionLevel.CAN_QUERY to the CAN_VIEW and CAN_MANAGE permissions in the _path function, allowing users to query the endpoint. Additionally, we have updated the test_endpoints function in the test_generic.py file as part of the integration tests for workspace access. This change updates the permission level for creating a serving endpoint from CAN_MANAGE to CAN_QUERY, meaning that the assigned group can now only query the endpoint. We have also included the test_feature_tables function in the commit, which tests the behavior of feature tables in the Databricks workspace. This change only affects the test_endpoints function and its assert statements, and does not impact the functionality of the test_feature_tables function.
  • Changed default UCX installation folder to /Applications/ucx from /Users/<me>/.ucx to allow multiple users users utilising the same installation (#854). In this release, we've added a new advanced feature that allows users to force the installation of UCX over an existing installation using the UCX_FORCE_INSTALL environment variable. This variable can take two values global and 'user', providing more control and flexibility in installing UCX. The default UCX installation folder has been changed to /Applications/ucx from /Users//.ucx to enable multiple users to utilize the same installation. A table detailing the expected install location, install_folder, and mode for each combination of global and user values has been added to the README file. We've also added user prompts to confirm the installation if UCX is already installed and the UCX_FORCE_INSTALL variable is set to 'user'. This feature is useful when users want to install UCX in a specific location or force the installation over an existing installation. However, it is recommended to use this feature with caution, as it can potentially break existing installations if not used correctly. Additionally, several changes to the implementation of the UCX installation process have been made, as well as new tests to ensure that the installation process works correctly in various scenarios.
  • Fix: Recover lost fix for webbrowser.open mock (#1052). A fix has been implemented to address an issue related to the mock for webbrowser.open in the tests test_repair_run and test_get_existing_installation_global. This change prevents the webbrowser.open function from being called during these tests, which helps improve test stability and consistency. No new methods have been added, and the existing functionality of these tests has only been modified to include the webbrowser.open mock. This modification aims to enhance the reliability and predictability of these specific tests, ensuring accurate and consistent results.
  • Improved table migrations logic (#1050). This change introduces improvements to table migrations logic by refactoring unit tests to load table mappings from JSON instead of inline structs, adding an escape_sql_identifier function where missing, and preparing for ACLs migration. The uc_grant_sql method in grants.py has been updated to accept optional object_type and object_key parameters, and the hive-to-UC mapping has been expanded to include mappings for views. Additionally, new JSON files for external source table configuration have been added, and new functions have been introduced for loading fixture data from JSON files and creating mocked WorkspaceClient and TableMapping objects for testing. The changes improve the maintainability and security of the codebase, prepare it for future migration tasks, and ensure that the code is more adaptable and robust. The changes have been manually tested and verified on the staging environment.
  • Moved SqlBackend implementation to databricks-labs-lsql dependency (#1042). In this change, the SqlBackend implementation, including classes such as StatementExecutionBackend and RuntimeBackend, has been moved to a separate library, databricks-labs-lsql, which is managed at https://github.com/databrickslabs/lsql. This refactoring simplifies the current repository, promotes code reuse, and improves modularity by leveraging an external dependency. The modification includes adding a new line in the .gitignore file to exclude *.out files from version control.
  • Prepare for a PyPI release (#1038). In preparation for a PyPI release, this change introduces a new GitHub Actions workflow that automates the package release process and ensures the integrity of the released packages by signing them with Sigstore. When a new git tag starting with v is pushed, this workflow is triggered, building wheels using hatch, drafting a new GitHub release, publishing the package distributions to PyPI, and signing the artifacts with Sigstore. The pyproject.toml file is now used for metadata, replacing setup.cfg and setup.py, and is cached to improve build performance. In addition, the pyproject.toml file has been updated with recent metadata in preparation for the release, including updates to the package's authors, development status, classifiers, and dependencies.
  • Prevent fragile mock.patch('databricks...') in the test code (#1037). This change introduces a custom pylint checker to improve code flexibility and maintainability by preventing fragile mock.patch designs in test code. The new checker discourages the use of MagicMock and encourages the use of create_autospec to ensure that mocks have the same attributes and methods as the original class. This change has been implemented in multiple test files, including test_cli.py, test_locations.py, test_mapping.py, test_table_migrate.py, test_table_move.py, test_workspace_access.py, test_redash.py, test_scim.py, and test_verification.py, to improve the robustness and maintainability of the test code. Additionally, the commit removes the verification.py file, which contained a VerificationManager class for verifying applied permissions, scope ACLs, roles, and entitlements for various objects in a Databricks workspace.
  • Removed mocker.patch("databricks...) from test_cli (#1047). In this release, we have made significant updates to the library's handling of Azure and AWS workspaces. We have added new parameters azure_resource_permissions and aws_permissions to the _execute_for_cloud function in cli.py, which are passed to the func_azure and func_aws functions respectively. The create_uber_principal and principal_prefix_access commands have also been updated to include these new parameters. Additionally, the _azure_setup_uber_principal and _aws_setup_uber_principal functions have been updated to accept the new azure_resource_permissions and aws_resource_permissions parameters. The _azure_principal_prefix_access and _aws_principal_prefix_access functions have also been updated similarly. We have also introduced a new aws_resources parameter in the migrate_credentials command, which is used to migrate Azure Service Principals in ADLS Gen2 locations to UC storage credentials. In terms of testing, we have replaced the mocker.patch calls with the creation of AzureResourcePermissions and AWSResourcePermissions objects, improving the code's readability and maintainability. Overall, these changes significantly enhance the library's functionality and maintainability in handling Azure and AWS workspaces.
  • Require Hatch v1.9.4 on build machines (#1049). In this release, we have updated the Hatch package version to 1.9.4 on build machines, addressing issue #1049. The changes include updating the toolchain dependencies and setup in the .codegen.json file, which simplifies the setup process and now relies on a pre-existing Hatch environment and Python 3. The acceptance workflow has also been updated to use the latest version of Hatch and the databrickslabs/sandbox/acceptance GitHub action version v0.1.4. Hatch is a Python package manager that simplifies package development and management, and this update provides new features and bug fixes that can help improve the reliability and performance of the acceptance workflow. This change requires version 1.9.4 of the Hatch package on build machines, and it will affect the build process for the project but will not have any impact on the functionality of the project itself. As a software engineer adopting this project, it's important to note this change to ensure that the build process runs smoothly and takes advantage of any new features or improvements in Hatch 1.9.4.
  • Set acceptance tests to timeout after 45 minutes (#1036). As part of issue #1036, the acceptance tests in this open-source library now have a 45-minute timeout configured, improving the reliability and stability of the testing environment. This change has been implemented in the .github/workflows/acceptance.yml file by adding the timeout parameter to the step where the databrickslabs/sandbox/acceptance action is called. This ensures that the acceptance tests will not run indefinitely and prevents any potential issues caused by long-running tests. By adopting this project, software engineers can now benefit from a more stable and reliable testing environment, with acceptance tests that are guaranteed to complete within a maximum of 45 minutes.
  • Updated databricks-labs-blueprint requirement from ~0.4.1 to ~0.4.3 (#1058). In this release, the version requirement for the databricks-labs-blueprint library has been updated from ~0.4.1 to ~0.4.3 in the pyproject.toml file. This change is necessary to support issues #1056 and #1057. The code has been manually tested and is ready for further testing to ensure the compatibility and smooth functioning of the software. It is essential to thoroughly test the latest version of the databricks-labs-blueprint library with the existing codebase before deploying it to production. This includes running a comprehensive suite of tests such as unit tests, integration tests, and verification on the staging environment. This modification allows the software to use the latest version of the library, improving its functionality and overall performance.
  • Use MockPrompts.extend() functionality in test_install to supply multiple prompts (#1057). This diff introduces the MockPrompts.extend() functionality in the test_install module to enable the supplying of multiple prompts for testing purposes. A new base_prompts dictionary with default prompts has been added and is extended with additional prompts for specific test cases. This allows for the testing of various scenarios, such as when UCX is already installed on the workspace and the user is prompted to choose between global or user installation. Additionally, new force_user_environ and force_global_env dictionaries have been added to simulate different installation environments. The functionality of the WorkspaceInstaller class and mocking of webbrowser.open are also utilized in the test cases. These changes aim to ensure the proper functioning of the configuration process for different installation scenarios.

* Added AWS IAM role support to `databricks labs ucx create-uber-principal` command ([#993](#993)). The `databricks labs ucx create-uber-principal` command now supports AWS Identity and Access Management (IAM) roles for external table migration. This new feature introduces a CLI command to create an `uber-IAM` profile, which checks for the UCX migration cluster policy and updates or adds the migration policy to provide access to the relevant table locations. If no IAM instance profile or role is specified in the cluster policy, a new one is created and the new migration policy is added. This change includes new methods and functions to handle AWS IAM roles, instance profiles, and related trust policies. Additionally, new unit and integration tests have been added and verified on the staging environment. The implementation also identifies all S3 buckets used by the Instance Profiles configured in the workspace.
* Added Dashboard widget to show the list of cluster policies along with DBR version ([#1013](#1013)). In this code revision, the `assessment` module of the 'databricks/labs/ucx' package has been updated to include a new `PoliciesCrawler` class, which fetches, assesses, and snapshots cluster policies. This class extends `CrawlerBase` and `CheckClusterMixin` and introduces the '_crawl', '_assess_policies', '_try_fetch', and `snapshot` methods. The `PolicyInfo` dataclass has been added to hold policy information, with a structure similar to the `ClusterInfo` dataclass. The `ClusterInfo` dataclass has been updated to include `spark_version` and `policy_id` attributes. A new table for policies has been added, and cluster policies along with the DBR version are loaded into this table. Relevant user documentation, tests, and a Dashboard widget have been added to support this feature. The `create` function in 'fixtures.py' has been updated to enable a Delta preview feature in Spark configurations, and a new SQL file has been included for querying cluster policies. Additionally, a new `crawl_cluster_policies` method has been added to scan and store cluster policies with matching configurations.
* Added `migration_status` table to capture a snapshot of migrated tables ([#1041](#1041)). A `migration_status` table has been added to track the status of migrated tables in the database, enabling improved management and tracking of migrations. The new `MigrationStatus` class, which is a dataclass that holds the source and destination schema, table, and updated timestamp, is added. The `TablesMigrate` class now has a new `_migration_status_refresher` attribute that is an instance of the new `MigrationStatusRefresher` class. This class crawls the `migration_status` table and returns a snapshot of the migration status, which is used to refresh the migration status and check if the table is upgraded. Additionally, the `_init_seen_tables` method is updated to get the seen tables from the `_migration_status_refresher` instead of fetching from the table properties. The `MigrationStatusRefresher` class fetches the migration status table and returns a snapshot of the migration status. This change also adds new test functions in the test file for the Hive metastore, which covers various scenarios such as migrating managed tables with and without caching, migrating external tables, and reverting migrated tables.
* Added a check for existing inventory database to avoid losing existing, inject installation objects in tests and try fetching existing installation before setting global as default ([#1043](#1043)). In this release, we have added a new method, `_check_inventory_database_exists`, to the `WorkspaceInstallation` class, which checks if an inventory database with a given name already exists in the Workspace. This prevents accidental overwriting of existing data and improves the robustness of handling inventory databases. The `validate_and_run` method has been updated to call `app.current_installation(workspace_client)`, allowing for a more flexible handling of installations. The `Installation` class import has been updated to include `SerdeError`, and the test suite has been updated to inject installation objects and check for existing installations before setting the global installation as default. A new argument `inventory_schema_suffix` has been added to the `factory` method for customization of the inventory schema name. We have also added a new method `check_inventory_database_exists` to the `WorkspaceInstaller` class, which checks if an inventory database already exists for a given installation type and raises an `AlreadyExists` error if it does. The behavior of the `download` method in the `WorkspaceClient` class has been mocked, and the `get_status` method has been updated to return `NotFound` in certain tests. These changes aim to improve the robustness, flexibility, and safety of the installation process in the Workspace.
* Added a check for external metastore in SQL warehouse configuration ([#1046](#1046)). In this release, we have added new functionality to the Unity Catalog (UCX) installation process to enable checking for and connecting to an external Hive metastore configuration. A new method, `_get_warehouse_config_with_external_hive_metastore`, has been introduced to retrieve the workspace warehouse config and identify if it is set up for an external Hive metastore. If so, and the user confirms the prompt, UCX will be configured to connect to the external metastore. Additionally, new methods `_extract_external_hive_metastore_sql_conf` and `test_cluster_policy_definition_<cloud_provider>_hms_warehouse()` have been added to handle the external metastore configuration for Azure, AWS, and GCP, and to handle the case when the data_access_config is empty. These changes provide more flexibility and ease of use when installing UCX with external Hive metastore configurations. The new imports `EndpointConfPair`, `GetWorkspaceWarehouseConfigResponse` from the `databricks.sdk.service.sql` package are used to handle the endpoint configuration of the SQL warehouse.
* Added integration tests for AWS - create locations ([#1026](#1026)). In this release, we have added comprehensive integration tests for AWS resources and their management in the `tests/unit/assessment/test_aws.py` file. The `AWSResources` class has been updated with new methods (AwsIamRole, add_uc_role, add_uc_role_policy, and validate_connection) and the regular expression for matching S3 resource ARN has been modified. The `create_external_locations` method now allows for creating external locations without validating them, and the `_identify_missing_external_locations` function has been enhanced to match roles with a wildcard pattern. The new tests include validating the integration of AWS services with the system, testing the CLI's behavior when it is missing, and introducing new configuration scenarios with the addition of a Key Management Service (KMS) key during the creation of IAM roles and policies. These changes improve the robustness and reliability of AWS resource integration and handling in our system.
* Bump Databricks SDK to v0.22.0 ([#1059](#1059)). In this release, we are bumping the Databricks SDK version to 0.22.0 and upgrading the `databricks-labs-lsql` package to ~0.2.2. The new dependencies for this release include `databricks-sdk==0.22.0`, `databricks-labs-lsql~=0.2.2`, `databricks-labs-blueprint~=0.4.3`, and `PyYAML>=6.0.0,<7.0.0`. In the `fixtures.py` file, we have added `PermissionLevel.CAN_QUERY` to the `CAN_VIEW` and `CAN_MANAGE` permissions in the `_path` function, allowing users to query the endpoint. Additionally, we have updated the `test_endpoints` function in the `test_generic.py` file as part of the integration tests for workspace access. This change updates the permission level for creating a serving endpoint from `CAN_MANAGE` to `CAN_QUERY`, meaning that the assigned group can now only query the endpoint. We have also included the `test_feature_tables` function in the commit, which tests the behavior of feature tables in the Databricks workspace. This change only affects the `test_endpoints` function and its assert statements, and does not impact the functionality of the `test_feature_tables` function.
* Changed default UCX installation folder to `/Applications/ucx` from `/Users/<me>/.ucx` to allow multiple users users utilising the same installation ([#854](#854)). In this release, we've added a new advanced feature that allows users to force the installation of UCX over an existing installation using the `UCX_FORCE_INSTALL` environment variable. This variable can take two values `global` and 'user', providing more control and flexibility in installing UCX. The default UCX installation folder has been changed to /Applications/ucx from /Users/<me>/.ucx to enable multiple users to utilize the same installation. A table detailing the expected install location, `install_folder`, and mode for each combination of global and user values has been added to the README file. We've also added user prompts to confirm the installation if UCX is already installed and the `UCX_FORCE_INSTALL` variable is set to 'user'. This feature is useful when users want to install UCX in a specific location or force the installation over an existing installation. However, it is recommended to use this feature with caution, as it can potentially break existing installations if not used correctly. Additionally, several changes to the implementation of the UCX installation process have been made, as well as new tests to ensure that the installation process works correctly in various scenarios.
* Fix: Recover lost fix for `webbrowser.open` mock ([#1052](#1052)). A fix has been implemented to address an issue related to the mock for `webbrowser.open` in the tests `test_repair_run` and `test_get_existing_installation_global`. This change prevents the `webbrowser.open` function from being called during these tests, which helps improve test stability and consistency. No new methods have been added, and the existing functionality of these tests has only been modified to include the `webbrowser.open` mock. This modification aims to enhance the reliability and predictability of these specific tests, ensuring accurate and consistent results.
* Improved table migrations logic ([#1050](#1050)). This change introduces improvements to table migrations logic by refactoring unit tests to load table mappings from JSON instead of inline structs, adding an `escape_sql_identifier` function where missing, and preparing for ACLs migration. The `uc_grant_sql` method in `grants.py` has been updated to accept optional `object_type` and `object_key` parameters, and the hive-to-UC mapping has been expanded to include mappings for views. Additionally, new JSON files for external source table configuration have been added, and new functions have been introduced for loading fixture data from JSON files and creating mocked `WorkspaceClient` and `TableMapping` objects for testing. The changes improve the maintainability and security of the codebase, prepare it for future migration tasks, and ensure that the code is more adaptable and robust. The changes have been manually tested and verified on the staging environment.
* Moved `SqlBackend` implementation to `databricks-labs-lsql` dependency ([#1042](#1042)). In this change, the `SqlBackend` implementation, including classes such as `StatementExecutionBackend` and `RuntimeBackend`, has been moved to a separate library, `databricks-labs-lsql`, which is managed at <https://github.com/databrickslabs/lsql>. This refactoring simplifies the current repository, promotes code reuse, and improves modularity by leveraging an external dependency. The modification includes adding a new line in the .gitignore file to exclude `*.out` files from version control.
* Prepare for a PyPI release ([#1038](#1038)). In preparation for a PyPI release, this change introduces a new GitHub Actions workflow that automates the package release process and ensures the integrity of the released packages by signing them with Sigstore. When a new git tag starting with `v` is pushed, this workflow is triggered, building wheels using hatch, drafting a new GitHub release, publishing the package distributions to PyPI, and signing the artifacts with Sigstore. The `pyproject.toml` file is now used for metadata, replacing `setup.cfg` and `setup.py`, and is cached to improve build performance. In addition, the `pyproject.toml` file has been updated with recent metadata in preparation for the release, including updates to the package's authors, development status, classifiers, and dependencies.
* Prevent fragile `mock.patch('databricks...')` in the test code ([#1037](#1037)). This change introduces a custom `pylint` checker to improve code flexibility and maintainability by preventing fragile `mock.patch` designs in test code. The new checker discourages the use of `MagicMock` and encourages the use of `create_autospec` to ensure that mocks have the same attributes and methods as the original class. This change has been implemented in multiple test files, including `test_cli.py`, `test_locations.py`, `test_mapping.py`, `test_table_migrate.py`, `test_table_move.py`, `test_workspace_access.py`, `test_redash.py`, `test_scim.py`, and `test_verification.py`, to improve the robustness and maintainability of the test code. Additionally, the commit removes the `verification.py` file, which contained a `VerificationManager` class for verifying applied permissions, scope ACLs, roles, and entitlements for various objects in a Databricks workspace.
* Removed `mocker.patch("databricks...)` from `test_cli` ([#1047](#1047)). In this release, we have made significant updates to the library's handling of Azure and AWS workspaces. We have added new parameters `azure_resource_permissions` and `aws_permissions` to the `_execute_for_cloud` function in `cli.py`, which are passed to the `func_azure` and `func_aws` functions respectively. The `create_uber_principal` and `principal_prefix_access` commands have also been updated to include these new parameters. Additionally, the `_azure_setup_uber_principal` and `_aws_setup_uber_principal` functions have been updated to accept the new `azure_resource_permissions` and `aws_resource_permissions` parameters. The `_azure_principal_prefix_access` and `_aws_principal_prefix_access` functions have also been updated similarly. We have also introduced a new `aws_resources` parameter in the `migrate_credentials` command, which is used to migrate Azure Service Principals in ADLS Gen2 locations to UC storage credentials. In terms of testing, we have replaced the `mocker.patch` calls with the creation of `AzureResourcePermissions` and `AWSResourcePermissions` objects, improving the code's readability and maintainability. Overall, these changes significantly enhance the library's functionality and maintainability in handling Azure and AWS workspaces.
* Require Hatch v1.9.4 on build machines ([#1049](#1049)). In this release, we have updated the Hatch package version to 1.9.4 on build machines, addressing issue [#1049](#1049). The changes include updating the toolchain dependencies and setup in the `.codegen.json` file, which simplifies the setup process and now relies on a pre-existing Hatch environment and Python 3. The acceptance workflow has also been updated to use the latest version of Hatch and the `databrickslabs/sandbox/acceptance` GitHub action version `v0.1.4`. Hatch is a Python package manager that simplifies package development and management, and this update provides new features and bug fixes that can help improve the reliability and performance of the acceptance workflow. This change requires version 1.9.4 of the Hatch package on build machines, and it will affect the build process for the project but will not have any impact on the functionality of the project itself. As a software engineer adopting this project, it's important to note this change to ensure that the build process runs smoothly and takes advantage of any new features or improvements in Hatch 1.9.4.
* Set acceptance tests to timeout after 45 minutes ([#1036](#1036)). As part of issue [#1036](#1036), the acceptance tests in this open-source library now have a 45-minute timeout configured, improving the reliability and stability of the testing environment. This change has been implemented in the `.github/workflows/acceptance.yml` file by adding the `timeout` parameter to the step where the `databrickslabs/sandbox/acceptance` action is called. This ensures that the acceptance tests will not run indefinitely and prevents any potential issues caused by long-running tests. By adopting this project, software engineers can now benefit from a more stable and reliable testing environment, with acceptance tests that are guaranteed to complete within a maximum of 45 minutes.
* Updated databricks-labs-blueprint requirement from ~0.4.1 to ~0.4.3 ([#1058](#1058)). In this release, the version requirement for the `databricks-labs-blueprint` library has been updated from ~0.4.1 to ~0.4.3 in the pyproject.toml file. This change is necessary to support issues [#1056](#1056) and [#1057](#1057). The code has been manually tested and is ready for further testing to ensure the compatibility and smooth functioning of the software. It is essential to thoroughly test the latest version of the `databricks-labs-blueprint` library with the existing codebase before deploying it to production. This includes running a comprehensive suite of tests such as unit tests, integration tests, and verification on the staging environment. This modification allows the software to use the latest version of the library, improving its functionality and overall performance.
* Use `MockPrompts.extend()` functionality in test_install to supply multiple prompts ([#1057](#1057)). This diff introduces the `MockPrompts.extend()` functionality in the `test_install` module to enable the supplying of multiple prompts for testing purposes. A new `base_prompts` dictionary with default prompts has been added and is extended with additional prompts for specific test cases. This allows for the testing of various scenarios, such as when UCX is already installed on the workspace and the user is prompted to choose between global or user installation. Additionally, new `force_user_environ` and `force_global_env` dictionaries have been added to simulate different installation environments. The functionality of the `WorkspaceInstaller` class and mocking of `webbrowser.open` are also utilized in the test cases. These changes aim to ensure the proper functioning of the configuration process for different installation scenarios.
@nfx nfx requested review from a team and dleiva04 March 15, 2024 14:22
@nfx nfx merged commit 265ed79 into main Mar 15, 2024
4 checks passed
@nfx nfx deleted the prepare/0.17.0 branch March 15, 2024 14:24
Copy link

✅ 110/110 passed, 1 flaky, 19 skipped, 1h26m1s total

Flaky tests:

  • 🤪 test_running_real_remove_backup_groups_job (5m48.985s)

Running from acceptance #1643

dmoore247 pushed a commit that referenced this pull request Mar 23, 2024
* Added AWS IAM role support to `databricks labs ucx
create-uber-principal` command
([#993](#993)). The
`databricks labs ucx create-uber-principal` command now supports AWS
Identity and Access Management (IAM) roles for external table migration.
This new feature introduces a CLI command to create an `uber-IAM`
profile, which checks for the UCX migration cluster policy and updates
or adds the migration policy to provide access to the relevant table
locations. If no IAM instance profile or role is specified in the
cluster policy, a new one is created and the new migration policy is
added. This change includes new methods and functions to handle AWS IAM
roles, instance profiles, and related trust policies. Additionally, new
unit and integration tests have been added and verified on the staging
environment. The implementation also identifies all S3 buckets used by
the Instance Profiles configured in the workspace.
* Added Dashboard widget to show the list of cluster policies along with
DBR version
([#1013](#1013)). In this
code revision, the `assessment` module of the 'databricks/labs/ucx'
package has been updated to include a new `PoliciesCrawler` class, which
fetches, assesses, and snapshots cluster policies. This class extends
`CrawlerBase` and `CheckClusterMixin` and introduces the '_crawl',
'_assess_policies', '_try_fetch', and `snapshot` methods. The
`PolicyInfo` dataclass has been added to hold policy information, with a
structure similar to the `ClusterInfo` dataclass. The `ClusterInfo`
dataclass has been updated to include `spark_version` and `policy_id`
attributes. A new table for policies has been added, and cluster
policies along with the DBR version are loaded into this table. Relevant
user documentation, tests, and a Dashboard widget have been added to
support this feature. The `create` function in 'fixtures.py' has been
updated to enable a Delta preview feature in Spark configurations, and a
new SQL file has been included for querying cluster policies.
Additionally, a new `crawl_cluster_policies` method has been added to
scan and store cluster policies with matching configurations.
* Added `migration_status` table to capture a snapshot of migrated
tables ([#1041](#1041)). A
`migration_status` table has been added to track the status of migrated
tables in the database, enabling improved management and tracking of
migrations. The new `MigrationStatus` class, which is a dataclass that
holds the source and destination schema, table, and updated timestamp,
is added. The `TablesMigrate` class now has a new
`_migration_status_refresher` attribute that is an instance of the new
`MigrationStatusRefresher` class. This class crawls the
`migration_status` table and returns a snapshot of the migration status,
which is used to refresh the migration status and check if the table is
upgraded. Additionally, the `_init_seen_tables` method is updated to get
the seen tables from the `_migration_status_refresher` instead of
fetching from the table properties. The `MigrationStatusRefresher` class
fetches the migration status table and returns a snapshot of the
migration status. This change also adds new test functions in the test
file for the Hive metastore, which covers various scenarios such as
migrating managed tables with and without caching, migrating external
tables, and reverting migrated tables.
* Added a check for existing inventory database to avoid losing
existing, inject installation objects in tests and try fetching existing
installation before setting global as default
([#1043](#1043)). In this
release, we have added a new method, `_check_inventory_database_exists`,
to the `WorkspaceInstallation` class, which checks if an inventory
database with a given name already exists in the Workspace. This
prevents accidental overwriting of existing data and improves the
robustness of handling inventory databases. The `validate_and_run`
method has been updated to call
`app.current_installation(workspace_client)`, allowing for a more
flexible handling of installations. The `Installation` class import has
been updated to include `SerdeError`, and the test suite has been
updated to inject installation objects and check for existing
installations before setting the global installation as default. A new
argument `inventory_schema_suffix` has been added to the `factory`
method for customization of the inventory schema name. We have also
added a new method `check_inventory_database_exists` to the
`WorkspaceInstaller` class, which checks if an inventory database
already exists for a given installation type and raises an
`AlreadyExists` error if it does. The behavior of the `download` method
in the `WorkspaceClient` class has been mocked, and the `get_status`
method has been updated to return `NotFound` in certain tests. These
changes aim to improve the robustness, flexibility, and safety of the
installation process in the Workspace.
* Added a check for external metastore in SQL warehouse configuration
([#1046](#1046)). In this
release, we have added new functionality to the Unity Catalog (UCX)
installation process to enable checking for and connecting to an
external Hive metastore configuration. A new method,
`_get_warehouse_config_with_external_hive_metastore`, has been
introduced to retrieve the workspace warehouse config and identify if it
is set up for an external Hive metastore. If so, and the user confirms
the prompt, UCX will be configured to connect to the external metastore.
Additionally, new methods `_extract_external_hive_metastore_sql_conf`
and `test_cluster_policy_definition_<cloud_provider>_hms_warehouse()`
have been added to handle the external metastore configuration for
Azure, AWS, and GCP, and to handle the case when the data_access_config
is empty. These changes provide more flexibility and ease of use when
installing UCX with external Hive metastore configurations. The new
imports `EndpointConfPair`, `GetWorkspaceWarehouseConfigResponse` from
the `databricks.sdk.service.sql` package are used to handle the endpoint
configuration of the SQL warehouse.
* Added integration tests for AWS - create locations
([#1026](#1026)). In this
release, we have added comprehensive integration tests for AWS resources
and their management in the `tests/unit/assessment/test_aws.py` file.
The `AWSResources` class has been updated with new methods (AwsIamRole,
add_uc_role, add_uc_role_policy, and validate_connection) and the
regular expression for matching S3 resource ARN has been modified. The
`create_external_locations` method now allows for creating external
locations without validating them, and the
`_identify_missing_external_locations` function has been enhanced to
match roles with a wildcard pattern. The new tests include validating
the integration of AWS services with the system, testing the CLI's
behavior when it is missing, and introducing new configuration scenarios
with the addition of a Key Management Service (KMS) key during the
creation of IAM roles and policies. These changes improve the robustness
and reliability of AWS resource integration and handling in our system.
* Bump Databricks SDK to v0.22.0
([#1059](#1059)). In this
release, we are bumping the Databricks SDK version to 0.22.0 and
upgrading the `databricks-labs-lsql` package to ~0.2.2. The new
dependencies for this release include `databricks-sdk==0.22.0`,
`databricks-labs-lsql~=0.2.2`, `databricks-labs-blueprint~=0.4.3`, and
`PyYAML>=6.0.0,<7.0.0`. In the `fixtures.py` file, we have added
`PermissionLevel.CAN_QUERY` to the `CAN_VIEW` and `CAN_MANAGE`
permissions in the `_path` function, allowing users to query the
endpoint. Additionally, we have updated the `test_endpoints` function in
the `test_generic.py` file as part of the integration tests for
workspace access. This change updates the permission level for creating
a serving endpoint from `CAN_MANAGE` to `CAN_QUERY`, meaning that the
assigned group can now only query the endpoint. We have also included
the `test_feature_tables` function in the commit, which tests the
behavior of feature tables in the Databricks workspace. This change only
affects the `test_endpoints` function and its assert statements, and
does not impact the functionality of the `test_feature_tables` function.
* Changed default UCX installation folder to `/Applications/ucx` from
`/Users/<me>/.ucx` to allow multiple users users utilising the same
installation ([#854](#854)).
In this release, we've added a new advanced feature that allows users to
force the installation of UCX over an existing installation using the
`UCX_FORCE_INSTALL` environment variable. This variable can take two
values `global` and 'user', providing more control and flexibility in
installing UCX. The default UCX installation folder has been changed to
/Applications/ucx from /Users/<me>/.ucx to enable multiple users to
utilize the same installation. A table detailing the expected install
location, `install_folder`, and mode for each combination of global and
user values has been added to the README file. We've also added user
prompts to confirm the installation if UCX is already installed and the
`UCX_FORCE_INSTALL` variable is set to 'user'. This feature is useful
when users want to install UCX in a specific location or force the
installation over an existing installation. However, it is recommended
to use this feature with caution, as it can potentially break existing
installations if not used correctly. Additionally, several changes to
the implementation of the UCX installation process have been made, as
well as new tests to ensure that the installation process works
correctly in various scenarios.
* Fix: Recover lost fix for `webbrowser.open` mock
([#1052](#1052)). A fix has
been implemented to address an issue related to the mock for
`webbrowser.open` in the tests `test_repair_run` and
`test_get_existing_installation_global`. This change prevents the
`webbrowser.open` function from being called during these tests, which
helps improve test stability and consistency. No new methods have been
added, and the existing functionality of these tests has only been
modified to include the `webbrowser.open` mock. This modification aims
to enhance the reliability and predictability of these specific tests,
ensuring accurate and consistent results.
* Improved table migrations logic
([#1050](#1050)). This
change introduces improvements to table migrations logic by refactoring
unit tests to load table mappings from JSON instead of inline structs,
adding an `escape_sql_identifier` function where missing, and preparing
for ACLs migration. The `uc_grant_sql` method in `grants.py` has been
updated to accept optional `object_type` and `object_key` parameters,
and the hive-to-UC mapping has been expanded to include mappings for
views. Additionally, new JSON files for external source table
configuration have been added, and new functions have been introduced
for loading fixture data from JSON files and creating mocked
`WorkspaceClient` and `TableMapping` objects for testing. The changes
improve the maintainability and security of the codebase, prepare it for
future migration tasks, and ensure that the code is more adaptable and
robust. The changes have been manually tested and verified on the
staging environment.
* Moved `SqlBackend` implementation to `databricks-labs-lsql` dependency
([#1042](#1042)). In this
change, the `SqlBackend` implementation, including classes such as
`StatementExecutionBackend` and `RuntimeBackend`, has been moved to a
separate library, `databricks-labs-lsql`, which is managed at
<https://github.com/databrickslabs/lsql>. This refactoring simplifies
the current repository, promotes code reuse, and improves modularity by
leveraging an external dependency. The modification includes adding a
new line in the .gitignore file to exclude `*.out` files from version
control.
* Prepare for a PyPI release
([#1038](#1038)). In
preparation for a PyPI release, this change introduces a new GitHub
Actions workflow that automates the package release process and ensures
the integrity of the released packages by signing them with Sigstore.
When a new git tag starting with `v` is pushed, this workflow is
triggered, building wheels using hatch, drafting a new GitHub release,
publishing the package distributions to PyPI, and signing the artifacts
with Sigstore. The `pyproject.toml` file is now used for metadata,
replacing `setup.cfg` and `setup.py`, and is cached to improve build
performance. In addition, the `pyproject.toml` file has been updated
with recent metadata in preparation for the release, including updates
to the package's authors, development status, classifiers, and
dependencies.
* Prevent fragile `mock.patch('databricks...')` in the test code
([#1037](#1037)). This
change introduces a custom `pylint` checker to improve code flexibility
and maintainability by preventing fragile `mock.patch` designs in test
code. The new checker discourages the use of `MagicMock` and encourages
the use of `create_autospec` to ensure that mocks have the same
attributes and methods as the original class. This change has been
implemented in multiple test files, including `test_cli.py`,
`test_locations.py`, `test_mapping.py`, `test_table_migrate.py`,
`test_table_move.py`, `test_workspace_access.py`, `test_redash.py`,
`test_scim.py`, and `test_verification.py`, to improve the robustness
and maintainability of the test code. Additionally, the commit removes
the `verification.py` file, which contained a `VerificationManager`
class for verifying applied permissions, scope ACLs, roles, and
entitlements for various objects in a Databricks workspace.
* Removed `mocker.patch("databricks...)` from `test_cli`
([#1047](#1047)). In this
release, we have made significant updates to the library's handling of
Azure and AWS workspaces. We have added new parameters
`azure_resource_permissions` and `aws_permissions` to the
`_execute_for_cloud` function in `cli.py`, which are passed to the
`func_azure` and `func_aws` functions respectively. The
`create_uber_principal` and `principal_prefix_access` commands have also
been updated to include these new parameters. Additionally, the
`_azure_setup_uber_principal` and `_aws_setup_uber_principal` functions
have been updated to accept the new `azure_resource_permissions` and
`aws_resource_permissions` parameters. The
`_azure_principal_prefix_access` and `_aws_principal_prefix_access`
functions have also been updated similarly. We have also introduced a
new `aws_resources` parameter in the `migrate_credentials` command,
which is used to migrate Azure Service Principals in ADLS Gen2 locations
to UC storage credentials. In terms of testing, we have replaced the
`mocker.patch` calls with the creation of `AzureResourcePermissions` and
`AWSResourcePermissions` objects, improving the code's readability and
maintainability. Overall, these changes significantly enhance the
library's functionality and maintainability in handling Azure and AWS
workspaces.
* Require Hatch v1.9.4 on build machines
([#1049](#1049)). In this
release, we have updated the Hatch package version to 1.9.4 on build
machines, addressing issue
[#1049](#1049). The changes
include updating the toolchain dependencies and setup in the
`.codegen.json` file, which simplifies the setup process and now relies
on a pre-existing Hatch environment and Python 3. The acceptance
workflow has also been updated to use the latest version of Hatch and
the `databrickslabs/sandbox/acceptance` GitHub action version `v0.1.4`.
Hatch is a Python package manager that simplifies package development
and management, and this update provides new features and bug fixes that
can help improve the reliability and performance of the acceptance
workflow. This change requires version 1.9.4 of the Hatch package on
build machines, and it will affect the build process for the project but
will not have any impact on the functionality of the project itself. As
a software engineer adopting this project, it's important to note this
change to ensure that the build process runs smoothly and takes
advantage of any new features or improvements in Hatch 1.9.4.
* Set acceptance tests to timeout after 45 minutes
([#1036](#1036)). As part of
issue [#1036](#1036), the
acceptance tests in this open-source library now have a 45-minute
timeout configured, improving the reliability and stability of the
testing environment. This change has been implemented in the
`.github/workflows/acceptance.yml` file by adding the `timeout`
parameter to the step where the `databrickslabs/sandbox/acceptance`
action is called. This ensures that the acceptance tests will not run
indefinitely and prevents any potential issues caused by long-running
tests. By adopting this project, software engineers can now benefit from
a more stable and reliable testing environment, with acceptance tests
that are guaranteed to complete within a maximum of 45 minutes.
* Updated databricks-labs-blueprint requirement from ~0.4.1 to ~0.4.3
([#1058](#1058)). In this
release, the version requirement for the `databricks-labs-blueprint`
library has been updated from ~0.4.1 to ~0.4.3 in the pyproject.toml
file. This change is necessary to support issues
[#1056](#1056) and
[#1057](#1057). The code has
been manually tested and is ready for further testing to ensure the
compatibility and smooth functioning of the software. It is essential to
thoroughly test the latest version of the `databricks-labs-blueprint`
library with the existing codebase before deploying it to production.
This includes running a comprehensive suite of tests such as unit tests,
integration tests, and verification on the staging environment. This
modification allows the software to use the latest version of the
library, improving its functionality and overall performance.
* Use `MockPrompts.extend()` functionality in test_install to supply
multiple prompts
([#1057](#1057)). This diff
introduces the `MockPrompts.extend()` functionality in the
`test_install` module to enable the supplying of multiple prompts for
testing purposes. A new `base_prompts` dictionary with default prompts
has been added and is extended with additional prompts for specific test
cases. This allows for the testing of various scenarios, such as when
UCX is already installed on the workspace and the user is prompted to
choose between global or user installation. Additionally, new
`force_user_environ` and `force_global_env` dictionaries have been added
to simulate different installation environments. The functionality of
the `WorkspaceInstaller` class and mocking of `webbrowser.open` are also
utilized in the test cases. These changes aim to ensure the proper
functioning of the configuration process for different installation
scenarios.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant