Added Dashboard widget to show the list of cluster policies along with DBR version #1013

prajin-29 · 2024-03-06T12:29:58Z

Changes

Adding new table called policies and loading cluster policies along with dbr version in the same

Linked issues

#830

Resolves #..

Functionality

added relevant user documentation
added new CLI command
modified existing command: databricks labs ucx ...
added a new workflow
modified existing workflow: ...
added a new table
modified existing table: ...

Tests

manually tested
added unit tests
added integration tests
verified on staging environment (screenshot attached)

# Conflicts: # src/databricks/labs/ucx/cli.py

gitguardian · 2024-03-06T12:30:05Z

️✅ There are no secrets present in this pull request anymore.

If these secrets were true positive and are still valid, we highly recommend you to revoke them.
Once a secret has been leaked into a git repository, you should consider it compromised, even if it was deleted immediately.
Find here more information about risks.

^{_{🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.

Our GitHub checks need improvements? Share your feedbacks!}}

codecov · 2024-03-06T12:31:47Z

Codecov Report

Attention: Patch coverage is 96.42857% with 2 lines in your changes are missing coverage. Please review.

Project coverage is 88.61%. Comparing base (56a422e) to head (ec3a667).
Report is 1 commits behind head on main.

Files	Patch %	Lines
src/databricks/labs/ucx/runtime.py	60.00%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1013      +/-   ##
==========================================
+ Coverage   88.55%   88.61%   +0.06%     
==========================================
  Files          52       53       +1     
  Lines        6483     6537      +54     
  Branches     1169     1174       +5     
==========================================
+ Hits         5741     5793      +52     
- Misses        492      494       +2     
  Partials      250      250

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

nfx · 2024-03-06T12:55:53Z

@prajin-29 this information has to be added to a dashboard as a widget instead of the CLI command

prajin-29 · 2024-03-06T13:17:16Z

@prajin-29 this information has to be added to a dashboard as a widget instead of the CLI command

ok. In that case we have to add extra columns(DBR_Versions and Policy Info) in clusters table to show the same in dashboard.

# Conflicts: # labs.yml # src/databricks/labs/ucx/cli.py

nfx

This is incorrect.

src/databricks/labs/ucx/assessment/clusters.py

nfx · 2024-03-07T08:38:21Z

src/databricks/labs/ucx/assessment/clusters.py

+class PolicyInfo:
+    policy_id: str
+    cluster_id: str
+    dbr_version: str


Suggested change

dbr_version: str

dbr_version: str | None = None

It might be optional

src/databricks/labs/ucx/assessment/clusters.py

nfx · 2024-03-07T08:39:32Z

src/databricks/labs/ucx/assessment/clusters.py

+        self._ws = ws
+
+    def _crawl(self) -> Iterable[PolicyInfo]:
+        all_clusters = list(self._ws.clusters.list())


List all policies, not clusters!

Updated the code to list all policies instead of clusters. Since Policy won't be having DBR version associated with it I have added the dbr version in cluster and joining the cluster with policies to get the required information

policy can have spark_version in it: https://docs.databricks.com/en/administration-guide/clusters/policy-definition.html#supported-attributes

src/databricks/labs/ucx/runtime.py

src/databricks/labs/ucx/assessment/clusters.py

nfx · 2024-03-07T16:23:48Z

src/databricks/labs/ucx/runtime.py

@@ -154,6 +154,19 @@ def assess_incompatible_submit_runs(cfg: WorkspaceConfig, ws: WorkspaceClient, s
    crawler.snapshot()


+@task("assessment")
+def assess_cluster_policies(cfg: WorkspaceConfig, ws: WorkspaceClient, sql_backend: SqlBackend):


Suggested change

def assess_cluster_policies(cfg: WorkspaceConfig, ws: WorkspaceClient, sql_backend: SqlBackend):

def crawl_cluster_policies(cfg: WorkspaceConfig, ws: WorkspaceClient, sql_backend: SqlBackend):

changed the naming convention

nfx

the implementation is incorrect

nfx · 2024-03-11T10:45:00Z

src/databricks/labs/ucx/assessment/clusters.py

+            if "spark_version" not in json.loads(policy.definition):
+                spark_version = None
+            else:
+                spark_version = json.loads(policy.definition)["spark_version"]


this is a bug - policy definition has a dictionary, we need to store a string.

Storing as string now

nfx · 2024-03-11T10:47:09Z

src/databricks/labs/ucx/queries/assessment/main/08_0_cluster_policies.sql

+    distinct policy_name,
+    cluster.spark_version as cluster_dbr_version,
+    policy.spark_version as policy_spark_version
+FROM $inventory.clusters as cluster


why do you need to select from clusters if you are required to select only from policies?

From clusters table we will get the cluster DBR versions and from policies table we will get the policy spark version.
We are joining both so that its easy for a customer to compare both side by side and see if a policy spark version can be upgraded based on cluster DBR version

nfx · 2024-03-11T10:47:46Z

tests/integration/assessment/test_clusters.py

+        if policy.policy_id == created_policy.policy_id:
+            results.append(policy)
+
+    assert len(results) >= 1


this test is incorrect.

Updated the assertions.Instead of checking the length now checking the individual values

nfx · 2024-03-11T10:48:49Z

tests/unit/assessment/test_clusters.py

+
+    crawler = PoliciesCrawler(ws, MockBackend(), "ucx")
+    result_set = list(crawler.snapshot())
+    assert len(result_set) == 2


and what are the failures?

Checking for failure also.

nfx · 2024-03-11T10:49:07Z

tests/unit/assessment/test_clusters.py

+    result_set = list(crawler.snapshot())
+
+    assert len(result_set) == 1
+    assert result_set[0].policy_id == "000"


why you don't check for failures?

nfx · 2024-03-11T10:49:28Z

tests/unit/assessment/test_clusters.py

+
+def test_policy_try_fetch():
+    ws = workspace_client_mock(policy_ids=['single-user-with-spn-policyid'])
+    mock_backend = MagicMock()


use MockBackend() instead

Removed MagicMock and using MockBackend instead

nfx · 2024-03-11T10:50:08Z

tests/unit/assessment/test_clusters.py

+        policy_ids=['single-user-with-spn-policyid'],
+    )
+
+    with patch("databricks.labs.ucx.assessment.clusters.azure_sp_conf_present_check", retrun_value=True):


this is prohibited. see #1037

Removed patching method

# Conflicts: # tests/unit/assessment/test_clusters.py

nfx

Code is incorrect and broken

nfx · 2024-03-12T12:21:33Z

src/databricks/labs/ucx/mixins/fixtures.py

-                {"spark_conf.spark.databricks.delta.preview.enabled": {"type": "fixed", "value": "true"}}
+                {
+                    "spark_conf.spark.databricks.delta.preview.enabled": {"type": "fixed", "value": "true"},
+                    "spark_version": {'type': 'fixed', 'value': '14.3.x-scala2.12'},


Don't add it to the fixture factory, add it to the test that needs it

Removed it from fixture

tests/integration/assessment/test_clusters.py

nfx · 2024-03-12T12:23:24Z

tests/unit/assessment/policies/single-user-with-spn-policyid.json

+    }
+  },
+  "name": "test_policy",
+  "spark_version": "13.3X,",


This looks like an invalid spark version

Removed it from json

tests/unit/assessment/test_clusters.py

nfx · 2024-03-12T12:26:21Z

src/databricks/labs/ucx/assessment/clusters.py

+        return list(self._assess_policies(all_policices))
+
+    def _assess_policies(self, all_policices):
+        failures: list[str] = []


BUG: this variable is shared by all iterations of the loop. It's producing incorrect results

variables included within the policy iteration

src/databricks/labs/ucx/assessment/clusters.py

nfx

Make CI pass

nfx · 2024-03-14T01:21:47Z

src/databricks/labs/ucx/upgrades/v0.16.0_changing_cluster_table_schema.py

+
+def upgrade(installation: Installation, ws: WorkspaceClient):
+    config = installation.load(WorkspaceConfig)
+    sql = f"DROP TABLE IF EXISTS hive_metastore.{config.inventory_database}.clusters"


Why are you dropping the production database table? 😰

Alter table and add columns instead

I have changed the code to alter the table.
But why I used drop and recreate is that once we drop and recreate and then if customer run the assessment it should populate the correct data.

# Conflicts: # src/databricks/labs/ucx/install.py # tests/unit/assessment/test_clusters.py

nfx

Few last touches required

nfx · 2024-03-14T08:50:15Z

tests/unit/test_install.py

@@ -1173,6 +1173,8 @@ def test_runs_upgrades_on_more_recent_version(ws, any_prompt):
    sql_backend = MockBackend()
    wheels = create_autospec(WheelsV2)

+    StatementExecutionBackend.execute = MockBackend().execute


Suggested change

StatementExecutionBackend.execute = MockBackend().execute

This is incorrect, don't do it

This is mocked because sql_backend was not getting passed to upgrades.I have changed the upgrade to execute from workspace_client execution statement and removed this from here

nfx · 2024-03-14T08:52:11Z

src/databricks/labs/ucx/upgrades/v0.16.0_changing_cluster_table_schema.py

+
+def upgrade(installation: Installation, ws: WorkspaceClient):
+    config = installation.load(WorkspaceConfig)
+    sql = f"ALTER TABLE hive_metastore.{config.inventory_database}.clusters ADD COLUMNS(policy_id string,spark_version string)"


Are these columns nullable? Did you try executing this statement on a database of the previous version?

The columns are nullable. Ran this with the previous version of database.

github-actions · 2024-03-14T09:12:02Z

✅ 110/110 passed, 19 skipped, 1h29m17s total

_{Running from acceptance #1615}

…h DBR version (#1013) ## Changes Adding new table called policies and loading cluster policies along with dbr version in the same ### Linked issues #830 Resolves #.. ### Functionality - [ ] added relevant user documentation - [ ] added new CLI command - [ ] modified existing command: `databricks labs ucx ...` - [ ] added a new workflow - [ ] modified existing workflow: `...` - [x] added a new table - [x] modified existing table: `...` ### Tests <img width="749" alt="image" src="https://github.com/databrickslabs/ucx/assets/127273819/117b8a78-9c9e-46dc-a838-7329c68c0bcb"> - [x] manually tested - [x] added unit tests - [x] added integration tests - [ ] verified on staging environment (screenshot attached)

* Added AWS IAM role support to `databricks labs ucx create-uber-principal` command ([#993](#993)). The `databricks labs ucx create-uber-principal` command now supports AWS Identity and Access Management (IAM) roles for external table migration. This new feature introduces a CLI command to create an `uber-IAM` profile, which checks for the UCX migration cluster policy and updates or adds the migration policy to provide access to the relevant table locations. If no IAM instance profile or role is specified in the cluster policy, a new one is created and the new migration policy is added. This change includes new methods and functions to handle AWS IAM roles, instance profiles, and related trust policies. Additionally, new unit and integration tests have been added and verified on the staging environment. The implementation also identifies all S3 buckets used by the Instance Profiles configured in the workspace. * Added Dashboard widget to show the list of cluster policies along with DBR version ([#1013](#1013)). In this code revision, the `assessment` module of the 'databricks/labs/ucx' package has been updated to include a new `PoliciesCrawler` class, which fetches, assesses, and snapshots cluster policies. This class extends `CrawlerBase` and `CheckClusterMixin` and introduces the '_crawl', '_assess_policies', '_try_fetch', and `snapshot` methods. The `PolicyInfo` dataclass has been added to hold policy information, with a structure similar to the `ClusterInfo` dataclass. The `ClusterInfo` dataclass has been updated to include `spark_version` and `policy_id` attributes. A new table for policies has been added, and cluster policies along with the DBR version are loaded into this table. Relevant user documentation, tests, and a Dashboard widget have been added to support this feature. The `create` function in 'fixtures.py' has been updated to enable a Delta preview feature in Spark configurations, and a new SQL file has been included for querying cluster policies. Additionally, a new `crawl_cluster_policies` method has been added to scan and store cluster policies with matching configurations. * Added `migration_status` table to capture a snapshot of migrated tables ([#1041](#1041)). A `migration_status` table has been added to track the status of migrated tables in the database, enabling improved management and tracking of migrations. The new `MigrationStatus` class, which is a dataclass that holds the source and destination schema, table, and updated timestamp, is added. The `TablesMigrate` class now has a new `_migration_status_refresher` attribute that is an instance of the new `MigrationStatusRefresher` class. This class crawls the `migration_status` table and returns a snapshot of the migration status, which is used to refresh the migration status and check if the table is upgraded. Additionally, the `_init_seen_tables` method is updated to get the seen tables from the `_migration_status_refresher` instead of fetching from the table properties. The `MigrationStatusRefresher` class fetches the migration status table and returns a snapshot of the migration status. This change also adds new test functions in the test file for the Hive metastore, which covers various scenarios such as migrating managed tables with and without caching, migrating external tables, and reverting migrated tables. * Added a check for existing inventory database to avoid losing existing, inject installation objects in tests and try fetching existing installation before setting global as default ([#1043](#1043)). In this release, we have added a new method, `_check_inventory_database_exists`, to the `WorkspaceInstallation` class, which checks if an inventory database with a given name already exists in the Workspace. This prevents accidental overwriting of existing data and improves the robustness of handling inventory databases. The `validate_and_run` method has been updated to call `app.current_installation(workspace_client)`, allowing for a more flexible handling of installations. The `Installation` class import has been updated to include `SerdeError`, and the test suite has been updated to inject installation objects and check for existing installations before setting the global installation as default. A new argument `inventory_schema_suffix` has been added to the `factory` method for customization of the inventory schema name. We have also added a new method `check_inventory_database_exists` to the `WorkspaceInstaller` class, which checks if an inventory database already exists for a given installation type and raises an `AlreadyExists` error if it does. The behavior of the `download` method in the `WorkspaceClient` class has been mocked, and the `get_status` method has been updated to return `NotFound` in certain tests. These changes aim to improve the robustness, flexibility, and safety of the installation process in the Workspace. * Added a check for external metastore in SQL warehouse configuration ([#1046](#1046)). In this release, we have added new functionality to the Unity Catalog (UCX) installation process to enable checking for and connecting to an external Hive metastore configuration. A new method, `_get_warehouse_config_with_external_hive_metastore`, has been introduced to retrieve the workspace warehouse config and identify if it is set up for an external Hive metastore. If so, and the user confirms the prompt, UCX will be configured to connect to the external metastore. Additionally, new methods `_extract_external_hive_metastore_sql_conf` and `test_cluster_policy_definition_<cloud_provider>_hms_warehouse()` have been added to handle the external metastore configuration for Azure, AWS, and GCP, and to handle the case when the data_access_config is empty. These changes provide more flexibility and ease of use when installing UCX with external Hive metastore configurations. The new imports `EndpointConfPair`, `GetWorkspaceWarehouseConfigResponse` from the `databricks.sdk.service.sql` package are used to handle the endpoint configuration of the SQL warehouse. * Added integration tests for AWS - create locations ([#1026](#1026)). In this release, we have added comprehensive integration tests for AWS resources and their management in the `tests/unit/assessment/test_aws.py` file. The `AWSResources` class has been updated with new methods (AwsIamRole, add_uc_role, add_uc_role_policy, and validate_connection) and the regular expression for matching S3 resource ARN has been modified. The `create_external_locations` method now allows for creating external locations without validating them, and the `_identify_missing_external_locations` function has been enhanced to match roles with a wildcard pattern. The new tests include validating the integration of AWS services with the system, testing the CLI's behavior when it is missing, and introducing new configuration scenarios with the addition of a Key Management Service (KMS) key during the creation of IAM roles and policies. These changes improve the robustness and reliability of AWS resource integration and handling in our system. * Bump Databricks SDK to v0.22.0 ([#1059](#1059)). In this release, we are bumping the Databricks SDK version to 0.22.0 and upgrading the `databricks-labs-lsql` package to ~0.2.2. The new dependencies for this release include `databricks-sdk==0.22.0`, `databricks-labs-lsql~=0.2.2`, `databricks-labs-blueprint~=0.4.3`, and `PyYAML>=6.0.0,<7.0.0`. In the `fixtures.py` file, we have added `PermissionLevel.CAN_QUERY` to the `CAN_VIEW` and `CAN_MANAGE` permissions in the `_path` function, allowing users to query the endpoint. Additionally, we have updated the `test_endpoints` function in the `test_generic.py` file as part of the integration tests for workspace access. This change updates the permission level for creating a serving endpoint from `CAN_MANAGE` to `CAN_QUERY`, meaning that the assigned group can now only query the endpoint. We have also included the `test_feature_tables` function in the commit, which tests the behavior of feature tables in the Databricks workspace. This change only affects the `test_endpoints` function and its assert statements, and does not impact the functionality of the `test_feature_tables` function. * Changed default UCX installation folder to `/Applications/ucx` from `/Users/<me>/.ucx` to allow multiple users users utilising the same installation ([#854](#854)). In this release, we've added a new advanced feature that allows users to force the installation of UCX over an existing installation using the `UCX_FORCE_INSTALL` environment variable. This variable can take two values `global` and 'user', providing more control and flexibility in installing UCX. The default UCX installation folder has been changed to /Applications/ucx from /Users/<me>/.ucx to enable multiple users to utilize the same installation. A table detailing the expected install location, `install_folder`, and mode for each combination of global and user values has been added to the README file. We've also added user prompts to confirm the installation if UCX is already installed and the `UCX_FORCE_INSTALL` variable is set to 'user'. This feature is useful when users want to install UCX in a specific location or force the installation over an existing installation. However, it is recommended to use this feature with caution, as it can potentially break existing installations if not used correctly. Additionally, several changes to the implementation of the UCX installation process have been made, as well as new tests to ensure that the installation process works correctly in various scenarios. * Fix: Recover lost fix for `webbrowser.open` mock ([#1052](#1052)). A fix has been implemented to address an issue related to the mock for `webbrowser.open` in the tests `test_repair_run` and `test_get_existing_installation_global`. This change prevents the `webbrowser.open` function from being called during these tests, which helps improve test stability and consistency. No new methods have been added, and the existing functionality of these tests has only been modified to include the `webbrowser.open` mock. This modification aims to enhance the reliability and predictability of these specific tests, ensuring accurate and consistent results. * Improved table migrations logic ([#1050](#1050)). This change introduces improvements to table migrations logic by refactoring unit tests to load table mappings from JSON instead of inline structs, adding an `escape_sql_identifier` function where missing, and preparing for ACLs migration. The `uc_grant_sql` method in `grants.py` has been updated to accept optional `object_type` and `object_key` parameters, and the hive-to-UC mapping has been expanded to include mappings for views. Additionally, new JSON files for external source table configuration have been added, and new functions have been introduced for loading fixture data from JSON files and creating mocked `WorkspaceClient` and `TableMapping` objects for testing. The changes improve the maintainability and security of the codebase, prepare it for future migration tasks, and ensure that the code is more adaptable and robust. The changes have been manually tested and verified on the staging environment. * Moved `SqlBackend` implementation to `databricks-labs-lsql` dependency ([#1042](#1042)). In this change, the `SqlBackend` implementation, including classes such as `StatementExecutionBackend` and `RuntimeBackend`, has been moved to a separate library, `databricks-labs-lsql`, which is managed at <https://github.com/databrickslabs/lsql>. This refactoring simplifies the current repository, promotes code reuse, and improves modularity by leveraging an external dependency. The modification includes adding a new line in the .gitignore file to exclude `*.out` files from version control. * Prepare for a PyPI release ([#1038](#1038)). In preparation for a PyPI release, this change introduces a new GitHub Actions workflow that automates the package release process and ensures the integrity of the released packages by signing them with Sigstore. When a new git tag starting with `v` is pushed, this workflow is triggered, building wheels using hatch, drafting a new GitHub release, publishing the package distributions to PyPI, and signing the artifacts with Sigstore. The `pyproject.toml` file is now used for metadata, replacing `setup.cfg` and `setup.py`, and is cached to improve build performance. In addition, the `pyproject.toml` file has been updated with recent metadata in preparation for the release, including updates to the package's authors, development status, classifiers, and dependencies. * Prevent fragile `mock.patch('databricks...')` in the test code ([#1037](#1037)). This change introduces a custom `pylint` checker to improve code flexibility and maintainability by preventing fragile `mock.patch` designs in test code. The new checker discourages the use of `MagicMock` and encourages the use of `create_autospec` to ensure that mocks have the same attributes and methods as the original class. This change has been implemented in multiple test files, including `test_cli.py`, `test_locations.py`, `test_mapping.py`, `test_table_migrate.py`, `test_table_move.py`, `test_workspace_access.py`, `test_redash.py`, `test_scim.py`, and `test_verification.py`, to improve the robustness and maintainability of the test code. Additionally, the commit removes the `verification.py` file, which contained a `VerificationManager` class for verifying applied permissions, scope ACLs, roles, and entitlements for various objects in a Databricks workspace. * Removed `mocker.patch("databricks...)` from `test_cli` ([#1047](#1047)). In this release, we have made significant updates to the library's handling of Azure and AWS workspaces. We have added new parameters `azure_resource_permissions` and `aws_permissions` to the `_execute_for_cloud` function in `cli.py`, which are passed to the `func_azure` and `func_aws` functions respectively. The `create_uber_principal` and `principal_prefix_access` commands have also been updated to include these new parameters. Additionally, the `_azure_setup_uber_principal` and `_aws_setup_uber_principal` functions have been updated to accept the new `azure_resource_permissions` and `aws_resource_permissions` parameters. The `_azure_principal_prefix_access` and `_aws_principal_prefix_access` functions have also been updated similarly. We have also introduced a new `aws_resources` parameter in the `migrate_credentials` command, which is used to migrate Azure Service Principals in ADLS Gen2 locations to UC storage credentials. In terms of testing, we have replaced the `mocker.patch` calls with the creation of `AzureResourcePermissions` and `AWSResourcePermissions` objects, improving the code's readability and maintainability. Overall, these changes significantly enhance the library's functionality and maintainability in handling Azure and AWS workspaces. * Require Hatch v1.9.4 on build machines ([#1049](#1049)). In this release, we have updated the Hatch package version to 1.9.4 on build machines, addressing issue [#1049](#1049). The changes include updating the toolchain dependencies and setup in the `.codegen.json` file, which simplifies the setup process and now relies on a pre-existing Hatch environment and Python 3. The acceptance workflow has also been updated to use the latest version of Hatch and the `databrickslabs/sandbox/acceptance` GitHub action version `v0.1.4`. Hatch is a Python package manager that simplifies package development and management, and this update provides new features and bug fixes that can help improve the reliability and performance of the acceptance workflow. This change requires version 1.9.4 of the Hatch package on build machines, and it will affect the build process for the project but will not have any impact on the functionality of the project itself. As a software engineer adopting this project, it's important to note this change to ensure that the build process runs smoothly and takes advantage of any new features or improvements in Hatch 1.9.4. * Set acceptance tests to timeout after 45 minutes ([#1036](#1036)). As part of issue [#1036](#1036), the acceptance tests in this open-source library now have a 45-minute timeout configured, improving the reliability and stability of the testing environment. This change has been implemented in the `.github/workflows/acceptance.yml` file by adding the `timeout` parameter to the step where the `databrickslabs/sandbox/acceptance` action is called. This ensures that the acceptance tests will not run indefinitely and prevents any potential issues caused by long-running tests. By adopting this project, software engineers can now benefit from a more stable and reliable testing environment, with acceptance tests that are guaranteed to complete within a maximum of 45 minutes. * Updated databricks-labs-blueprint requirement from ~0.4.1 to ~0.4.3 ([#1058](#1058)). In this release, the version requirement for the `databricks-labs-blueprint` library has been updated from ~0.4.1 to ~0.4.3 in the pyproject.toml file. This change is necessary to support issues [#1056](#1056) and [#1057](#1057). The code has been manually tested and is ready for further testing to ensure the compatibility and smooth functioning of the software. It is essential to thoroughly test the latest version of the `databricks-labs-blueprint` library with the existing codebase before deploying it to production. This includes running a comprehensive suite of tests such as unit tests, integration tests, and verification on the staging environment. This modification allows the software to use the latest version of the library, improving its functionality and overall performance. * Use `MockPrompts.extend()` functionality in test_install to supply multiple prompts ([#1057](#1057)). This diff introduces the `MockPrompts.extend()` functionality in the `test_install` module to enable the supplying of multiple prompts for testing purposes. A new `base_prompts` dictionary with default prompts has been added and is extended with additional prompts for specific test cases. This allows for the testing of various scenarios, such as when UCX is already installed on the workspace and the user is prompted to choose between global or user installation. Additionally, new `force_user_environ` and `force_global_env` dictionaries have been added to simulate different installation environments. The functionality of the `WorkspaceInstaller` class and mocking of `webbrowser.open` are also utilized in the test cases. These changes aim to ensure the proper functioning of the configuration process for different installation scenarios.

…h DBR version (#1013) ## Changes Adding new table called policies and loading cluster policies along with dbr version in the same ### Linked issues #830 Resolves #.. ### Functionality - [ ] added relevant user documentation - [ ] added new CLI command - [ ] modified existing command: `databricks labs ucx ...` - [ ] added a new workflow - [ ] modified existing workflow: `...` - [x] added a new table - [x] modified existing table: `...` ### Tests <img width="749" alt="image" src="https://github.com/databrickslabs/ucx/assets/127273819/117b8a78-9c9e-46dc-a838-7329c68c0bcb"> - [x] manually tested - [x] added unit tests - [x] added integration tests - [ ] verified on staging environment (screenshot attached)

* Added AWS IAM role support to `databricks labs ucx create-uber-principal` command ([#993](#993)). The `databricks labs ucx create-uber-principal` command now supports AWS Identity and Access Management (IAM) roles for external table migration. This new feature introduces a CLI command to create an `uber-IAM` profile, which checks for the UCX migration cluster policy and updates or adds the migration policy to provide access to the relevant table locations. If no IAM instance profile or role is specified in the cluster policy, a new one is created and the new migration policy is added. This change includes new methods and functions to handle AWS IAM roles, instance profiles, and related trust policies. Additionally, new unit and integration tests have been added and verified on the staging environment. The implementation also identifies all S3 buckets used by the Instance Profiles configured in the workspace. * Added Dashboard widget to show the list of cluster policies along with DBR version ([#1013](#1013)). In this code revision, the `assessment` module of the 'databricks/labs/ucx' package has been updated to include a new `PoliciesCrawler` class, which fetches, assesses, and snapshots cluster policies. This class extends `CrawlerBase` and `CheckClusterMixin` and introduces the '_crawl', '_assess_policies', '_try_fetch', and `snapshot` methods. The `PolicyInfo` dataclass has been added to hold policy information, with a structure similar to the `ClusterInfo` dataclass. The `ClusterInfo` dataclass has been updated to include `spark_version` and `policy_id` attributes. A new table for policies has been added, and cluster policies along with the DBR version are loaded into this table. Relevant user documentation, tests, and a Dashboard widget have been added to support this feature. The `create` function in 'fixtures.py' has been updated to enable a Delta preview feature in Spark configurations, and a new SQL file has been included for querying cluster policies. Additionally, a new `crawl_cluster_policies` method has been added to scan and store cluster policies with matching configurations. * Added `migration_status` table to capture a snapshot of migrated tables ([#1041](#1041)). A `migration_status` table has been added to track the status of migrated tables in the database, enabling improved management and tracking of migrations. The new `MigrationStatus` class, which is a dataclass that holds the source and destination schema, table, and updated timestamp, is added. The `TablesMigrate` class now has a new `_migration_status_refresher` attribute that is an instance of the new `MigrationStatusRefresher` class. This class crawls the `migration_status` table and returns a snapshot of the migration status, which is used to refresh the migration status and check if the table is upgraded. Additionally, the `_init_seen_tables` method is updated to get the seen tables from the `_migration_status_refresher` instead of fetching from the table properties. The `MigrationStatusRefresher` class fetches the migration status table and returns a snapshot of the migration status. This change also adds new test functions in the test file for the Hive metastore, which covers various scenarios such as migrating managed tables with and without caching, migrating external tables, and reverting migrated tables. * Added a check for existing inventory database to avoid losing existing, inject installation objects in tests and try fetching existing installation before setting global as default ([#1043](#1043)). In this release, we have added a new method, `_check_inventory_database_exists`, to the `WorkspaceInstallation` class, which checks if an inventory database with a given name already exists in the Workspace. This prevents accidental overwriting of existing data and improves the robustness of handling inventory databases. The `validate_and_run` method has been updated to call `app.current_installation(workspace_client)`, allowing for a more flexible handling of installations. The `Installation` class import has been updated to include `SerdeError`, and the test suite has been updated to inject installation objects and check for existing installations before setting the global installation as default. A new argument `inventory_schema_suffix` has been added to the `factory` method for customization of the inventory schema name. We have also added a new method `check_inventory_database_exists` to the `WorkspaceInstaller` class, which checks if an inventory database already exists for a given installation type and raises an `AlreadyExists` error if it does. The behavior of the `download` method in the `WorkspaceClient` class has been mocked, and the `get_status` method has been updated to return `NotFound` in certain tests. These changes aim to improve the robustness, flexibility, and safety of the installation process in the Workspace. * Added a check for external metastore in SQL warehouse configuration ([#1046](#1046)). In this release, we have added new functionality to the Unity Catalog (UCX) installation process to enable checking for and connecting to an external Hive metastore configuration. A new method, `_get_warehouse_config_with_external_hive_metastore`, has been introduced to retrieve the workspace warehouse config and identify if it is set up for an external Hive metastore. If so, and the user confirms the prompt, UCX will be configured to connect to the external metastore. Additionally, new methods `_extract_external_hive_metastore_sql_conf` and `test_cluster_policy_definition_<cloud_provider>_hms_warehouse()` have been added to handle the external metastore configuration for Azure, AWS, and GCP, and to handle the case when the data_access_config is empty. These changes provide more flexibility and ease of use when installing UCX with external Hive metastore configurations. The new imports `EndpointConfPair`, `GetWorkspaceWarehouseConfigResponse` from the `databricks.sdk.service.sql` package are used to handle the endpoint configuration of the SQL warehouse. * Added integration tests for AWS - create locations ([#1026](#1026)). In this release, we have added comprehensive integration tests for AWS resources and their management in the `tests/unit/assessment/test_aws.py` file. The `AWSResources` class has been updated with new methods (AwsIamRole, add_uc_role, add_uc_role_policy, and validate_connection) and the regular expression for matching S3 resource ARN has been modified. The `create_external_locations` method now allows for creating external locations without validating them, and the `_identify_missing_external_locations` function has been enhanced to match roles with a wildcard pattern. The new tests include validating the integration of AWS services with the system, testing the CLI's behavior when it is missing, and introducing new configuration scenarios with the addition of a Key Management Service (KMS) key during the creation of IAM roles and policies. These changes improve the robustness and reliability of AWS resource integration and handling in our system. * Bump Databricks SDK to v0.22.0 ([#1059](#1059)). In this release, we are bumping the Databricks SDK version to 0.22.0 and upgrading the `databricks-labs-lsql` package to ~0.2.2. The new dependencies for this release include `databricks-sdk==0.22.0`, `databricks-labs-lsql~=0.2.2`, `databricks-labs-blueprint~=0.4.3`, and `PyYAML>=6.0.0,<7.0.0`. In the `fixtures.py` file, we have added `PermissionLevel.CAN_QUERY` to the `CAN_VIEW` and `CAN_MANAGE` permissions in the `_path` function, allowing users to query the endpoint. Additionally, we have updated the `test_endpoints` function in the `test_generic.py` file as part of the integration tests for workspace access. This change updates the permission level for creating a serving endpoint from `CAN_MANAGE` to `CAN_QUERY`, meaning that the assigned group can now only query the endpoint. We have also included the `test_feature_tables` function in the commit, which tests the behavior of feature tables in the Databricks workspace. This change only affects the `test_endpoints` function and its assert statements, and does not impact the functionality of the `test_feature_tables` function. * Changed default UCX installation folder to `/Applications/ucx` from `/Users/<me>/.ucx` to allow multiple users users utilising the same installation ([#854](#854)). In this release, we've added a new advanced feature that allows users to force the installation of UCX over an existing installation using the `UCX_FORCE_INSTALL` environment variable. This variable can take two values `global` and 'user', providing more control and flexibility in installing UCX. The default UCX installation folder has been changed to /Applications/ucx from /Users/<me>/.ucx to enable multiple users to utilize the same installation. A table detailing the expected install location, `install_folder`, and mode for each combination of global and user values has been added to the README file. We've also added user prompts to confirm the installation if UCX is already installed and the `UCX_FORCE_INSTALL` variable is set to 'user'. This feature is useful when users want to install UCX in a specific location or force the installation over an existing installation. However, it is recommended to use this feature with caution, as it can potentially break existing installations if not used correctly. Additionally, several changes to the implementation of the UCX installation process have been made, as well as new tests to ensure that the installation process works correctly in various scenarios. * Fix: Recover lost fix for `webbrowser.open` mock ([#1052](#1052)). A fix has been implemented to address an issue related to the mock for `webbrowser.open` in the tests `test_repair_run` and `test_get_existing_installation_global`. This change prevents the `webbrowser.open` function from being called during these tests, which helps improve test stability and consistency. No new methods have been added, and the existing functionality of these tests has only been modified to include the `webbrowser.open` mock. This modification aims to enhance the reliability and predictability of these specific tests, ensuring accurate and consistent results. * Improved table migrations logic ([#1050](#1050)). This change introduces improvements to table migrations logic by refactoring unit tests to load table mappings from JSON instead of inline structs, adding an `escape_sql_identifier` function where missing, and preparing for ACLs migration. The `uc_grant_sql` method in `grants.py` has been updated to accept optional `object_type` and `object_key` parameters, and the hive-to-UC mapping has been expanded to include mappings for views. Additionally, new JSON files for external source table configuration have been added, and new functions have been introduced for loading fixture data from JSON files and creating mocked `WorkspaceClient` and `TableMapping` objects for testing. The changes improve the maintainability and security of the codebase, prepare it for future migration tasks, and ensure that the code is more adaptable and robust. The changes have been manually tested and verified on the staging environment. * Moved `SqlBackend` implementation to `databricks-labs-lsql` dependency ([#1042](#1042)). In this change, the `SqlBackend` implementation, including classes such as `StatementExecutionBackend` and `RuntimeBackend`, has been moved to a separate library, `databricks-labs-lsql`, which is managed at <https://github.com/databrickslabs/lsql>. This refactoring simplifies the current repository, promotes code reuse, and improves modularity by leveraging an external dependency. The modification includes adding a new line in the .gitignore file to exclude `*.out` files from version control. * Prepare for a PyPI release ([#1038](#1038)). In preparation for a PyPI release, this change introduces a new GitHub Actions workflow that automates the package release process and ensures the integrity of the released packages by signing them with Sigstore. When a new git tag starting with `v` is pushed, this workflow is triggered, building wheels using hatch, drafting a new GitHub release, publishing the package distributions to PyPI, and signing the artifacts with Sigstore. The `pyproject.toml` file is now used for metadata, replacing `setup.cfg` and `setup.py`, and is cached to improve build performance. In addition, the `pyproject.toml` file has been updated with recent metadata in preparation for the release, including updates to the package's authors, development status, classifiers, and dependencies. * Prevent fragile `mock.patch('databricks...')` in the test code ([#1037](#1037)). This change introduces a custom `pylint` checker to improve code flexibility and maintainability by preventing fragile `mock.patch` designs in test code. The new checker discourages the use of `MagicMock` and encourages the use of `create_autospec` to ensure that mocks have the same attributes and methods as the original class. This change has been implemented in multiple test files, including `test_cli.py`, `test_locations.py`, `test_mapping.py`, `test_table_migrate.py`, `test_table_move.py`, `test_workspace_access.py`, `test_redash.py`, `test_scim.py`, and `test_verification.py`, to improve the robustness and maintainability of the test code. Additionally, the commit removes the `verification.py` file, which contained a `VerificationManager` class for verifying applied permissions, scope ACLs, roles, and entitlements for various objects in a Databricks workspace. * Removed `mocker.patch("databricks...)` from `test_cli` ([#1047](#1047)). In this release, we have made significant updates to the library's handling of Azure and AWS workspaces. We have added new parameters `azure_resource_permissions` and `aws_permissions` to the `_execute_for_cloud` function in `cli.py`, which are passed to the `func_azure` and `func_aws` functions respectively. The `create_uber_principal` and `principal_prefix_access` commands have also been updated to include these new parameters. Additionally, the `_azure_setup_uber_principal` and `_aws_setup_uber_principal` functions have been updated to accept the new `azure_resource_permissions` and `aws_resource_permissions` parameters. The `_azure_principal_prefix_access` and `_aws_principal_prefix_access` functions have also been updated similarly. We have also introduced a new `aws_resources` parameter in the `migrate_credentials` command, which is used to migrate Azure Service Principals in ADLS Gen2 locations to UC storage credentials. In terms of testing, we have replaced the `mocker.patch` calls with the creation of `AzureResourcePermissions` and `AWSResourcePermissions` objects, improving the code's readability and maintainability. Overall, these changes significantly enhance the library's functionality and maintainability in handling Azure and AWS workspaces. * Require Hatch v1.9.4 on build machines ([#1049](#1049)). In this release, we have updated the Hatch package version to 1.9.4 on build machines, addressing issue [#1049](#1049). The changes include updating the toolchain dependencies and setup in the `.codegen.json` file, which simplifies the setup process and now relies on a pre-existing Hatch environment and Python 3. The acceptance workflow has also been updated to use the latest version of Hatch and the `databrickslabs/sandbox/acceptance` GitHub action version `v0.1.4`. Hatch is a Python package manager that simplifies package development and management, and this update provides new features and bug fixes that can help improve the reliability and performance of the acceptance workflow. This change requires version 1.9.4 of the Hatch package on build machines, and it will affect the build process for the project but will not have any impact on the functionality of the project itself. As a software engineer adopting this project, it's important to note this change to ensure that the build process runs smoothly and takes advantage of any new features or improvements in Hatch 1.9.4. * Set acceptance tests to timeout after 45 minutes ([#1036](#1036)). As part of issue [#1036](#1036), the acceptance tests in this open-source library now have a 45-minute timeout configured, improving the reliability and stability of the testing environment. This change has been implemented in the `.github/workflows/acceptance.yml` file by adding the `timeout` parameter to the step where the `databrickslabs/sandbox/acceptance` action is called. This ensures that the acceptance tests will not run indefinitely and prevents any potential issues caused by long-running tests. By adopting this project, software engineers can now benefit from a more stable and reliable testing environment, with acceptance tests that are guaranteed to complete within a maximum of 45 minutes. * Updated databricks-labs-blueprint requirement from ~0.4.1 to ~0.4.3 ([#1058](#1058)). In this release, the version requirement for the `databricks-labs-blueprint` library has been updated from ~0.4.1 to ~0.4.3 in the pyproject.toml file. This change is necessary to support issues [#1056](#1056) and [#1057](#1057). The code has been manually tested and is ready for further testing to ensure the compatibility and smooth functioning of the software. It is essential to thoroughly test the latest version of the `databricks-labs-blueprint` library with the existing codebase before deploying it to production. This includes running a comprehensive suite of tests such as unit tests, integration tests, and verification on the staging environment. This modification allows the software to use the latest version of the library, improving its functionality and overall performance. * Use `MockPrompts.extend()` functionality in test_install to supply multiple prompts ([#1057](#1057)). This diff introduces the `MockPrompts.extend()` functionality in the `test_install` module to enable the supplying of multiple prompts for testing purposes. A new `base_prompts` dictionary with default prompts has been added and is extended with additional prompts for specific test cases. This allows for the testing of various scenarios, such as when UCX is already installed on the workspace and the user is prompted to choose between global or user installation. Additionally, new `force_user_environ` and `force_global_env` dictionaries have been added to simulate different installation environments. The functionality of the `WorkspaceInstaller` class and mocking of `webbrowser.open` are also utilized in the test cases. These changes aim to ensure the proper functioning of the configuration process for different installation scenarios.

prajin-29 added 3 commits March 6, 2024 16:50

Cluster Policy Count

7390c52

Cluster Policy Count

7144e16

Merge branch 'main' into feature/cluster_policy

dcaf034

# Conflicts: # src/databricks/labs/ucx/cli.py

nfx linked an issue Mar 6, 2024 that may be closed by this pull request

[FEATURE]: Count of cluster policy #830

Closed

1 task

prajin-29 added 5 commits March 7, 2024 11:23

Merge branch 'main' into feature/cluster_policy

da78acb

# Conflicts: # labs.yml # src/databricks/labs/ucx/cli.py

checking cluster policy

1b0946a

checking cluster policy

105a673

checking cluster policy

05f9715

checking cluster policy

6cd87fb

prajin-29 changed the title ~~Added CLI Command databricks labs ucx get-cluster-policy to show the list of cluster policies along with DBR version~~ Adding Dashboard widget to show the list of cluster policies along with DBR version Mar 7, 2024

nfx requested changes Mar 7, 2024

View reviewed changes

prajin-29 added 4 commits March 7, 2024 14:13

checking cluster policy

2b34e8c

checking cluster policy

2a4e733

checking cluster policy

3059f90

checking cluster policy

0a7b2dc

nfx requested changes Mar 7, 2024

View reviewed changes

nfx added the pr/breaking-change this change does require data or configuration migration from previous versions label Mar 7, 2024

prajin-29 added 8 commits March 8, 2024 11:52

changing based on review comments

7c21bf2

changing based on review comments

69a8e19

Added Unit Test case

6923b5e

Added Unit Test case

7f6d8b5

Added Unit Test case

74888dc

Added Unit Test case

49d2cae

Adding unit test case for failure scenarios

2d0e600

Adding integration Test

e340f7d

nfx requested changes Mar 11, 2024

View reviewed changes

prajin-29 added 6 commits March 11, 2024 19:43

Merge branch 'main' into feature/cluster_policy

a1fec9a

# Conflicts: # tests/unit/assessment/test_clusters.py

Resolving the unit test comments

c174761

Resolving the unit test comments

499a9df

Resolving the integration test

df7ea2b

Resolving the integration test and unit test cases

b481af3

Merge branch 'main' into feature/cluster_policy

b7e0c80

nfx requested changes Mar 12, 2024

View reviewed changes

prajin-29 added 2 commits March 13, 2024 12:30

Updated integration test with failure scenaris

a7c5baa

Adding Upgrade script for cluster table

859f7c0

nfx requested changes Mar 14, 2024

View reviewed changes

prajin-29 added 3 commits March 14, 2024 08:09

Making the CI pass

2df818d

Merge branch 'main' into feature/cluster_policy

28e56aa

# Conflicts: # src/databricks/labs/ucx/install.py # tests/unit/assessment/test_clusters.py

Merged the code with latest changes

cac127c

nfx changed the title ~~Adding Dashboard widget to show the list of cluster policies along with DBR version~~ Added Dashboard widget to show the list of cluster policies along with DBR version Mar 14, 2024

nfx requested changes Mar 14, 2024

View reviewed changes

nfx marked this pull request as ready for review March 14, 2024 08:52

nfx requested review from a team and tnguyen-db March 14, 2024 08:52

nfx temporarily deployed to account-admin March 14, 2024 08:52 — with GitHub Actions Inactive

Updating the sqlExecution using workspace_client api

ec3a667

prajin-29 temporarily deployed to account-admin March 14, 2024 12:16 — with GitHub Actions Inactive

nfx merged commit 3a43ccf into main Mar 14, 2024
7 checks passed

nfx deleted the feature/cluster_policy branch March 14, 2024 12:37

nfx mentioned this pull request Mar 15, 2024

Release v0.17.0 #1060

Merged

	def assess_cluster_policies(cfg: WorkspaceConfig, ws: WorkspaceClient, sql_backend: SqlBackend):
	def crawl_cluster_policies(cfg: WorkspaceConfig, ws: WorkspaceClient, sql_backend: SqlBackend):

Added Dashboard widget to show the list of cluster policies along with DBR version #1013

Added Dashboard widget to show the list of cluster policies along with DBR version #1013

Conversation

prajin-29 commented Mar 6, 2024 • edited Loading

Changes

Linked issues

Functionality

Tests

gitguardian bot commented Mar 6, 2024 • edited Loading

️✅ There are no secrets present in this pull request anymore.

codecov bot commented Mar 6, 2024 • edited Loading

Codecov Report

nfx commented Mar 6, 2024

prajin-29 commented Mar 6, 2024

nfx left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nfx left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nfx left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nfx left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nfx left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Mar 14, 2024 • edited Loading

prajin-29 commented Mar 6, 2024 •

edited

Loading

gitguardian bot commented Mar 6, 2024 •

edited

Loading

codecov bot commented Mar 6, 2024 •

edited

Loading

github-actions bot commented Mar 14, 2024 •

edited

Loading