-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Migrate Azure Service Principals with secrete stored in Databricks Secret to UC Storage Credentials #874
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #874 +/- ##
==========================================
+ Coverage 87.51% 87.87% +0.36%
==========================================
Files 44 45 +1
Lines 5341 5495 +154
Branches 954 983 +29
==========================================
+ Hits 4674 4829 +155
+ Misses 452 448 -4
- Partials 215 218 +3 ☔ View full report in Codecov by Sentry. |
@@ -534,3 +534,13 @@ def _get_storage_accounts(self) -> list[str]: | |||
if storage_acct not in storage_accounts: | |||
storage_accounts.append(storage_acct) | |||
return storage_accounts | |||
|
|||
|
|||
def load_spn_permission(self, customized_csv: str) -> list[StoragePermissionMapping]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def load_spn_permission(self, customized_csv: str) -> list[StoragePermissionMapping]: | |
def load(self) -> list[StoragePermissionMapping]: |
It's already clear what this method is about
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
src/databricks/labs/ucx/cli.py
Outdated
""" | ||
logger.info("Running migrate_azure_service_principals command") | ||
prompts = Prompts() | ||
if not w.config.is_azure: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move all this logic into a method of migration class. Inject prompts to it. Unit test with MockPrompts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done. Will add unit test later
src/databricks/labs/ucx/cli.py
Outdated
|
||
service_principal_migration = AzureServicePrincipalMigration.for_cli(w) | ||
action_plan_file = service_principal_migration.generate_migration_list() | ||
logger.info("Azure Service Principals subject for migration are checked") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Redundant 🤷🏻♂️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
refactored the code and move them away from cli.
src/databricks/labs/ucx/cli.py
Outdated
return | ||
|
||
service_principal_migration = AzureServicePrincipalMigration.for_cli(w) | ||
action_plan_file = service_principal_migration.generate_migration_list() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we even need this action plan? Get rid of it.
Print the mapping to standard output. But not the full mapping, but only what's usable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
continue | ||
|
||
try: | ||
secret_response = self._ws.secrets.get_secret(azure_sp_info.secret_scope, azure_sp_info.secret_key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you 1000% sure this call won't fail outside DBR? Did you test it on labs-azure? When I implemented dbutils in the sdk, it wasn't the case.
Create a new task/workflow in runtime.py and call this method of migration from there (within a job).
Then inject WorkspaceInstallation into this class and ".run_workflow()" it from here.
@nkvuong this is the context for you why you need to fix the crawler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is correct - You create secrets using the REST API or CLI, but you must use the Secrets utility (dbutils.secrets) in a notebook or job to read a secret.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@qziyuan That workspace probably has a custom setting enabled. Try it on labs-azure workspace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nfx , just tried the labs-azure workspace, same behavior:
f"client_secret for {azure_sp_info.application_id}. Will not reuse this client_secret") | ||
continue | ||
except PermissionDenied: | ||
logger.info(f"User does not have permission to read secret value for {azure_sp_info.secret_scope}.{azure_sp_info.secret_key}. " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logger.info(f"User does not have permission to read secret value for {azure_sp_info.secret_scope}.{azure_sp_info.secret_key}. " | |
logger.error (f"User does not have permission to read secret value for {azure_sp_info.secret_scope}.{azure_sp_info.secret_key}. " |
That's silly...
continue | ||
except PermissionDenied: | ||
logger.info(f"User does not have permission to read secret value for {azure_sp_info.secret_scope}.{azure_sp_info.secret_key}. " | ||
f"Cannot fetch the service principal client_secret for {azure_sp_info.application_id}. " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, why do we log this particular case? I think we should rethrow, as UCX is run as an workspace admin
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just in case we allow non admin users to install ucx to run some cli command in the future and then suddenly executed this command. (this may be a case when we work on #392, based on my experience workspace admin don't want to touch user's code, and it's highly possible they will push it down to the code owners to execute the command we may develop for #392).
azure_sp_infos = self._azure_sp_crawler.snapshot() | ||
|
||
for azure_sp_info in azure_sp_infos: | ||
if azure_sp_info.secret_scope is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Invert the logic, this method is too long: given the spn info instance, retrieve a secret from it and return as string
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
split it into two functions now
|
||
|
||
def _save_action_plan(self, sp_list_with_secret) -> str | None: | ||
# save action plan to a file for customer to review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may need that file, but for the workflow to pickup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so we may want the workflow to execute the migration, not just using cli?
For now, I just removed the save action plan code, just print the info to console for user to confirm.
:return: | ||
""" | ||
# load sp list from azure_storage_account_info.csv | ||
loaded_sp_list = self._azure_resource_permissions.load_spn_permission() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does "loaded" mean here? 😝
Just pushed some new commits not related to above review comments. Will address above comments later. |
|
GitGuardian id | GitGuardian status | Secret | Commit | Filename | |
---|---|---|---|---|---|
9523537 | Triggered | Generic High Entropy Secret | 16de683 | tests/unit/migration/test_azure_credentials.py | View secret |
9523537 | Triggered | Generic High Entropy Secret | d856520 | tests/unit/migration/test_azure_credentials.py | View secret |
9523537 | Triggered | Generic High Entropy Secret | 1397b44 | tests/unit/azure/test_credentials.py | View secret |
9523537 | Triggered | Generic High Entropy Secret | 9f6cfc9 | tests/unit/migration/test_azure_credentials.py | View secret |
9523537 | Triggered | Generic High Entropy Secret | 0a7413c | tests/unit/azure/test_credentials.py | View secret |
9523537 | Triggered | Generic High Entropy Secret | ea73a1d | tests/unit/azure/test_azure_credentials.py | View secret |
9523537 | Triggered | Generic High Entropy Secret | d856520 | tests/unit/migration/test_azure_credentials.py | View secret |
9523537 | Triggered | Generic High Entropy Secret | 1397b44 | tests/unit/azure/test_credentials.py | View secret |
9523537 | Triggered | Generic High Entropy Secret | d856520 | tests/unit/migration/test_azure_credentials.py | View secret |
9523537 | Triggered | Generic High Entropy Secret | 1397b44 | tests/unit/azure/test_credentials.py | View secret |
9523537 | Triggered | Generic High Entropy Secret | d856520 | tests/unit/migration/test_azure_credentials.py | View secret |
9523537 | Triggered | Generic High Entropy Secret | 1397b44 | tests/unit/azure/test_credentials.py | View secret |
9523537 | Triggered | Generic High Entropy Secret | f396d22 | tests/unit/azure/test_credentials.py | View secret |
9523537 | Triggered | Generic High Entropy Secret | 1397b44 | tests/unit/azure/test_credentials.py | View secret |
9523537 | Triggered | Generic High Entropy Secret | d856520 | tests/unit/migration/test_azure_credentials.py | View secret |
9523537 | Triggered | Generic High Entropy Secret | 1fa336c | tests/unit/migration/test_azure_credentials.py | View secret |
9523537 | Triggered | Generic High Entropy Secret | 1397b44 | tests/unit/azure/test_credentials.py | View secret |
9523537 | Triggered | Generic High Entropy Secret | 16de683 | tests/unit/migration/test_azure_credentials.py | View secret |
9523537 | Triggered | Generic High Entropy Secret | f396d22 | tests/unit/azure/test_credentials.py | View secret |
9523537 | Triggered | Generic High Entropy Secret | b365488 | tests/unit/migration/test_azure_credentials.py | View secret |
9523537 | Triggered | Generic High Entropy Secret | c317767 | tests/unit/azure/test_azure_credentials.py | View secret |
9523537 | Triggered | Generic High Entropy Secret | cff18fa | tests/unit/azure/test_credentials.py | View secret |
9523538 | Triggered | Generic High Entropy Secret | 1397b44 | tests/unit/azure/test_credentials.py | View secret |
9523538 | Triggered | Generic High Entropy Secret | f396d22 | tests/unit/azure/test_credentials.py | View secret |
9523538 | Triggered | Generic High Entropy Secret | d856520 | tests/unit/migration/test_azure_credentials.py | View secret |
9523538 | Triggered | Generic High Entropy Secret | 9f6cfc9 | tests/unit/migration/test_azure_credentials.py | View secret |
9523538 | Triggered | Generic High Entropy Secret | 1397b44 | tests/unit/azure/test_credentials.py | View secret |
9523538 | Triggered | Generic High Entropy Secret | f396d22 | tests/unit/azure/test_credentials.py | View secret |
9523538 | Triggered | Generic High Entropy Secret | d856520 | tests/unit/migration/test_azure_credentials.py | View secret |
🛠 Guidelines to remediate hardcoded secrets
- Understand the implications of revoking this secret by investigating where it is used in your code.
- Replace and store your secrets safely. Learn here the best practices.
- Revoke and rotate these secrets.
- If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.
To avoid such incidents in the future consider
- following these best practices for managing and storing secrets including API keys and other credentials
- install secret detection on pre-commit to catch secret before it leaves your machine and ease remediation.
🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.
Our GitHub checks need improvements? Share your feedbacks!
9c6d392
to
bd5e0f6
Compare
2bbd2ee
to
f9bfb5c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First pass
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
many security and maintainability issues
Refactored the code and integration tests to improve the testability and readability. Will work on unit tests and other comments tomorrow. |
@qziyuan Create a dataclass without a secret field in it and save that |
…and replace it with MockInstallation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Found a bug. Last one
tests/unit/azure/test_credentials.py
Outdated
|
||
|
||
def test_validate_storage_credentials(credential_manager): | ||
service_principal = MagicMock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you need to mock if we have a dataclass?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I use mock just to save lines of code populating those dataclass fields that are not actually used. In this case, only the privilege field is needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's incorrect. You mock behavior, not data. Dataclasses are data. Mocking them is a bug, because these tests won't catch a bug anymore when someone else changes your code
…mock for dataclass in unit test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lgtm
…cret to UC Storage Credentials (#874)
* Added `upgraded_from_workspace_id` property to migrated tables to indicated the source workspace ([#987](#987)). In this release, updates have been made to the `_migrate_external_table`, `_migrate_dbfs_root_table`, and `_migrate_view` methods in the `table_migrate.py` file to include a new parameter `upgraded_from_ws` in the SQL commands used to alter tables, views, or managed tables. This parameter is used to store the source workspace ID in the migrated tables, indicating the migration origin. A new utility method `sql_alter_from` has been added to the `Table` class in `tables.py` to generate the SQL command with the new parameter. Additionally, a new class-level attribute `UPGRADED_FROM_WS_PARAM` has been added to the `Table` class in `tables.py` to indicate the source workspace. A new property `upgraded_from_workspace_id` has been added to migrated tables to store the source workspace ID. These changes resolve issue [#899](#899) and are tested through manual testing, unit tests, and integration tests. No new CLI commands, workflows, or tables have been added or modified, and there are no changes to user documentation. * Added a command to create account level groups if they do not exist ([#763](#763)). This commit introduces a new feature that enables the creation of account-level groups if they do not already exist in the account. A new command, `create-account-groups`, has been added to the `databricks labs ucx` tool, which crawls all workspaces in the account and creates account-level groups if a corresponding workspace-local group is not found. The feature supports various scenarios, including creating account-level groups that exist in some workspaces but not in others, and creating multiple account-level groups with the same name but different members. Several new methods have been added to the `account.py` file to support the new feature, and the `test_account.py` file has been updated with new tests to ensure the correct behavior of the `create_account_level_groups` method. Additionally, the `cli.py` file has been updated to include the new `create-account-groups` command. With these changes, users can easily manage account-level groups and ensure that they are consistent across all workspaces in the account, improving the overall user experience. * Added assessment for the incompatible `RunSubmit` API usages ([#849](#849)). In this release, the assessment functionality for incompatible `RunSubmit` API usages has been significantly enhanced through various changes. The 'clusters.py' file has seen improvements in clarity and consistency with the renaming of private methods `check_spark_conf` to `_check_spark_conf` and `check_cluster_failures` to `_check_cluster_failures`. The `_assess_clusters` method has been updated to call the renamed `_check_cluster_failures` method for thorough checks of cluster configurations, resulting in better assessment functionality. A new `SubmitRunsCrawler` class has been added to the `databricks.labs.ucx.assessment.jobs` module, implementing `CrawlerBase`, `JobsMixin`, and `CheckClusterMixin` classes. This class crawls and assesses job runs based on their submitted runs, ensuring compatibility and identifying failure issues. Additionally, a new configuration attribute, `num_days_submit_runs_history`, has been introduced in the `WorkspaceConfig` class of the `config.py` module, controlling the number of days for which submission history of `RunSubmit` API calls is retained. Lastly, various new JSON files have been added for unit testing, assessing the `RunSubmit` API usages related to different scenarios like dbt task runs, Git source-based job runs, JAR file runs, and more. These tests will aid in identifying and addressing potential compatibility issues with the `RunSubmit` API. * Added group members difference to the output of `validate-groups-membership` cli command ([#995](#995)). The `validate-groups-membership` command has been updated to include a comparison of group memberships at both the account and workspace levels. This enhancement is implemented through the `validate_group_membership` function, which has been updated to calculate the difference in members between the two levels and display it in a new `group_members_difference` column. This allows for a more detailed analysis of group memberships and easily identifies any discrepancies between the account and workspace levels. The corresponding unit test file, "test_groups.py," has been updated to include a new test case that verifies the calculation of the `group_members_difference` value. The functionality of the other commands remains unchanged. The new `group_members_difference` value is calculated as the difference in the number of members in the workspace group and the account group, with a positive value indicating more members in the workspace group and a negative value indicating more members in the account group. The table template in the labs.yml file has also been updated to include the new column for the group membership difference. * Added handling for empty `directory_id` if managed identity encountered during the crawling of StoragePermissionMapping ([#986](#986)). This PR adds a `type` field to the `StoragePermissionMapping` and `Principal` dataclasses to differentiate between service principals and managed identities, allowing `None` for the `directory_id` field if the principal is not a service principal. During the migration to UC storage credentials, managed identities are currently ignored. These changes improve handling of managed identities during the crawling of `StoragePermissionMapping`, prevent errors when creating storage credentials with managed identities, and address issue [#339](#339). The changes are tested through unit tests, manual testing, and integration tests, and only affect the `StoragePermissionMapping` class and related methods, without introducing new commands, workflows, or tables. * Added migration for Azure Service Principals with secrets stored in Databricks Secret to UC Storage Credentials ([#874](#874)). In this release, we have made significant updates to migrate Azure Service Principals with their secrets stored in Databricks Secret to UC Storage Credentials, enhancing security and management of storage access. The changes include: Addition of a new `migrate_credentials` command in the `labs.yml` file to migrate credentials for storage access to UC storage credential. Modification of `secrets.py` to handle the case where a secret has been removed from the backend and to log warning messages for secrets with invalid Base64 bytes. Introduction of the `StorageCredentialManager` and `ServicePrincipalMigration` classes in `credentials.py` to manage Azure Service Principals and their associated client secrets, and to migrate them to UC Storage Credentials. Addition of a new `directory_id` attribute in the `Principal` class and its associated dataclass in `resources.py` to store the directory ID for creating UC storage credentials using a service principal. Creation of a new pytest fixture, `make_storage_credential_spn`, in `fixtures.py` to simplify writing tests requiring Databricks Storage Credentials with Azure Service Principal auth. Addition of a new test file for the Azure integration of the project, including new classes, methods, and test cases for testing the migration of Azure Service Principals to UC Storage Credentials. These improvements will ensure better security and management of storage access using Azure Service Principals, while providing more efficient and robust testing capabilities. * Added permission migration support for feature tables and the root permissions for models and feature tables ([#997](#997)). This commit introduces support for migration of permissions related to feature tables and sets root permissions for models and feature tables. New functions such as `feature_store_listing`, `feature_tables_root_page`, `models_root_page`, and `tokens_and_passwords` have been added to facilitate population of a workspace access page with necessary permissions information. The `factory` function in `manager.py` has been updated to include new listings for models' root page, feature tables' root page, and the feature store for enhanced management and access control of models and feature tables. New classes and methods have been implemented to handle permissions for these resources, utilizing `GenericPermissionsSupport`, `AccessControlRequest`, and `MigratedGroup` classes. Additionally, new test methods have been included to verify feature tables listing functionality and root page listing functionality for feature tables and registered models. The test manager method has been updated to include `feature-tables` in the list of items to be checked for permissions, ensuring comprehensive testing of permission functionality related to these new feature tables. * Added support for serving endpoints ([#990](#990)). In this release, we have made significant enhancements to support serving endpoints in our open-source library. The `fixtures.py` file in the `databricks.labs.ucx.mixins` module has been updated with new classes and functions to create and manage serving endpoints, accompanied by integration tests to verify their functionality. We have added a new listing for serving endpoints in the assessment's permissions crawling, using the `ws.serving_endpoints.list` function and the `serving-endpoints` category. A new integration test, "test_endpoints," has been added to verify that assessments now crawl permissions for serving endpoints. This test demonstrates the ability to migrate permissions from one group to another. The test suite has been updated to ensure the proper functioning of the new feature and improve the assessment of permissions for serving endpoints, ensuring compatibility with the updated `test_manager.py` file. * Expanded end-user documentation with detailed descriptions for workflows and commands ([#999](#999)). The Databricks Labs UCX project has been updated with several new features to assist in upgrading to Unity Catalog, including an assessment workflow that generates a detailed compatibility report for workspace entities, a group migration workflow for upgrading all Databricks workspace assets, and utility commands for managing cross-workspace installations. The Assessment Report now includes a more detailed summary of the assessment findings, table counts, database summaries, and external locations. Additional improvements include expanded workspace group migration to handle potential conflicts with locally scoped group names, enhanced documentation for external Hive Metastore integration, a new debugging notebook, and detailed descriptions of table upgrade considerations, data access permissions, external storage, and table crawler. * Fixed `config.yml` upgrade from very old versions ([#984](#984)). In this release, we've introduced enhancements to the configuration upgrading process for `config.yml` in our open-source library. We've replaced the previous `v1_migrate` class method with a new implementation that specifically handles migration from version 1. The new method retrieves the `groups` field, extracts the `selected` value, and assigns it to the `include_group_names` key in the configuration. The `backup_group_prefix` value from the `groups` field is assigned to the `renamed_group_prefix` key, and the `groups` field is removed, with the version number updated to 2. These changes simplify the code and improve readability, enabling users to upgrade smoothly from version 1 of the configuration. Furthermore, we've added new unit tests to the `test_config.py` file to ensure backward compatibility. Two new tests, `test_v1_migrate_zeroconf` and `test_v1_migrate_some_conf`, have been added, utilizing the `MockInstallation` class and loading the configuration using `WorkspaceConfig`. These tests enhance the robustness and reliability of the migration process for `config.yml`. * Renamed columns in assessment SQL queries to use actual names, not aliases ([#983](#983)). In this update, we have resolved an issue where aliases used for column references in SQL queries caused errors in certain setups by renaming them to use actual names. Specifically, for assessment SQL queries, we have modified the definition of the `is_delta` column to use the actual `table_format` name instead of the alias `format`. This change improves compatibility and enhances the reliability of query execution. As a software engineer, you will appreciate that this modification ensures consistent interpretation of column references across various setups, thereby avoiding potential errors caused by aliases. This change does not introduce any new methods, but instead modifies existing functionality to use actual column names, ensuring a more reliable and consistent SQL query for the `05_0_all_tables` assessment. * Updated groups permissions validation to use Table ACL cluster ([#979](#979)). In this update, the `validate_groups_permissions` task has been modified to utilize the Table ACL cluster, as indicated by the inclusion of `job_cluster="tacl"`. This task is responsible for ensuring that all crawled permissions are accurately applied to the destination groups by calling the `permission_manager.apply_group_permissions` method during the migration state. This modification enhances the validation of group permissions by performing it on the Table ACL cluster, potentially improving performance or functionality. If you are implementing this project, it is crucial to comprehend the consequences of this change on your permissions validation process and adjust your workflows appropriately.
* Added `upgraded_from_workspace_id` property to migrated tables to indicated the source workspace ([#987](#987)). In this release, updates have been made to the `_migrate_external_table`, `_migrate_dbfs_root_table`, and `_migrate_view` methods in the `table_migrate.py` file to include a new parameter `upgraded_from_ws` in the SQL commands used to alter tables, views, or managed tables. This parameter is used to store the source workspace ID in the migrated tables, indicating the migration origin. A new utility method `sql_alter_from` has been added to the `Table` class in `tables.py` to generate the SQL command with the new parameter. Additionally, a new class-level attribute `UPGRADED_FROM_WS_PARAM` has been added to the `Table` class in `tables.py` to indicate the source workspace. A new property `upgraded_from_workspace_id` has been added to migrated tables to store the source workspace ID. These changes resolve issue [#899](#899) and are tested through manual testing, unit tests, and integration tests. No new CLI commands, workflows, or tables have been added or modified, and there are no changes to user documentation. * Added a command to create account level groups if they do not exist ([#763](#763)). This commit introduces a new feature that enables the creation of account-level groups if they do not already exist in the account. A new command, `create-account-groups`, has been added to the `databricks labs ucx` tool, which crawls all workspaces in the account and creates account-level groups if a corresponding workspace-local group is not found. The feature supports various scenarios, including creating account-level groups that exist in some workspaces but not in others, and creating multiple account-level groups with the same name but different members. Several new methods have been added to the `account.py` file to support the new feature, and the `test_account.py` file has been updated with new tests to ensure the correct behavior of the `create_account_level_groups` method. Additionally, the `cli.py` file has been updated to include the new `create-account-groups` command. With these changes, users can easily manage account-level groups and ensure that they are consistent across all workspaces in the account, improving the overall user experience. * Added assessment for the incompatible `RunSubmit` API usages ([#849](#849)). In this release, the assessment functionality for incompatible `RunSubmit` API usages has been significantly enhanced through various changes. The 'clusters.py' file has seen improvements in clarity and consistency with the renaming of private methods `check_spark_conf` to `_check_spark_conf` and `check_cluster_failures` to `_check_cluster_failures`. The `_assess_clusters` method has been updated to call the renamed `_check_cluster_failures` method for thorough checks of cluster configurations, resulting in better assessment functionality. A new `SubmitRunsCrawler` class has been added to the `databricks.labs.ucx.assessment.jobs` module, implementing `CrawlerBase`, `JobsMixin`, and `CheckClusterMixin` classes. This class crawls and assesses job runs based on their submitted runs, ensuring compatibility and identifying failure issues. Additionally, a new configuration attribute, `num_days_submit_runs_history`, has been introduced in the `WorkspaceConfig` class of the `config.py` module, controlling the number of days for which submission history of `RunSubmit` API calls is retained. Lastly, various new JSON files have been added for unit testing, assessing the `RunSubmit` API usages related to different scenarios like dbt task runs, Git source-based job runs, JAR file runs, and more. These tests will aid in identifying and addressing potential compatibility issues with the `RunSubmit` API. * Added group members difference to the output of `validate-groups-membership` cli command ([#995](#995)). The `validate-groups-membership` command has been updated to include a comparison of group memberships at both the account and workspace levels. This enhancement is implemented through the `validate_group_membership` function, which has been updated to calculate the difference in members between the two levels and display it in a new `group_members_difference` column. This allows for a more detailed analysis of group memberships and easily identifies any discrepancies between the account and workspace levels. The corresponding unit test file, "test_groups.py," has been updated to include a new test case that verifies the calculation of the `group_members_difference` value. The functionality of the other commands remains unchanged. The new `group_members_difference` value is calculated as the difference in the number of members in the workspace group and the account group, with a positive value indicating more members in the workspace group and a negative value indicating more members in the account group. The table template in the labs.yml file has also been updated to include the new column for the group membership difference. * Added handling for empty `directory_id` if managed identity encountered during the crawling of StoragePermissionMapping ([#986](#986)). This PR adds a `type` field to the `StoragePermissionMapping` and `Principal` dataclasses to differentiate between service principals and managed identities, allowing `None` for the `directory_id` field if the principal is not a service principal. During the migration to UC storage credentials, managed identities are currently ignored. These changes improve handling of managed identities during the crawling of `StoragePermissionMapping`, prevent errors when creating storage credentials with managed identities, and address issue [#339](#339). The changes are tested through unit tests, manual testing, and integration tests, and only affect the `StoragePermissionMapping` class and related methods, without introducing new commands, workflows, or tables. * Added migration for Azure Service Principals with secrets stored in Databricks Secret to UC Storage Credentials ([#874](#874)). In this release, we have made significant updates to migrate Azure Service Principals with their secrets stored in Databricks Secret to UC Storage Credentials, enhancing security and management of storage access. The changes include: Addition of a new `migrate_credentials` command in the `labs.yml` file to migrate credentials for storage access to UC storage credential. Modification of `secrets.py` to handle the case where a secret has been removed from the backend and to log warning messages for secrets with invalid Base64 bytes. Introduction of the `StorageCredentialManager` and `ServicePrincipalMigration` classes in `credentials.py` to manage Azure Service Principals and their associated client secrets, and to migrate them to UC Storage Credentials. Addition of a new `directory_id` attribute in the `Principal` class and its associated dataclass in `resources.py` to store the directory ID for creating UC storage credentials using a service principal. Creation of a new pytest fixture, `make_storage_credential_spn`, in `fixtures.py` to simplify writing tests requiring Databricks Storage Credentials with Azure Service Principal auth. Addition of a new test file for the Azure integration of the project, including new classes, methods, and test cases for testing the migration of Azure Service Principals to UC Storage Credentials. These improvements will ensure better security and management of storage access using Azure Service Principals, while providing more efficient and robust testing capabilities. * Added permission migration support for feature tables and the root permissions for models and feature tables ([#997](#997)). This commit introduces support for migration of permissions related to feature tables and sets root permissions for models and feature tables. New functions such as `feature_store_listing`, `feature_tables_root_page`, `models_root_page`, and `tokens_and_passwords` have been added to facilitate population of a workspace access page with necessary permissions information. The `factory` function in `manager.py` has been updated to include new listings for models' root page, feature tables' root page, and the feature store for enhanced management and access control of models and feature tables. New classes and methods have been implemented to handle permissions for these resources, utilizing `GenericPermissionsSupport`, `AccessControlRequest`, and `MigratedGroup` classes. Additionally, new test methods have been included to verify feature tables listing functionality and root page listing functionality for feature tables and registered models. The test manager method has been updated to include `feature-tables` in the list of items to be checked for permissions, ensuring comprehensive testing of permission functionality related to these new feature tables. * Added support for serving endpoints ([#990](#990)). In this release, we have made significant enhancements to support serving endpoints in our open-source library. The `fixtures.py` file in the `databricks.labs.ucx.mixins` module has been updated with new classes and functions to create and manage serving endpoints, accompanied by integration tests to verify their functionality. We have added a new listing for serving endpoints in the assessment's permissions crawling, using the `ws.serving_endpoints.list` function and the `serving-endpoints` category. A new integration test, "test_endpoints," has been added to verify that assessments now crawl permissions for serving endpoints. This test demonstrates the ability to migrate permissions from one group to another. The test suite has been updated to ensure the proper functioning of the new feature and improve the assessment of permissions for serving endpoints, ensuring compatibility with the updated `test_manager.py` file. * Expanded end-user documentation with detailed descriptions for workflows and commands ([#999](#999)). The Databricks Labs UCX project has been updated with several new features to assist in upgrading to Unity Catalog, including an assessment workflow that generates a detailed compatibility report for workspace entities, a group migration workflow for upgrading all Databricks workspace assets, and utility commands for managing cross-workspace installations. The Assessment Report now includes a more detailed summary of the assessment findings, table counts, database summaries, and external locations. Additional improvements include expanded workspace group migration to handle potential conflicts with locally scoped group names, enhanced documentation for external Hive Metastore integration, a new debugging notebook, and detailed descriptions of table upgrade considerations, data access permissions, external storage, and table crawler. * Fixed `config.yml` upgrade from very old versions ([#984](#984)). In this release, we've introduced enhancements to the configuration upgrading process for `config.yml` in our open-source library. We've replaced the previous `v1_migrate` class method with a new implementation that specifically handles migration from version 1. The new method retrieves the `groups` field, extracts the `selected` value, and assigns it to the `include_group_names` key in the configuration. The `backup_group_prefix` value from the `groups` field is assigned to the `renamed_group_prefix` key, and the `groups` field is removed, with the version number updated to 2. These changes simplify the code and improve readability, enabling users to upgrade smoothly from version 1 of the configuration. Furthermore, we've added new unit tests to the `test_config.py` file to ensure backward compatibility. Two new tests, `test_v1_migrate_zeroconf` and `test_v1_migrate_some_conf`, have been added, utilizing the `MockInstallation` class and loading the configuration using `WorkspaceConfig`. These tests enhance the robustness and reliability of the migration process for `config.yml`. * Renamed columns in assessment SQL queries to use actual names, not aliases ([#983](#983)). In this update, we have resolved an issue where aliases used for column references in SQL queries caused errors in certain setups by renaming them to use actual names. Specifically, for assessment SQL queries, we have modified the definition of the `is_delta` column to use the actual `table_format` name instead of the alias `format`. This change improves compatibility and enhances the reliability of query execution. As a software engineer, you will appreciate that this modification ensures consistent interpretation of column references across various setups, thereby avoiding potential errors caused by aliases. This change does not introduce any new methods, but instead modifies existing functionality to use actual column names, ensuring a more reliable and consistent SQL query for the `05_0_all_tables` assessment. * Updated groups permissions validation to use Table ACL cluster ([#979](#979)). In this update, the `validate_groups_permissions` task has been modified to utilize the Table ACL cluster, as indicated by the inclusion of `job_cluster="tacl"`. This task is responsible for ensuring that all crawled permissions are accurately applied to the destination groups by calling the `permission_manager.apply_group_permissions` method during the migration state. This modification enhances the validation of group permissions by performing it on the Table ACL cluster, potentially improving performance or functionality. If you are implementing this project, it is crucial to comprehend the consequences of this change on your permissions validation process and adjust your workflows appropriately.
* Added `upgraded_from_workspace_id` property to migrated tables to indicated the source workspace ([#987](#987)). In this release, updates have been made to the `_migrate_external_table`, `_migrate_dbfs_root_table`, and `_migrate_view` methods in the `table_migrate.py` file to include a new parameter `upgraded_from_ws` in the SQL commands used to alter tables, views, or managed tables. This parameter is used to store the source workspace ID in the migrated tables, indicating the migration origin. A new utility method `sql_alter_from` has been added to the `Table` class in `tables.py` to generate the SQL command with the new parameter. Additionally, a new class-level attribute `UPGRADED_FROM_WS_PARAM` has been added to the `Table` class in `tables.py` to indicate the source workspace. A new property `upgraded_from_workspace_id` has been added to migrated tables to store the source workspace ID. These changes resolve issue [#899](#899) and are tested through manual testing, unit tests, and integration tests. No new CLI commands, workflows, or tables have been added or modified, and there are no changes to user documentation. * Added a command to create account level groups if they do not exist ([#763](#763)). This commit introduces a new feature that enables the creation of account-level groups if they do not already exist in the account. A new command, `create-account-groups`, has been added to the `databricks labs ucx` tool, which crawls all workspaces in the account and creates account-level groups if a corresponding workspace-local group is not found. The feature supports various scenarios, including creating account-level groups that exist in some workspaces but not in others, and creating multiple account-level groups with the same name but different members. Several new methods have been added to the `account.py` file to support the new feature, and the `test_account.py` file has been updated with new tests to ensure the correct behavior of the `create_account_level_groups` method. Additionally, the `cli.py` file has been updated to include the new `create-account-groups` command. With these changes, users can easily manage account-level groups and ensure that they are consistent across all workspaces in the account, improving the overall user experience. * Added assessment for the incompatible `RunSubmit` API usages ([#849](#849)). In this release, the assessment functionality for incompatible `RunSubmit` API usages has been significantly enhanced through various changes. The 'clusters.py' file has seen improvements in clarity and consistency with the renaming of private methods `check_spark_conf` to `_check_spark_conf` and `check_cluster_failures` to `_check_cluster_failures`. The `_assess_clusters` method has been updated to call the renamed `_check_cluster_failures` method for thorough checks of cluster configurations, resulting in better assessment functionality. A new `SubmitRunsCrawler` class has been added to the `databricks.labs.ucx.assessment.jobs` module, implementing `CrawlerBase`, `JobsMixin`, and `CheckClusterMixin` classes. This class crawls and assesses job runs based on their submitted runs, ensuring compatibility and identifying failure issues. Additionally, a new configuration attribute, `num_days_submit_runs_history`, has been introduced in the `WorkspaceConfig` class of the `config.py` module, controlling the number of days for which submission history of `RunSubmit` API calls is retained. Lastly, various new JSON files have been added for unit testing, assessing the `RunSubmit` API usages related to different scenarios like dbt task runs, Git source-based job runs, JAR file runs, and more. These tests will aid in identifying and addressing potential compatibility issues with the `RunSubmit` API. * Added group members difference to the output of `validate-groups-membership` cli command ([#995](#995)). The `validate-groups-membership` command has been updated to include a comparison of group memberships at both the account and workspace levels. This enhancement is implemented through the `validate_group_membership` function, which has been updated to calculate the difference in members between the two levels and display it in a new `group_members_difference` column. This allows for a more detailed analysis of group memberships and easily identifies any discrepancies between the account and workspace levels. The corresponding unit test file, "test_groups.py," has been updated to include a new test case that verifies the calculation of the `group_members_difference` value. The functionality of the other commands remains unchanged. The new `group_members_difference` value is calculated as the difference in the number of members in the workspace group and the account group, with a positive value indicating more members in the workspace group and a negative value indicating more members in the account group. The table template in the labs.yml file has also been updated to include the new column for the group membership difference. * Added handling for empty `directory_id` if managed identity encountered during the crawling of StoragePermissionMapping ([#986](#986)). This PR adds a `type` field to the `StoragePermissionMapping` and `Principal` dataclasses to differentiate between service principals and managed identities, allowing `None` for the `directory_id` field if the principal is not a service principal. During the migration to UC storage credentials, managed identities are currently ignored. These changes improve handling of managed identities during the crawling of `StoragePermissionMapping`, prevent errors when creating storage credentials with managed identities, and address issue [#339](#339). The changes are tested through unit tests, manual testing, and integration tests, and only affect the `StoragePermissionMapping` class and related methods, without introducing new commands, workflows, or tables. * Added migration for Azure Service Principals with secrets stored in Databricks Secret to UC Storage Credentials ([#874](#874)). In this release, we have made significant updates to migrate Azure Service Principals with their secrets stored in Databricks Secret to UC Storage Credentials, enhancing security and management of storage access. The changes include: Addition of a new `migrate_credentials` command in the `labs.yml` file to migrate credentials for storage access to UC storage credential. Modification of `secrets.py` to handle the case where a secret has been removed from the backend and to log warning messages for secrets with invalid Base64 bytes. Introduction of the `StorageCredentialManager` and `ServicePrincipalMigration` classes in `credentials.py` to manage Azure Service Principals and their associated client secrets, and to migrate them to UC Storage Credentials. Addition of a new `directory_id` attribute in the `Principal` class and its associated dataclass in `resources.py` to store the directory ID for creating UC storage credentials using a service principal. Creation of a new pytest fixture, `make_storage_credential_spn`, in `fixtures.py` to simplify writing tests requiring Databricks Storage Credentials with Azure Service Principal auth. Addition of a new test file for the Azure integration of the project, including new classes, methods, and test cases for testing the migration of Azure Service Principals to UC Storage Credentials. These improvements will ensure better security and management of storage access using Azure Service Principals, while providing more efficient and robust testing capabilities. * Added permission migration support for feature tables and the root permissions for models and feature tables ([#997](#997)). This commit introduces support for migration of permissions related to feature tables and sets root permissions for models and feature tables. New functions such as `feature_store_listing`, `feature_tables_root_page`, `models_root_page`, and `tokens_and_passwords` have been added to facilitate population of a workspace access page with necessary permissions information. The `factory` function in `manager.py` has been updated to include new listings for models' root page, feature tables' root page, and the feature store for enhanced management and access control of models and feature tables. New classes and methods have been implemented to handle permissions for these resources, utilizing `GenericPermissionsSupport`, `AccessControlRequest`, and `MigratedGroup` classes. Additionally, new test methods have been included to verify feature tables listing functionality and root page listing functionality for feature tables and registered models. The test manager method has been updated to include `feature-tables` in the list of items to be checked for permissions, ensuring comprehensive testing of permission functionality related to these new feature tables. * Added support for serving endpoints ([#990](#990)). In this release, we have made significant enhancements to support serving endpoints in our open-source library. The `fixtures.py` file in the `databricks.labs.ucx.mixins` module has been updated with new classes and functions to create and manage serving endpoints, accompanied by integration tests to verify their functionality. We have added a new listing for serving endpoints in the assessment's permissions crawling, using the `ws.serving_endpoints.list` function and the `serving-endpoints` category. A new integration test, "test_endpoints," has been added to verify that assessments now crawl permissions for serving endpoints. This test demonstrates the ability to migrate permissions from one group to another. The test suite has been updated to ensure the proper functioning of the new feature and improve the assessment of permissions for serving endpoints, ensuring compatibility with the updated `test_manager.py` file. * Expanded end-user documentation with detailed descriptions for workflows and commands ([#999](#999)). The Databricks Labs UCX project has been updated with several new features to assist in upgrading to Unity Catalog, including an assessment workflow that generates a detailed compatibility report for workspace entities, a group migration workflow for upgrading all Databricks workspace assets, and utility commands for managing cross-workspace installations. The Assessment Report now includes a more detailed summary of the assessment findings, table counts, database summaries, and external locations. Additional improvements include expanded workspace group migration to handle potential conflicts with locally scoped group names, enhanced documentation for external Hive Metastore integration, a new debugging notebook, and detailed descriptions of table upgrade considerations, data access permissions, external storage, and table crawler. * Fixed `config.yml` upgrade from very old versions ([#984](#984)). In this release, we've introduced enhancements to the configuration upgrading process for `config.yml` in our open-source library. We've replaced the previous `v1_migrate` class method with a new implementation that specifically handles migration from version 1. The new method retrieves the `groups` field, extracts the `selected` value, and assigns it to the `include_group_names` key in the configuration. The `backup_group_prefix` value from the `groups` field is assigned to the `renamed_group_prefix` key, and the `groups` field is removed, with the version number updated to 2. These changes simplify the code and improve readability, enabling users to upgrade smoothly from version 1 of the configuration. Furthermore, we've added new unit tests to the `test_config.py` file to ensure backward compatibility. Two new tests, `test_v1_migrate_zeroconf` and `test_v1_migrate_some_conf`, have been added, utilizing the `MockInstallation` class and loading the configuration using `WorkspaceConfig`. These tests enhance the robustness and reliability of the migration process for `config.yml`. * Renamed columns in assessment SQL queries to use actual names, not aliases ([#983](#983)). In this update, we have resolved an issue where aliases used for column references in SQL queries caused errors in certain setups by renaming them to use actual names. Specifically, for assessment SQL queries, we have modified the definition of the `is_delta` column to use the actual `table_format` name instead of the alias `format`. This change improves compatibility and enhances the reliability of query execution. As a software engineer, you will appreciate that this modification ensures consistent interpretation of column references across various setups, thereby avoiding potential errors caused by aliases. This change does not introduce any new methods, but instead modifies existing functionality to use actual column names, ensuring a more reliable and consistent SQL query for the `05_0_all_tables` assessment. * Updated groups permissions validation to use Table ACL cluster ([#979](#979)). In this update, the `validate_groups_permissions` task has been modified to utilize the Table ACL cluster, as indicated by the inclusion of `job_cluster="tacl"`. This task is responsible for ensuring that all crawled permissions are accurately applied to the destination groups by calling the `permission_manager.apply_group_permissions` method during the migration state. This modification enhances the validation of group permissions by performing it on the Table ACL cluster, potentially improving performance or functionality. If you are implementing this project, it is crucial to comprehend the consequences of this change on your permissions validation process and adjust your workflows appropriately.
add integration tests fix - Fix integration tests on AWS (#978) Update groups permissions validation to use Table ACL cluster (#979) Renamed columns in assessment SQL queries to use actual names, not aliases (#983) <!-- Summary of your changes that are easy to understand. Add screenshots when necessary --> Aliases are usually not allowed in projections (as they are replaced later in the query execution phases). While the DBSQL was smart enough to handle the references via aliases, for some setups this results in an error. Changing column references to use actual names fixes this. <!-- DOC: Link issue with a keyword: close, closes, closed, fix, fixes, fixed, resolve, resolves, resolved. See https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword --> Resolves #980 - [ ] added relevant user documentation - [ ] added new CLI command - [ ] modified existing command: `databricks labs ucx ...` - [ ] added a new workflow - [ ] modified existing workflow: `...` - [ ] added a new table - [ ] modified existing table: `...` <!-- How is this tested? Please see the checklist below and also describe any other relevant tests --> - [x] manually tested - [ ] added unit tests - [ ] added integration tests - [ ] verified on staging environment (screenshot attached) Fixed `config.yml` upgrade from very old versions (#984) Add a command to create account level groups if they do not exist (#763) Attempt to fix - #17 - #649 Adds a command to create groups at account level by crawling all workspaces configured in the account and in scope of the migration This pull request adds several new methods to the `account.py` file in the `databricks/labs/ucx` directory. The main method added is `create_account_level_groups`, which crawls all workspaces in an account and creates account-level groups if a workspace-local group is not present in the account. The method `get_valid_workspaces_groups` is added to retrieve a dictionary of all valid workspace groups, while `has_not_same_members` checks if two groups have the same members. The method `get_account_groups` retrieves a dictionary of all account groups. Regarding the tests, the `test_account.py` file has been updated to include new tests for the `create_account_level_groups` method. The test `test_create_acc_groups_should_create_acc_group_if_no_group_found` verifies that an account-level group is created if no group with the same name is found. The test `test_create_acc_groups_should_filter_groups_in_other_workspaces` checks that the method filters groups present in other workspaces and only creates groups that are not present in the account. Additionally, the `cli.py` file has been updated to include a new command, `create_account_level_groups`, which uploads workspace config to all workspaces in the account where ucx is installed. Added tokei.rs lines of code badge (#988) [![lines of code](https://tokei.rs/b1/github/databrickslabs/ucx)]([https://codecov.io/github/databrickslabs/ucx](https://github.com/databrickslabs/ucx)) Adding support for serving endpoints (#990) Assessment did not crawled permissions for serving endpoints, this PR aims to fix it - [X] added integration tests Added `upgraded_from_workspace_id` property to migrated tables to indicated the source workspace. (#987) Added table parameter `upgraded_from_ws` to migrated tables. The parameters contains the sources workspace id. Resolves #899 - [ ] added relevant user documentation - [ ] added new CLI command - [ ] modified existing command: `databricks labs ucx ...` - [ ] added a new workflow - [ ] modified existing workflow: `...` - [ ] added a new table - [ ] modified existing table: `...` <!-- How is this tested? Please see the checklist below and also describe any other relevant tests --> - [x] manually tested - [x] added unit tests - [x] added integration tests - [x] verified on staging environment (screenshot attached) Handle None directory_id if managed identity encountered during the crawling of StoragePermissionMapping (#986) While creating StoragePermissionMapping, a principal could be managed identity which does not have directory_id. This PR will allow managed identity to be stored in StoragePermissionMapping, and allow None directory_id. <!-- Summary of your changes that are easy to understand. Add screenshots when necessary --> - Add `type` field to dataclass `StoragePermissionMapping` and `Principal` to indicate if a principal is service principal or managed identity. - Allow None `directory_id` if the principal is not a service principal. - Ignore the managed identity while migrating to UC storage credentials for now. <!-- DOC: Link issue with a keyword: close, closes, closed, fix, fixes, fixed, resolve, resolves, resolved. See https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword --> fix #339 - [ ] added relevant user documentation - [ ] added new CLI command - [ ] modified existing command: `databricks labs ucx ...` - [ ] added a new workflow - [ ] modified existing workflow: `...` - [ ] added a new table - [ ] modified existing table: `...` <!-- How is this tested? Please see the checklist below and also describe any other relevant tests --> - [ ] manually tested - [ ] added unit tests - [ ] added integration tests - [ ] verified on staging environment (screenshot attached) Added group members difference to the output of `validate-groups-membership` cli command (#995) The `validate-groups-membership` command has been updated to include a comparison of group memberships at both the account and workspace levels, displaying the difference in members between the two levels in a new column. This enhancement allows for a more detailed analysis of group memberships, with the added functionality implemented in the `validate_group_membership` function in the `groups.py` file located in the `databricks/labs/ucx/workspace_access` directory. A new output field, "group\_members\_difference," has been added to represent the difference in the number of members between a workspace group and an associated account group. The corresponding unit test file, "test\_groups.py," has been updated to include a new test case that verifies the calculation of the "group\_members\_difference" value. This change provides users with a more comprehensive view of their group memberships and allows them to easily identify any discrepancies between the account and workspace levels. The functionality of the other commands remains unchanged. Added permission migration support for feature tables and the root permissions for models and feature tables (#997) Improved installation integration test flakiness (#998) - improved `_infer_error_from_job_run` and `_infer_error_from_task_run` to also catch `KeyError` and `ValueError` - removed retries for `Unknown` errors for installation tests Added assessment for the incompatible `RunSubmit` API usages (#849) Expanded end-user documentation with detailed descriptions for workflows and commands (#999) The Databricks Labs UCX project has been updated with several new features to assist in upgrading to Unity Catalog. These include various workflows and command-line utilities, such as an assessment workflow that generates a detailed compatibility report for workspace entities and a group migration workflow to upgrade all Databricks workspace assets. Additionally, new utility commands have been added for managing cross-workspace installations, and users can now view deployed workflows' status and repair failed workflows. A new end-user documentation has also been introduced, featuring comprehensive descriptions of workflows, commands, and an assessment report image. The Assessment Report, generated from UCX tools, now includes a more detailed summary of the assessment findings, table counts, database summaries, and external locations. Improved documentation for external Hive Metastore integration and a new debugging notebook are also included in this release. Lastly, the workspace group migration feature has been expanded to handle potential conflicts when migrating multiple workspaces with locally scoped group names. Release v0.14.0 (#1000) * Added `upgraded_from_workspace_id` property to migrated tables to indicated the source workspace ([#987](#987)). In this release, updates have been made to the `_migrate_external_table`, `_migrate_dbfs_root_table`, and `_migrate_view` methods in the `table_migrate.py` file to include a new parameter `upgraded_from_ws` in the SQL commands used to alter tables, views, or managed tables. This parameter is used to store the source workspace ID in the migrated tables, indicating the migration origin. A new utility method `sql_alter_from` has been added to the `Table` class in `tables.py` to generate the SQL command with the new parameter. Additionally, a new class-level attribute `UPGRADED_FROM_WS_PARAM` has been added to the `Table` class in `tables.py` to indicate the source workspace. A new property `upgraded_from_workspace_id` has been added to migrated tables to store the source workspace ID. These changes resolve issue [#899](#899) and are tested through manual testing, unit tests, and integration tests. No new CLI commands, workflows, or tables have been added or modified, and there are no changes to user documentation. * Added a command to create account level groups if they do not exist ([#763](#763)). This commit introduces a new feature that enables the creation of account-level groups if they do not already exist in the account. A new command, `create-account-groups`, has been added to the `databricks labs ucx` tool, which crawls all workspaces in the account and creates account-level groups if a corresponding workspace-local group is not found. The feature supports various scenarios, including creating account-level groups that exist in some workspaces but not in others, and creating multiple account-level groups with the same name but different members. Several new methods have been added to the `account.py` file to support the new feature, and the `test_account.py` file has been updated with new tests to ensure the correct behavior of the `create_account_level_groups` method. Additionally, the `cli.py` file has been updated to include the new `create-account-groups` command. With these changes, users can easily manage account-level groups and ensure that they are consistent across all workspaces in the account, improving the overall user experience. * Added assessment for the incompatible `RunSubmit` API usages ([#849](#849)). In this release, the assessment functionality for incompatible `RunSubmit` API usages has been significantly enhanced through various changes. The 'clusters.py' file has seen improvements in clarity and consistency with the renaming of private methods `check_spark_conf` to `_check_spark_conf` and `check_cluster_failures` to `_check_cluster_failures`. The `_assess_clusters` method has been updated to call the renamed `_check_cluster_failures` method for thorough checks of cluster configurations, resulting in better assessment functionality. A new `SubmitRunsCrawler` class has been added to the `databricks.labs.ucx.assessment.jobs` module, implementing `CrawlerBase`, `JobsMixin`, and `CheckClusterMixin` classes. This class crawls and assesses job runs based on their submitted runs, ensuring compatibility and identifying failure issues. Additionally, a new configuration attribute, `num_days_submit_runs_history`, has been introduced in the `WorkspaceConfig` class of the `config.py` module, controlling the number of days for which submission history of `RunSubmit` API calls is retained. Lastly, various new JSON files have been added for unit testing, assessing the `RunSubmit` API usages related to different scenarios like dbt task runs, Git source-based job runs, JAR file runs, and more. These tests will aid in identifying and addressing potential compatibility issues with the `RunSubmit` API. * Added group members difference to the output of `validate-groups-membership` cli command ([#995](#995)). The `validate-groups-membership` command has been updated to include a comparison of group memberships at both the account and workspace levels. This enhancement is implemented through the `validate_group_membership` function, which has been updated to calculate the difference in members between the two levels and display it in a new `group_members_difference` column. This allows for a more detailed analysis of group memberships and easily identifies any discrepancies between the account and workspace levels. The corresponding unit test file, "test_groups.py," has been updated to include a new test case that verifies the calculation of the `group_members_difference` value. The functionality of the other commands remains unchanged. The new `group_members_difference` value is calculated as the difference in the number of members in the workspace group and the account group, with a positive value indicating more members in the workspace group and a negative value indicating more members in the account group. The table template in the labs.yml file has also been updated to include the new column for the group membership difference. * Added handling for empty `directory_id` if managed identity encountered during the crawling of StoragePermissionMapping ([#986](#986)). This PR adds a `type` field to the `StoragePermissionMapping` and `Principal` dataclasses to differentiate between service principals and managed identities, allowing `None` for the `directory_id` field if the principal is not a service principal. During the migration to UC storage credentials, managed identities are currently ignored. These changes improve handling of managed identities during the crawling of `StoragePermissionMapping`, prevent errors when creating storage credentials with managed identities, and address issue [#339](#339). The changes are tested through unit tests, manual testing, and integration tests, and only affect the `StoragePermissionMapping` class and related methods, without introducing new commands, workflows, or tables. * Added migration for Azure Service Principals with secrets stored in Databricks Secret to UC Storage Credentials ([#874](#874)). In this release, we have made significant updates to migrate Azure Service Principals with their secrets stored in Databricks Secret to UC Storage Credentials, enhancing security and management of storage access. The changes include: Addition of a new `migrate_credentials` command in the `labs.yml` file to migrate credentials for storage access to UC storage credential. Modification of `secrets.py` to handle the case where a secret has been removed from the backend and to log warning messages for secrets with invalid Base64 bytes. Introduction of the `StorageCredentialManager` and `ServicePrincipalMigration` classes in `credentials.py` to manage Azure Service Principals and their associated client secrets, and to migrate them to UC Storage Credentials. Addition of a new `directory_id` attribute in the `Principal` class and its associated dataclass in `resources.py` to store the directory ID for creating UC storage credentials using a service principal. Creation of a new pytest fixture, `make_storage_credential_spn`, in `fixtures.py` to simplify writing tests requiring Databricks Storage Credentials with Azure Service Principal auth. Addition of a new test file for the Azure integration of the project, including new classes, methods, and test cases for testing the migration of Azure Service Principals to UC Storage Credentials. These improvements will ensure better security and management of storage access using Azure Service Principals, while providing more efficient and robust testing capabilities. * Added permission migration support for feature tables and the root permissions for models and feature tables ([#997](#997)). This commit introduces support for migration of permissions related to feature tables and sets root permissions for models and feature tables. New functions such as `feature_store_listing`, `feature_tables_root_page`, `models_root_page`, and `tokens_and_passwords` have been added to facilitate population of a workspace access page with necessary permissions information. The `factory` function in `manager.py` has been updated to include new listings for models' root page, feature tables' root page, and the feature store for enhanced management and access control of models and feature tables. New classes and methods have been implemented to handle permissions for these resources, utilizing `GenericPermissionsSupport`, `AccessControlRequest`, and `MigratedGroup` classes. Additionally, new test methods have been included to verify feature tables listing functionality and root page listing functionality for feature tables and registered models. The test manager method has been updated to include `feature-tables` in the list of items to be checked for permissions, ensuring comprehensive testing of permission functionality related to these new feature tables. * Added support for serving endpoints ([#990](#990)). In this release, we have made significant enhancements to support serving endpoints in our open-source library. The `fixtures.py` file in the `databricks.labs.ucx.mixins` module has been updated with new classes and functions to create and manage serving endpoints, accompanied by integration tests to verify their functionality. We have added a new listing for serving endpoints in the assessment's permissions crawling, using the `ws.serving_endpoints.list` function and the `serving-endpoints` category. A new integration test, "test_endpoints," has been added to verify that assessments now crawl permissions for serving endpoints. This test demonstrates the ability to migrate permissions from one group to another. The test suite has been updated to ensure the proper functioning of the new feature and improve the assessment of permissions for serving endpoints, ensuring compatibility with the updated `test_manager.py` file. * Expanded end-user documentation with detailed descriptions for workflows and commands ([#999](#999)). The Databricks Labs UCX project has been updated with several new features to assist in upgrading to Unity Catalog, including an assessment workflow that generates a detailed compatibility report for workspace entities, a group migration workflow for upgrading all Databricks workspace assets, and utility commands for managing cross-workspace installations. The Assessment Report now includes a more detailed summary of the assessment findings, table counts, database summaries, and external locations. Additional improvements include expanded workspace group migration to handle potential conflicts with locally scoped group names, enhanced documentation for external Hive Metastore integration, a new debugging notebook, and detailed descriptions of table upgrade considerations, data access permissions, external storage, and table crawler. * Fixed `config.yml` upgrade from very old versions ([#984](#984)). In this release, we've introduced enhancements to the configuration upgrading process for `config.yml` in our open-source library. We've replaced the previous `v1_migrate` class method with a new implementation that specifically handles migration from version 1. The new method retrieves the `groups` field, extracts the `selected` value, and assigns it to the `include_group_names` key in the configuration. The `backup_group_prefix` value from the `groups` field is assigned to the `renamed_group_prefix` key, and the `groups` field is removed, with the version number updated to 2. These changes simplify the code and improve readability, enabling users to upgrade smoothly from version 1 of the configuration. Furthermore, we've added new unit tests to the `test_config.py` file to ensure backward compatibility. Two new tests, `test_v1_migrate_zeroconf` and `test_v1_migrate_some_conf`, have been added, utilizing the `MockInstallation` class and loading the configuration using `WorkspaceConfig`. These tests enhance the robustness and reliability of the migration process for `config.yml`. * Renamed columns in assessment SQL queries to use actual names, not aliases ([#983](#983)). In this update, we have resolved an issue where aliases used for column references in SQL queries caused errors in certain setups by renaming them to use actual names. Specifically, for assessment SQL queries, we have modified the definition of the `is_delta` column to use the actual `table_format` name instead of the alias `format`. This change improves compatibility and enhances the reliability of query execution. As a software engineer, you will appreciate that this modification ensures consistent interpretation of column references across various setups, thereby avoiding potential errors caused by aliases. This change does not introduce any new methods, but instead modifies existing functionality to use actual column names, ensuring a more reliable and consistent SQL query for the `05_0_all_tables` assessment. * Updated groups permissions validation to use Table ACL cluster ([#979](#979)). In this update, the `validate_groups_permissions` task has been modified to utilize the Table ACL cluster, as indicated by the inclusion of `job_cluster="tacl"`. This task is responsible for ensuring that all crawled permissions are accurately applied to the destination groups by calling the `permission_manager.apply_group_permissions` method during the migration state. This modification enhances the validation of group permissions by performing it on the Table ACL cluster, potentially improving performance or functionality. If you are implementing this project, it is crucial to comprehend the consequences of this change on your permissions validation process and adjust your workflows appropriately. Update databricks-labs-blueprint requirement from ~=0.2.4 to ~=0.3.0 (#1001) Updates the requirements on [databricks-labs-blueprint](https://github.com/databrickslabs/blueprint) to permit the latest version. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/databrickslabs/blueprint/releases">databricks-labs-blueprint's releases</a>.</em></p> <blockquote> <h2>v0.3.0</h2> <ul> <li>Added automated upgrade framework (<a href="https://redirect.github.com/databrickslabs/blueprint/issues/50">#50</a>). This update introduces an automated upgrade framework for managing and applying upgrades to the product, with a new <code>upgrades.py</code> file that includes a <code>ProductInfo</code> class having methods for version handling, wheel building, and exception handling. The test code organization has been improved, and new test cases, functions, and a directory structure for fixtures and unit tests have been added for the upgrades functionality. The <code>test_wheels.py</code> file now checks the version of the Databricks SDK and handles cases where the version marker is missing or does not contain the <code>__version__</code> variable. Additionally, a new <code>Application State Migrations</code> section has been added to the README, explaining the process of seamless upgrades from version X to version Z through version Y, addressing the need for configuration or database state migrations as the application evolves. Users can apply these upgrades by following an idiomatic usage pattern involving several classes and functions. Furthermore, improvements have been made to the <code>_trim_leading_whitespace</code> function in the <code>commands.py</code> file of the <code>databricks.labs.blueprint</code> module, ensuring accurate and consistent removal of leading whitespace for each line in the command string, leading to better overall functionality and maintainability.</li> <li>Added brute-forcing <code>SerdeError</code> with <code>as_dict()</code> and <code>from_dict()</code> (<a href="https://redirect.github.com/databrickslabs/blueprint/issues/58">#58</a>). This commit introduces a brute-forcing approach for handling <code>SerdeError</code> using <code>as_dict()</code> and <code>from_dict()</code> methods in an open-source library. The new <code>SomePolicy</code> class demonstrates the usage of these methods for manual serialization and deserialization of custom classes. The <code>as_dict()</code> method returns a dictionary representation of the class instance, and the <code>from_dict()</code> method, decorated with <code>@classmethod</code>, creates a new instance from the provided dictionary. Additionally, the GitHub Actions workflow for acceptance tests has been updated to include the <code>ready_for_review</code> event type, ensuring that tests run not only for opened and synchronized pull requests but also when marked as "ready for review." These changes provide developers with more control over the deserialization process and facilitate debugging in cases where default deserialization fails, but should be used judiciously to avoid brittle code.</li> <li>Fixed nightly integration tests run as service principals (<a href="https://redirect.github.com/databrickslabs/blueprint/issues/52">#52</a>). In this release, we have enhanced the compatibility of our codebase with service principals, particularly in the context of nightly integration tests. The <code>Installation</code> class in the <code>databricks.labs.blueprint.installation</code> module has been refactored, deprecating the <code>current</code> method and introducing two new methods: <code>assume_global</code> and <code>assume_user_home</code>. These methods enable users to install and manage <code>blueprint</code> as either a global or user-specific installation. Additionally, the <code>existing</code> method has been updated to work with the new <code>Installation</code> methods. In the test suite, the <code>test_installation.py</code> file has been updated to correctly detect global and user-specific installations when running as a service principal. These changes improve the testability and functionality of our software, ensuring seamless operation with service principals during nightly integration tests.</li> <li>Made <code>test_existing_installations_are_detected</code> more resilient (<a href="https://redirect.github.com/databrickslabs/blueprint/issues/51">#51</a>). In this release, we have added a new test function <code>test_existing_installations_are_detected</code> that checks if existing installations are correctly detected and retries the test for up to 15 seconds if they are not. This improves the reliability of the test by making it more resilient to potential intermittent failures. We have also added an import from <code>databricks.sdk.retries</code> named <code>retried</code> which is used to retry the test function in case of an <code>AssertionError</code>. Additionally, the test function <code>test_existing</code> has been renamed to <code>test_existing_installations_are_detected</code> and the <code>xfail</code> marker has been removed. We have also renamed the test function <code>test_dataclass</code> to <code>test_loading_dataclass_from_installation</code> for better clarity. This change will help ensure that the library is correctly detecting existing installations and improve the overall quality of the codebase.</li> </ul> <p>Contributors: <a href="https://github.com/nfx"><code>@nfx</code></a></p> </blockquote> </details> <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/databrickslabs/blueprint/blob/main/CHANGELOG.md">databricks-labs-blueprint's changelog</a>.</em></p> <blockquote> <h2>0.3.0</h2> <ul> <li>Added automated upgrade framework (<a href="https://redirect.github.com/databrickslabs/blueprint/issues/50">#50</a>). This update introduces an automated upgrade framework for managing and applying upgrades to the product, with a new <code>upgrades.py</code> file that includes a <code>ProductInfo</code> class having methods for version handling, wheel building, and exception handling. The test code organization has been improved, and new test cases, functions, and a directory structure for fixtures and unit tests have been added for the upgrades functionality. The <code>test_wheels.py</code> file now checks the version of the Databricks SDK and handles cases where the version marker is missing or does not contain the <code>__version__</code> variable. Additionally, a new <code>Application State Migrations</code> section has been added to the README, explaining the process of seamless upgrades from version X to version Z through version Y, addressing the need for configuration or database state migrations as the application evolves. Users can apply these upgrades by following an idiomatic usage pattern involving several classes and functions. Furthermore, improvements have been made to the <code>_trim_leading_whitespace</code> function in the <code>commands.py</code> file of the <code>databricks.labs.blueprint</code> module, ensuring accurate and consistent removal of leading whitespace for each line in the command string, leading to better overall functionality and maintainability.</li> <li>Added brute-forcing <code>SerdeError</code> with <code>as_dict()</code> and <code>from_dict()</code> (<a href="https://redirect.github.com/databrickslabs/blueprint/issues/58">#58</a>). This commit introduces a brute-forcing approach for handling <code>SerdeError</code> using <code>as_dict()</code> and <code>from_dict()</code> methods in an open-source library. The new <code>SomePolicy</code> class demonstrates the usage of these methods for manual serialization and deserialization of custom classes. The <code>as_dict()</code> method returns a dictionary representation of the class instance, and the <code>from_dict()</code> method, decorated with <code>@classmethod</code>, creates a new instance from the provided dictionary. Additionally, the GitHub Actions workflow for acceptance tests has been updated to include the <code>ready_for_review</code> event type, ensuring that tests run not only for opened and synchronized pull requests but also when marked as "ready for review." These changes provide developers with more control over the deserialization process and facilitate debugging in cases where default deserialization fails, but should be used judiciously to avoid brittle code.</li> <li>Fixed nightly integration tests run as service principals (<a href="https://redirect.github.com/databrickslabs/blueprint/issues/52">#52</a>). In this release, we have enhanced the compatibility of our codebase with service principals, particularly in the context of nightly integration tests. The <code>Installation</code> class in the <code>databricks.labs.blueprint.installation</code> module has been refactored, deprecating the <code>current</code> method and introducing two new methods: <code>assume_global</code> and <code>assume_user_home</code>. These methods enable users to install and manage <code>blueprint</code> as either a global or user-specific installation. Additionally, the <code>existing</code> method has been updated to work with the new <code>Installation</code> methods. In the test suite, the <code>test_installation.py</code> file has been updated to correctly detect global and user-specific installations when running as a service principal. These changes improve the testability and functionality of our software, ensuring seamless operation with service principals during nightly integration tests.</li> <li>Made <code>test_existing_installations_are_detected</code> more resilient (<a href="https://redirect.github.com/databrickslabs/blueprint/issues/51">#51</a>). In this release, we have added a new test function <code>test_existing_installations_are_detected</code> that checks if existing installations are correctly detected and retries the test for up to 15 seconds if they are not. This improves the reliability of the test by making it more resilient to potential intermittent failures. We have also added an import from <code>databricks.sdk.retries</code> named <code>retried</code> which is used to retry the test function in case of an <code>AssertionError</code>. Additionally, the test function <code>test_existing</code> has been renamed to <code>test_existing_installations_are_detected</code> and the <code>xfail</code> marker has been removed. We have also renamed the test function <code>test_dataclass</code> to <code>test_loading_dataclass_from_installation</code> for better clarity. This change will help ensure that the library is correctly detecting existing installations and improve the overall quality of the codebase.</li> </ul> <h2>0.2.5</h2> <ul> <li>Automatically enable workspace filesystem if the feature is disabled (<a href="https://redirect.github.com/databrickslabs/blueprint/pull/42">#42</a>).</li> </ul> <h2>0.2.4</h2> <ul> <li>Added more integration tests for <code>Installation</code> (<a href="https://redirect.github.com/databrickslabs/blueprint/pull/39">#39</a>).</li> <li>Fixed <code>yaml</code> optional import error (<a href="https://redirect.github.com/databrickslabs/blueprint/pull/38">#38</a>).</li> </ul> <h2>0.2.3</h2> <ul> <li>Added special handling for notebooks in <code>Installation.upload(...)</code> (<a href="https://redirect.github.com/databrickslabs/blueprint/pull/36">#36</a>).</li> </ul> <h2>0.2.2</h2> <ul> <li>Fixed issues with uploading wheels to DBFS and loading a non-existing install state (<a href="https://redirect.github.com/databrickslabs/blueprint/pull/34">#34</a>).</li> </ul> <h2>0.2.1</h2> <ul> <li>Aligned <code>Installation</code> framework with UCX project (<a href="https://redirect.github.com/databrickslabs/blueprint/pull/32">#32</a>).</li> </ul> <h2>0.2.0</h2> <ul> <li>Added common install state primitives with strong typing (<a href="https://redirect.github.com/databrickslabs/blueprint/pull/27">#27</a>).</li> <li>Added documentation for Invoking Databricks Connect (<a href="https://redirect.github.com/databrickslabs/blueprint/pull/28">#28</a>).</li> <li>Added more documentation for Databricks CLI command router (<a href="https://redirect.github.com/databrickslabs/blueprint/pull/30">#30</a>).</li> <li>Enforced <code>pylint</code> standards (<a href="https://redirect.github.com/databrickslabs/blueprint/pull/29">#29</a>).</li> </ul> <h2>0.1.0</h2> <ul> <li>Changed python requirement from 3.10.6 to 3.10 (<a href="https://redirect.github.com/databrickslabs/blueprint/pull/25">#25</a>).</li> </ul> <h2>0.0.6</h2> <ul> <li>Make <code>find_project_root</code> more deterministic (<a href="https://redirect.github.com/databrickslabs/blueprint/pull/23">#23</a>).</li> </ul> <!-- raw HTML omitted --> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/databrickslabs/blueprint/commit/905e5ff5303a005d48bc98d101a613afeda15d51"><code>905e5ff</code></a> Release v0.3.0 (<a href="https://redirect.github.com/databrickslabs/blueprint/issues/59">#59</a>)</li> <li><a href="https://github.com/databrickslabs/blueprint/commit/a029f6bb1ecf807017754e298ea685326dbedf72"><code>a029f6b</code></a> Added brute-forcing <code>SerdeError</code> with <code>as_dict()</code> and <code>from_dict()</code> (<a href="https://redirect.github.com/databrickslabs/blueprint/issues/58">#58</a>)</li> <li><a href="https://github.com/databrickslabs/blueprint/commit/c8a74f4129b4592d365aac9670eb86069f3517f7"><code>c8a74f4</code></a> Added automated upgrade framework (<a href="https://redirect.github.com/databrickslabs/blueprint/issues/50">#50</a>)</li> <li><a href="https://github.com/databrickslabs/blueprint/commit/24e62ef4f060e43e02c92a7d082d95e8bc164317"><code>24e62ef</code></a> Don't run integration tests on draft pull requests (<a href="https://redirect.github.com/databrickslabs/blueprint/issues/55">#55</a>)</li> <li><a href="https://github.com/databrickslabs/blueprint/commit/b4dd5abf4eaf8d022ae0b6ec7e659296ec3d2f37"><code>b4dd5ab</code></a> Added tokei.rs badge (<a href="https://redirect.github.com/databrickslabs/blueprint/issues/54">#54</a>)</li> <li><a href="https://github.com/databrickslabs/blueprint/commit/01d9467f425763ab08035001270593253bce11f0"><code>01d9467</code></a> Fixed nightly integration tests run as service principals (<a href="https://redirect.github.com/databrickslabs/blueprint/issues/52">#52</a>)</li> <li><a href="https://github.com/databrickslabs/blueprint/commit/aa5714179c65be8e13f54601e1d1fcd70548342d"><code>aa57141</code></a> Made <code>test_existing_installations_are_detected</code> more resilient (<a href="https://redirect.github.com/databrickslabs/blueprint/issues/51">#51</a>)</li> <li><a href="https://github.com/databrickslabs/blueprint/commit/9cbc6f863d3ea06659f37939cf1b97115dd873bd"><code>9cbc6f8</code></a> Bump <code>databrickslabs/sandbox/acceptance</code> to v0.1.0 (<a href="https://redirect.github.com/databrickslabs/blueprint/issues/48">#48</a>)</li> <li><a href="https://github.com/databrickslabs/blueprint/commit/22fc1a8787b8e98de03048595202f88b7ddb9b94"><code>22fc1a8</code></a> Use <code>databrickslabs/sandbox/acceptance</code> action (<a href="https://redirect.github.com/databrickslabs/blueprint/issues/45">#45</a>)</li> <li><a href="https://github.com/databrickslabs/blueprint/commit/c7e47abd82b2f04e95b1d91f346cc1ea6df43961"><code>c7e47ab</code></a> Release v0.2.5 (<a href="https://redirect.github.com/databrickslabs/blueprint/issues/44">#44</a>)</li> <li>Additional commits viewable in <a href="https://github.com/databrickslabs/blueprint/compare/v0.2.4...v0.3.0">compare view</a></li> </ul> </details> <br /> Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Run integration tests only for pull requests ready for review (#1002) Tested on https://github.com/databrickslabs/blueprint Reducing flakiness of create account groups (#1003) Prompt user if Terraform utilised for deploying infrastructure (#1004) Added prompt is_terraform_used and updated the same in the config of WorkspaceInstaller Resolves #393 --------- Co-authored-by: Serge Smertin <259697+nfx@users.noreply.github.com> Update CONTRIBUTING.md (#1005) Closes #850
author Vuong <vuong.nguyen@databricks.com> 1709737244 +0000 committer Vuong <vuong.nguyen@databricks.com> 1709739422 +0000 parent c866d42 author Vuong <vuong.nguyen@databricks.com> 1709737244 +0000 committer Vuong <vuong.nguyen@databricks.com> 1709739396 +0000 parent c866d42 author Vuong <vuong.nguyen@databricks.com> 1709737244 +0000 committer Vuong <vuong.nguyen@databricks.com> 1709739377 +0000 parent c866d42 author Vuong <vuong.nguyen@databricks.com> 1709737244 +0000 committer Vuong <vuong.nguyen@databricks.com> 1709739250 +0000 add trust relationship update Fix integration tests on AWS (#978) Update groups permissions validation to use Table ACL cluster (#979) Renamed columns in assessment SQL queries to use actual names, not aliases (#983) <!-- Summary of your changes that are easy to understand. Add screenshots when necessary --> Aliases are usually not allowed in projections (as they are replaced later in the query execution phases). While the DBSQL was smart enough to handle the references via aliases, for some setups this results in an error. Changing column references to use actual names fixes this. <!-- DOC: Link issue with a keyword: close, closes, closed, fix, fixes, fixed, resolve, resolves, resolved. See https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword --> Resolves #980 - [ ] added relevant user documentation - [ ] added new CLI command - [ ] modified existing command: `databricks labs ucx ...` - [ ] added a new workflow - [ ] modified existing workflow: `...` - [ ] added a new table - [ ] modified existing table: `...` <!-- How is this tested? Please see the checklist below and also describe any other relevant tests --> - [x] manually tested - [ ] added unit tests - [ ] added integration tests - [ ] verified on staging environment (screenshot attached) Fixed `config.yml` upgrade from very old versions (#984) Added `upgraded_from_workspace_id` property to migrated tables to indicated the source workspace. (#987) Added table parameter `upgraded_from_ws` to migrated tables. The parameters contains the sources workspace id. Resolves #899 - [ ] added relevant user documentation - [ ] added new CLI command - [ ] modified existing command: `databricks labs ucx ...` - [ ] added a new workflow - [ ] modified existing workflow: `...` - [ ] added a new table - [ ] modified existing table: `...` <!-- How is this tested? Please see the checklist below and also describe any other relevant tests --> - [x] manually tested - [x] added unit tests - [x] added integration tests - [x] verified on staging environment (screenshot attached) Added group members difference to the output of `validate-groups-membership` cli command (#995) The `validate-groups-membership` command has been updated to include a comparison of group memberships at both the account and workspace levels, displaying the difference in members between the two levels in a new column. This enhancement allows for a more detailed analysis of group memberships, with the added functionality implemented in the `validate_group_membership` function in the `groups.py` file located in the `databricks/labs/ucx/workspace_access` directory. A new output field, "group\_members\_difference," has been added to represent the difference in the number of members between a workspace group and an associated account group. The corresponding unit test file, "test\_groups.py," has been updated to include a new test case that verifies the calculation of the "group\_members\_difference" value. This change provides users with a more comprehensive view of their group memberships and allows them to easily identify any discrepancies between the account and workspace levels. The functionality of the other commands remains unchanged. Improved installation integration test flakiness (#998) - improved `_infer_error_from_job_run` and `_infer_error_from_task_run` to also catch `KeyError` and `ValueError` - removed retries for `Unknown` errors for installation tests Expanded end-user documentation with detailed descriptions for workflows and commands (#999) The Databricks Labs UCX project has been updated with several new features to assist in upgrading to Unity Catalog. These include various workflows and command-line utilities, such as an assessment workflow that generates a detailed compatibility report for workspace entities and a group migration workflow to upgrade all Databricks workspace assets. Additionally, new utility commands have been added for managing cross-workspace installations, and users can now view deployed workflows' status and repair failed workflows. A new end-user documentation has also been introduced, featuring comprehensive descriptions of workflows, commands, and an assessment report image. The Assessment Report, generated from UCX tools, now includes a more detailed summary of the assessment findings, table counts, database summaries, and external locations. Improved documentation for external Hive Metastore integration and a new debugging notebook are also included in this release. Lastly, the workspace group migration feature has been expanded to handle potential conflicts when migrating multiple workspaces with locally scoped group names. Release v0.14.0 (#1000) * Added `upgraded_from_workspace_id` property to migrated tables to indicated the source workspace ([#987](#987)). In this release, updates have been made to the `_migrate_external_table`, `_migrate_dbfs_root_table`, and `_migrate_view` methods in the `table_migrate.py` file to include a new parameter `upgraded_from_ws` in the SQL commands used to alter tables, views, or managed tables. This parameter is used to store the source workspace ID in the migrated tables, indicating the migration origin. A new utility method `sql_alter_from` has been added to the `Table` class in `tables.py` to generate the SQL command with the new parameter. Additionally, a new class-level attribute `UPGRADED_FROM_WS_PARAM` has been added to the `Table` class in `tables.py` to indicate the source workspace. A new property `upgraded_from_workspace_id` has been added to migrated tables to store the source workspace ID. These changes resolve issue [#899](#899) and are tested through manual testing, unit tests, and integration tests. No new CLI commands, workflows, or tables have been added or modified, and there are no changes to user documentation. * Added a command to create account level groups if they do not exist ([#763](#763)). This commit introduces a new feature that enables the creation of account-level groups if they do not already exist in the account. A new command, `create-account-groups`, has been added to the `databricks labs ucx` tool, which crawls all workspaces in the account and creates account-level groups if a corresponding workspace-local group is not found. The feature supports various scenarios, including creating account-level groups that exist in some workspaces but not in others, and creating multiple account-level groups with the same name but different members. Several new methods have been added to the `account.py` file to support the new feature, and the `test_account.py` file has been updated with new tests to ensure the correct behavior of the `create_account_level_groups` method. Additionally, the `cli.py` file has been updated to include the new `create-account-groups` command. With these changes, users can easily manage account-level groups and ensure that they are consistent across all workspaces in the account, improving the overall user experience. * Added assessment for the incompatible `RunSubmit` API usages ([#849](#849)). In this release, the assessment functionality for incompatible `RunSubmit` API usages has been significantly enhanced through various changes. The 'clusters.py' file has seen improvements in clarity and consistency with the renaming of private methods `check_spark_conf` to `_check_spark_conf` and `check_cluster_failures` to `_check_cluster_failures`. The `_assess_clusters` method has been updated to call the renamed `_check_cluster_failures` method for thorough checks of cluster configurations, resulting in better assessment functionality. A new `SubmitRunsCrawler` class has been added to the `databricks.labs.ucx.assessment.jobs` module, implementing `CrawlerBase`, `JobsMixin`, and `CheckClusterMixin` classes. This class crawls and assesses job runs based on their submitted runs, ensuring compatibility and identifying failure issues. Additionally, a new configuration attribute, `num_days_submit_runs_history`, has been introduced in the `WorkspaceConfig` class of the `config.py` module, controlling the number of days for which submission history of `RunSubmit` API calls is retained. Lastly, various new JSON files have been added for unit testing, assessing the `RunSubmit` API usages related to different scenarios like dbt task runs, Git source-based job runs, JAR file runs, and more. These tests will aid in identifying and addressing potential compatibility issues with the `RunSubmit` API. * Added group members difference to the output of `validate-groups-membership` cli command ([#995](#995)). The `validate-groups-membership` command has been updated to include a comparison of group memberships at both the account and workspace levels. This enhancement is implemented through the `validate_group_membership` function, which has been updated to calculate the difference in members between the two levels and display it in a new `group_members_difference` column. This allows for a more detailed analysis of group memberships and easily identifies any discrepancies between the account and workspace levels. The corresponding unit test file, "test_groups.py," has been updated to include a new test case that verifies the calculation of the `group_members_difference` value. The functionality of the other commands remains unchanged. The new `group_members_difference` value is calculated as the difference in the number of members in the workspace group and the account group, with a positive value indicating more members in the workspace group and a negative value indicating more members in the account group. The table template in the labs.yml file has also been updated to include the new column for the group membership difference. * Added handling for empty `directory_id` if managed identity encountered during the crawling of StoragePermissionMapping ([#986](#986)). This PR adds a `type` field to the `StoragePermissionMapping` and `Principal` dataclasses to differentiate between service principals and managed identities, allowing `None` for the `directory_id` field if the principal is not a service principal. During the migration to UC storage credentials, managed identities are currently ignored. These changes improve handling of managed identities during the crawling of `StoragePermissionMapping`, prevent errors when creating storage credentials with managed identities, and address issue [#339](#339). The changes are tested through unit tests, manual testing, and integration tests, and only affect the `StoragePermissionMapping` class and related methods, without introducing new commands, workflows, or tables. * Added migration for Azure Service Principals with secrets stored in Databricks Secret to UC Storage Credentials ([#874](#874)). In this release, we have made significant updates to migrate Azure Service Principals with their secrets stored in Databricks Secret to UC Storage Credentials, enhancing security and management of storage access. The changes include: Addition of a new `migrate_credentials` command in the `labs.yml` file to migrate credentials for storage access to UC storage credential. Modification of `secrets.py` to handle the case where a secret has been removed from the backend and to log warning messages for secrets with invalid Base64 bytes. Introduction of the `StorageCredentialManager` and `ServicePrincipalMigration` classes in `credentials.py` to manage Azure Service Principals and their associated client secrets, and to migrate them to UC Storage Credentials. Addition of a new `directory_id` attribute in the `Principal` class and its associated dataclass in `resources.py` to store the directory ID for creating UC storage credentials using a service principal. Creation of a new pytest fixture, `make_storage_credential_spn`, in `fixtures.py` to simplify writing tests requiring Databricks Storage Credentials with Azure Service Principal auth. Addition of a new test file for the Azure integration of the project, including new classes, methods, and test cases for testing the migration of Azure Service Principals to UC Storage Credentials. These improvements will ensure better security and management of storage access using Azure Service Principals, while providing more efficient and robust testing capabilities. * Added permission migration support for feature tables and the root permissions for models and feature tables ([#997](#997)). This commit introduces support for migration of permissions related to feature tables and sets root permissions for models and feature tables. New functions such as `feature_store_listing`, `feature_tables_root_page`, `models_root_page`, and `tokens_and_passwords` have been added to facilitate population of a workspace access page with necessary permissions information. The `factory` function in `manager.py` has been updated to include new listings for models' root page, feature tables' root page, and the feature store for enhanced management and access control of models and feature tables. New classes and methods have been implemented to handle permissions for these resources, utilizing `GenericPermissionsSupport`, `AccessControlRequest`, and `MigratedGroup` classes. Additionally, new test methods have been included to verify feature tables listing functionality and root page listing functionality for feature tables and registered models. The test manager method has been updated to include `feature-tables` in the list of items to be checked for permissions, ensuring comprehensive testing of permission functionality related to these new feature tables. * Added support for serving endpoints ([#990](#990)). In this release, we have made significant enhancements to support serving endpoints in our open-source library. The `fixtures.py` file in the `databricks.labs.ucx.mixins` module has been updated with new classes and functions to create and manage serving endpoints, accompanied by integration tests to verify their functionality. We have added a new listing for serving endpoints in the assessment's permissions crawling, using the `ws.serving_endpoints.list` function and the `serving-endpoints` category. A new integration test, "test_endpoints," has been added to verify that assessments now crawl permissions for serving endpoints. This test demonstrates the ability to migrate permissions from one group to another. The test suite has been updated to ensure the proper functioning of the new feature and improve the assessment of permissions for serving endpoints, ensuring compatibility with the updated `test_manager.py` file. * Expanded end-user documentation with detailed descriptions for workflows and commands ([#999](#999)). The Databricks Labs UCX project has been updated with several new features to assist in upgrading to Unity Catalog, including an assessment workflow that generates a detailed compatibility report for workspace entities, a group migration workflow for upgrading all Databricks workspace assets, and utility commands for managing cross-workspace installations. The Assessment Report now includes a more detailed summary of the assessment findings, table counts, database summaries, and external locations. Additional improvements include expanded workspace group migration to handle potential conflicts with locally scoped group names, enhanced documentation for external Hive Metastore integration, a new debugging notebook, and detailed descriptions of table upgrade considerations, data access permissions, external storage, and table crawler. * Fixed `config.yml` upgrade from very old versions ([#984](#984)). In this release, we've introduced enhancements to the configuration upgrading process for `config.yml` in our open-source library. We've replaced the previous `v1_migrate` class method with a new implementation that specifically handles migration from version 1. The new method retrieves the `groups` field, extracts the `selected` value, and assigns it to the `include_group_names` key in the configuration. The `backup_group_prefix` value from the `groups` field is assigned to the `renamed_group_prefix` key, and the `groups` field is removed, with the version number updated to 2. These changes simplify the code and improve readability, enabling users to upgrade smoothly from version 1 of the configuration. Furthermore, we've added new unit tests to the `test_config.py` file to ensure backward compatibility. Two new tests, `test_v1_migrate_zeroconf` and `test_v1_migrate_some_conf`, have been added, utilizing the `MockInstallation` class and loading the configuration using `WorkspaceConfig`. These tests enhance the robustness and reliability of the migration process for `config.yml`. * Renamed columns in assessment SQL queries to use actual names, not aliases ([#983](#983)). In this update, we have resolved an issue where aliases used for column references in SQL queries caused errors in certain setups by renaming them to use actual names. Specifically, for assessment SQL queries, we have modified the definition of the `is_delta` column to use the actual `table_format` name instead of the alias `format`. This change improves compatibility and enhances the reliability of query execution. As a software engineer, you will appreciate that this modification ensures consistent interpretation of column references across various setups, thereby avoiding potential errors caused by aliases. This change does not introduce any new methods, but instead modifies existing functionality to use actual column names, ensuring a more reliable and consistent SQL query for the `05_0_all_tables` assessment. * Updated groups permissions validation to use Table ACL cluster ([#979](#979)). In this update, the `validate_groups_permissions` task has been modified to utilize the Table ACL cluster, as indicated by the inclusion of `job_cluster="tacl"`. This task is responsible for ensuring that all crawled permissions are accurately applied to the destination groups by calling the `permission_manager.apply_group_permissions` method during the migration state. This modification enhances the validation of group permissions by performing it on the Table ACL cluster, potentially improving performance or functionality. If you are implementing this project, it is crucial to comprehend the consequences of this change on your permissions validation process and adjust your workflows appropriately. Update databricks-labs-blueprint requirement from ~=0.2.4 to ~=0.3.0 (#1001) Updates the requirements on [databricks-labs-blueprint](https://github.com/databrickslabs/blueprint) to permit the latest version. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/databrickslabs/blueprint/releases">databricks-labs-blueprint's releases</a>.</em></p> <blockquote> <h2>v0.3.0</h2> <ul> <li>Added automated upgrade framework (<a href="https://redirect.github.com/databrickslabs/blueprint/issues/50">#50</a>). This update introduces an automated upgrade framework for managing and applying upgrades to the product, with a new <code>upgrades.py</code> file that includes a <code>ProductInfo</code> class having methods for version handling, wheel building, and exception handling. The test code organization has been improved, and new test cases, functions, and a directory structure for fixtures and unit tests have been added for the upgrades functionality. The <code>test_wheels.py</code> file now checks the version of the Databricks SDK and handles cases where the version marker is missing or does not contain the <code>__version__</code> variable. Additionally, a new <code>Application State Migrations</code> section has been added to the README, explaining the process of seamless upgrades from version X to version Z through version Y, addressing the need for configuration or database state migrations as the application evolves. Users can apply these upgrades by following an idiomatic usage pattern involving several classes and functions. Furthermore, improvements have been made to the <code>_trim_leading_whitespace</code> function in the <code>commands.py</code> file of the <code>databricks.labs.blueprint</code> module, ensuring accurate and consistent removal of leading whitespace for each line in the command string, leading to better overall functionality and maintainability.</li> <li>Added brute-forcing <code>SerdeError</code> with <code>as_dict()</code> and <code>from_dict()</code> (<a href="https://redirect.github.com/databrickslabs/blueprint/issues/58">#58</a>). This commit introduces a brute-forcing approach for handling <code>SerdeError</code> using <code>as_dict()</code> and <code>from_dict()</code> methods in an open-source library. The new <code>SomePolicy</code> class demonstrates the usage of these methods for manual serialization and deserialization of custom classes. The <code>as_dict()</code> method returns a dictionary representation of the class instance, and the <code>from_dict()</code> method, decorated with <code>@classmethod</code>, creates a new instance from the provided dictionary. Additionally, the GitHub Actions workflow for acceptance tests has been updated to include the <code>ready_for_review</code> event type, ensuring that tests run not only for opened and synchronized pull requests but also when marked as "ready for review." These changes provide developers with more control over the deserialization process and facilitate debugging in cases where default deserialization fails, but should be used judiciously to avoid brittle code.</li> <li>Fixed nightly integration tests run as service principals (<a href="https://redirect.github.com/databrickslabs/blueprint/issues/52">#52</a>). In this release, we have enhanced the compatibility of our codebase with service principals, particularly in the context of nightly integration tests. The <code>Installation</code> class in the <code>databricks.labs.blueprint.installation</code> module has been refactored, deprecating the <code>current</code> method and introducing two new methods: <code>assume_global</code> and <code>assume_user_home</code>. These methods enable users to install and manage <code>blueprint</code> as either a global or user-specific installation. Additionally, the <code>existing</code> method has been updated to work with the new <code>Installation</code> methods. In the test suite, the <code>test_installation.py</code> file has been updated to correctly detect global and user-specific installations when running as a service principal. These changes improve the testability and functionality of our software, ensuring seamless operation with service principals during nightly integration tests.</li> <li>Made <code>test_existing_installations_are_detected</code> more resilient (<a href="https://redirect.github.com/databrickslabs/blueprint/issues/51">#51</a>). In this release, we have added a new test function <code>test_existing_installations_are_detected</code> that checks if existing installations are correctly detected and retries the test for up to 15 seconds if they are not. This improves the reliability of the test by making it more resilient to potential intermittent failures. We have also added an import from <code>databricks.sdk.retries</code> named <code>retried</code> which is used to retry the test function in case of an <code>AssertionError</code>. Additionally, the test function <code>test_existing</code> has been renamed to <code>test_existing_installations_are_detected</code> and the <code>xfail</code> marker has been removed. We have also renamed the test function <code>test_dataclass</code> to <code>test_loading_dataclass_from_installation</code> for better clarity. This change will help ensure that the library is correctly detecting existing installations and improve the overall quality of the codebase.</li> </ul> <p>Contributors: <a href="https://github.com/nfx"><code>@nfx</code></a></p> </blockquote> </details> <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/databrickslabs/blueprint/blob/main/CHANGELOG.md">databricks-labs-blueprint's changelog</a>.</em></p> <blockquote> <h2>0.3.0</h2> <ul> <li>Added automated upgrade framework (<a href="https://redirect.github.com/databrickslabs/blueprint/issues/50">#50</a>). This update introduces an automated upgrade framework for managing and applying upgrades to the product, with a new <code>upgrades.py</code> file that includes a <code>ProductInfo</code> class having methods for version handling, wheel building, and exception handling. The test code organization has been improved, and new test cases, functions, and a directory structure for fixtures and unit tests have been added for the upgrades functionality. The <code>test_wheels.py</code> file now checks the version of the Databricks SDK and handles cases where the version marker is missing or does not contain the <code>__version__</code> variable. Additionally, a new <code>Application State Migrations</code> section has been added to the README, explaining the process of seamless upgrades from version X to version Z through version Y, addressing the need for configuration or database state migrations as the application evolves. Users can apply these upgrades by following an idiomatic usage pattern involving several classes and functions. Furthermore, improvements have been made to the <code>_trim_leading_whitespace</code> function in the <code>commands.py</code> file of the <code>databricks.labs.blueprint</code> module, ensuring accurate and consistent removal of leading whitespace for each line in the command string, leading to better overall functionality and maintainability.</li> <li>Added brute-forcing <code>SerdeError</code> with <code>as_dict()</code> and <code>from_dict()</code> (<a href="https://redirect.github.com/databrickslabs/blueprint/issues/58">#58</a>). This commit introduces a brute-forcing approach for handling <code>SerdeError</code> using <code>as_dict()</code> and <code>from_dict()</code> methods in an open-source library. The new <code>SomePolicy</code> class demonstrates the usage of these methods for manual serialization and deserialization of custom classes. The <code>as_dict()</code> method returns a dictionary representation of the class instance, and the <code>from_dict()</code> method, decorated with <code>@classmethod</code>, creates a new instance from the provided dictionary. Additionally, the GitHub Actions workflow for acceptance tests has been updated to include the <code>ready_for_review</code> event type, ensuring that tests run not only for opened and synchronized pull requests but also when marked as "ready for review." These changes provide developers with more control over the deserialization process and facilitate debugging in cases where default deserialization fails, but should be used judiciously to avoid brittle code.</li> <li>Fixed nightly integration tests run as service principals (<a href="https://redirect.github.com/databrickslabs/blueprint/issues/52">#52</a>). In this release, we have enhanced the compatibility of our codebase with service principals, particularly in the context of nightly integration tests. The <code>Installation</code> class in the <code>databricks.labs.blueprint.installation</code> module has been refactored, deprecating the <code>current</code> method and introducing two new methods: <code>assume_global</code> and <code>assume_user_home</code>. These methods enable users to install and manage <code>blueprint</code> as either a global or user-specific installation. Additionally, the <code>existing</code> method has been updated to work with the new <code>Installation</code> methods. In the test suite, the <code>test_installation.py</code> file has been updated to correctly detect global and user-specific installations when running as a service principal. These changes improve the testability and functionality of our software, ensuring seamless operation with service principals during nightly integration tests.</li> <li>Made <code>test_existing_installations_are_detected</code> more resilient (<a href="https://redirect.github.com/databrickslabs/blueprint/issues/51">#51</a>). In this release, we have added a new test function <code>test_existing_installations_are_detected</code> that checks if existing installations are correctly detected and retries the test for up to 15 seconds if they are not. This improves the reliability of the test by making it more resilient to potential intermittent failures. We have also added an import from <code>databricks.sdk.retries</code> named <code>retried</code> which is used to retry the test function in case of an <code>AssertionError</code>. Additionally, the test function <code>test_existing</code> has been renamed to <code>test_existing_installations_are_detected</code> and the <code>xfail</code> marker has been removed. We have also renamed the test function <code>test_dataclass</code> to <code>test_loading_dataclass_from_installation</code> for better clarity. This change will help ensure that the library is correctly detecting existing installations and improve the overall quality of the codebase.</li> </ul> <h2>0.2.5</h2> <ul> <li>Automatically enable workspace filesystem if the feature is disabled (<a href="https://redirect.github.com/databrickslabs/blueprint/pull/42">#42</a>).</li> </ul> <h2>0.2.4</h2> <ul> <li>Added more integration tests for <code>Installation</code> (<a href="https://redirect.github.com/databrickslabs/blueprint/pull/39">#39</a>).</li> <li>Fixed <code>yaml</code> optional import error (<a href="https://redirect.github.com/databrickslabs/blueprint/pull/38">#38</a>).</li> </ul> <h2>0.2.3</h2> <ul> <li>Added special handling for notebooks in <code>Installation.upload(...)</code> (<a href="https://redirect.github.com/databrickslabs/blueprint/pull/36">#36</a>).</li> </ul> <h2>0.2.2</h2> <ul> <li>Fixed issues with uploading wheels to DBFS and loading a non-existing install state (<a href="https://redirect.github.com/databrickslabs/blueprint/pull/34">#34</a>).</li> </ul> <h2>0.2.1</h2> <ul> <li>Aligned <code>Installation</code> framework with UCX project (<a href="https://redirect.github.com/databrickslabs/blueprint/pull/32">#32</a>).</li> </ul> <h2>0.2.0</h2> <ul> <li>Added common install state primitives with strong typing (<a href="https://redirect.github.com/databrickslabs/blueprint/pull/27">#27</a>).</li> <li>Added documentation for Invoking Databricks Connect (<a href="https://redirect.github.com/databrickslabs/blueprint/pull/28">#28</a>).</li> <li>Added more documentation for Databricks CLI command router (<a href="https://redirect.github.com/databrickslabs/blueprint/pull/30">#30</a>).</li> <li>Enforced <code>pylint</code> standards (<a href="https://redirect.github.com/databrickslabs/blueprint/pull/29">#29</a>).</li> </ul> <h2>0.1.0</h2> <ul> <li>Changed python requirement from 3.10.6 to 3.10 (<a href="https://redirect.github.com/databrickslabs/blueprint/pull/25">#25</a>).</li> </ul> <h2>0.0.6</h2> <ul> <li>Make <code>find_project_root</code> more deterministic (<a href="https://redirect.github.com/databrickslabs/blueprint/pull/23">#23</a>).</li> </ul> <!-- raw HTML omitted --> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/databrickslabs/blueprint/commit/905e5ff5303a005d48bc98d101a613afeda15d51"><code>905e5ff</code></a> Release v0.3.0 (<a href="https://redirect.github.com/databrickslabs/blueprint/issues/59">#59</a>)</li> <li><a href="https://github.com/databrickslabs/blueprint/commit/a029f6bb1ecf807017754e298ea685326dbedf72"><code>a029f6b</code></a> Added brute-forcing <code>SerdeError</code> with <code>as_dict()</code> and <code>from_dict()</code> (<a href="https://redirect.github.com/databrickslabs/blueprint/issues/58">#58</a>)</li> <li><a href="https://github.com/databrickslabs/blueprint/commit/c8a74f4129b4592d365aac9670eb86069f3517f7"><code>c8a74f4</code></a> Added automated upgrade framework (<a href="https://redirect.github.com/databrickslabs/blueprint/issues/50">#50</a>)</li> <li><a href="https://github.com/databrickslabs/blueprint/commit/24e62ef4f060e43e02c92a7d082d95e8bc164317"><code>24e62ef</code></a> Don't run integration tests on draft pull requests (<a href="https://redirect.github.com/databrickslabs/blueprint/issues/55">#55</a>)</li> <li><a href="https://github.com/databrickslabs/blueprint/commit/b4dd5abf4eaf8d022ae0b6ec7e659296ec3d2f37"><code>b4dd5ab</code></a> Added tokei.rs badge (<a href="https://redirect.github.com/databrickslabs/blueprint/issues/54">#54</a>)</li> <li><a href="https://github.com/databrickslabs/blueprint/commit/01d9467f425763ab08035001270593253bce11f0"><code>01d9467</code></a> Fixed nightly integration tests run as service principals (<a href="https://redirect.github.com/databrickslabs/blueprint/issues/52">#52</a>)</li> <li><a href="https://github.com/databrickslabs/blueprint/commit/aa5714179c65be8e13f54601e1d1fcd70548342d"><code>aa57141</code></a> Made <code>test_existing_installations_are_detected</code> more resilient (<a href="https://redirect.github.com/databrickslabs/blueprint/issues/51">#51</a>)</li> <li><a href="https://github.com/databrickslabs/blueprint/commit/9cbc6f863d3ea06659f37939cf1b97115dd873bd"><code>9cbc6f8</code></a> Bump <code>databrickslabs/sandbox/acceptance</code> to v0.1.0 (<a href="https://redirect.github.com/databrickslabs/blueprint/issues/48">#48</a>)</li> <li><a href="https://github.com/databrickslabs/blueprint/commit/22fc1a8787b8e98de03048595202f88b7ddb9b94"><code>22fc1a8</code></a> Use <code>databrickslabs/sandbox/acceptance</code> action (<a href="https://redirect.github.com/databrickslabs/blueprint/issues/45">#45</a>)</li> <li><a href="https://github.com/databrickslabs/blueprint/commit/c7e47abd82b2f04e95b1d91f346cc1ea6df43961"><code>c7e47ab</code></a> Release v0.2.5 (<a href="https://redirect.github.com/databrickslabs/blueprint/issues/44">#44</a>)</li> <li>Additional commits viewable in <a href="https://github.com/databrickslabs/blueprint/compare/v0.2.4...v0.3.0">compare view</a></li> </ul> </details> <br /> Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Run integration tests only for pull requests ready for review (#1002) Tested on https://github.com/databrickslabs/blueprint Reducing flakiness of create account groups (#1003) Prompt user if Terraform utilised for deploying infrastructure (#1004) Added prompt is_terraform_used and updated the same in the config of WorkspaceInstaller Resolves #393 --------- Co-authored-by: Serge Smertin <259697+nfx@users.noreply.github.com> Update CONTRIBUTING.md (#1005) Closes #850 Added `databricks labs ucx create-uber-principal` command to create Azure Service Principal for migration (#976) - Added new cli cmd for create-master-principal in labs.yml, cli.py - Added separate class for AzureApiClient to separate out azure API calls - Added logic to create SPN, secret, roleassignment in resources and update workspace config with spn client_id - added logic to call create spn, update rbac of all storage account to that spn, update ucx cluster policy with spn secret for each storage account - test unit and int test cases Resolves #881 Related issues: - #993 - #693 - [ ] added relevant user documentation - [X] added new CLI command - [ ] modified existing command: `databricks labs ucx ...` - [ ] added a new workflow - [ ] modified existing workflow: `...` - [ ] added a new table - [ ] modified existing table: `...` - [X] manually tested - [X] added unit tests - [X] added integration tests - [ ] verified on staging environment (screenshot attached) Fix gitguardian warning caused by "hello world" secret used in unit test (#1010) Replace the plain encoded string by base64.b64encode to mitigate the gitguardian warning. <!-- DOC: Link issue with a keyword: close, closes, closed, fix, fixes, fixed, resolve, resolves, resolved. See https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword --> Resolves #.. - [ ] added relevant user documentation - [ ] added new CLI command - [ ] modified existing command: `databricks labs ucx ...` - [ ] added a new workflow - [ ] modified existing workflow: `...` - [ ] added a new table - [ ] modified existing table: `...` <!-- How is this tested? Please see the checklist below and also describe any other relevant tests --> - [ ] manually tested - [ ] added unit tests - [ ] added integration tests - [ ] verified on staging environment (screenshot attached) Create UC external locations in Azure based on migrated storage credentials (#992) Handle widget delete on upgrade platform bug (#1011)
author Vuong <vuong.nguyen@databricks.com> 1709738765 +0000 committer Vuong <vuong.nguyen@databricks.com> 1709812255 +0000 parent 7735a71 author Vuong <vuong.nguyen@databricks.com> 1709738765 +0000 committer Vuong <vuong.nguyen@databricks.com> 1709812237 +0000 parent 7735a71 author Vuong <vuong.nguyen@databricks.com> 1709738765 +0000 committer Vuong <vuong.nguyen@databricks.com> 1709812227 +0000 parent 7735a71 author Vuong <vuong.nguyen@databricks.com> 1709738765 +0000 committer Vuong <vuong.nguyen@databricks.com> 1709812214 +0000 parent 7735a71 author Vuong <vuong.nguyen@databricks.com> 1709738765 +0000 committer Vuong <vuong.nguyen@databricks.com> 1709812198 +0000 make fmt Added `upgraded_from_workspace_id` property to migrated tables to indicated the source workspace. (#987) Added table parameter `upgraded_from_ws` to migrated tables. The parameters contains the sources workspace id. Resolves #899 - [ ] added relevant user documentation - [ ] added new CLI command - [ ] modified existing command: `databricks labs ucx ...` - [ ] added a new workflow - [ ] modified existing workflow: `...` - [ ] added a new table - [ ] modified existing table: `...` <!-- How is this tested? Please see the checklist below and also describe any other relevant tests --> - [x] manually tested - [x] added unit tests - [x] added integration tests - [x] verified on staging environment (screenshot attached) Added group members difference to the output of `validate-groups-membership` cli command (#995) The `validate-groups-membership` command has been updated to include a comparison of group memberships at both the account and workspace levels, displaying the difference in members between the two levels in a new column. This enhancement allows for a more detailed analysis of group memberships, with the added functionality implemented in the `validate_group_membership` function in the `groups.py` file located in the `databricks/labs/ucx/workspace_access` directory. A new output field, "group\_members\_difference," has been added to represent the difference in the number of members between a workspace group and an associated account group. The corresponding unit test file, "test\_groups.py," has been updated to include a new test case that verifies the calculation of the "group\_members\_difference" value. This change provides users with a more comprehensive view of their group memberships and allows them to easily identify any discrepancies between the account and workspace levels. The functionality of the other commands remains unchanged. Improved installation integration test flakiness (#998) - improved `_infer_error_from_job_run` and `_infer_error_from_task_run` to also catch `KeyError` and `ValueError` - removed retries for `Unknown` errors for installation tests Expanded end-user documentation with detailed descriptions for workflows and commands (#999) The Databricks Labs UCX project has been updated with several new features to assist in upgrading to Unity Catalog. These include various workflows and command-line utilities, such as an assessment workflow that generates a detailed compatibility report for workspace entities and a group migration workflow to upgrade all Databricks workspace assets. Additionally, new utility commands have been added for managing cross-workspace installations, and users can now view deployed workflows' status and repair failed workflows. A new end-user documentation has also been introduced, featuring comprehensive descriptions of workflows, commands, and an assessment report image. The Assessment Report, generated from UCX tools, now includes a more detailed summary of the assessment findings, table counts, database summaries, and external locations. Improved documentation for external Hive Metastore integration and a new debugging notebook are also included in this release. Lastly, the workspace group migration feature has been expanded to handle potential conflicts when migrating multiple workspaces with locally scoped group names. Release v0.14.0 (#1000) * Added `upgraded_from_workspace_id` property to migrated tables to indicated the source workspace ([#987](#987)). In this release, updates have been made to the `_migrate_external_table`, `_migrate_dbfs_root_table`, and `_migrate_view` methods in the `table_migrate.py` file to include a new parameter `upgraded_from_ws` in the SQL commands used to alter tables, views, or managed tables. This parameter is used to store the source workspace ID in the migrated tables, indicating the migration origin. A new utility method `sql_alter_from` has been added to the `Table` class in `tables.py` to generate the SQL command with the new parameter. Additionally, a new class-level attribute `UPGRADED_FROM_WS_PARAM` has been added to the `Table` class in `tables.py` to indicate the source workspace. A new property `upgraded_from_workspace_id` has been added to migrated tables to store the source workspace ID. These changes resolve issue [#899](#899) and are tested through manual testing, unit tests, and integration tests. No new CLI commands, workflows, or tables have been added or modified, and there are no changes to user documentation. * Added a command to create account level groups if they do not exist ([#763](#763)). This commit introduces a new feature that enables the creation of account-level groups if they do not already exist in the account. A new command, `create-account-groups`, has been added to the `databricks labs ucx` tool, which crawls all workspaces in the account and creates account-level groups if a corresponding workspace-local group is not found. The feature supports various scenarios, including creating account-level groups that exist in some workspaces but not in others, and creating multiple account-level groups with the same name but different members. Several new methods have been added to the `account.py` file to support the new feature, and the `test_account.py` file has been updated with new tests to ensure the correct behavior of the `create_account_level_groups` method. Additionally, the `cli.py` file has been updated to include the new `create-account-groups` command. With these changes, users can easily manage account-level groups and ensure that they are consistent across all workspaces in the account, improving the overall user experience. * Added assessment for the incompatible `RunSubmit` API usages ([#849](#849)). In this release, the assessment functionality for incompatible `RunSubmit` API usages has been significantly enhanced through various changes. The 'clusters.py' file has seen improvements in clarity and consistency with the renaming of private methods `check_spark_conf` to `_check_spark_conf` and `check_cluster_failures` to `_check_cluster_failures`. The `_assess_clusters` method has been updated to call the renamed `_check_cluster_failures` method for thorough checks of cluster configurations, resulting in better assessment functionality. A new `SubmitRunsCrawler` class has been added to the `databricks.labs.ucx.assessment.jobs` module, implementing `CrawlerBase`, `JobsMixin`, and `CheckClusterMixin` classes. This class crawls and assesses job runs based on their submitted runs, ensuring compatibility and identifying failure issues. Additionally, a new configuration attribute, `num_days_submit_runs_history`, has been introduced in the `WorkspaceConfig` class of the `config.py` module, controlling the number of days for which submission history of `RunSubmit` API calls is retained. Lastly, various new JSON files have been added for unit testing, assessing the `RunSubmit` API usages related to different scenarios like dbt task runs, Git source-based job runs, JAR file runs, and more. These tests will aid in identifying and addressing potential compatibility issues with the `RunSubmit` API. * Added group members difference to the output of `validate-groups-membership` cli command ([#995](#995)). The `validate-groups-membership` command has been updated to include a comparison of group memberships at both the account and workspace levels. This enhancement is implemented through the `validate_group_membership` function, which has been updated to calculate the difference in members between the two levels and display it in a new `group_members_difference` column. This allows for a more detailed analysis of group memberships and easily identifies any discrepancies between the account and workspace levels. The corresponding unit test file, "test_groups.py," has been updated to include a new test case that verifies the calculation of the `group_members_difference` value. The functionality of the other commands remains unchanged. The new `group_members_difference` value is calculated as the difference in the number of members in the workspace group and the account group, with a positive value indicating more members in the workspace group and a negative value indicating more members in the account group. The table template in the labs.yml file has also been updated to include the new column for the group membership difference. * Added handling for empty `directory_id` if managed identity encountered during the crawling of StoragePermissionMapping ([#986](#986)). This PR adds a `type` field to the `StoragePermissionMapping` and `Principal` dataclasses to differentiate between service principals and managed identities, allowing `None` for the `directory_id` field if the principal is not a service principal. During the migration to UC storage credentials, managed identities are currently ignored. These changes improve handling of managed identities during the crawling of `StoragePermissionMapping`, prevent errors when creating storage credentials with managed identities, and address issue [#339](#339). The changes are tested through unit tests, manual testing, and integration tests, and only affect the `StoragePermissionMapping` class and related methods, without introducing new commands, workflows, or tables. * Added migration for Azure Service Principals with secrets stored in Databricks Secret to UC Storage Credentials ([#874](#874)). In this release, we have made significant updates to migrate Azure Service Principals with their secrets stored in Databricks Secret to UC Storage Credentials, enhancing security and management of storage access. The changes include: Addition of a new `migrate_credentials` command in the `labs.yml` file to migrate credentials for storage access to UC storage credential. Modification of `secrets.py` to handle the case where a secret has been removed from the backend and to log warning messages for secrets with invalid Base64 bytes. Introduction of the `StorageCredentialManager` and `ServicePrincipalMigration` classes in `credentials.py` to manage Azure Service Principals and their associated client secrets, and to migrate them to UC Storage Credentials. Addition of a new `directory_id` attribute in the `Principal` class and its associated dataclass in `resources.py` to store the directory ID for creating UC storage credentials using a service principal. Creation of a new pytest fixture, `make_storage_credential_spn`, in `fixtures.py` to simplify writing tests requiring Databricks Storage Credentials with Azure Service Principal auth. Addition of a new test file for the Azure integration of the project, including new classes, methods, and test cases for testing the migration of Azure Service Principals to UC Storage Credentials. These improvements will ensure better security and management of storage access using Azure Service Principals, while providing more efficient and robust testing capabilities. * Added permission migration support for feature tables and the root permissions for models and feature tables ([#997](#997)). This commit introduces support for migration of permissions related to feature tables and sets root permissions for models and feature tables. New functions such as `feature_store_listing`, `feature_tables_root_page`, `models_root_page`, and `tokens_and_passwords` have been added to facilitate population of a workspace access page with necessary permissions information. The `factory` function in `manager.py` has been updated to include new listings for models' root page, feature tables' root page, and the feature store for enhanced management and access control of models and feature tables. New classes and methods have been implemented to handle permissions for these resources, utilizing `GenericPermissionsSupport`, `AccessControlRequest`, and `MigratedGroup` classes. Additionally, new test methods have been included to verify feature tables listing functionality and root page listing functionality for feature tables and registered models. The test manager method has been updated to include `feature-tables` in the list of items to be checked for permissions, ensuring comprehensive testing of permission functionality related to these new feature tables. * Added support for serving endpoints ([#990](#990)). In this release, we have made significant enhancements to support serving endpoints in our open-source library. The `fixtures.py` file in the `databricks.labs.ucx.mixins` module has been updated with new classes and functions to create and manage serving endpoints, accompanied by integration tests to verify their functionality. We have added a new listing for serving endpoints in the assessment's permissions crawling, using the `ws.serving_endpoints.list` function and the `serving-endpoints` category. A new integration test, "test_endpoints," has been added to verify that assessments now crawl permissions for serving endpoints. This test demonstrates the ability to migrate permissions from one group to another. The test suite has been updated to ensure the proper functioning of the new feature and improve the assessment of permissions for serving endpoints, ensuring compatibility with the updated `test_manager.py` file. * Expanded end-user documentation with detailed descriptions for workflows and commands ([#999](#999)). The Databricks Labs UCX project has been updated with several new features to assist in upgrading to Unity Catalog, including an assessment workflow that generates a detailed compatibility report for workspace entities, a group migration workflow for upgrading all Databricks workspace assets, and utility commands for managing cross-workspace installations. The Assessment Report now includes a more detailed summary of the assessment findings, table counts, database summaries, and external locations. Additional improvements include expanded workspace group migration to handle potential conflicts with locally scoped group names, enhanced documentation for external Hive Metastore integration, a new debugging notebook, and detailed descriptions of table upgrade considerations, data access permissions, external storage, and table crawler. * Fixed `config.yml` upgrade from very old versions ([#984](#984)). In this release, we've introduced enhancements to the configuration upgrading process for `config.yml` in our open-source library. We've replaced the previous `v1_migrate` class method with a new implementation that specifically handles migration from version 1. The new method retrieves the `groups` field, extracts the `selected` value, and assigns it to the `include_group_names` key in the configuration. The `backup_group_prefix` value from the `groups` field is assigned to the `renamed_group_prefix` key, and the `groups` field is removed, with the version number updated to 2. These changes simplify the code and improve readability, enabling users to upgrade smoothly from version 1 of the configuration. Furthermore, we've added new unit tests to the `test_config.py` file to ensure backward compatibility. Two new tests, `test_v1_migrate_zeroconf` and `test_v1_migrate_some_conf`, have been added, utilizing the `MockInstallation` class and loading the configuration using `WorkspaceConfig`. These tests enhance the robustness and reliability of the migration process for `config.yml`. * Renamed columns in assessment SQL queries to use actual names, not aliases ([#983](#983)). In this update, we have resolved an issue where aliases used for column references in SQL queries caused errors in certain setups by renaming them to use actual names. Specifically, for assessment SQL queries, we have modified the definition of the `is_delta` column to use the actual `table_format` name instead of the alias `format`. This change improves compatibility and enhances the reliability of query execution. As a software engineer, you will appreciate that this modification ensures consistent interpretation of column references across various setups, thereby avoiding potential errors caused by aliases. This change does not introduce any new methods, but instead modifies existing functionality to use actual column names, ensuring a more reliable and consistent SQL query for the `05_0_all_tables` assessment. * Updated groups permissions validation to use Table ACL cluster ([#979](#979)). In this update, the `validate_groups_permissions` task has been modified to utilize the Table ACL cluster, as indicated by the inclusion of `job_cluster="tacl"`. This task is responsible for ensuring that all crawled permissions are accurately applied to the destination groups by calling the `permission_manager.apply_group_permissions` method during the migration state. This modification enhances the validation of group permissions by performing it on the Table ACL cluster, potentially improving performance or functionality. If you are implementing this project, it is crucial to comprehend the consequences of this change on your permissions validation process and adjust your workflows appropriately. Run integration tests only for pull requests ready for review (#1002) Tested on https://github.com/databrickslabs/blueprint Reducing flakiness of create account groups (#1003) Prompt user if Terraform utilised for deploying infrastructure (#1004) Added prompt is_terraform_used and updated the same in the config of WorkspaceInstaller Resolves #393 --------- Co-authored-by: Serge Smertin <259697+nfx@users.noreply.github.com> Update CONTRIBUTING.md (#1005) Closes #850 Fix gitguardian warning caused by "hello world" secret used in unit test (#1010) Replace the plain encoded string by base64.b64encode to mitigate the gitguardian warning. <!-- DOC: Link issue with a keyword: close, closes, closed, fix, fixes, fixed, resolve, resolves, resolved. See https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword --> Resolves #.. - [ ] added relevant user documentation - [ ] added new CLI command - [ ] modified existing command: `databricks labs ucx ...` - [ ] added a new workflow - [ ] modified existing workflow: `...` - [ ] added a new table - [ ] modified existing table: `...` <!-- How is this tested? Please see the checklist below and also describe any other relevant tests --> - [ ] manually tested - [ ] added unit tests - [ ] added integration tests - [ ] verified on staging environment (screenshot attached) Create UC external locations in Azure based on migrated storage credentials (#992) Handle widget delete on upgrade platform bug (#1011) Deprecate legacy installer (#1014) <img width="799" alt="image" src="https://github.com/databrickslabs/ucx/assets/259697/2aa5fed6-5734-44c2-87bc-39fbc214d5fa"> Automatically upgrade existing installations to avoid breaking changes (#985) This PR incorporates the work from databrickslabs/blueprint#50, which enables smoother cross-version upgrades. Fix #471 Added missing documentation for `create-uber-principal` command (#1015) Add `migrate-locations` command (#1016) Add cli command `migrate_locations` to create UC external location. <!-- DOC: Link issue with a keyword: close, closes, closed, fix, fixes, fixed, resolve, resolves, resolved. See https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword --> - [ ] added relevant user documentation - [ ] added new CLI command - [ ] modified existing command: `databricks labs ucx ...` - [ ] added a new workflow - [ ] modified existing workflow: `...` - [ ] added a new table - [ ] modified existing table: `...` <!-- How is this tested? Please see the checklist below and also describe any other relevant tests --> - [ ] manually tested - [ ] added unit tests - [ ] added integration tests - [ ] verified on staging environment (screenshot attached) Fix document for `migrate-locations` command (#1017) <!-- Summary of your changes that are easy to understand. Add screenshots when necessary --> <!-- DOC: Link issue with a keyword: close, closes, closed, fix, fixes, fixed, resolve, resolves, resolved. See https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword --> - [ ] added relevant user documentation - [ ] added new CLI command - [ ] modified existing command: `databricks labs ucx ...` - [ ] added a new workflow - [ ] modified existing workflow: `...` - [ ] added a new table - [ ] modified existing table: `...` <!-- How is this tested? Please see the checklist below and also describe any other relevant tests --> - [ ] manually tested - [ ] added unit tests - [ ] added integration tests - [ ] verified on staging environment (screenshot attached) Make code more readable by enforcing `max-nested-blocks = 3` with `pylint` (#1018) No logic changes, just for readability and to spare code reviewer's sanity. Added AWS S3 support for `migrate-locations` command (#1009) Release v0.15.0 (#1020) * Added AWS S3 support for `migrate-locations` command ([#1009](#1009)). In this release, the open-source library has been enhanced with AWS S3 support for the `migrate-locations` command, enabling efficient and secure management of S3 data. The new functionality includes the identification of missing S3 prefixes and the creation of corresponding roles and policies through the addition of methods `_identify_missing_paths`, `_get_existing_credentials_dict`, and `create_external_locations`. The library now also includes new classes `AwsIamRole`, `ExternalLocationInfo`, and `StorageCredentialInfo` for better handling of AWS-related functionality. Additionally, two new tests, `test_create_external_locations` and `test_create_external_locations_skip_existing`, have been added to ensure the correct behavior of the new AWS-related functionality. The new test function `test_migrate_locations_aws` checks the AWS-specific implementation of the `migrate-locations` command, while `test_missing_aws_cli` verifies the correct error message is displayed when the AWS CLI is not found in the system path. These changes enhance the library's capabilities, improving data security, privacy, and overall performance for users working with AWS S3. * Added `databricks labs ucx create-uber-principal` command to create Azure Service Principal for migration ([#976](#976)). The new CLI command, `databricks labs ucx create-uber-principal`, has been introduced to create an Azure Service Principal (SPN) and grant it STORAGE BLOB READER access on all the storage accounts used by the tables in the workspace. The SPN information is then stored in the UCX cluster policy. A new class, AzureApiClient, has been added to isolate Azure API calls, and unit and integration tests have been included to verify the functionality. This development enhances migration capabilities for Azure workspaces, providing a more streamlined and automated way to create and manage Service Principals, and improves the functionality and usability of the UCX tool. The changes are well-documented and follow the project's coding standards. * Added `migrate-locations` command ([#1016](#1016)). In this release, we've added a new CLI command, `migrate_locations`, to create Unity Catalog (UC) external locations. This command extracts candidates for location creation from the `guess_external_locations` assessment task and checks if corresponding UC Storage Credentials exist before creating the locations. Currently, the command only supports Azure, with plans to add support for AWS and GCP in the future. The `migrate_locations` function is marked with the `ucx.command` decorator and is available as a command-line interface (CLI) command. The pull request also includes unit tests for this new command, which check the environment (Azure, AWS, or GCP) before executing the migration and log a message if the environment is AWS or GCP, indicating that the migration is not yet supported on those platforms. No changes have been made to existing workflows, commands, or tables. * Added handling for widget delete on upgrade platform bug ([#1011](#1011)). In this release, the `_install_dashboard` method in `dashboards.py` has been updated to handle a platform bug that occurred during the deletion of dashboard widgets during an upgrade process (issue [#1011](#1011)). Previously, the method attempted to delete each widget using the `self._ws.dashboard_widgets.delete(widget.id)` command, which resulted in a `TypeError` when attempting to delete a widget. The updated method now includes a try/except block that catches this `TypeError` and logs a warning message, while also tracking the issue under bug ES-1061370. The rest of the method remains unchanged, creating a dashboard with the given name, role, and parent folder ID if no widgets are present. This enhancement improves the robustness of the `_install_dashboard` method by adding error handling for the SDK API response when deleting dashboard widgets, ensuring a smoother upgrade process. * Create UC external locations in Azure based on migrated storage credentials ([#992](#992)). The `locations.py` file in the `databricks.labs.ucx.azure` package has been updated to include a new class `ExternalLocationsMigration`, which creates UC external locations in Azure based on migrated storage credentials. This class takes various arguments, including `WorkspaceClient`, `HiveMetastoreLocations`, `AzureResourcePermissions`, and `AzureResources`. It has a `run()` method that lists any missing external locations in UC, extracts their location URLs, and attempts to create a UC external location with a mapped storage credential name if the missing external location is in the mapping. The class also includes helper methods for generating credential name mappings. Additionally, the `resources.py` file in the same package has been modified to include a new method `managed_identity_client_id`, which retrieves the client ID of a managed identity associated with a given access connector. Test functions for the `ExternalLocationsMigration` class and Azure external locations functionality have been added in the new file `test_locations.py`. The `test_resources.py` file has been updated to include tests for the `managed_identity_client_id` method. A new `mappings.json` file has also been added for tests related to Azure external location mappings based on migrated storage credentials. * Deprecate legacy installer ([#1014](#1014)). In this release, we have deprecated the legacy installer for the UCX project, which was previously implemented as a bash script. A warning message has been added to inform users about the deprecation and direct them to the UCX installation instructions. The functionality of the script remains unchanged, and it still performs tasks such as installing Python dependencies and building Python bindings. The script will eventually be replaced with the `databricks labs install ucx` command. This change is part of issue [#1014](#1014) and is intended to streamline the installation process and improve the overall user experience. We recommend that users update their installation process to the new recommended method as soon as possible to avoid any issues with the legacy installer in the future. * Prompt user if Terraform utilised for deploying infrastructure ([#1004](#1004)). In this update, the `config.py` file has been modified to include a new attribute, `is_terraform_used`, in the `WorkspaceConfig` class. This boolean flag indicates whether Terraform has been used for deploying certain entities in the workspace. Issue [#393](#393) has been addressed with this change. The `WorkspaceInstaller` configuration has also been updated to take advantage of this new attribute, allowing developers to determine if Terraform was used for infrastructure deployment, thereby increasing visibility into the deployment process. Additionally, a new prompt has been added to the `warehouse_type` function to ascertain if Terraform is being utilized for infrastructure deployment, setting the `is_terraform_used` variable to True if it is. This improvement is intended for software engineers adopting this open-source library. * Updated CONTRIBUTING.md ([#1005](#1005)). In this contribution to the open-source library, the CONTRIBUTING.md file has been significantly updated with clearer instructions on how to effectively contibute to the project. The previous command to print the Python path has been removed, as the IDE is now advised to be configured to use the Python interpreter from the virtual environment. A new step has been added, recommending the use of a consistent styleguide and formatting of the code before every commit. Moreover, it is now encouraged to run tests before committing to minimize potential issues during the review process. The steps on how to make a Fork from the ucx repo and create a PR have been updated with links to official documentation. Lastly, the commit now includes information on handling dependency errors that may occur after `git pull`. * Updated databricks-labs-blueprint requirement from ~=0.2.4 to ~=0.3.0 ([#1001](#1001)). In this pull request update, the requirements file, pyproject.toml, has been modified to upgrade the databricks-labs-blueprint package from version ~0.2.4 to ~0.3.0. This update integrates the latest features and bug fixes of the package, including an automated upgrade framework, a brute-forcing approach for handling SerdeError, and enhancements for running nightly integration tests with service principals. These improvements increase the testability and functionality of the software, ensuring its stable operation with service principals during nightly integration tests. Furthermore, the reliability of the test for detecting existing installations has been reinforced by adding a new test function that checks for the correct detection of existing installations and retries the test for up to 15 seconds if they are not. Dependency updates: * Updated databricks-labs-blueprint requirement from ~=0.2.4 to ~=0.3.0 ([#1001](#1001)).
…ls` command (#973) ## Changes <!-- Summary of your changes that are easy to understand. Add screenshots when necessary --> A few more things to be done - [x] Added `load` function to `AWSResourcePermissions` to return identified instance profiles - [x] Added `IamRoleMigration` class under `aws/credentials.py` to migrate AWS instance profiles identified ### Linked issues <!-- DOC: Link issue with a keyword: close, closes, closed, fix, fixes, fixed, resolve, resolves, resolved. See https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword --> Resolves #862 Related PR: - #874 ### Functionality - [x] added relevant user documentation - [x] added new CLI command `databricks labs ucx migrate-credentials` ### Tests <!-- How is this tested? Please see the checklist below and also describe any other relevant tests --> - [x] manually tested - [x] added unit tests - [x] added integration tests --------- Co-authored-by: qziyuan <91635877+qziyuan@users.noreply.github.com>
* Added AWS IAM roles support to `databricks labs ucx migrate-credentials` command ([#973](#973)). This commit adds AWS Identity and Access Management (IAM) roles support to the `databricks labs ucx migrate-credentials` command, resolving issue [#862](#862) and being related to pull request [#874](#874). It includes the addition of a `load` function to `AWSResourcePermissions` to return identified instance profiles and the creation of an `IamRoleMigration` class under `aws/credentials.py` to migrate identified AWS instance profiles. Additionally, user documentation and a new CLI command `databricks labs ucx migrate-credentials` have been added, and the changes have been thoroughly tested with manual, unit, and integration tests. The functionality additions include new methods such as `add_uc_role_policy` and `update_uc_trust_role`, among others, designed to facilitate the migration process for AWS IAM roles. * Added `create-catalogs-schemas` command to prepare destination catalogs and schemas before table migration ([#1028](#1028)). The Databricks Labs Unity Catalog (UCX) tool has been updated with a new `create-catalogs-schemas` command to facilitate the creation of destination catalogs and schemas prior to table migration. This command should be executed after the `create-table-mapping` command and is designed to prepare the workspace for migrating tables to UC. Additionally, a new `CatalogSchema` class has been added to the `hive_metastore` package to manage the creation of catalogs and schemas in the Hive metastore. This new functionality simplifies the process of preparing the destination Hive metastore for table migration, reducing the likelihood of user errors and ensuring that the metastore is properly configured. Unit tests have been added to the `tests/unit/hive_metastore` directory to verify the behavior of the `CatalogSchema` class and the new `create-catalogs-schemas` command. This command is intended for use in contexts where GCP is not supported. * Added automated upgrade option to set up cluster policy ([#1024](#1024)). This commit introduces an automated upgrade option for setting up a cluster policy for older versions of UCX, separating the cluster creation policy from install.py to installer.policy.py and adding an upgrade script for older UCX versions. A new class, `ClusterPolicyInstaller`, is added to the `policy.py` file in the `installer` package to manage the creation and update of a Databricks cluster policy for Unity Catalog Migration. This class handles creating a new cluster policy with specific configurations, extracting external Hive Metastore configurations, and updating job policies. Additionally, the commit includes refactoring, removal of library references, and a new script, v0.15.0_added_cluster_policy.py, which contains the upgrade function. The changes are tested through manual and automated testing with unit tests and integration tests. This feature is intended for software engineers working with the project. * Added crawling for init scripts on local files to assessment workflow ([#960](#960)). This commit introduces the ability to crawl init scripts stored on local files and S3 as part of the assessment workflow, resolving issue [#9](#9) * Added database filter for the `assessment` workflow ([#989](#989)). In this release, we have added a new configuration option, `include_databases`, to the assessment workflow which allows users to specify a list of databases to include for migration, rather than crawling all the databases in the Hive Metastore. This feature is implemented in the `TablesCrawler`, `UdfsCrawler`, `GrantsCrawler` classes and the associated functions such as `_all_databases`, `getIncludeDatabases`, `_select_databases`. These changes aim to improve efficiency and reduce unnecessary crawling, and are accompanied by modifications to existing functionality, as well as the addition of unit and integration tests. The changes have been manually tested and verified on a staging environment. * Estimate migration effort based on assessment database ([#1008](#1008)). In this release, a new functionality has been added to estimate the migration effort for each asset in the assessment database. The estimation is presented in days and is displayed on a new estimates dashboard with a summary widget for a global estimate per object type, along with assumptions and scope for each object type. A new `query` parameter has been added to the `SimpleQuery` class to support this feature. Additional changes include the update of the `_install_viz` and `_install_query` methods, the inclusion of the `data_source_id` in the query metadata, and the addition of tests to ensure the proper functioning of the new feature. A new fixture, `mock_installation_with_jobs`, has been added to support testing of the assessment estimates dashboard. * Explicitly write to `hive_metastore` from `crawl_tables` task ([#1021](#1021)). In this release, we have improved the clarity and specificity of our handling of the `hive_metastore` in the `crawl_tables` task. Previously, the `df.write.saveAsTable` method was used without explicitly specifying the `hive_metastore` database, which could result in ambiguity. To address this issue, we have updated the `saveAsTable` method to include the `hive_metastore` database, ensuring that tables are written to the correct location in the Hive metastore. These changes are confined to the `src/databricks/labs/ucx/hive_metastore/tables.scala` file and affect the `crawl_tables` task. While no new methods have been added, the existing `saveAsTable` method has been modified to enhance the accuracy and predictability of our interaction with the Hive metastore. * Improved documentation for `databricks labs ucx move` command ([#1025](#1025)). The `databricks labs ucx move` command has been updated with new improvements to its documentation, providing enhanced clarity and ease of use for developers and administrators. This command facilitates the movement of UC tables/table(s) from one schema to another, either in the same or different catalog, during the table upgrade process. A significant enhancement is the preservation of the source table's permissions when moving to a new schema or catalog, maintaining the original table's access controls, simplifying the management of table permissions, and streamlining the migration process. These improvements aim to facilitate a more efficient table migration experience, ensuring that developers and administrators can effectively manage their UC tables while maintaining the desired level of access control and security. * Updated databricks-sdk requirement from ~=0.20.0 to ~=0.21.0 ([#1030](#1030)). In this update, the `databricks-sdk` package requirement has been updated to version `~=0.21.0` from `~=0.20.0`. This new version addresses several bugs and provides enhancements, including the fix for the `get_workspace_client` method in GCP, the use of the `all-apis` scope with the external browser, and an attempt to initialize all Databricks globals. Moreover, the API's settings nesting approach has changed, which may cause compatibility issues with previous versions. Several new services and dataclasses have been added to the API, and documentation and examples have been updated accordingly. There are no updates to the `databricks-labs-blueprint` and `PyYAML` dependencies in this commit.
* Added AWS IAM roles support to `databricks labs ucx migrate-credentials` command ([#973](#973)). This commit adds AWS Identity and Access Management (IAM) roles support to the `databricks labs ucx migrate-credentials` command, resolving issue [#862](#862) and being related to pull request [#874](#874). It includes the addition of a `load` function to `AWSResourcePermissions` to return identified instance profiles and the creation of an `IamRoleMigration` class under `aws/credentials.py` to migrate identified AWS instance profiles. Additionally, user documentation and a new CLI command `databricks labs ucx migrate-credentials` have been added, and the changes have been thoroughly tested with manual, unit, and integration tests. The functionality additions include new methods such as `add_uc_role_policy` and `update_uc_trust_role`, among others, designed to facilitate the migration process for AWS IAM roles. * Added `create-catalogs-schemas` command to prepare destination catalogs and schemas before table migration ([#1028](#1028)). The Databricks Labs Unity Catalog (UCX) tool has been updated with a new `create-catalogs-schemas` command to facilitate the creation of destination catalogs and schemas prior to table migration. This command should be executed after the `create-table-mapping` command and is designed to prepare the workspace for migrating tables to UC. Additionally, a new `CatalogSchema` class has been added to the `hive_metastore` package to manage the creation of catalogs and schemas in the Hive metastore. This new functionality simplifies the process of preparing the destination Hive metastore for table migration, reducing the likelihood of user errors and ensuring that the metastore is properly configured. Unit tests have been added to the `tests/unit/hive_metastore` directory to verify the behavior of the `CatalogSchema` class and the new `create-catalogs-schemas` command. This command is intended for use in contexts where GCP is not supported. * Added automated upgrade option to set up cluster policy ([#1024](#1024)). This commit introduces an automated upgrade option for setting up a cluster policy for older versions of UCX, separating the cluster creation policy from install.py to installer.policy.py and adding an upgrade script for older UCX versions. A new class, `ClusterPolicyInstaller`, is added to the `policy.py` file in the `installer` package to manage the creation and update of a Databricks cluster policy for Unity Catalog Migration. This class handles creating a new cluster policy with specific configurations, extracting external Hive Metastore configurations, and updating job policies. Additionally, the commit includes refactoring, removal of library references, and a new script, v0.15.0_added_cluster_policy.py, which contains the upgrade function. The changes are tested through manual and automated testing with unit tests and integration tests. This feature is intended for software engineers working with the project. * Added crawling for init scripts on local files to assessment workflow ([#960](#960)). This commit introduces the ability to crawl init scripts stored on local files and S3 as part of the assessment workflow, resolving issue [#9](#9) * Added database filter for the `assessment` workflow ([#989](#989)). In this release, we have added a new configuration option, `include_databases`, to the assessment workflow which allows users to specify a list of databases to include for migration, rather than crawling all the databases in the Hive Metastore. This feature is implemented in the `TablesCrawler`, `UdfsCrawler`, `GrantsCrawler` classes and the associated functions such as `_all_databases`, `getIncludeDatabases`, `_select_databases`. These changes aim to improve efficiency and reduce unnecessary crawling, and are accompanied by modifications to existing functionality, as well as the addition of unit and integration tests. The changes have been manually tested and verified on a staging environment. * Estimate migration effort based on assessment database ([#1008](#1008)). In this release, a new functionality has been added to estimate the migration effort for each asset in the assessment database. The estimation is presented in days and is displayed on a new estimates dashboard with a summary widget for a global estimate per object type, along with assumptions and scope for each object type. A new `query` parameter has been added to the `SimpleQuery` class to support this feature. Additional changes include the update of the `_install_viz` and `_install_query` methods, the inclusion of the `data_source_id` in the query metadata, and the addition of tests to ensure the proper functioning of the new feature. A new fixture, `mock_installation_with_jobs`, has been added to support testing of the assessment estimates dashboard. * Explicitly write to `hive_metastore` from `crawl_tables` task ([#1021](#1021)). In this release, we have improved the clarity and specificity of our handling of the `hive_metastore` in the `crawl_tables` task. Previously, the `df.write.saveAsTable` method was used without explicitly specifying the `hive_metastore` database, which could result in ambiguity. To address this issue, we have updated the `saveAsTable` method to include the `hive_metastore` database, ensuring that tables are written to the correct location in the Hive metastore. These changes are confined to the `src/databricks/labs/ucx/hive_metastore/tables.scala` file and affect the `crawl_tables` task. While no new methods have been added, the existing `saveAsTable` method has been modified to enhance the accuracy and predictability of our interaction with the Hive metastore. * Improved documentation for `databricks labs ucx move` command ([#1025](#1025)). The `databricks labs ucx move` command has been updated with new improvements to its documentation, providing enhanced clarity and ease of use for developers and administrators. This command facilitates the movement of UC tables/table(s) from one schema to another, either in the same or different catalog, during the table upgrade process. A significant enhancement is the preservation of the source table's permissions when moving to a new schema or catalog, maintaining the original table's access controls, simplifying the management of table permissions, and streamlining the migration process. These improvements aim to facilitate a more efficient table migration experience, ensuring that developers and administrators can effectively manage their UC tables while maintaining the desired level of access control and security. * Updated databricks-sdk requirement from ~=0.20.0 to ~=0.21.0 ([#1030](#1030)). In this update, the `databricks-sdk` package requirement has been updated to version `~=0.21.0` from `~=0.20.0`. This new version addresses several bugs and provides enhancements, including the fix for the `get_workspace_client` method in GCP, the use of the `all-apis` scope with the external browser, and an attempt to initialize all Databricks globals. Moreover, the API's settings nesting approach has changed, which may cause compatibility issues with previous versions. Several new services and dataclasses have been added to the API, and documentation and examples have been updated accordingly. There are no updates to the `databricks-labs-blueprint` and `PyYAML` dependencies in this commit.
## Changes - Add client secret detection logic to Azure service principal crawler, this is needed for #874 ### Tests <!-- How is this tested? Please see the checklist below and also describe any other relevant tests --> - [x] manually tested - [x] added unit tests - [x] added integration tests
…cret to UC Storage Credentials (#874)
* Added `upgraded_from_workspace_id` property to migrated tables to indicated the source workspace ([#987](#987)). In this release, updates have been made to the `_migrate_external_table`, `_migrate_dbfs_root_table`, and `_migrate_view` methods in the `table_migrate.py` file to include a new parameter `upgraded_from_ws` in the SQL commands used to alter tables, views, or managed tables. This parameter is used to store the source workspace ID in the migrated tables, indicating the migration origin. A new utility method `sql_alter_from` has been added to the `Table` class in `tables.py` to generate the SQL command with the new parameter. Additionally, a new class-level attribute `UPGRADED_FROM_WS_PARAM` has been added to the `Table` class in `tables.py` to indicate the source workspace. A new property `upgraded_from_workspace_id` has been added to migrated tables to store the source workspace ID. These changes resolve issue [#899](#899) and are tested through manual testing, unit tests, and integration tests. No new CLI commands, workflows, or tables have been added or modified, and there are no changes to user documentation. * Added a command to create account level groups if they do not exist ([#763](#763)). This commit introduces a new feature that enables the creation of account-level groups if they do not already exist in the account. A new command, `create-account-groups`, has been added to the `databricks labs ucx` tool, which crawls all workspaces in the account and creates account-level groups if a corresponding workspace-local group is not found. The feature supports various scenarios, including creating account-level groups that exist in some workspaces but not in others, and creating multiple account-level groups with the same name but different members. Several new methods have been added to the `account.py` file to support the new feature, and the `test_account.py` file has been updated with new tests to ensure the correct behavior of the `create_account_level_groups` method. Additionally, the `cli.py` file has been updated to include the new `create-account-groups` command. With these changes, users can easily manage account-level groups and ensure that they are consistent across all workspaces in the account, improving the overall user experience. * Added assessment for the incompatible `RunSubmit` API usages ([#849](#849)). In this release, the assessment functionality for incompatible `RunSubmit` API usages has been significantly enhanced through various changes. The 'clusters.py' file has seen improvements in clarity and consistency with the renaming of private methods `check_spark_conf` to `_check_spark_conf` and `check_cluster_failures` to `_check_cluster_failures`. The `_assess_clusters` method has been updated to call the renamed `_check_cluster_failures` method for thorough checks of cluster configurations, resulting in better assessment functionality. A new `SubmitRunsCrawler` class has been added to the `databricks.labs.ucx.assessment.jobs` module, implementing `CrawlerBase`, `JobsMixin`, and `CheckClusterMixin` classes. This class crawls and assesses job runs based on their submitted runs, ensuring compatibility and identifying failure issues. Additionally, a new configuration attribute, `num_days_submit_runs_history`, has been introduced in the `WorkspaceConfig` class of the `config.py` module, controlling the number of days for which submission history of `RunSubmit` API calls is retained. Lastly, various new JSON files have been added for unit testing, assessing the `RunSubmit` API usages related to different scenarios like dbt task runs, Git source-based job runs, JAR file runs, and more. These tests will aid in identifying and addressing potential compatibility issues with the `RunSubmit` API. * Added group members difference to the output of `validate-groups-membership` cli command ([#995](#995)). The `validate-groups-membership` command has been updated to include a comparison of group memberships at both the account and workspace levels. This enhancement is implemented through the `validate_group_membership` function, which has been updated to calculate the difference in members between the two levels and display it in a new `group_members_difference` column. This allows for a more detailed analysis of group memberships and easily identifies any discrepancies between the account and workspace levels. The corresponding unit test file, "test_groups.py," has been updated to include a new test case that verifies the calculation of the `group_members_difference` value. The functionality of the other commands remains unchanged. The new `group_members_difference` value is calculated as the difference in the number of members in the workspace group and the account group, with a positive value indicating more members in the workspace group and a negative value indicating more members in the account group. The table template in the labs.yml file has also been updated to include the new column for the group membership difference. * Added handling for empty `directory_id` if managed identity encountered during the crawling of StoragePermissionMapping ([#986](#986)). This PR adds a `type` field to the `StoragePermissionMapping` and `Principal` dataclasses to differentiate between service principals and managed identities, allowing `None` for the `directory_id` field if the principal is not a service principal. During the migration to UC storage credentials, managed identities are currently ignored. These changes improve handling of managed identities during the crawling of `StoragePermissionMapping`, prevent errors when creating storage credentials with managed identities, and address issue [#339](#339). The changes are tested through unit tests, manual testing, and integration tests, and only affect the `StoragePermissionMapping` class and related methods, without introducing new commands, workflows, or tables. * Added migration for Azure Service Principals with secrets stored in Databricks Secret to UC Storage Credentials ([#874](#874)). In this release, we have made significant updates to migrate Azure Service Principals with their secrets stored in Databricks Secret to UC Storage Credentials, enhancing security and management of storage access. The changes include: Addition of a new `migrate_credentials` command in the `labs.yml` file to migrate credentials for storage access to UC storage credential. Modification of `secrets.py` to handle the case where a secret has been removed from the backend and to log warning messages for secrets with invalid Base64 bytes. Introduction of the `StorageCredentialManager` and `ServicePrincipalMigration` classes in `credentials.py` to manage Azure Service Principals and their associated client secrets, and to migrate them to UC Storage Credentials. Addition of a new `directory_id` attribute in the `Principal` class and its associated dataclass in `resources.py` to store the directory ID for creating UC storage credentials using a service principal. Creation of a new pytest fixture, `make_storage_credential_spn`, in `fixtures.py` to simplify writing tests requiring Databricks Storage Credentials with Azure Service Principal auth. Addition of a new test file for the Azure integration of the project, including new classes, methods, and test cases for testing the migration of Azure Service Principals to UC Storage Credentials. These improvements will ensure better security and management of storage access using Azure Service Principals, while providing more efficient and robust testing capabilities. * Added permission migration support for feature tables and the root permissions for models and feature tables ([#997](#997)). This commit introduces support for migration of permissions related to feature tables and sets root permissions for models and feature tables. New functions such as `feature_store_listing`, `feature_tables_root_page`, `models_root_page`, and `tokens_and_passwords` have been added to facilitate population of a workspace access page with necessary permissions information. The `factory` function in `manager.py` has been updated to include new listings for models' root page, feature tables' root page, and the feature store for enhanced management and access control of models and feature tables. New classes and methods have been implemented to handle permissions for these resources, utilizing `GenericPermissionsSupport`, `AccessControlRequest`, and `MigratedGroup` classes. Additionally, new test methods have been included to verify feature tables listing functionality and root page listing functionality for feature tables and registered models. The test manager method has been updated to include `feature-tables` in the list of items to be checked for permissions, ensuring comprehensive testing of permission functionality related to these new feature tables. * Added support for serving endpoints ([#990](#990)). In this release, we have made significant enhancements to support serving endpoints in our open-source library. The `fixtures.py` file in the `databricks.labs.ucx.mixins` module has been updated with new classes and functions to create and manage serving endpoints, accompanied by integration tests to verify their functionality. We have added a new listing for serving endpoints in the assessment's permissions crawling, using the `ws.serving_endpoints.list` function and the `serving-endpoints` category. A new integration test, "test_endpoints," has been added to verify that assessments now crawl permissions for serving endpoints. This test demonstrates the ability to migrate permissions from one group to another. The test suite has been updated to ensure the proper functioning of the new feature and improve the assessment of permissions for serving endpoints, ensuring compatibility with the updated `test_manager.py` file. * Expanded end-user documentation with detailed descriptions for workflows and commands ([#999](#999)). The Databricks Labs UCX project has been updated with several new features to assist in upgrading to Unity Catalog, including an assessment workflow that generates a detailed compatibility report for workspace entities, a group migration workflow for upgrading all Databricks workspace assets, and utility commands for managing cross-workspace installations. The Assessment Report now includes a more detailed summary of the assessment findings, table counts, database summaries, and external locations. Additional improvements include expanded workspace group migration to handle potential conflicts with locally scoped group names, enhanced documentation for external Hive Metastore integration, a new debugging notebook, and detailed descriptions of table upgrade considerations, data access permissions, external storage, and table crawler. * Fixed `config.yml` upgrade from very old versions ([#984](#984)). In this release, we've introduced enhancements to the configuration upgrading process for `config.yml` in our open-source library. We've replaced the previous `v1_migrate` class method with a new implementation that specifically handles migration from version 1. The new method retrieves the `groups` field, extracts the `selected` value, and assigns it to the `include_group_names` key in the configuration. The `backup_group_prefix` value from the `groups` field is assigned to the `renamed_group_prefix` key, and the `groups` field is removed, with the version number updated to 2. These changes simplify the code and improve readability, enabling users to upgrade smoothly from version 1 of the configuration. Furthermore, we've added new unit tests to the `test_config.py` file to ensure backward compatibility. Two new tests, `test_v1_migrate_zeroconf` and `test_v1_migrate_some_conf`, have been added, utilizing the `MockInstallation` class and loading the configuration using `WorkspaceConfig`. These tests enhance the robustness and reliability of the migration process for `config.yml`. * Renamed columns in assessment SQL queries to use actual names, not aliases ([#983](#983)). In this update, we have resolved an issue where aliases used for column references in SQL queries caused errors in certain setups by renaming them to use actual names. Specifically, for assessment SQL queries, we have modified the definition of the `is_delta` column to use the actual `table_format` name instead of the alias `format`. This change improves compatibility and enhances the reliability of query execution. As a software engineer, you will appreciate that this modification ensures consistent interpretation of column references across various setups, thereby avoiding potential errors caused by aliases. This change does not introduce any new methods, but instead modifies existing functionality to use actual column names, ensuring a more reliable and consistent SQL query for the `05_0_all_tables` assessment. * Updated groups permissions validation to use Table ACL cluster ([#979](#979)). In this update, the `validate_groups_permissions` task has been modified to utilize the Table ACL cluster, as indicated by the inclusion of `job_cluster="tacl"`. This task is responsible for ensuring that all crawled permissions are accurately applied to the destination groups by calling the `permission_manager.apply_group_permissions` method during the migration state. This modification enhances the validation of group permissions by performing it on the Table ACL cluster, potentially improving performance or functionality. If you are implementing this project, it is crucial to comprehend the consequences of this change on your permissions validation process and adjust your workflows appropriately.
…ls` command (#973) ## Changes <!-- Summary of your changes that are easy to understand. Add screenshots when necessary --> A few more things to be done - [x] Added `load` function to `AWSResourcePermissions` to return identified instance profiles - [x] Added `IamRoleMigration` class under `aws/credentials.py` to migrate AWS instance profiles identified ### Linked issues <!-- DOC: Link issue with a keyword: close, closes, closed, fix, fixes, fixed, resolve, resolves, resolved. See https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword --> Resolves #862 Related PR: - #874 ### Functionality - [x] added relevant user documentation - [x] added new CLI command `databricks labs ucx migrate-credentials` ### Tests <!-- How is this tested? Please see the checklist below and also describe any other relevant tests --> - [x] manually tested - [x] added unit tests - [x] added integration tests --------- Co-authored-by: qziyuan <91635877+qziyuan@users.noreply.github.com>
* Added AWS IAM roles support to `databricks labs ucx migrate-credentials` command ([#973](#973)). This commit adds AWS Identity and Access Management (IAM) roles support to the `databricks labs ucx migrate-credentials` command, resolving issue [#862](#862) and being related to pull request [#874](#874). It includes the addition of a `load` function to `AWSResourcePermissions` to return identified instance profiles and the creation of an `IamRoleMigration` class under `aws/credentials.py` to migrate identified AWS instance profiles. Additionally, user documentation and a new CLI command `databricks labs ucx migrate-credentials` have been added, and the changes have been thoroughly tested with manual, unit, and integration tests. The functionality additions include new methods such as `add_uc_role_policy` and `update_uc_trust_role`, among others, designed to facilitate the migration process for AWS IAM roles. * Added `create-catalogs-schemas` command to prepare destination catalogs and schemas before table migration ([#1028](#1028)). The Databricks Labs Unity Catalog (UCX) tool has been updated with a new `create-catalogs-schemas` command to facilitate the creation of destination catalogs and schemas prior to table migration. This command should be executed after the `create-table-mapping` command and is designed to prepare the workspace for migrating tables to UC. Additionally, a new `CatalogSchema` class has been added to the `hive_metastore` package to manage the creation of catalogs and schemas in the Hive metastore. This new functionality simplifies the process of preparing the destination Hive metastore for table migration, reducing the likelihood of user errors and ensuring that the metastore is properly configured. Unit tests have been added to the `tests/unit/hive_metastore` directory to verify the behavior of the `CatalogSchema` class and the new `create-catalogs-schemas` command. This command is intended for use in contexts where GCP is not supported. * Added automated upgrade option to set up cluster policy ([#1024](#1024)). This commit introduces an automated upgrade option for setting up a cluster policy for older versions of UCX, separating the cluster creation policy from install.py to installer.policy.py and adding an upgrade script for older UCX versions. A new class, `ClusterPolicyInstaller`, is added to the `policy.py` file in the `installer` package to manage the creation and update of a Databricks cluster policy for Unity Catalog Migration. This class handles creating a new cluster policy with specific configurations, extracting external Hive Metastore configurations, and updating job policies. Additionally, the commit includes refactoring, removal of library references, and a new script, v0.15.0_added_cluster_policy.py, which contains the upgrade function. The changes are tested through manual and automated testing with unit tests and integration tests. This feature is intended for software engineers working with the project. * Added crawling for init scripts on local files to assessment workflow ([#960](#960)). This commit introduces the ability to crawl init scripts stored on local files and S3 as part of the assessment workflow, resolving issue [#9](#9) * Added database filter for the `assessment` workflow ([#989](#989)). In this release, we have added a new configuration option, `include_databases`, to the assessment workflow which allows users to specify a list of databases to include for migration, rather than crawling all the databases in the Hive Metastore. This feature is implemented in the `TablesCrawler`, `UdfsCrawler`, `GrantsCrawler` classes and the associated functions such as `_all_databases`, `getIncludeDatabases`, `_select_databases`. These changes aim to improve efficiency and reduce unnecessary crawling, and are accompanied by modifications to existing functionality, as well as the addition of unit and integration tests. The changes have been manually tested and verified on a staging environment. * Estimate migration effort based on assessment database ([#1008](#1008)). In this release, a new functionality has been added to estimate the migration effort for each asset in the assessment database. The estimation is presented in days and is displayed on a new estimates dashboard with a summary widget for a global estimate per object type, along with assumptions and scope for each object type. A new `query` parameter has been added to the `SimpleQuery` class to support this feature. Additional changes include the update of the `_install_viz` and `_install_query` methods, the inclusion of the `data_source_id` in the query metadata, and the addition of tests to ensure the proper functioning of the new feature. A new fixture, `mock_installation_with_jobs`, has been added to support testing of the assessment estimates dashboard. * Explicitly write to `hive_metastore` from `crawl_tables` task ([#1021](#1021)). In this release, we have improved the clarity and specificity of our handling of the `hive_metastore` in the `crawl_tables` task. Previously, the `df.write.saveAsTable` method was used without explicitly specifying the `hive_metastore` database, which could result in ambiguity. To address this issue, we have updated the `saveAsTable` method to include the `hive_metastore` database, ensuring that tables are written to the correct location in the Hive metastore. These changes are confined to the `src/databricks/labs/ucx/hive_metastore/tables.scala` file and affect the `crawl_tables` task. While no new methods have been added, the existing `saveAsTable` method has been modified to enhance the accuracy and predictability of our interaction with the Hive metastore. * Improved documentation for `databricks labs ucx move` command ([#1025](#1025)). The `databricks labs ucx move` command has been updated with new improvements to its documentation, providing enhanced clarity and ease of use for developers and administrators. This command facilitates the movement of UC tables/table(s) from one schema to another, either in the same or different catalog, during the table upgrade process. A significant enhancement is the preservation of the source table's permissions when moving to a new schema or catalog, maintaining the original table's access controls, simplifying the management of table permissions, and streamlining the migration process. These improvements aim to facilitate a more efficient table migration experience, ensuring that developers and administrators can effectively manage their UC tables while maintaining the desired level of access control and security. * Updated databricks-sdk requirement from ~=0.20.0 to ~=0.21.0 ([#1030](#1030)). In this update, the `databricks-sdk` package requirement has been updated to version `~=0.21.0` from `~=0.20.0`. This new version addresses several bugs and provides enhancements, including the fix for the `get_workspace_client` method in GCP, the use of the `all-apis` scope with the external browser, and an attempt to initialize all Databricks globals. Moreover, the API's settings nesting approach has changed, which may cause compatibility issues with previous versions. Several new services and dataclasses have been added to the API, and documentation and examples have been updated accordingly. There are no updates to the `databricks-labs-blueprint` and `PyYAML` dependencies in this commit.
Changes
secrets.py
file, a method called_get_secret_if_exists
has been updated to handle cases where the secret value cannot be decoded to a utf-8 string. When aUnicodeDecodeError
is raised, the method will log a warning message and returnNone
.access.py
file, a new class calledStoragePermissionMapping
has been added. This class contains information about a storage permission mapping, including the client ID, principal, privilege, and directory ID.credentials.py
file, several new classes and functions have been added to handle the migration of Azure Service Principals to UC storage credentials. TheServicePrincipalMigrationInfo
class is used to represent a service principal with its client secret, and theStorageCredentialValidationResult
class is used to represent the result of validating a storage credential. TheStorageCredentialManager
class is used to manage storage credentials, including listing existing storage credentials, creating storage credentials with a client secret, and validating storage credentials. TheServicePrincipalMigration
class is used to migrate Azure Service Principals to UC storage credentials.resources.py
file, a newdirectory_id
field has been added to thePrincipal
class. This field is used to store the directory ID of a principal. Additionally, the_get_principal
method has been updated to set thedirectory_id
field of a principal.cli.py
file, a new command calledmigrate_azure_service_principals
has been added. This command is used to migrate Azure Service Principals to UC storage credentials.test_credentials.py
file, several new tests have been added to test the functionality of theStorageCredentialManager
andServicePrincipalMigration
classes.test_resources.py
file, thePrincipal
class has been updated to include thedirectory_id
field, and therole_assignments
methods have been updated to use thePrincipal
class with thedirectory_id
field.conftest.py
file, several new fixtures have been added to test the functionality of theStorageCredentialManager
andServicePrincipalMigration
classes. These fixtures includeStaticServicePrincipalMigration
,StaticStorageCredentialManager
,StaticServicePrincipalCrawler
, andStaticResourcePermissions
.Linked issues
Related to #339