Merge branch 'main' into add/reports

databrickslabs · Sep 20, 2023 · 6575e28 · 6575e28
2 parents 08da085 + 188eefa
commit 6575e28
Show file tree

Hide file tree

Showing 4 changed files with 87 additions and 81 deletions.
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -155,22 +155,24 @@ make lint test
 
 Here are the example steps to submit your first contribution:
 
-1. `git checkout main` (or `gcm` if you're using [ohmyzsh](https://ohmyz.sh/)).
-2. `git pull` (or `gl` if you're using [ohmyzsh](https://ohmyz.sh/)).
-3. `git checkout -b FEATURENAME` (or `gcb FEATURENAME` if you're using [ohmyzsh](https://ohmyz.sh/)).
-4. .. do the work
-5. `make fmt`
-6. `make lint`
-7. .. fix if any
-8. `make test`
+1. Make a Fork from ucx repo (if you really want to contribute)
+2. `git clone`
+3. `git checkout main` (or `gcm` if you're using [ohmyzsh](https://ohmyz.sh/)).
+4. `git pull` (or `gl` if you're using [ohmyzsh](https://ohmyz.sh/)).
+5. `git checkout -b FEATURENAME` (or `gcb FEATURENAME` if you're using [ohmyzsh](https://ohmyz.sh/)).
+6. .. do the work
+7. `make fmt`
+8. `make lint`
 9. .. fix if any
-10. `git commit -a`. Make sure to enter meaningful commit message title. 
-11. `git push origin FEATURENAME`
-12. Go to GitHub UI and create PR. Alternatively, `gh pr create` (if you have [GitHub CLI](https://cli.github.com/) installed). 
+10. `make test`
+11. .. fix if any
+12. `git commit -a`. Make sure to enter meaningful commit message title.
+13. `git push origin FEATURENAME`
+14. Go to GitHub UI and create PR. Alternatively, `gh pr create` (if you have [GitHub CLI](https://cli.github.com/) installed). 
     Use a meaningful pull request title because it'll appear in the release notes. Use `Resolves #NUMBER` in pull
     request description to [automatically link it](https://docs.github.com/en/get-started/writing-on-github/working-with-advanced-formatting/using-keywords-in-issues-and-pull-requests#linking-a-pull-request-to-an-issue)
-    to an existing issue.
-14. announce PR for the review
+    to an existing issue. 
+15. announce PR for the review
 
 ## Troubleshooting
 

diff --git a/README.md b/README.md
@@ -11,8 +11,9 @@ See [contributing instructions](CONTRIBUTING.md) to help improve this project.
 
 ## Installation
 
+First clone this project repo to your local environment (better, make a fork in case you make changes to contribute back). 
+
 The `./install.sh` script will guide you through installation process. 
-First clone this repo to your local environment to get install.sh and the rest of the project (better, make a fork in case you make changes to contribute back). 
 Make sure you have Python 3.10 (or greater) 
 installed on your workstation, and you've configured authentication for 
 the [Databricks Workspace](https://databricks-sdk-py.readthedocs.io/en/latest/authentication.html#default-authentication-flow).

diff --git a/src/databricks/labs/ucx/install.py b/src/databricks/labs/ucx/install.py
@@ -251,7 +251,7 @@ def _create_readme(self):
         self._ws.workspace.upload(path, intro.encode("utf8"), overwrite=True)
         url = self._notebook_link(path)
         logger.info(f"Created README notebook with job overview: {url}")
-        msg = "Open job overview in README notebook in your home directory"
+        msg = "Open job overview in README notebook in your home directory ?"
         if self._prompts and self._question(msg, default="yes") == "yes":
             webbrowser.open(url)
 

diff --git a/src/databricks/labs/ucx/runtime.py b/src/databricks/labs/ucx/runtime.py
@@ -23,28 +23,29 @@ def setup_schema(cfg: MigrationConfig):
 
 @task("assessment", depends_on=[setup_schema], notebook="hive_metastore/tables.scala")
 def crawl_tables(_: MigrationConfig):
-    """During this operation, a systematic scan is conducted, encompassing every table within the Hive Metastore.
-    This scan extracts essential details associated with each table, including its unique identifier or name, table
-    format, storage location details.
+    """In this procedure, we systematically scan every table stored within the Hive Metastore. This scanning process
+    retrieves vital information for each table, which includes its distinct identifier or name, table format, and
+    storage location details.
 
-    The extracted metadata is subsequently organized and cataloged within a dedicated storage entity known as
-    the `$inventory.tables` table. This table functions as a comprehensive inventory, providing a structured and
-    easily accessible reference point for users, data engineers, and administrators."""
+    The gathered metadata is then subsequently organized and documented within a designated storage entity referred to
+    as the `$inventory.tables` table. This table serves as an extensive inventory, offering a well-structured and
+    readily accessible point of reference for users, data engineers, and administrators."""
 
 
 @task("assessment", depends_on=[crawl_tables], job_cluster="tacl")
 def crawl_grants(cfg: MigrationConfig):
-    """During this operation, our process is designed to systematically scan and retrieve Legacy Table ACLs from
-    the Hive Metastore. This includes comprehensive details such as user and group permissions, role-based access
-    settings, and any custom access configurations. These ACLs are then thoughtfully organized and securely stored
-    within the `$inventory.grants` table. This dedicated table serves as a central repository, safeguarding the
-    continuity of access control data as we transition to the Databricks Unity Catalog.
-
-    By undertaking this meticulous migration of Legacy Table ACLs, we ensure that the data governance and security
-    framework established within our legacy Hive Metastore environment seamlessly carries over to our new Databricks
-    Unity Catalog setup. This approach not only safeguards data integrity and access control but also guarantees
-    a smooth and secure transition for our data assets, bolstering our commitment to data security and compliance
-    throughout the migration process and beyond."""
+    """During this process, our methodology is purposefully designed to systematically scan and retrieve ACLs
+    (Access Control Lists) associated with Legacy Tables from the Hive Metastore. These ACLs encompass comprehensive
+    information, including permissions for users and groups, role-based access settings, and any custom access
+    configurations. These ACLs are then thoughtfully structured and securely stored within the `$inventory.grants`
+    table. This dedicated table serves as a central repository, ensuring the uninterrupted preservation of access
+    control data as we transition to the Databricks Unity Catalog.
+
+    By meticulously migrating these Legacy Table ACLs, we guarantee the seamless transfer of the data governance and
+    security framework established in our legacy Hive Metastore environment to our new Databricks Unity Catalog
+    setup. This approach not only safeguards data integrity and access control but also ensures a smooth and
+    secure transition for our data assets. It reinforces our commitment to data security and compliance throughout the
+    migration process and beyond"""
     ws = WorkspaceClient(config=cfg.to_databricks_config())
     tacls = TaclToolkit(
         ws, inventory_catalog="hive_metastore", inventory_schema=cfg.inventory_database, databases=cfg.tacl.databases
@@ -54,32 +55,32 @@ def crawl_grants(cfg: MigrationConfig):
 
 @task("assessment", depends_on=[setup_schema])
 def inventorize_mounts(cfg: MigrationConfig):
-    """In this part of the assessment, we're going to scope the mount points that are going to be
-    migrated into Unity Catalog. Since these objects are not supported in the UC paragidm, part of the migration phase
-    is to migrate them into Unity Catalog External Locations.
+    """In this segment of the assessment, we will define the scope of the mount points intended for migration into the
+    Unity Catalog. As these objects are not compatible with the Unity Catalog paradigm, a key component of the
+    migration process involves transferring them to Unity Catalog External Locations.
 
-    The assessment is going in the workspace to list all the Mount points that has been created, and then store them in
-    the `$inventory.mounts` table, which will allow you to have a snapshot of your existing Mount Point infrastructure.
-    """
+    The assessment involves scanning the workspace to compile a list of all existing mount points and subsequently
+    storing this information in the `$inventory.mounts` table. This step enables you to create a snapshot of your
+    current Mount Point infrastructure, which is crucial for planning the migration."""
     ws = WorkspaceClient(config=cfg.to_databricks_config())
     mounts = Mounts(backend=RuntimeBackend(), ws=ws, inventory_database=cfg.inventory_database)
     mounts.inventorize_mounts()
 
 
 @task("assessment", depends_on=[setup_schema])
 def inventorize_permissions(cfg: MigrationConfig):
-    """As we embark on the complex migration journey from Hive Metastore to the Databricks Unity Catalog, a pivotal
-    aspect of this transition is the comprehensive examination and preservation of permissions associated with a myriad
-     of Databricks Workspace objects. These objects encompass a wide spectrum of resources such as clusters, cluster
-     policies, jobs, models, experiments, SQL warehouses, SQL alerts, dashboards, queries, AWS IAM instance profiles,
-     and secret scopes. Ensuring the continuity of permissions is essential not only for maintaining data security but
-     also for enabling a seamless and secure migration process.
-
-    Our meticulously designed operation systematically scans and extracts permissions across these diverse Databricks
-    Workspace objects. This encompasses user and group access rights, role-based permissions, custom access
-    configurations, and any specialized policies governing access to these resources. The outcome of this thorough scan
-    is methodically stored within the `$inventory.permissions` table, which serves as a central repository for
-    preserving and managing these critical access control details."""
+    """As we commence the intricate migration process from Hive Metastore to the Databricks Unity Catalog, a critical
+    element of this transition is the thorough examination and preservation of permissions linked to a wide array of
+    Databricks Workspace components. These components encompass a broad spectrum of resources, including clusters,
+    cluster policies, jobs, models, experiments, SQL warehouses, SQL alerts, dashboards, queries, AWS IAM instance
+    profiles, and secret scopes. Ensuring the uninterrupted continuity of permissions is of paramount importance,
+    as it not only upholds data security but also facilitates a smooth and secure migration journey.
+
+    Our carefully designed procedure systematically scans and extracts permissions associated with these diverse
+    Databricks Workspace objects. This process encompasses rights granted to users and groups, role-based permissions,
+    custom access configurations, and any specialized policies governing resource access. The results of this
+    meticulous scan are methodically stored within the `$inventory.permissions` table, which serves as a central
+    repository for preserving and managing these crucial access control details."""
     toolkit = GroupMigrationToolkit(cfg)
     toolkit.prepare_environment()
     toolkit.cleanup_inventory_table()
@@ -88,47 +89,49 @@ def inventorize_permissions(cfg: MigrationConfig):
 
 @task("assessment", depends_on=[crawl_tables, crawl_grants, inventorize_permissions], dashboard="assessment")
 def assessment_report(_: MigrationConfig):
-    """This report is meticulously crafted to evaluate and assess the readiness of a specific workspace for
-    the seamless adoption of the Unity Catalog.
+    """This meticulously prepared report serves the purpose of evaluating and gauging the preparedness of a specific
+    workspace for a smooth transition to the Unity Catalog.
 
-    Our assessment process encompasses a thorough analysis of various critical aspects, including data schemas, metadata
-    structures, permissions, access controls, data assets, and dependencies within the workspace. We delve deep into
-    the intricacies of the existing environment, taking into consideration factors such as the complexity of data
-    models, the intricacy of ACLs, the presence of custom scripts, and the overall data ecosystem.
+    Our assessment procedure involves a comprehensive examination of various critical elements, including data schemas,
+    metadata structures, permissions, access controls, data assets, and dependencies within the workspace. We dive deep
+    into the intricacies of the current environment, taking into account factors like the complexity of data models,
+    the intricacy of access control lists (ACLs), the existence of custom scripts, and the overall data ecosystem.
 
-    The result of this meticulous assessment is a detailed report that provides a holistic view of the workspace's
-    readiness for migration to the Databricks Unity Catalog. This report serves as a valuable guide, offering insights,
-    recommendations, and actionable steps to ensure a smooth and successful transition. It aids data engineers,
-    administrators, and decision-makers in making informed choices, mitigating potential challenges, and optimizing
-    the migration strategy.
+    The outcome of this thorough assessment is a comprehensive report that offers a holistic perspective on the
+    workspace's readiness for migration to the Databricks Unity Catalog. This report serves as a valuable resource,
+    provides insights, recommendations, and practical steps to ensure a seamless and successful transition.
+    It assists data engineers, administrators, and decision-makers in making informed decisions, addressing potential
+    challenges, and optimizing the migration strategy.
 
-    By creating this readiness assessment report, we demonstrate our commitment to a well-planned, risk-mitigated
-    migration process. It ensures that our migration to the Databricks Unity Catalog is not only efficient but also
-    aligns seamlessly with our data governance, security, and operational requirements, setting the stage for a new era
-    of data management excellence."""
+    Through the creation of this readiness assessment report, we demonstrate our commitment to a well-planned,
+    risk-mitigated migration process. It guarantees that our migration to the Databricks Unity Catalog is not only
+    efficient but also seamlessly aligns with our data governance, security, and operational requirements, paving the
+    way for a new era of excellence in data management."""
 
 
 @task("migrate-groups", depends_on=[inventorize_permissions])
 def migrate_permissions(cfg: MigrationConfig):
-    """As we embark on the intricate migration journey from Hive Metastore to the Databricks Unity Catalog, a pivotal
-    phase in this transition involves the careful orchestration of permissions. This multifaceted operation includes
-    the application of permissions to designated backup groups, the seamless substitution of workspace groups with
-    account groups, and the subsequent application of permissions to these newly formed account groups.
+    """As we embark on the complex journey of migrating from Hive Metastore to the Databricks Unity Catalog,
+    a crucial phase in this transition involves the careful management of permissions.
+    This intricate process entails several key steps: first, applying permissions to designated backup groups;
+    second, smoothly substituting workspace groups with account groups;
+    and finally, applying permissions to these newly established account groups.
 
-    During this meticulous process, existing permissions are thoughtfully mapped to backup groups, ensuring that data
-    security and access control remain robust and consistent throughout the migration. Simultaneously, workspace groups
-    are gracefully replaced with account groups to align with the structure and policies of the Databricks Unity
-    Catalog.
+    Throughout this meticulous process, we ensure that existing permissions are thoughtfully mapped to backup groups
+    to maintain robust and consistent data security and access control during the migration.
 
-    Once this transition is complete, permissions are diligently applied to the newly established account groups,
+    Concurrently, we gracefully replace workspace groups with account groups to align with the structure and policies
+    of the Databricks Unity Catalog.
+
+    Once this transition is complete, we diligently apply permissions to the newly formed account groups,
     preserving the existing access control framework while facilitating the seamless integration of data assets into
-    the Unity Catalog environment. This meticulous orchestration of permissions guarantees the continuity of data
-    security, minimizes disruption to data workflows, and enables a smooth migration experience for users and
-    administrators alike.
+    the Unity Catalog environment.
 
-    By undertaking this precise operation, we ensure that the transition to the Databricks Unity Catalog not only meets
-    data security and governance standards but also enhances the overall efficiency and manageability of our data
-    ecosystem, setting the stage for a new era of data management excellence within our organization."""
+    This careful orchestration of permissions guarantees the continuity of data security, minimizes disruptions to data
+    workflows, and ensures a smooth migration experience for both users and administrators. By executing this precise
+    operation, we not only meet data security and governance standards but also enhance the overall efficiency and
+    manageability of our data ecosystem, laying the foundation for a new era of data management excellence within our
+    organization."""
     toolkit = GroupMigrationToolkit(cfg)
     toolkit.prepare_environment()
     toolkit.apply_permissions_to_backup_groups()
@@ -138,7 +141,7 @@ def migrate_permissions(cfg: MigrationConfig):
 
 @task("migrate-groups-cleanup", depends_on=[migrate_permissions])
 def delete_backup_groups(cfg: MigrationConfig):
-    """Removes backup groups"""
+    """Removes workspace-level backup groups"""
     toolkit = GroupMigrationToolkit(cfg)
     toolkit.prepare_environment()
     toolkit.delete_backup_groups()