Skip to content

Commit

Permalink
Merge branch 'main' into add/reports
Browse files Browse the repository at this point in the history
  • Loading branch information
nfx authored Sep 20, 2023
2 parents 08da085 + 188eefa commit 6575e28
Show file tree
Hide file tree
Showing 4 changed files with 87 additions and 81 deletions.
28 changes: 15 additions & 13 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -155,22 +155,24 @@ make lint test

Here are the example steps to submit your first contribution:

1. `git checkout main` (or `gcm` if you're using [ohmyzsh](https://ohmyz.sh/)).
2. `git pull` (or `gl` if you're using [ohmyzsh](https://ohmyz.sh/)).
3. `git checkout -b FEATURENAME` (or `gcb FEATURENAME` if you're using [ohmyzsh](https://ohmyz.sh/)).
4. .. do the work
5. `make fmt`
6. `make lint`
7. .. fix if any
8. `make test`
1. Make a Fork from ucx repo (if you really want to contribute)
2. `git clone`
3. `git checkout main` (or `gcm` if you're using [ohmyzsh](https://ohmyz.sh/)).
4. `git pull` (or `gl` if you're using [ohmyzsh](https://ohmyz.sh/)).
5. `git checkout -b FEATURENAME` (or `gcb FEATURENAME` if you're using [ohmyzsh](https://ohmyz.sh/)).
6. .. do the work
7. `make fmt`
8. `make lint`
9. .. fix if any
10. `git commit -a`. Make sure to enter meaningful commit message title.
11. `git push origin FEATURENAME`
12. Go to GitHub UI and create PR. Alternatively, `gh pr create` (if you have [GitHub CLI](https://cli.github.com/) installed).
10. `make test`
11. .. fix if any
12. `git commit -a`. Make sure to enter meaningful commit message title.
13. `git push origin FEATURENAME`
14. Go to GitHub UI and create PR. Alternatively, `gh pr create` (if you have [GitHub CLI](https://cli.github.com/) installed).
Use a meaningful pull request title because it'll appear in the release notes. Use `Resolves #NUMBER` in pull
request description to [automatically link it](https://docs.github.com/en/get-started/writing-on-github/working-with-advanced-formatting/using-keywords-in-issues-and-pull-requests#linking-a-pull-request-to-an-issue)
to an existing issue.
14. announce PR for the review
to an existing issue.
15. announce PR for the review

## Troubleshooting

Expand Down
3 changes: 2 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,9 @@ See [contributing instructions](CONTRIBUTING.md) to help improve this project.

## Installation

First clone this project repo to your local environment (better, make a fork in case you make changes to contribute back).

The `./install.sh` script will guide you through installation process.
First clone this repo to your local environment to get install.sh and the rest of the project (better, make a fork in case you make changes to contribute back).
Make sure you have Python 3.10 (or greater)
installed on your workstation, and you've configured authentication for
the [Databricks Workspace](https://databricks-sdk-py.readthedocs.io/en/latest/authentication.html#default-authentication-flow).
Expand Down
2 changes: 1 addition & 1 deletion src/databricks/labs/ucx/install.py
Original file line number Diff line number Diff line change
Expand Up @@ -251,7 +251,7 @@ def _create_readme(self):
self._ws.workspace.upload(path, intro.encode("utf8"), overwrite=True)
url = self._notebook_link(path)
logger.info(f"Created README notebook with job overview: {url}")
msg = "Open job overview in README notebook in your home directory"
msg = "Open job overview in README notebook in your home directory ?"
if self._prompts and self._question(msg, default="yes") == "yes":
webbrowser.open(url)

Expand Down
135 changes: 69 additions & 66 deletions src/databricks/labs/ucx/runtime.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,28 +23,29 @@ def setup_schema(cfg: MigrationConfig):

@task("assessment", depends_on=[setup_schema], notebook="hive_metastore/tables.scala")
def crawl_tables(_: MigrationConfig):
"""During this operation, a systematic scan is conducted, encompassing every table within the Hive Metastore.
This scan extracts essential details associated with each table, including its unique identifier or name, table
format, storage location details.
"""In this procedure, we systematically scan every table stored within the Hive Metastore. This scanning process
retrieves vital information for each table, which includes its distinct identifier or name, table format, and
storage location details.
The extracted metadata is subsequently organized and cataloged within a dedicated storage entity known as
the `$inventory.tables` table. This table functions as a comprehensive inventory, providing a structured and
easily accessible reference point for users, data engineers, and administrators."""
The gathered metadata is then subsequently organized and documented within a designated storage entity referred to
as the `$inventory.tables` table. This table serves as an extensive inventory, offering a well-structured and
readily accessible point of reference for users, data engineers, and administrators."""


@task("assessment", depends_on=[crawl_tables], job_cluster="tacl")
def crawl_grants(cfg: MigrationConfig):
"""During this operation, our process is designed to systematically scan and retrieve Legacy Table ACLs from
the Hive Metastore. This includes comprehensive details such as user and group permissions, role-based access
settings, and any custom access configurations. These ACLs are then thoughtfully organized and securely stored
within the `$inventory.grants` table. This dedicated table serves as a central repository, safeguarding the
continuity of access control data as we transition to the Databricks Unity Catalog.
By undertaking this meticulous migration of Legacy Table ACLs, we ensure that the data governance and security
framework established within our legacy Hive Metastore environment seamlessly carries over to our new Databricks
Unity Catalog setup. This approach not only safeguards data integrity and access control but also guarantees
a smooth and secure transition for our data assets, bolstering our commitment to data security and compliance
throughout the migration process and beyond."""
"""During this process, our methodology is purposefully designed to systematically scan and retrieve ACLs
(Access Control Lists) associated with Legacy Tables from the Hive Metastore. These ACLs encompass comprehensive
information, including permissions for users and groups, role-based access settings, and any custom access
configurations. These ACLs are then thoughtfully structured and securely stored within the `$inventory.grants`
table. This dedicated table serves as a central repository, ensuring the uninterrupted preservation of access
control data as we transition to the Databricks Unity Catalog.
By meticulously migrating these Legacy Table ACLs, we guarantee the seamless transfer of the data governance and
security framework established in our legacy Hive Metastore environment to our new Databricks Unity Catalog
setup. This approach not only safeguards data integrity and access control but also ensures a smooth and
secure transition for our data assets. It reinforces our commitment to data security and compliance throughout the
migration process and beyond"""
ws = WorkspaceClient(config=cfg.to_databricks_config())
tacls = TaclToolkit(
ws, inventory_catalog="hive_metastore", inventory_schema=cfg.inventory_database, databases=cfg.tacl.databases
Expand All @@ -54,32 +55,32 @@ def crawl_grants(cfg: MigrationConfig):

@task("assessment", depends_on=[setup_schema])
def inventorize_mounts(cfg: MigrationConfig):
"""In this part of the assessment, we're going to scope the mount points that are going to be
migrated into Unity Catalog. Since these objects are not supported in the UC paragidm, part of the migration phase
is to migrate them into Unity Catalog External Locations.
"""In this segment of the assessment, we will define the scope of the mount points intended for migration into the
Unity Catalog. As these objects are not compatible with the Unity Catalog paradigm, a key component of the
migration process involves transferring them to Unity Catalog External Locations.
The assessment is going in the workspace to list all the Mount points that has been created, and then store them in
the `$inventory.mounts` table, which will allow you to have a snapshot of your existing Mount Point infrastructure.
"""
The assessment involves scanning the workspace to compile a list of all existing mount points and subsequently
storing this information in the `$inventory.mounts` table. This step enables you to create a snapshot of your
current Mount Point infrastructure, which is crucial for planning the migration."""
ws = WorkspaceClient(config=cfg.to_databricks_config())
mounts = Mounts(backend=RuntimeBackend(), ws=ws, inventory_database=cfg.inventory_database)
mounts.inventorize_mounts()


@task("assessment", depends_on=[setup_schema])
def inventorize_permissions(cfg: MigrationConfig):
"""As we embark on the complex migration journey from Hive Metastore to the Databricks Unity Catalog, a pivotal
aspect of this transition is the comprehensive examination and preservation of permissions associated with a myriad
of Databricks Workspace objects. These objects encompass a wide spectrum of resources such as clusters, cluster
policies, jobs, models, experiments, SQL warehouses, SQL alerts, dashboards, queries, AWS IAM instance profiles,
and secret scopes. Ensuring the continuity of permissions is essential not only for maintaining data security but
also for enabling a seamless and secure migration process.
Our meticulously designed operation systematically scans and extracts permissions across these diverse Databricks
Workspace objects. This encompasses user and group access rights, role-based permissions, custom access
configurations, and any specialized policies governing access to these resources. The outcome of this thorough scan
is methodically stored within the `$inventory.permissions` table, which serves as a central repository for
preserving and managing these critical access control details."""
"""As we commence the intricate migration process from Hive Metastore to the Databricks Unity Catalog, a critical
element of this transition is the thorough examination and preservation of permissions linked to a wide array of
Databricks Workspace components. These components encompass a broad spectrum of resources, including clusters,
cluster policies, jobs, models, experiments, SQL warehouses, SQL alerts, dashboards, queries, AWS IAM instance
profiles, and secret scopes. Ensuring the uninterrupted continuity of permissions is of paramount importance,
as it not only upholds data security but also facilitates a smooth and secure migration journey.
Our carefully designed procedure systematically scans and extracts permissions associated with these diverse
Databricks Workspace objects. This process encompasses rights granted to users and groups, role-based permissions,
custom access configurations, and any specialized policies governing resource access. The results of this
meticulous scan are methodically stored within the `$inventory.permissions` table, which serves as a central
repository for preserving and managing these crucial access control details."""
toolkit = GroupMigrationToolkit(cfg)
toolkit.prepare_environment()
toolkit.cleanup_inventory_table()
Expand All @@ -88,47 +89,49 @@ def inventorize_permissions(cfg: MigrationConfig):

@task("assessment", depends_on=[crawl_tables, crawl_grants, inventorize_permissions], dashboard="assessment")
def assessment_report(_: MigrationConfig):
"""This report is meticulously crafted to evaluate and assess the readiness of a specific workspace for
the seamless adoption of the Unity Catalog.
"""This meticulously prepared report serves the purpose of evaluating and gauging the preparedness of a specific
workspace for a smooth transition to the Unity Catalog.
Our assessment process encompasses a thorough analysis of various critical aspects, including data schemas, metadata
structures, permissions, access controls, data assets, and dependencies within the workspace. We delve deep into
the intricacies of the existing environment, taking into consideration factors such as the complexity of data
models, the intricacy of ACLs, the presence of custom scripts, and the overall data ecosystem.
Our assessment procedure involves a comprehensive examination of various critical elements, including data schemas,
metadata structures, permissions, access controls, data assets, and dependencies within the workspace. We dive deep
into the intricacies of the current environment, taking into account factors like the complexity of data models,
the intricacy of access control lists (ACLs), the existence of custom scripts, and the overall data ecosystem.
The result of this meticulous assessment is a detailed report that provides a holistic view of the workspace's
readiness for migration to the Databricks Unity Catalog. This report serves as a valuable guide, offering insights,
recommendations, and actionable steps to ensure a smooth and successful transition. It aids data engineers,
administrators, and decision-makers in making informed choices, mitigating potential challenges, and optimizing
the migration strategy.
The outcome of this thorough assessment is a comprehensive report that offers a holistic perspective on the
workspace's readiness for migration to the Databricks Unity Catalog. This report serves as a valuable resource,
provides insights, recommendations, and practical steps to ensure a seamless and successful transition.
It assists data engineers, administrators, and decision-makers in making informed decisions, addressing potential
challenges, and optimizing the migration strategy.
By creating this readiness assessment report, we demonstrate our commitment to a well-planned, risk-mitigated
migration process. It ensures that our migration to the Databricks Unity Catalog is not only efficient but also
aligns seamlessly with our data governance, security, and operational requirements, setting the stage for a new era
of data management excellence."""
Through the creation of this readiness assessment report, we demonstrate our commitment to a well-planned,
risk-mitigated migration process. It guarantees that our migration to the Databricks Unity Catalog is not only
efficient but also seamlessly aligns with our data governance, security, and operational requirements, paving the
way for a new era of excellence in data management."""


@task("migrate-groups", depends_on=[inventorize_permissions])
def migrate_permissions(cfg: MigrationConfig):
"""As we embark on the intricate migration journey from Hive Metastore to the Databricks Unity Catalog, a pivotal
phase in this transition involves the careful orchestration of permissions. This multifaceted operation includes
the application of permissions to designated backup groups, the seamless substitution of workspace groups with
account groups, and the subsequent application of permissions to these newly formed account groups.
"""As we embark on the complex journey of migrating from Hive Metastore to the Databricks Unity Catalog,
a crucial phase in this transition involves the careful management of permissions.
This intricate process entails several key steps: first, applying permissions to designated backup groups;
second, smoothly substituting workspace groups with account groups;
and finally, applying permissions to these newly established account groups.
During this meticulous process, existing permissions are thoughtfully mapped to backup groups, ensuring that data
security and access control remain robust and consistent throughout the migration. Simultaneously, workspace groups
are gracefully replaced with account groups to align with the structure and policies of the Databricks Unity
Catalog.
Throughout this meticulous process, we ensure that existing permissions are thoughtfully mapped to backup groups
to maintain robust and consistent data security and access control during the migration.
Once this transition is complete, permissions are diligently applied to the newly established account groups,
Concurrently, we gracefully replace workspace groups with account groups to align with the structure and policies
of the Databricks Unity Catalog.
Once this transition is complete, we diligently apply permissions to the newly formed account groups,
preserving the existing access control framework while facilitating the seamless integration of data assets into
the Unity Catalog environment. This meticulous orchestration of permissions guarantees the continuity of data
security, minimizes disruption to data workflows, and enables a smooth migration experience for users and
administrators alike.
the Unity Catalog environment.
By undertaking this precise operation, we ensure that the transition to the Databricks Unity Catalog not only meets
data security and governance standards but also enhances the overall efficiency and manageability of our data
ecosystem, setting the stage for a new era of data management excellence within our organization."""
This careful orchestration of permissions guarantees the continuity of data security, minimizes disruptions to data
workflows, and ensures a smooth migration experience for both users and administrators. By executing this precise
operation, we not only meet data security and governance standards but also enhance the overall efficiency and
manageability of our data ecosystem, laying the foundation for a new era of data management excellence within our
organization."""
toolkit = GroupMigrationToolkit(cfg)
toolkit.prepare_environment()
toolkit.apply_permissions_to_backup_groups()
Expand All @@ -138,7 +141,7 @@ def migrate_permissions(cfg: MigrationConfig):

@task("migrate-groups-cleanup", depends_on=[migrate_permissions])
def delete_backup_groups(cfg: MigrationConfig):
"""Removes backup groups"""
"""Removes workspace-level backup groups"""
toolkit = GroupMigrationToolkit(cfg)
toolkit.prepare_environment()
toolkit.delete_backup_groups()
Expand Down

0 comments on commit 6575e28

Please sign in to comment.