Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added functionality to determine migration method based on DBFS Root #759

Merged
merged 3 commits into from
Jan 8, 2024

Conversation

FastLee
Copy link
Contributor

@FastLee FastLee commented Jan 7, 2024

Change migration functionality to base migration type on whether the table is on DBFS root or not.

  • If the table location on a mount, sync the table to an external UC table
  • If the table points to a databrick's dataset skip it
  • If the table location is on a dbfs root location (dbfs location that is not a mount or a databricks dataset) deep clone the table to a managed UC table.
  • Otherwise sync the table to an external UC table

Copy link

codecov bot commented Jan 7, 2024

Codecov Report

Attention: 6 lines in your changes are missing coverage. Please review.

Comparison is base (ff97db8) 81.73% compared to head (010e9db) 81.70%.

Files Patch % Lines
src/databricks/labs/ucx/hive_metastore/mapping.py 0.00% 2 Missing and 1 partial ⚠️
src/databricks/labs/ucx/hive_metastore/tables.py 91.66% 1 Missing and 1 partial ⚠️
...atabricks/labs/ucx/hive_metastore/table_migrate.py 88.88% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #759      +/-   ##
==========================================
- Coverage   81.73%   81.70%   -0.03%     
==========================================
  Files          39       39              
  Lines        4374     4390      +16     
  Branches      810      815       +5     
==========================================
+ Hits         3575     3587      +12     
- Misses        593      596       +3     
- Partials      206      207       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@nfx nfx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also add integration test

src/databricks/labs/ucx/hive_metastore/mapping.py Outdated Show resolved Hide resolved
@@ -65,16 +66,13 @@ def _migrate_managed_table(self, src_table: Table, rule: Rule):

def _migrate_view(self, src_table: Table, rule: Rule):
target_table_key = rule.as_uc_table_key
table_migrate_sql = src_table.uc_create_sql(target_table_key)
table_migrate_sql = src_table.sql_migrate_view(target_table_key)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: this method would have been so much more natural being in TableToMigrate class, right? ;)

It has both the rule and the table . Can be in separate pr

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a nitpick at all. That makes sense. I have some other style things. I'll do it in a separate PR.

]
table_migrate = TablesMigrate(table_crawler, client, backend, table_mapping)
table_migrate.migrate_tables()

assert (list(backend.queries)) == [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to split this statement? It's more readable this way

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume you meant to break off the assert, in which case I have.

src/databricks/labs/ucx/hive_metastore/tables.py Outdated Show resolved Hide resolved
@nfx nfx merged commit db580a9 into main Jan 8, 2024
6 of 7 checks passed
@nfx nfx deleted the feature/dbfs_root_migrate_334 branch January 8, 2024 00:51
nfx added a commit that referenced this pull request Jan 12, 2024
* Added assessment step to estimate the size of DBFS root tables ([#741](#741)).
* Added `TableMapping` functionality to table migrate ([#752](#752)).
* Added `databricks labs ucx move` command to move tables and schemas between catalogs ([#756](#756)).
* Added functionality to determine migration method based on DBFS Root ([#759](#759)).
* Added `get_tables_to_migrate` functionality in the mapping module ([#755](#755)).
* Added retry and rate limit to rename workspace group operation and corrected rate limit for reflecting account groups to workspace ([#751](#751)).
* Adopted `databricks-labs-blueprint` library for common utilities to be reused in the other projects ([#758](#758)).
* Converted `RuntimeBackend` query executions exceptions to SDK exceptions ([#769](#769)).
* Fixed issue with missing users and temp groups after workspace-local groups migration and skip table when crawling table size if it does not exist anymore ([#770](#770)).
* Improved error handling by not failing group rename step if a group was removed from account before reflecting it to workspace ([#762](#762)).
* Improved error message inference from failed workflow runs ([#753](#753)).
* Moved `TablesMigrate` to a separate module ([#747](#747)).
* Reorganized assessment dashboard to increase readability ([#738](#738)).
* Updated databricks-sdk requirement from ~=0.16.0 to ~=0.17.0 ([#773](#773)).
* Verify metastore exists in current workspace ([#735](#735)).
@nfx nfx mentioned this pull request Jan 12, 2024
nfx added a commit that referenced this pull request Jan 12, 2024
* Added assessment step to estimate the size of DBFS root tables
([#741](#741)).
* Added `TableMapping` functionality to table migrate
([#752](#752)).
* Added `databricks labs ucx move` command to move tables and schemas
between catalogs
([#756](#756)).
* Added functionality to determine migration method based on DBFS Root
([#759](#759)).
* Added `get_tables_to_migrate` functionality in the mapping module
([#755](#755)).
* Added retry and rate limit to rename workspace group operation and
corrected rate limit for reflecting account groups to workspace
([#751](#751)).
* Adopted `databricks-labs-blueprint` library for common utilities to be
reused in the other projects
([#758](#758)).
* Converted `RuntimeBackend` query executions exceptions to SDK
exceptions ([#769](#769)).
* Fixed issue with missing users and temp groups after workspace-local
groups migration and skip table when crawling table size if it does not
exist anymore ([#770](#770)).
* Improved error handling by not failing group rename step if a group
was removed from account before reflecting it to workspace
([#762](#762)).
* Improved error message inference from failed workflow runs
([#753](#753)).
* Moved `TablesMigrate` to a separate module
([#747](#747)).
* Reorganized assessment dashboard to increase readability
([#738](#738)).
* Updated databricks-sdk requirement from ~=0.16.0 to ~=0.17.0
([#773](#773)).
* Verify metastore exists in current workspace
([#735](#735)).
FastLee pushed a commit that referenced this pull request Jan 19, 2024
* Added assessment step to estimate the size of DBFS root tables
([#741](#741)).
* Added `TableMapping` functionality to table migrate
([#752](#752)).
* Added `databricks labs ucx move` command to move tables and schemas
between catalogs
([#756](#756)).
* Added functionality to determine migration method based on DBFS Root
([#759](#759)).
* Added `get_tables_to_migrate` functionality in the mapping module
([#755](#755)).
* Added retry and rate limit to rename workspace group operation and
corrected rate limit for reflecting account groups to workspace
([#751](#751)).
* Adopted `databricks-labs-blueprint` library for common utilities to be
reused in the other projects
([#758](#758)).
* Converted `RuntimeBackend` query executions exceptions to SDK
exceptions ([#769](#769)).
* Fixed issue with missing users and temp groups after workspace-local
groups migration and skip table when crawling table size if it does not
exist anymore ([#770](#770)).
* Improved error handling by not failing group rename step if a group
was removed from account before reflecting it to workspace
([#762](#762)).
* Improved error message inference from failed workflow runs
([#753](#753)).
* Moved `TablesMigrate` to a separate module
([#747](#747)).
* Reorganized assessment dashboard to increase readability
([#738](#738)).
* Updated databricks-sdk requirement from ~=0.16.0 to ~=0.17.0
([#773](#773)).
* Verify metastore exists in current workspace
([#735](#735)).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants