-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mount Point crawler lists /Volume with four variations which is confusing #779
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #779 +/- ##
=======================================
Coverage 82.87% 82.88%
=======================================
Files 39 39
Lines 4567 4569 +2
Branches 849 850 +1
=======================================
+ Hits 3785 3787 +2
Misses 580 580
Partials 202 202 ☔ View full report in Codecov by Sentry. |
Mount(name="/Volume", source="DbfsReserved"), | ||
Mount(name="/Volumes", source="DbfsReserved"), | ||
Mount(name="/volume", source="DbfsReserved"), | ||
Mount(name="/volumes", source="DbfsReserved"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add normal mount and assert it appears
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
def test_list_mounts_should_return_a_deduped_list_of_mount_without_variable_volume_names(): | ||
mounts = [ | ||
Mount(name="/Volume", source="DbfsReserved"), | ||
Mount(name="/Volumes", source="DbfsReserved"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we just filter out dbfs reserved mounts instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
checked for DbfsReserved.
* Added `databricks labs ucx validate-groups-membership` command to validate groups to see if they have same membership across acount and workspace level ([#772](#772)). * Added baseline for getting Azure Resource Role Assignments ([#764](#764)). * Added issue and pull request templates ([#791](#791)). * Added linked issues to PR template ([#793](#793)). * Added optional `debug_truncate_bytes` parameter to the config and extend the default log truncation limit ([#782](#782)). * Added support for crawling grants and applying Hive Metastore UDF ACLs ([#812](#812)). * Changed Python requirement from 3.10.6 to 3.10 ([#805](#805)). * Extend error handling of delta issues in crawlers and hive metastore ([#795](#795)). * Fixed `databricks labs ucx repair-run` command to execute correctly ([#801](#801)). * Fixed handling of `DELTASHARING` table format ([#802](#802)). * Fixed listing of workflows via CLI ([#811](#811)). * Fixed logger import path for DEBUG notebook ([#792](#792)). * Fixed move table command to delete table/view regardless if permissions are present, skipping corrupted tables when crawling table size and making existing tests more stable ([#777](#777)). * Fixed the issue of `databricks labs ucx installations` and `databricks labs ucx manual-workspace-info` ([#814](#814)). * Increase the unit test coverage for cli.py ([#800](#800)). * Mount Point crawler lists /Volume with four variations which is confusing ([#779](#779)). * Updated README.md to remove mention of deprecated install.sh ([#781](#781)). * Updated `bug` issue template ([#797](#797)). * Fixed writing log readme in multiprocess safe way ([#794](#794)).
* Added `databricks labs ucx validate-groups-membership` command to validate groups to see if they have same membership across acount and workspace level ([#772](#772)). * Added baseline for getting Azure Resource Role Assignments ([#764](#764)). * Added issue and pull request templates ([#791](#791)). * Added linked issues to PR template ([#793](#793)). * Added optional `debug_truncate_bytes` parameter to the config and extend the default log truncation limit ([#782](#782)). * Added support for crawling grants and applying Hive Metastore UDF ACLs ([#812](#812)). * Changed Python requirement from 3.10.6 to 3.10 ([#805](#805)). * Extend error handling of delta issues in crawlers and hive metastore ([#795](#795)). * Fixed `databricks labs ucx repair-run` command to execute correctly ([#801](#801)). * Fixed handling of `DELTASHARING` table format ([#802](#802)). * Fixed listing of workflows via CLI ([#811](#811)). * Fixed logger import path for DEBUG notebook ([#792](#792)). * Fixed move table command to delete table/view regardless if permissions are present, skipping corrupted tables when crawling table size and making existing tests more stable ([#777](#777)). * Fixed the issue of `databricks labs ucx installations` and `databricks labs ucx manual-workspace-info` ([#814](#814)). * Increase the unit test coverage for cli.py ([#800](#800)). * Mount Point crawler lists /Volume with four variations which is confusing ([#779](#779)). * Updated README.md to remove mention of deprecated install.sh ([#781](#781)). * Updated `bug` issue template ([#797](#797)). * Fixed writing log readme in multiprocess safe way ([#794](#794)).
* Added `databricks labs ucx validate-groups-membership` command to validate groups to see if they have same membership across acount and workspace level ([#772](#772)). * Added baseline for getting Azure Resource Role Assignments ([#764](#764)). * Added issue and pull request templates ([#791](#791)). * Added linked issues to PR template ([#793](#793)). * Added optional `debug_truncate_bytes` parameter to the config and extend the default log truncation limit ([#782](#782)). * Added support for crawling grants and applying Hive Metastore UDF ACLs ([#812](#812)). * Changed Python requirement from 3.10.6 to 3.10 ([#805](#805)). * Extend error handling of delta issues in crawlers and hive metastore ([#795](#795)). * Fixed `databricks labs ucx repair-run` command to execute correctly ([#801](#801)). * Fixed handling of `DELTASHARING` table format ([#802](#802)). * Fixed listing of workflows via CLI ([#811](#811)). * Fixed logger import path for DEBUG notebook ([#792](#792)). * Fixed move table command to delete table/view regardless if permissions are present, skipping corrupted tables when crawling table size and making existing tests more stable ([#777](#777)). * Fixed the issue of `databricks labs ucx installations` and `databricks labs ucx manual-workspace-info` ([#814](#814)). * Increase the unit test coverage for cli.py ([#800](#800)). * Mount Point crawler lists /Volume with four variations which is confusing ([#779](#779)). * Updated README.md to remove mention of deprecated install.sh ([#781](#781)). * Updated `bug` issue template ([#797](#797)). * Fixed writing log readme in multiprocess safe way ([#794](#794)).
This pull request includes changes to the
locations.py
file in thedatabricks/labs/ucx/hive_metastore
directory and the addition of a new unit test intest_locations.py
. The changes inlocations.py
involve modifying the_deduplicate_mounts
method in theMounts
class. The method now checks if the name of theMount
object contains the word "volume", and if so, it standardizes the name to "/Volume". This is done to ensure that mounts with different casing for the word "volume" are treated as the same mount and not duplicated in the deduplicated list.The new unit test in
test_locations.py
verifies that the_deduplicate_mounts
method correctly deduplicates a list ofMount
objects with different casing for the word "volume" in their names. The test creates a list ofMount
objects with different casing for the word "volume" and passes it to the_deduplicate_mounts
method. The test then asserts that the deduplicated list contains only oneMount
object with the name "/Volume". This test ensures that the changes to the_deduplicate_mounts
method work as intended and prevent duplication of mounts with different casing for the word "volume" in their names.In summary, the changes in this pull request modify the
_deduplicate_mounts
method to standardize the name of mounts containing the word "volume" and add a unit test to ensure that the method correctly deduplicates a list ofMount
objects with different casing for the word "volume" in their names.