Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mount Point crawler lists /Volume with four variations which is confusing #779

Merged
merged 5 commits into from
Jan 14, 2024

Conversation

dipankarkush-db
Copy link
Contributor

@dipankarkush-db dipankarkush-db commented Jan 12, 2024

This pull request includes changes to the locations.py file in the databricks/labs/ucx/hive_metastore directory and the addition of a new unit test in test_locations.py. The changes in locations.py involve modifying the _deduplicate_mounts method in the Mounts class. The method now checks if the name of the Mount object contains the word "volume", and if so, it standardizes the name to "/Volume". This is done to ensure that mounts with different casing for the word "volume" are treated as the same mount and not duplicated in the deduplicated list.

The new unit test in test_locations.py verifies that the _deduplicate_mounts method correctly deduplicates a list of Mount objects with different casing for the word "volume" in their names. The test creates a list of Mount objects with different casing for the word "volume" and passes it to the _deduplicate_mounts method. The test then asserts that the deduplicated list contains only one Mount object with the name "/Volume". This test ensures that the changes to the _deduplicate_mounts method work as intended and prevent duplication of mounts with different casing for the word "volume" in their names.

In summary, the changes in this pull request modify the _deduplicate_mounts method to standardize the name of mounts containing the word "volume" and add a unit test to ensure that the method correctly deduplicates a list of Mount objects with different casing for the word "volume" in their names.

Copy link

codecov bot commented Jan 12, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (3f2bf08) 82.87% compared to head (dcc4dd7) 82.88%.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #779   +/-   ##
=======================================
  Coverage   82.87%   82.88%           
=======================================
  Files          39       39           
  Lines        4567     4569    +2     
  Branches      849      850    +1     
=======================================
+ Hits         3785     3787    +2     
  Misses        580      580           
  Partials      202      202           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@dipankarkush-db dipankarkush-db changed the title Fixes #622 Mount Point crawler lists /Volume with four variations which is confusing Jan 12, 2024
Mount(name="/Volume", source="DbfsReserved"),
Mount(name="/Volumes", source="DbfsReserved"),
Mount(name="/volume", source="DbfsReserved"),
Mount(name="/volumes", source="DbfsReserved"),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add normal mount and assert it appears

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

def test_list_mounts_should_return_a_deduped_list_of_mount_without_variable_volume_names():
mounts = [
Mount(name="/Volume", source="DbfsReserved"),
Mount(name="/Volumes", source="DbfsReserved"),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just filter out dbfs reserved mounts instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

checked for DbfsReserved.

@nfx nfx merged commit 6984873 into main Jan 14, 2024
7 checks passed
@nfx nfx deleted the fix/mount_point_volume branch January 14, 2024 10:39
nfx added a commit that referenced this pull request Jan 19, 2024
* Added `databricks labs ucx validate-groups-membership` command to validate groups to see if they have same membership across acount and workspace level ([#772](#772)).
* Added baseline for getting Azure Resource Role Assignments ([#764](#764)).
* Added issue and pull request templates ([#791](#791)).
* Added linked issues to PR template ([#793](#793)).
* Added optional `debug_truncate_bytes` parameter to the config and extend the default log truncation limit ([#782](#782)).
* Added support for crawling grants and applying Hive Metastore UDF ACLs ([#812](#812)).
* Changed Python requirement from 3.10.6 to 3.10 ([#805](#805)).
* Extend error handling of delta issues in crawlers and hive metastore ([#795](#795)).
* Fixed `databricks labs ucx repair-run` command to execute correctly ([#801](#801)).
* Fixed handling of `DELTASHARING` table format ([#802](#802)).
* Fixed listing of workflows via CLI ([#811](#811)).
* Fixed logger import path for DEBUG notebook ([#792](#792)).
* Fixed move table command to delete table/view regardless if permissions are present, skipping corrupted tables when crawling table size and making existing tests more stable ([#777](#777)).
* Fixed the issue of `databricks labs ucx installations` and `databricks labs ucx manual-workspace-info` ([#814](#814)).
* Increase the unit test coverage for cli.py ([#800](#800)).
* Mount Point crawler lists /Volume with four variations which is confusing ([#779](#779)).
* Updated README.md to remove mention of deprecated install.sh ([#781](#781)).
* Updated `bug` issue template ([#797](#797)).
* Fixed writing log readme in multiprocess safe way ([#794](#794)).
@nfx nfx mentioned this pull request Jan 19, 2024
nfx added a commit that referenced this pull request Jan 19, 2024
* Added `databricks labs ucx validate-groups-membership` command to
validate groups to see if they have same membership across acount and
workspace level
([#772](#772)).
* Added baseline for getting Azure Resource Role Assignments
([#764](#764)).
* Added issue and pull request templates
([#791](#791)).
* Added linked issues to PR template
([#793](#793)).
* Added optional `debug_truncate_bytes` parameter to the config and
extend the default log truncation limit
([#782](#782)).
* Added support for crawling grants and applying Hive Metastore UDF ACLs
([#812](#812)).
* Changed Python requirement from 3.10.6 to 3.10
([#805](#805)).
* Extend error handling of delta issues in crawlers and hive metastore
([#795](#795)).
* Fixed `databricks labs ucx repair-run` command to execute correctly
([#801](#801)).
* Fixed handling of `DELTASHARING` table format
([#802](#802)).
* Fixed listing of workflows via CLI
([#811](#811)).
* Fixed logger import path for DEBUG notebook
([#792](#792)).
* Fixed move table command to delete table/view regardless if
permissions are present, skipping corrupted tables when crawling table
size and making existing tests more stable
([#777](#777)).
* Fixed the issue of `databricks labs ucx installations` and `databricks
labs ucx manual-workspace-info`
([#814](#814)).
* Increase the unit test coverage for cli.py
([#800](#800)).
* Mount Point crawler lists /Volume with four variations which is
confusing ([#779](#779)).
* Updated README.md to remove mention of deprecated install.sh
([#781](#781)).
* Updated `bug` issue template
([#797](#797)).
* Fixed writing log readme in multiprocess safe way
([#794](#794)).
dmoore247 pushed a commit that referenced this pull request Mar 23, 2024
* Added `databricks labs ucx validate-groups-membership` command to
validate groups to see if they have same membership across acount and
workspace level
([#772](#772)).
* Added baseline for getting Azure Resource Role Assignments
([#764](#764)).
* Added issue and pull request templates
([#791](#791)).
* Added linked issues to PR template
([#793](#793)).
* Added optional `debug_truncate_bytes` parameter to the config and
extend the default log truncation limit
([#782](#782)).
* Added support for crawling grants and applying Hive Metastore UDF ACLs
([#812](#812)).
* Changed Python requirement from 3.10.6 to 3.10
([#805](#805)).
* Extend error handling of delta issues in crawlers and hive metastore
([#795](#795)).
* Fixed `databricks labs ucx repair-run` command to execute correctly
([#801](#801)).
* Fixed handling of `DELTASHARING` table format
([#802](#802)).
* Fixed listing of workflows via CLI
([#811](#811)).
* Fixed logger import path for DEBUG notebook
([#792](#792)).
* Fixed move table command to delete table/view regardless if
permissions are present, skipping corrupted tables when crawling table
size and making existing tests more stable
([#777](#777)).
* Fixed the issue of `databricks labs ucx installations` and `databricks
labs ucx manual-workspace-info`
([#814](#814)).
* Increase the unit test coverage for cli.py
([#800](#800)).
* Mount Point crawler lists /Volume with four variations which is
confusing ([#779](#779)).
* Updated README.md to remove mention of deprecated install.sh
([#781](#781)).
* Updated `bug` issue template
([#797](#797)).
* Fixed writing log readme in multiprocess safe way
([#794](#794)).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants