Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleaned up Job assessment and Cluster assessment to improve testing and reduce redundancy. #825

Merged
merged 5 commits into from
Jan 24, 2024

Conversation

FastLee
Copy link
Contributor

@FastLee FastLee commented Jan 22, 2024

Changes

Linked issues

closes #818
Relates to #823

Resolves #..

Functionality

  • added relevant user documentation
  • added new CLI command
  • modified existing command: databricks labs ucx ...
  • added a new workflow
  • modified existing workflow: ...
  • added a new table
  • modified existing table: ...

Tests

  • manually tested
  • added unit tests
  • added integration tests
  • verified on staging environment (screenshot attached)

Copy link

codecov bot commented Jan 22, 2024

Codecov Report

Attention: 2 lines in your changes are missing coverage. Please review.

Comparison is base (e36db5f) 85.39% compared to head (6d12d1a) 85.61%.
Report is 2 commits behind head on main.

Files Patch % Lines
src/databricks/labs/ucx/assessment/crawlers.py 33.33% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #825      +/-   ##
==========================================
+ Coverage   85.39%   85.61%   +0.21%     
==========================================
  Files          40       41       +1     
  Lines        5031     5212     +181     
  Branches      921      950      +29     
==========================================
+ Hits         4296     4462     +166     
- Misses        523      536      +13     
- Partials      212      214       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Collaborator

@nfx nfx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see #823

tests/unit/assessment/test_clusters.py Outdated Show resolved Hide resolved
assert len(result_set) == 1
assert result_set[0].success == 0
match = re.findall(fail_regex, result_set[0].failures)
assert len(match) == 2
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parse JSON and assert on concrete failures, don't rely tests on regexes. And mark this PR ready for review.

Added tests.
@FastLee FastLee changed the title Fix Multiple Failures in Cluster Assessment Cleaned up Job assessment and Cluster assessment to improve testing and reduce redundancy. Jan 24, 2024
@FastLee FastLee requested a review from nfx January 24, 2024 03:49
Copy link
Collaborator

@nfx nfx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Beautiful

def test_cluster_assessment():
ws = workspace_client_mock(clusters="assortment-conf.json")
crawler = ClustersCrawler(ws, MockBackend(), "ucx")
result_set = list(crawler.snapshot())

assert len(result_set) == 4
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you do assertions on failure messages received? That way we'll be confident in tests doing what they should

assert len(result_set) == 1
assert result_set[0].success == 0
failures = json.loads(result_set[0].failures)
assert 'unsupported config: spark.databricks.passthrough.enabled' in failures
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest to create some helper function for this, as we'll be doing it a lot

@nfx nfx marked this pull request as ready for review January 24, 2024 08:44
@nfx nfx requested review from a team and HariGS-DB January 24, 2024 08:44
@nfx nfx merged commit c03596e into main Jan 24, 2024
6 of 7 checks passed
@nfx nfx deleted the fix/clusters_multiple_issues_818 branch January 24, 2024 08:45
nfx pushed a commit that referenced this pull request Jan 25, 2024
Fixed and issue introduced with PR #825
qziyuan added a commit that referenced this pull request Jan 25, 2024
- fix conflicts in assessment/clusters.py and assessment/jobs.py from the PR #825 and PR #838
- move _check_cluster_failures logic into assessment/crawlers.py and let jobs and clusters call this function
nfx added a commit that referenced this pull request Jan 26, 2024
* Added `databricks labs ucx alias` command to create a view of tables from one schema/catalog in another schema/catalog ([#837](#837)).
* Added `databricks labs ucx save-aws-iam-profiles` command to scan instance profiles identify AWS S3 access and save a CSV with permissions ([#817](#817)).
* Added total view counts in the assessment dashboard ([#834](#834)).
* Cleaned up `assess_jobs` and `assess_clusters` tasks in the `assessment` workflow to improve testing and reduce redundancy.([#825](#825)).
* Added documentation for the assessment report ([#806](#806)).
* Fixed escaping for SQL object names ([#836](#836)).

Dependency updates:

 * Updated databricks-sdk requirement from ~=0.17.0 to ~=0.18.0 ([#832](#832)).
@nfx nfx mentioned this pull request Jan 26, 2024
nfx added a commit that referenced this pull request Jan 26, 2024
* Added `databricks labs ucx alias` command to create a view of tables
from one schema/catalog in another schema/catalog
([#837](#837)).
* Added `databricks labs ucx save-aws-iam-profiles` command to scan
instance profiles identify AWS S3 access and save a CSV with permissions
([#817](#817)).
* Added total view counts in the assessment dashboard
([#834](#834)).
* Cleaned up `assess_jobs` and `assess_clusters` tasks in the
`assessment` workflow to improve testing and reduce
redundancy.([#825](#825)).
* Added documentation for the assessment report
([#806](#806)).
* Fixed escaping for SQL object names
([#836](#836)).

Dependency updates:

* Updated databricks-sdk requirement from ~=0.17.0 to ~=0.18.0
([#832](#832)).
dmoore247 pushed a commit that referenced this pull request Mar 23, 2024
dmoore247 pushed a commit that referenced this pull request Mar 23, 2024
Fixed and issue introduced with PR #825
dmoore247 pushed a commit that referenced this pull request Mar 23, 2024
* Added `databricks labs ucx alias` command to create a view of tables
from one schema/catalog in another schema/catalog
([#837](#837)).
* Added `databricks labs ucx save-aws-iam-profiles` command to scan
instance profiles identify AWS S3 access and save a CSV with permissions
([#817](#817)).
* Added total view counts in the assessment dashboard
([#834](#834)).
* Cleaned up `assess_jobs` and `assess_clusters` tasks in the
`assessment` workflow to improve testing and reduce
redundancy.([#825](#825)).
* Added documentation for the assessment report
([#806](#806)).
* Fixed escaping for SQL object names
([#836](#836)).

Dependency updates:

* Updated databricks-sdk requirement from ~=0.17.0 to ~=0.18.0
([#832](#832)).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG]: Only one issue per cluster was identified during assessment (v0.8.0)
2 participants