Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor Azure service principal crawler and fix bug where tenant_id inside secret scope is not detected #942

Merged
merged 17 commits into from
Feb 15, 2024

Conversation

nkvuong
Copy link
Contributor

@nkvuong nkvuong commented Feb 13, 2024

Changes

  • Fix tenant_id detection logic In the case where spn endpoint configuration is also inside a secret scope.
  • Refactoring spn crawler logic to make it more readable

Linked issues

Closes #896

Tests

  • manually tested
  • added unit tests
  • added integration tests
  • verified on staging environment (screenshot attached)

Copy link

codecov bot commented Feb 13, 2024

Codecov Report

Attention: 12 lines in your changes are missing coverage. Please review.

Comparison is base (fd0b604) 87.82% compared to head (6f604b0) 87.88%.

Files Patch % Lines
src/databricks/labs/ucx/assessment/azure.py 88.77% 6 Missing and 5 partials ⚠️
src/databricks/labs/ucx/assessment/clusters.py 85.71% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #942      +/-   ##
==========================================
+ Coverage   87.82%   87.88%   +0.06%     
==========================================
  Files          43       43              
  Lines        5230     5185      -45     
  Branches      945      929      -16     
==========================================
- Hits         4593     4557      -36     
+ Misses        434      432       -2     
+ Partials      203      196       -7     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

}
)
return spn_list
return list(generate_service_principals(deduped_service_principals))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you make a set of AzureServicePrincipalInfo and return those directly instead of working with dictionaries - you don't need a dedupe block - dataclasses already have __hash__.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point :)

storage_account: str | None


def generate_service_principals(service_principals: list[dict]):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't need this top-level function ;)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also cleaned up some nonsensical tests

@@ -23,161 +23,113 @@
@dataclass
class AzureServicePrincipalInfo:
# fs.azure.account.oauth2.client.id
application_id: str
application_id: str | None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't it broken if all fields are none? we need at least application_id

tenant_id = client_endpoint_list[3]

# add output to spn list
spn_list.append(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i meant creating the AzureServicePrincipalInfo from here directly, without using intermediate dictionary structure. can you change this place and the other couple of places where we construct this dict?

assert secret.value is not None
return base64.b64decode(secret.value).decode("utf-8")
except NotFound:
logger.warning(f'removed on the backend: {secret_scope}"/"{secret_key}')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
logger.warning(f'removed on the backend: {secret_scope}"/"{secret_key}')
logger.warning(f'removed on the backend: {secret_scope}/{secret_key}')

Copy link
Contributor

@nfx nfx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm

@nkvuong nkvuong changed the title Azure service principal crawler bug fix & refactoring Refactor Azure service principal crawler and fix bug where tenant_id inside secret scope is not detected Feb 15, 2024
@nkvuong nkvuong changed the title Refactor Azure service principal crawler and fix bug where tenant_id inside secret scope is not detected Refactor Azure service principal crawler and fix bug where tenant_id inside secret scope is not detected Feb 15, 2024
@nfx nfx merged commit 2a6b090 into main Feb 15, 2024
6 of 7 checks passed
@nfx nfx deleted the fix/spn-crawler branch February 15, 2024 17:26
nfx added a commit that referenced this pull request Feb 16, 2024
* Added CLI Command `databricks labs ucx principal-prefix-access` ([#949](#949)).
* Added a widget with all jobs to track migration progress ([#940](#940)).
* Added legacy cluster types to the assessment result ([#932](#932)).
* Cleanup of install documentation ([#951](#951), [#947](#947)).
* Fixed `WorkspaceConfig` initialization for `DEBUG` notebook ([#934](#934)).
* Fixed installer not opening config file during the installation ([#945](#945)).
* Fixed groups in config file not considered for group migration job ([#943](#943)).
* Fixed bug where `tenant_id` inside secret scope is not detected ([#942](#942)).
@nfx nfx mentioned this pull request Feb 16, 2024
nfx added a commit that referenced this pull request Feb 16, 2024
* Added CLI Command `databricks labs ucx principal-prefix-access`
([#949](#949)).
* Added a widget with all jobs to track migration progress
([#940](#940)).
* Added legacy cluster types to the assessment result
([#932](#932)).
* Cleanup of install documentation
([#951](#951),
[#947](#947)).
* Fixed `WorkspaceConfig` initialization for `DEBUG` notebook
([#934](#934)).
* Fixed installer not opening config file during the installation
([#945](#945)).
* Fixed groups in config file not considered for group migration job
([#943](#943)).
* Fixed bug where `tenant_id` inside secret scope is not detected
([#942](#942)).
dmoore247 pushed a commit that referenced this pull request Mar 23, 2024
…` inside secret scope is not detected (#942)

## Changes
- Fix tenant_id detection logic In the case where spn endpoint
configuration is also inside a secret scope.
- Refactoring spn crawler logic to make it more readable

### Linked issues
Closes #896

### Tests
<!-- How is this tested? Please see the checklist below and also
describe any other relevant tests -->

- [x] manually tested
- [ ] added unit tests
- [ ] added integration tests
- [ ] verified on staging environment (screenshot attached)
dmoore247 pushed a commit that referenced this pull request Mar 23, 2024
* Added CLI Command `databricks labs ucx principal-prefix-access`
([#949](#949)).
* Added a widget with all jobs to track migration progress
([#940](#940)).
* Added legacy cluster types to the assessment result
([#932](#932)).
* Cleanup of install documentation
([#951](#951),
[#947](#947)).
* Fixed `WorkspaceConfig` initialization for `DEBUG` notebook
([#934](#934)).
* Fixed installer not opening config file during the installation
([#945](#945)).
* Fixed groups in config file not considered for group migration job
([#943](#943)).
* Fixed bug where `tenant_id` inside secret scope is not detected
([#942](#942)).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG]: SPN crawling logic does not work correctly for cluster
2 participants