-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
crawl_permissions
takes too long to run on a large workspace
#380
Comments
crawl_permissions
takes too long to run on a large workspacecrawl_permissions
takes too long to run on a large workspace
we cannot defy the laws of physics with networking... |
Make it optional! |
This would make migration of local groups impossible |
@nfx The api runs at 4-50 requests per second. The API is pretty slow on a good day (>1 second each call) I just noticed, we default parallelism to 8 but number of vCPUs per driver/worker is 2 so we're over subscribed. |
@dmoore247 Python Global Interpreter Lock doesn't have any effect on IO, so CPU oversubscription concerns are not applicable here |
## Changes - Add experimental support for group permission migrations using new API. The new workflow is called `migrate-groups-experimental` which uses the new API for all permissions migration, except for Legacy Table ACL which still leverage the current approach. - This API does not require specifying resources to migrate, which simplify the codebase - Extend integration tests for both existing & new code path. - Add initial performance testing code. Initial results suggests that the new API has much better scaling behaviour ### Linked issues <!-- DOC: Link issue with a keyword: close, closes, closed, fix, fixes, fixed, resolve, resolves, resolved. See https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword --> Resolves #380 ### Functionality - [x] added new workflow: `migrate-groups-experimental` ### Tests <!-- How is this tested? Please see the checklist below and also describe any other relevant tests --> - [x] manually tested - [x] added unit tests - [x] added integration tests - [x] verified on staging environment (screenshot attached)
In testing the assessment job on a large (3000+ user workspace)
crawl_permissions
takes over 8 hours to run. This job has not run to completion for this workspace, starting with ucx 0.2.0, also on ucx 0.3.0Suggest...
a. make it run faster (more threads)
b. do less work
c. make it do save checkpoints and re-start from where it last failed.
The text was updated successfully, but these errors were encountered: