Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VM: create LRO polling mechanism improvement #1346

Closed
yugangw-msft opened this issue Nov 16, 2016 · 9 comments
Closed

VM: create LRO polling mechanism improvement #1346

yugangw-msft opened this issue Nov 16, 2016 · 9 comments
Assignees
Labels
Compute az vm/vmss/image/disk/snapshot
Milestone

Comments

@yugangw-msft
Copy link
Contributor

For now, we pull every 5 seconds.
We need to have this aggressive polling only for the first 400 seconds. After that we can have an incremental back-off. Having a flat polling every 5 seconds will have high risk of running into the ARM throttling limits.

@yugangw-msft yugangw-msft added this to the Sprint 7 milestone Nov 16, 2016
@yugangw-msft yugangw-msft self-assigned this Nov 16, 2016
@tjprescott
Copy link
Member

The whole point of the 5 second polling interval was so that truly long running LROs (like VM create) didn't incur a penalty of roughly 1/2 the polling delay. If we end up using a short delay for short LROs and a long delay for long LROs, it seems like we have the same perf hit for the long LROs while just adding more requests for the short ones for no discernible benefit (don't know of anyone complaining about the relatively quick LROs).

@yugangw-msft
Copy link
Contributor Author

Agreed with your point.
Since it is for VM, and most time, I saw the VM gets created within 3 minutes(180 seconds). So for now, I think this is for edge case when the service was slowed down for reasons.

@yugangw-msft
Copy link
Contributor Author

Also, this bug was opened per folks doing perf evaluation from other teams

@tjprescott
Copy link
Member

If we are going to end up with long polling intervals for long LROs, then why not just go back to the default of 30 sec?

@yugangw-msft
Copy link
Contributor Author

agree

@mayurid mayurid added the Compute az vm/vmss/image/disk/snapshot label Nov 16, 2016
@yugangw-msft
Copy link
Contributor Author

I suggest we live with it, for 2 reasons

  1. 400 seconds is 6 minutes and 20 seconds. The VM creation currently takes about 2~4 minutes, so the new logic will not make an impact mostly.
    2 . Python AutoRestAzure LRO polling doesn't appear to have the extensibility to dynamically update the polling interval, unless we hack it. //cc: @annatisch @lmazuel

If #1 becomes a problem say it starts to take more than 400 seconds, we can consider to updating the AutoRestAzure

@lmazuel
Copy link
Member

lmazuel commented Dec 7, 2016

Are you saying we should ignore the RetryAfter Header of the ARM spec? Or provide a callback or something like this to open to customisation easily?

@yugangw-msft
Copy link
Contributor Author

yugangw-msft commented Dec 7, 2016

Are you saying we should ignore the RetryAfter Header of the ARM spec?

@lmazuel, no, we should always respect 'RetryAfter'. For 'vm create' the service doesn't respond with RetryAfter, so autorest_azure ends up with a sleep

or provide a callback or something like this to open to customisation easily?

If needed, but like I mentioned early on, I suggest not to do it right now till we have user data

@tjprescott tjprescott modified the milestones: Backlog, Sprint 7 Dec 15, 2016
@yugangw-msft
Copy link
Contributor Author

Closing per reasons I provided. We can reactivate it when we have new data

@haroldrandom haroldrandom added the Compute az vm/vmss/image/disk/snapshot label Oct 25, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Compute az vm/vmss/image/disk/snapshot
Projects
None yet
Development

No branches or pull requests

5 participants