-
Notifications
You must be signed in to change notification settings - Fork 352
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failures to restore from Azure artifacts feeds due to throttling #4190
Comments
Took me a while to track the exact configuration for NuGet.config (it's not documented yet). Updated the issue description with it:
|
According to AzDO telemetry a large number of our AzDO buildpool machines are being detected as being the same machine, which makes it so they all get into the same throttling bucket, making things a lot worse for our BYOC pools than it is for hosted. |
As a short-term solution, the concurrency limits for the dnceng instance has been increased and it seems to have helped somewhat. The Azure DevOps team has found that we are now hitting some other throttling limits, and are in the process of investigating. The AzDO team is also considering some long-term solutions like filling the gaps that are blocking us from adopting upstream feeds, which would reduce the number of feeds we need to specify in our repos' NuGet.config, and looking at improvements with the NuGet team. |
@riarenas - Is this still an thing? |
Yes. We have had some quota increases to help with this in the short term, but we haven't heard back with a long term solution. |
We've received additional reports of 429s during restore operations in attempt 1 of these builds: https://dev.azure.com/dnceng/internal/_build/results?buildId=587047&_a=summary I reached out in the thread we had with the azure artifacts group about throttling. CC @wtgodbe |
The Azure Artifacts team said the problems on 4/3 were due to dnceng using 60% of the traffic for their scale units when they were completely scaled down. Additionally, yesterday we saw IP throttling come back in a lot more cases: Using runfo I was able to find these from runtime, but we have additional reports from Roslyn, where the error was reported as a timeout instead of a build failure error.
|
I asked the artifacts team for increased quota as we're ramping up usage of the feeds. |
@riarenas thanks for looking into this, do we have an ETA? would it make sense to bring back the dotnet blob feed back as a restore source in the meantime? |
No ETA. I'll create PRs to re-add dotnet-core as a backup if we don't hear from them soon. |
Thanks @riarenas |
AzDO folks have increased our limits. I'll keep this in FR for a bit to see if this gives us some relief, and move it back to general tracking afterwards, as our feed usage is only going to increase in the near term. (we haven't moved ASPNet or Installer to relying entirely on azdo feeds yet) |
The new limits seem to have stuck. Haven't seen any more throttling during restore since the limits were raised. I'll remove this from FR. The AzDO team said they are evaluating more sustainable options to handle our load. I'll keep this open because I think if we onboarded another big repo to only using azdo feeds, we'd start seeing this again. |
We have reached out to the azure artifacts team again for options, https://github.com/dotnet/core-eng/issues/9681 |
Ok - AzDO is saying they've fixed it (for real) now. |
After the recent AzDO changes it doesn't look like we'll be easily throttled again, so I don't think there's much worth in keeping this long standing issue open anymore. We can open new issues for any sporadic throttling we see. |
We have started seeing some throttling errors when attempting to restore NuGet packages from Azure Artifacts that manifest like this:
This does not appear to be causing widespread failures so far, but as we increase our reliance on these feeds, we're starting to get more reports.
Example builds where this has been seen:
https://dev.azure.com/dnceng/public/_build/results?buildId=398862
https://dnceng.visualstudio.com/internal/_build/results?buildId=384184
All unauthenticated NuGet requests are getting thrown into the same throttling bucket by AzDO, and due to the multi-feed lookup for each package that NuGet does, depending on how many feeds you have in your NuGet config, and how many packages you need to restore, the problem gets worse.
The short term suggestions from the AzDO team are:
We are still in talks with the AzDO team as these workarounds will end up requiring a lot of changes to our infrastructure for this.
The text was updated successfully, but these errors were encountered: