Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failures to restore from Azure artifacts feeds due to throttling #4190

Closed
riarenas opened this issue Oct 23, 2019 · 19 comments
Closed

Failures to restore from Azure artifacts feeds due to throttling #4190

riarenas opened this issue Oct 23, 2019 · 19 comments
Assignees

Comments

@riarenas
Copy link
Member

riarenas commented Oct 23, 2019

We have started seeing some throttling errors when attempting to restore NuGet packages from Azure Artifacts that manifest like this:

/root/coresetup/.dotnet/sdk/3.0.100/NuGet.targets(123,5): error : Failed to download package 'transport.runtime.linux-musl-x64.Microsoft.NETCore.Jit.3.1.0-preview1.19504.3' from 'https://pkgs.dev.azure.com/dnceng/9ee6d478-d288-47f7-aacc-f6e6d082ae6d/_packaging/9c2ea29a-00e0-4bae-b470-161fdab1f360/nuget/v3/flat2/transport.runtime.linux-musl-x64.microsoft.netcore.jit/3.1.0-preview1.19504.3/transport.runtime.linux-musl-x64.microsoft.netcore.jit.3.1.0-preview1.19504.3.nupkg'.
/root/coresetup/.dotnet/sdk/3.0.100/NuGet.targets(123,5): error : Response status code does not indicate success: 429 (Request was blocked due to exceeding usage of resource 'Concurrency' in namespace 'IPAddress'. For more information on why your request was blocked, see the topic "Rate limits" on the Microsoft Web site (https://go.microsoft.com/fwlink/?LinkId=823950). (DevOps Activity ID: 5B41D91F-6ED5-41D1-814B-0328F8821422)).
##[error]/root/coresetup/.dotnet/sdk/3.0.100/NuGet.targets(123,5): error : Failed to download package 'transport.runtime.linux-musl-x64.Microsoft.NETCore.Jit.3.1.0-preview1.19504.3' from 'https://pkgs.dev.azure.com/dnceng/9ee6d478-d288-47f7-aacc-f6e6d082ae6d/_packaging/9c2ea29a-00e0-4bae-b470-161fdab1f360/nuget/v3/flat2/transport.runtime.linux-musl-x64.microsoft.netcore.jit/3.1.0-preview1.19504.3/transport.runtime.linux-musl-x64.microsoft.netcore.jit.3.1.0-preview1.19504.3.nupkg'.

This does not appear to be causing widespread failures so far, but as we increase our reliance on these feeds, we're starting to get more reports.

Example builds where this has been seen:
https://dev.azure.com/dnceng/public/_build/results?buildId=398862
https://dnceng.visualstudio.com/internal/_build/results?buildId=384184

All unauthenticated NuGet requests are getting thrown into the same throttling bucket by AzDO, and due to the multi-feed lookup for each package that NuGet does, depending on how many feeds you have in your NuGet config, and how many packages you need to restore, the problem gets worse.

The short term suggestions from the AzDO team are:

  • Reduce the number of feeds used (Their docs mention we should only be using a single feed for restores)
  • Use private feeds instead of public, that way NuGet won't make unauthenticated requests
  • Change NuGet.config to limit the maxHttpRequestsPerSource:
    <config>
        <add key='maxHttpRequestsPerSource' value='2' />
    </config>

We are still in talks with the AzDO team as these workarounds will end up requiring a lot of changes to our infrastructure for this.

@riarenas riarenas self-assigned this Oct 23, 2019
@JohnTortugo
Copy link
Contributor

@riarenas
Copy link
Member Author

Took me a while to track the exact configuration for NuGet.config (it's not documented yet). Updated the issue description with it:

    <config>
        <add key='maxHttpRequestsPerSource' value='2' />
    </config>

@riarenas
Copy link
Member Author

According to AzDO telemetry a large number of our AzDO buildpool machines are being detected as being the same machine, which makes it so they all get into the same throttling bucket, making things a lot worse for our BYOC pools than it is for hosted.

@riarenas
Copy link
Member Author

riarenas commented Nov 1, 2019

As a short-term solution, the concurrency limits for the dnceng instance has been increased and it seems to have helped somewhat.

The Azure DevOps team has found that we are now hitting some other throttling limits, and are in the process of investigating.

The AzDO team is also considering some long-term solutions like filling the gaps that are blocking us from adopting upstream feeds, which would reduce the number of feeds we need to specify in our repos' NuGet.config, and looking at improvements with the NuGet team.

@JohnTortugo
Copy link
Contributor

@riarenas - Is this still an thing?

@riarenas
Copy link
Member Author

Yes. We have had some quota increases to help with this in the short term, but we haven't heard back with a long term solution.

@riarenas
Copy link
Member Author

riarenas commented Apr 3, 2020

We've received additional reports of 429s during restore operations in attempt 1 of these builds:

https://dev.azure.com/dnceng/internal/_build/results?buildId=587047&_a=summary
https://dev.azure.com/dnceng/internal/_build/results?buildId=587046&_a=summary
https://dev.azure.com/dnceng/internal/_build%2Fresults?buildId=587045&_a=summary

I reached out in the thread we had with the azure artifacts group about throttling.

CC @wtgodbe

@riarenas
Copy link
Member Author

riarenas commented Apr 8, 2020

The Azure Artifacts team said the problems on 4/3 were due to dnceng using 60% of the traffic for their scale units when they were completely scaled down. Additionally, yesterday we saw IP throttling come back in a lot more cases:

Using runfo I was able to find these from runtime, but we have additional reports from Roslyn, where the error was reported as a timeout instead of a build failure error.

Build Kind Timeline Record
592529 PR dotnet/runtime#34666 Build System.Private.CoreLib
592529 PR dotnet/runtime#34666 Build System.Private.CoreLib
592488 PR dotnet/runtime#34519 Build product
592488 PR dotnet/runtime#34519 Build managed product components and packages
592488 PR dotnet/runtime#34519 Build managed product components and packages
592488 PR dotnet/runtime#34519 Build managed product components and packages
592482 PR dotnet/runtime#34054 Restore and Build Product
592482 PR dotnet/runtime#34054 Restore and Build Product
592437 PR dotnet/runtime#34522 Restore and Build Product
592437 PR dotnet/runtime#34522 Build CoreCLR Runtime
592437 PR dotnet/runtime#34522 Restore and Build Product
592437 PR dotnet/runtime#34522 Restore and Build Product
592437 PR dotnet/runtime#34522 Build CoreCLR Runtime
592437 PR dotnet/runtime#34522 Restore and Build Product
592404 Rolling Prepare TestHost with runtime CoreCLR
592404 Rolling Build System.Private.CoreLib
592404 Rolling Build product
592404 Rolling Build System.Private.CoreLib
592404 Rolling Build managed product components and packages
592417 PR dotnet/runtime#34663 Restore and Build Product
592417 PR dotnet/runtime#34663 Build managed product components and packages
592417 PR dotnet/runtime#34663 Restore and Build Product
592415 PR dotnet/runtime#34665 Build managed product components and packages
592415 PR dotnet/runtime#34665 Build System.Private.CoreLib
592415 PR dotnet/runtime#34665 Build managed product components and packages
592105 PR dotnet/runtime#34654 Build product
592105 PR dotnet/runtime#34654 Build product
592105 PR dotnet/runtime#34654 Restore and Build
592105 PR dotnet/runtime#34654 Build System.Private.CoreLib
592105 PR dotnet/runtime#34654 Build System.Private.CoreLib
592105 PR dotnet/runtime#34654 Build System.Private.CoreLib
592194 PR dotnet/runtime#34658 Restore and Build Product
592194 PR dotnet/runtime#34658 Restore blob feed tasks
592194 PR dotnet/runtime#34658 Build managed product components and packages
592187 PR dotnet/runtime#34518 Build System.Private.CoreLib
592187 PR dotnet/runtime#34518 Restore and Build Product
592313 PR dotnet/runtime#34662 Restore and Build Product
592313 PR dotnet/runtime#34662 Restore and Build Product
592295 PR dotnet/runtime#34661 Restore and Build Product
592295 PR dotnet/runtime#34661 Restore and Build Product
592380 PR dotnet/runtime#34664 Build product
592380 PR dotnet/runtime#34664 Build System.Private.CoreLib
592380 PR dotnet/runtime#34664 Build System.Private.CoreLib
592252 PR dotnet/runtime#34659 Build managed product components and packages
592272 PR dotnet/runtime#34521 Build product
592272 PR dotnet/runtime#34521 Build product
592272 PR dotnet/runtime#34521 Restore and Build Product
592272 PR dotnet/runtime#34521 Build product
592272 PR dotnet/runtime#34521 Restore and Build Product
592192 PR dotnet/runtime#34274 Build
592192 PR dotnet/runtime#34274 Build System.Private.CoreLib
592192 PR dotnet/runtime#34274 Build
592052 PR dotnet/runtime#34432 Restore and Build Product
592052 PR dotnet/runtime#34432 Build managed product components and packages
592040 PR dotnet/runtime#33902 Restore and Build Product
592040 PR dotnet/runtime#33902 Restore and Build Product
592080 Rolling Build product
592080 Rolling Build product
592080 Rolling Build product
592080 Rolling Restore and Build Product
592037 PR dotnet/runtime#34652 Build
592037 PR dotnet/runtime#34652 Restore and Build Product
592037 PR dotnet/runtime#34652 Restore and Build Product
592032 PR dotnet/runtime#34651 Restore and Build Product
591984 PR dotnet/runtime#33733 Build managed test components
591984 PR dotnet/runtime#33733 Build
591984 PR dotnet/runtime#33733 Build
591984 PR dotnet/runtime#33733 Build
592074 PR dotnet/runtime#34211 Build managed product components and packages
592074 PR dotnet/runtime#34211 Restore and Build Product
592074 PR dotnet/runtime#34211 Restore and Build Product
592074 PR dotnet/runtime#34211 Restore and Build Product
592074 PR dotnet/runtime#34211 Build System.Private.CoreLib
592061 PR dotnet/runtime#34650 Restore and Build Product
592061 PR dotnet/runtime#34650 Restore and Build Product
592061 PR dotnet/runtime#34650 Restore and Build Product
592061 PR dotnet/runtime#34650 Build managed product components and packages
592061 PR dotnet/runtime#34650 Build managed product components and packages
592061 PR dotnet/runtime#34650 Restore and Build Product
592402 PR dotnet/roslyn#43152 Build

@riarenas
Copy link
Member Author

riarenas commented Apr 8, 2020

I asked the artifacts team for increased quota as we're ramping up usage of the feeds.

@safern
Copy link
Member

safern commented Apr 8, 2020

@riarenas thanks for looking into this, do we have an ETA?

would it make sense to bring back the dotnet blob feed back as a restore source in the meantime?

@riarenas
Copy link
Member Author

riarenas commented Apr 8, 2020

No ETA.

I'll create PRs to re-add dotnet-core as a backup if we don't hear from them soon.

@safern
Copy link
Member

safern commented Apr 8, 2020

Thanks @riarenas

@riarenas
Copy link
Member Author

AzDO folks have increased our limits. I'll keep this in FR for a bit to see if this gives us some relief, and move it back to general tracking afterwards, as our feed usage is only going to increase in the near term. (we haven't moved ASPNet or Installer to relying entirely on azdo feeds yet)

@riarenas
Copy link
Member Author

The new limits seem to have stuck. Haven't seen any more throttling during restore since the limits were raised. I'll remove this from FR.

The AzDO team said they are evaluating more sustainable options to handle our load. I'll keep this open because I think if we onboarded another big repo to only using azdo feeds, we'd start seeing this again.

@riarenas
Copy link
Member Author

We have reached out to the azure artifacts team again for options, https://github.com/dotnet/core-eng/issues/9681

@markwilkie
Copy link
Member

Ok - AzDO is saying they've fixed it (for real) now.

@riarenas
Copy link
Member Author

After the recent AzDO changes it doesn't look like we'll be easily throttled again, so I don't think there's much worth in keeping this long standing issue open anymore. We can open new issues for any sporadic throttling we see.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants