Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NuGetSdkResolver adds significant overhead to evaluation and should not be used #4025

Open
Tracked by #7249
davkean opened this issue Dec 21, 2018 · 64 comments
Open
Tracked by #7249
Assignees
Labels
backlog For consideration Used for items on the backlog to raise them to the top of that list for discussion Iteration:2023April iteration:2023August Iteration:2023June Iteration:2023March Iteration:2023May Partner request performance Performance-Scenario-Solution-Open This issue affects solution open performance. Priority:1 Work that is critical for the release, but we could probably ship without size:5 Status:InProgress triaged

Comments

@davkean
Copy link
Member

davkean commented Dec 21, 2018

For evaluation time, The NuGetSdkResolver is extremely slow.

Based on investigation in CoreFX, I found:

  • On restored state, it added 180-400ms overhead to a single project evaluation
  • On unrestored state, it added 1000-6000ms overhead to a single project evaluation

I've looked at the original design, and it's built on a incorrect premise; that its performance impact will only affect the loading state of projects that use it. The image it shows where projects are being loaded in the background via Asynchronous Solution Load (ASL) was removed from the product in 15.5.

If a project opts into this resolver, it has the following effects:

  • On solution load, blocks interaction with VS until it has completed downloading & resolving the SDK.
  • Increases the time it takes for the project-system to send design-time build results to Roslyn, resulting in delayed IntelliSense results
  • Increases the time it takes for designers to open
  • Increases the time it takes for tests to be discovered
  • Increases the time it takes for VS to react to changing the active configuration
  • Increases the time and blocks interaction with VS when making changes to a project, such as adding/removing a file
  • Increases design-time builds for every project that references it
  • Increases the time for above when switching branches

When we designed SDK resolvers, it was explicitly called out that due to them being used during evaluation - that they must be extremely fast and must not hit the network. While this only hits the network on unrestored state, it still has a large negative impact on evaluation time when the package is already downloaded.

This is the entire reason that NuGet restore does not run during the build while inside Visual Studio.

Rogue resolvers can cause Visual Studio and other IDEs to be blamed for performance and UI delay issues, please remove or change the design of this resolver to play nicely with Visual Studio.

@livarcocc
Copy link
Contributor

ACK on the negative performance impact. However, taking the feature out at this point is an impossibility.

Could we work together to figure out how to change the current design in a way to meet the perf requirements for VS?

@livarcocc livarcocc added this to the Discussion milestone Dec 21, 2018
@livarcocc livarcocc self-assigned this Dec 21, 2018
@nguerrera
Copy link
Contributor

@davkean Do you have the repro steps for how this measurement is taken?

It is very bizarre that the nuget sdk resolver would be any worse than the regular resolver in the restored state.

@nguerrera nguerrera modified the milestones: Discussion, MSBuild 16.1 Feb 5, 2019
@nguerrera
Copy link
Contributor

Putting this in 16.1 based on discussion just now. The resolver is actually now in the nuget repo, but first we should check that it is not something msbuild is doing in calling the resolver that is taking the time.

@nguerrera
Copy link
Contributor

Loooking at the code, https://github.com/NuGet/NuGet.Client/blob/dev/src/NuGet.Core/Microsoft.Build.NuGetSdkResolver/NuGetSdkResolver.cs

It really doesn't do much in the restored state.

@davkean
Copy link
Member Author

davkean commented Feb 5, 2019

It was doing work, loading files off disk and parsing them if I remember correctly. I wish I could remember how I verified the overhead, I think I isolated in a single project and measured evaluation time. It was the overwhelming cause of the CoreFX evaluation time.

@nguerrera
Copy link
Contributor

I can repro this:

Evaluation time for dotnet new console: 178 ms
Evaluation time for dotnet new console + MSBuild.Sdk.Extras (unrestored): 3.8 seconds
Evaluation time for dotnet new console + MSBuild.Sdk.Extras (restored): 381 ms

@nguerrera
Copy link
Contributor

I will profile this simple repro.

@nguerrera
Copy link
Contributor

image

Some quick thoughts:

It is spending the most time parsing global.json, which is sad because my global.json is 74 bytes. We should look at switching to the new corefx json reader. There's a sources version for netfx IIRC. This file is trivial, we don't need a fancy deserializer.

That said, it will avoid doing this more than once per build across projects:

https://github.com/NuGet/NuGet.Client/blob/b6cd1677ae2bd3b07f4cc23c2e8d408f784e8b05/src/NuGet.Core/Microsoft.Build.NuGetSdkResolver/NuGetSdkResolver.cs#L89-L90

But I suspect VS evaluations aren't retaining this context object between projects. Fixing that if possible will help non-NuGet sdk evaluation considerably too.

Another major contributor is nuget loading its configuration files. I presume it needs to do this to answer the question of where a downloaded package is because values in the config can impact the answer to that. The parsing there can probably be improved as well.

For both of these we can possibly cache things in static state. and check config timestamps. I did some trickery like that in the non-nuget resolver.

@davidfowl
Copy link
Member

It is spending the most time parsing global.json, which is sad because my global.json is 74 bytes. We should look at switching to the new corefx json reader. There's a sources version for netfx IIRC. This file is trivial, we don't need a fancy deserializer.

You don't have to switch if that's a big short term cost. Just use the JSONReader directly (since the global.json schema is fixed). The corefx reader should be lower allocation regardless though...

@nguerrera
Copy link
Contributor

So it's mostly Jit time being measured there. 140ms of jitting newtonsoft.json.

@davidfowl
Copy link
Member

😮 wow

@nguerrera
Copy link
Contributor

Have to check if build at least reuses that jitting. I think it might if the version of nuget in the sdk matches the version used by nuget msbuild resolver. The location of the dll is different for full msbuild.

@davkean
Copy link
Member Author

davkean commented Feb 5, 2019

This time matches almost identical to what I found in dotnet/sdk#1483.

@nguerrera
Copy link
Contributor

image

@davkean
Copy link
Member Author

davkean commented Feb 5, 2019

Nope, strike that, JSON.NET was way less than above.

@davidfowl
Copy link
Member

We could just NGEN the one that comes with VS.

@nguerrera
Copy link
Contributor

@davkean
Copy link
Member Author

davkean commented Feb 5, 2019

Yep, just to grab a couple of strings from that file, hell I betcha regex would beat this hands down.

@nguerrera
Copy link
Contributor

Yeah, there's a ReadAllText + IndexOf before it to avoid the deserialization, and that's not even showing up. 😆

https://github.com/NuGet/NuGet.Client/blob/b6cd1677ae2bd3b07f4cc23c2e8d408f784e8b05/src/NuGet.Core/Microsoft.Build.NuGetSdkResolver/GlobalJsonReader.cs#L39-L46

@nguerrera
Copy link
Contributor

nguerrera commented Feb 5, 2019

I'll study some more tomorrow. This is interesting, but doesn't fully explain seeing 200ms - 400ms per project. I'm seeing 200ms, of which 140ms is Jitting that would not have been happening N times for loading of full solution, right?

Mind you above you said: "I think I isolated in a single project and measured evaluation time". Possible your 200ms - 400ms included jitting?

@davkean
Copy link
Member Author

davkean commented Feb 5, 2019

When I was looking at traces for CoreFx, the resolver showing up as significant, enough to make me isolate this case and measure it by itself. Best way to measure this would be grab a real world use - such as the CoreFx solution I shared.

@nguerrera
Copy link
Contributor

Yep. Will do that.

@nguerrera
Copy link
Contributor

The remaining 60ms is still quite bad. Just wondering if the isolation process bloated it to 200ms. I will dig.

@davkean
Copy link
Member Author

davkean commented Feb 5, 2019

Bear in mind that the cost in that solution load will be ballooned by the number of evaluations (we evaluate every single project and configuration because the solution is incomplete) but it should be accurate to the cost of the total evaluation.

@jeffkl
Copy link
Contributor

jeffkl commented Feb 5, 2019

The parsed SDK versions from global.json are cached so it should only be read once.

https://github.com/NuGet/NuGet.Client/blob/b6cd1677ae2bd3b07f4cc23c2e8d408f784e8b05/src/NuGet.Core/Microsoft.Build.NuGetSdkResolver/NuGetSdkResolver.cs#L81

MSBuild caches the resolved path for a particular SDK so that should only happen once. But for each unique SDK, the NuGet settings are read. That could be improved.

@panopticoncentral panopticoncentral added the Performance-Scenario-Solution-Open This issue affects solution open performance. label Mar 31, 2020
@dsplaisted
Copy link
Member

I have a proposal to avoid the network access on the UI thread. See the "NuGet SDK Resolver" section in this proposal: dotnet/designs#104

Basically we are going to update the MSBuild SDK resolver interface so that a resolver can return no resolved SDK paths without failing. A resolver will also be able to return items that should be added to the evaluation.

We can leverage this to fix the issue where the NuGet resolver downloads files on the UI thread. In VS, the resolver should run in a mode that does not allow downloading SDKs. Rather, if it can't find the SDK locally, it should return success and include a MissingMSBuildSDK item describing the SDK it would have downloaded. The VS project system should check for those items on evaluation, and if there are any of them, it should spin off a non-UI thread task to download them.

@marcpopMSFT marcpopMSFT added For consideration Used for items on the backlog to raise them to the top of that list for discussion Partner request labels Apr 7, 2020
@panopticoncentral panopticoncentral removed this from the MSBuild 16.6 milestone Jun 9, 2020
@marcpopMSFT
Copy link
Member

@dsplaisted with the optional workload work getting in, is it now possible for project system to update to handle this differently?

@dsplaisted
Copy link
Member

This has been unblocked since the new SDK resolver support went in to MSBuild. We would need to make changes to the NuGet SDK Resolver, and the project system would also need to make changes. It would be nice to do that for 5.0.200 / VS 16.9.

@panopticoncentral panopticoncentral added the Priority:1 Work that is critical for the release, but we could probably ship without label Mar 23, 2021
@davkean
Copy link
Member Author

davkean commented Jun 3, 2021

This is another issue caused by it: https://developercommunity.visualstudio.com/t/VisualStudio-was-hang-when-open-the-wpf-/1429662. We have ~57 threads all blocked behind the NuGet resolver.

@davkean
Copy link
Member Author

davkean commented Jan 21, 2022

@aortiz-msft
Copy link

NuGet Epic to address perf issues in the NuGetSDKResolver: NuGet/Home#11441.

Please feel free to reach out to @jeffkl directly.

@ladipro
Copy link
Member

ladipro commented Apr 26, 2023

I'm looking at this issue, trying to break it down into units of work across the affected components. A couple of observations:

  1. Avoid downloading NuGet packages during evaluation
    I'm actually not sure if it should be built on top of the new SDK resolver support. If the SDK cannot be resolved, there is no point in continuing evaluation. Instead of defining items on the evaluated project to indicate which NuGet packages need to be restored, the resolver may as well just throw an exception with this info, which the project system would catch. Maybe I'm not understanding the flow correctly, though, will need to follow up.

The work needed here:

  • From NuGet SDK resolver:

    • Detect that it's running in VS and it's not allowed to restore. The flag for VS is already there as SdkResolverContext.IsRunningInVisualStudio but presumably we'll want a way to opt-in/out to experiment so we may need an additional flag.
    • If the above holds true, indicate the unresolved package via an exception or via ItemsToAdd in the new SDK interface.
    • Overall, this is a fairly small work item.
  • From MSBuild:

    • Depending on the exact mechanism of indicating the need to restore, MSBuild may need to aggregate the information from all resolver invocations. This is because <Project Sdk="Package1/Version1;Package2/Version2"> is perfectly legal, whether intended to or not.
    • Very small amount of work.
  • From the project system:

    • This would depend on the desired UX. We may want to have VS automatically restore the missing packages and then retry evaluating the affected project, all within the context of solution load. Or finish loading the solution leaving the affected projects 'unloaded' and then asynchronously pop a dialog? Or take the user directly to the VS NuGet Manager? Or something else. Very much an open question, feedback welcome.
  1. Avoid JITting
    There is indeed some non-trivial JITting cost when a process runs SDK resolution for the first time. For VS, the first project loaded after VS starts pays the price. On command-line it's generally every build (i.e. every MSBuild.exe invocation). On my machine I measured it at 45 ms for Microsoft.Build.NuGetSdkResolver.dll and 85 ms for Microsoft.DotNet.MSBuildSdkResolver.dll. The latter has nothing to do with NuGet SDK resolution but it's always invoked and because it's more expensive to JIT than the NuGet resolver, if we decide to address JITting it would make sense to prioritize fixing the base SDK resolver first.

I have played with prototyping a change to NGEN Microsoft.Build.NuGetSdkResolver.dll + its dependency Microsoft.Deployment.DotNet.Releases and it's not straightforward. MSBuild uses Assembly.LoadFrom, which immediately disqualifies the native image from being loaded. To be able to Assembly.Load by full name, we 1) Need to know the full name and 2) Need add a binding redirect to devenv.exe.config and MSBuild.exe.config so the CLR binder can find the file. For this we need to know the assembly version and in some cases this appears to be hitting layering & versioning unfortunacies, i.e. MSBuild generally doesn't know the specific .NET SDK version it's going to be part of. Not unsolvable but likely P2 in the grand scheme of things.

@dsplaisted
Copy link
Member

  1. I'm actually not sure if it should be built on top of the new SDK resolver support. If the SDK cannot be resolved, there is no point in continuing evaluation. Instead of defining items on the evaluated project to indicate which NuGet packages need to be restored, the resolver may as well just throw an exception with this info, which the project system would catch. Maybe I'm not understanding the flow correctly, though, will need to follow up.

The resolver needs to communicate this information to Visual Studio. You could probably include it as data on an exception, but adding items seems cleaner to me. Also if you do it via items, then I don't think you need to do the work in MSBuild to aggregate the information about which packages need to be downloaded, as there will just be multiple items with the information from the different resolver invocations.

@ladipro
Copy link
Member

ladipro commented Apr 27, 2023

True, I think in general it's uglier to rely on structured data attached to an exception. On the other hand, if the resolver fails in this way, conceptually there's no value in evaluating the project. Other than the items added via ItemsToAdd it's pretty much useless. Perf wouldn't be an issue as this an error path. But what I'm kind of debating is whether a dummy ProjectInstance fits the CPS internal model or not. I.e. if it's good for CPS to always receive a valid object or if it's a potential source of confusion downstream. (ex: Would it be good or bad for such a ProjectInstance to be persisted to project system's evaluation cache?)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backlog For consideration Used for items on the backlog to raise them to the top of that list for discussion Iteration:2023April iteration:2023August Iteration:2023June Iteration:2023March Iteration:2023May Partner request performance Performance-Scenario-Solution-Open This issue affects solution open performance. Priority:1 Work that is critical for the release, but we could probably ship without size:5 Status:InProgress triaged
Projects
None yet
Development

No branches or pull requests