-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LockFreeReaderHashtable growing algorithm results in an arithmetic overflow #102131
Comments
Tagging subscribers to this area: @agocke, @MichalStrehovsky, @jkotas |
I am not sure if I can make an abridged repro, so here's the full ILC repro for a start: https://1drv.ms/u/s!AgaMhbe7wFq8m8pg-U17VEoTsQ6rGg?e=iAcv7t |
If it's compiling for 20 minutes, it's 99% sure you hit a "wide" generic recursion and you'll either hit this failure mode, or OOM (we do detect a "deep" recursion, but "wide" is a problem). Try the switch from #97235 (comment) to see the warnings that should point you where the recursion is. |
The project is huge. I use it for NativeAOT testing because it tends to hit all sorts of limits. 20 minutes is actually not a bad time given my past attempts :-)
Sure, will give it a try. |
I don't think
|
Yep, looks like there's no generic recursion, just a ton of stuff. If you re-run the failing ilc execution by adding
There's no clear sign this is a recursion, just tons of types and variations. I don't know if there's anything to do. Even if this succeeded, we're likely to run into PE file format limits. Cc @eerhardt if he heard about Microsoft.Graph and why is it 35 MB in IL. Basically this is already bigger than all of BCL combined and apparently doesn't trim very well, and add generic specialization and it becomes incompilable. |
Thanks for checking it. The annoying thing is that we use very small subset of the |
Looks like the reported issue got closed due to inactivity microsoftgraph/msgraph-sdk-dotnet#1702. |
The quoted issue in particular states:
That's certainly not working as advertised. Neither is the approach of generating your own SDK subset using Microsoft Kiota because it generates the exact same Model bloat full of classes that are never even referenced. |
@filipnavara, is it reproduced with isolated repro; like a dummy project with calls to bunch of Graph endpoints? I tried a sample app: https://github.com/microsoftgraph/msgraph-sample-github-connector-dotnet/tree/main (changed net8 -> net9 and published with PublishAot on osx-arm64 with daily build), it seemed to have published in a few seconds. |
I suppose I can try to reduce the repro. The whole app was always on the verge of hitting some limits, so it's very well possible that this was just the last thing that pushed it over, but the number of classes from the Graph assembly is certainly too high to ignore... |
Also, I suppose the LINQ optimizations in .NET 9 may also play a role. There's a LOT of LINQ specializations and apparently each added interface basically multiplies those, resulting in each specialized iterator taking 4 slots in this hash table which certainly doesn't help things. |
Yup, and this #101969 😉 |
I already tried replacing LINQ with the one from 8.0 and also with the one used on WASM (that has less generic code). It didn't seem to help. |
I am aware of LinqGen but it’s not trivial to just add it to project of this size :) |
For what's it worth. I enabled the verbose logging, redirected it to file and let it run until the overflow happens.
|
Seems like the minimal repro is using Microsoft.Graph;
var httpClient = new HttpClient();
var graphClient = new GraphServiceClient(httpClient);
var me = await graphClient.Me.GetAsync(); <Project Sdk="Microsoft.NET.Sdk">
<PropertyGroup>
<OutputType>Exe</OutputType>
<TargetFramework>net9.0</TargetFramework>
<RootNamespace>msgraph_repro</RootNamespace>
<ImplicitUsings>enable</ImplicitUsings>
<Nullable>enable</Nullable>
<PublishAot>true</PublishAot>
</PropertyGroup>
<ItemGroup>
<PackageReference Include="Microsoft.Graph" Version="5.48.0" />
</ItemGroup>
</Project> |
Nice repro! The other GitHub sample wasn't using .me.GetAsync(), but instead |
There are some endpoints that trigger it and others that don't. We hit it with more than one, but this was the first one I found to reproduce it. The compilation on my machine (Ryzen 7950 on Windows) fails after 30 minutes. |
Okay, this took a while to untangle because all the tools we have are crashing on this. I'm going to channel some frustration into this response. I see the problem, it's unclear how hard it will be to fix. It needs to be fixed in Kiota. First some intro. LINQ is implemented on top of a ton of generic virtual methods. Generic virtual methods tend to have combinatorial explosions. Think: interface I
{
void Do<T>();
}
class A : I
{
public void Do<T>();
}
class B : I
{
public void Do<T>();
} Then somewhere we do Looking at how this works with LINQ: using System.Diagnostics.CodeAnalysis;
using System.Reflection;
GetEnumValue<MyEnum>();
GetEnumValue<YourEnum>();
static T? GetEnumValue<[DynamicallyAccessedMembers(DynamicallyAccessedMemberTypes.PublicFields)] T>() where T : struct, Enum
{
var rawValue = "foo,bar";
if (string.IsNullOrEmpty(rawValue)) return null;
var type = typeof(T);
if (type.GetCustomAttributes<FlagsAttribute>().Any())
{
return (T)(object)rawValue!
.Split(',')
.Select(x => Enum.TryParse<T>(x, true, out var result) ? result : (T?)null)
.Where(x => !x.Equals(null))
.Select(x => (int)(object)x!)
.Sum();
}
else
return Enum.TryParse<T>(rawValue, true, out var result) ? result : null;
}
enum MyEnum { }
enum YourEnum { } Notice that I'm doing a bunch of select/where with MyEnum and YourEnum, separately. But compile this with AOT and look at what was generated with sizoscope: Notice IteratorSelectIterator over YourEnum getting a Select method body for MyEnum. Notice IteratorSelectIterator types instantiated over both YourEnum and MyEnum, in all possible combinations. You might think, Michal, what a crazy example of LINQ use, nobody writes code like that. Nobody writes LINQ is everywhere, including all the generated code, even in places that look trivially avoidable. The good news is that Copilot is pretty good at "can you rewrite this code without LINQ" and produces more readable and efficient code than the original. |
(This also explains why Microsoft.Graph is 35 MB of IL. It's all the lambdas.) |
I guess we should file an issue with Kiota, right? I can do that. Stepping aside from the specific scenario, do you have any ideas on how to limit the generic expansion or report better errors? Not asking for a miracle, but I assume people may inadvertently run into this and diagnosing it is quite painful with the crashing tools. I was unable to produce DGML/MSTAT for any non-trivial code that would make it easier to find the root cause and create a repro. |
There is no single reason. It's a death by a thousand papercuts. I honestly don't know what approach could be used here. It's a philosophical problem akin to sorites paradox. The approach I ended up using was this tool: https://github.com/dotnet/runtime/tree/4246ba19bd196c5f374d94e5c1fc7b21d53bd9fc/src/coreclr/tools/aot/DependencyGraphViewer. Click the "Use ETW" button, it will listen to dependency graph event in any active ILC compilation. Then I ran ILC with |
I guess what we could do is to add a command line switch that will abort compilation after a user-specified number of methods and spits out whatever logs were requested and a broken object file. That might help. |
Thanks for the analysis and documenting the |
I guess I would be happy if the DGML file was written even on unsuccessful compilation (assuming compilation with |
Btw, fixing up the Enum reading/writing in |
It's not unreasonable to do this by default. The C# compiler has a specific error message for this "This expression is too large or complex to compile." Basically used for when the program starts running up against physical limits. |
The underlying issue in the Microsoft Graph SDK / Kiota is fixed and I can confirm that the build runs to the end now. Should we close this issue? Should we keep it around to track improvements to the diagnostic options? |
I think it may be better to define a hard limit (in byte or number of generic types we can handle), then issue an descriptive message if it exceeds the limit (and/or catch |
Just a side-note, there are still places in the Microsoft.Graph SDK that emit expensive LINQ even with the serialization fix: ![]() The app still has ~90Mb (30%) of the LINQ generic expansions where at least some of them are from parts of the Graph SDK that we don't use. On the upside I have MSTAT and DGML log files now. UPD: Hopefully I found the culprit. |
https://github.com/microsoft/kiota/releases/tag/v1.16.0 removed linq from kiota generated code and kiota.* dependencies |
msgraph waiting for update microsoftgraph/msgraph-sdk-dotnet#2573 |
All the recent Kiota fixes made it viable to use in the NativeAOT scenarios and it no longer crashes the AOT compiler. I shared some numbers earlier on individual PRs/issues. Notably, the initial fixes went from Kiota choking the AOT compiler to successful build. Later fixes cut down 60 MB of the resulting app size (Microsoft Graph SDK and few other dependencies). We are now down to ~26 MB for all LINQ instances in the whole app according to Sizoscope and so Kiota is no longer the biggest contributor. |
should cut down more after microsoftgraph/msgraph-sdk-dotnet#2573 |
That requires rebuilding all the SDKs / models that we depend on. I can do most of that myself but I'd prefer to wait at least for the Graph SDK packages from upstream. |
@filipnavara latest release should have all the no-linq changes if you wana give it a try |
Thanks for the heads up! |
I tried to compile quite a big project with ILC and it always fails like ~20 minutes later with this error:
Notably, I tried the same experiment on the same project couple of months ago. Obviously both the project and ILC evolved, but it seems to have arbitrarily trip a threshold in few places where the hash table expansion goes too big.
The text was updated successfully, but these errors were encountered: