Don't ignore the case when dataflow doesn't know what's going on #101031

MichalStrehovsky · 2024-04-14T21:05:27Z

TIL that when dataflow doesn't know something, it will model it as no value at all.

dotnet-policy-service · 2024-04-14T21:05:57Z

Tagging subscribers to this area: @agocke, @MichalStrehovsky, @jkotas
See info in area-owners.md if you want to be subscribed.

MichalStrehovsky · 2024-04-14T21:07:00Z

I assume it also fixes #101010.

agocke · 2024-04-15T02:58:32Z

Can we unit test this?

jkotas · 2024-04-15T04:46:37Z

src/coreclr/tools/aot/ILCompiler.Compiler/Compiler/Dataflow/ReflectionMethodBodyScanner.cs

@@ -656,6 +656,12 @@ public override bool HandleCall(MethodIL callingMethodBody, MethodDesc calledMet
                        if (Intrinsics.GetIntrinsicIdForMethod(callingMethodDefinition) == IntrinsicId.RuntimeReflectionExtensions_GetMethodInfo)
                            break;

+                        if (param.IsEmpty())
+                        {
+                            // The static value is unknown and the below `foreach` won't execute


Why is the value unknown in my repro?

I would expect that the dataflow should be able to trace through new Action(Test<string>).Method without problems.

Known limitation of the dataflow analysis (#93720 is the tracking issue).

I thought #93720 was about not tracking variable types for GetType - how does that explain this issue? If we don't trace through new Action(Test<string>), I would have expected it to show up as an unknown value, not empty.

I thought #93720 was about not tracking variable types for GetType - how does that explain this issue? If we don't trace through new Action(Test<string>), I would have expected it to show up as an unknown value, not empty.

I thought unknown/empty are interchangeable but now I understand the difference. We eventually want to know the type allocated through the new Action so that GetType (or in this case Delegate.get_Method) know the type it's operating on.

In this case, we end up with empty instead of unknown because constructors take this path (they "return void"):

runtime/src/tools/illink/src/ILLink.Shared/TrimAnalysis/HandleCallAction.cs

Lines 1168 to 1170 in 3d0da2c

// For now, if the intrinsic doesn't set a return value, fall back on the annotations.

// Note that this will be DynamicallyAccessedMembers.None for the intrinsics which don't return types.

returnValue ??= calledMethod.ReturnsVoid () ? MultiValueLattice.Top : annotatedMethodReturnValue;

And we can no longer bash it to unknown here because the HandleCallAction returned true (i.e. handledFunction):

runtime/src/coreclr/tools/aot/ILCompiler.Compiler/Compiler/Dataflow/MethodBodyScanner.cs

Lines 1302 to 1319 in 3d0da2c

// Handle the return value or newobj result

if (!handledFunction)

{

if (isNewObj)

{

if (newObjValue == null)

methodReturnValue = UnknownValue.Instance;

else

methodReturnValue = newObjValue;

}

else

{

if (!calledMethod.Signature.ReturnType.IsVoid)

{

methodReturnValue = UnknownValue.Instance;

}

}

}

Interesting, that looks like something we should fix (not that it'll help with the issue this PR is addressing). Thanks.

We eventually want to know the type allocated through the new Action so that GetType (or in this case Delegate.get_Method) know the type it's operating on.

To fix tracking of new Action(Test<string>), I think we'd need to introduce a new dataflow value for Action that tracks the ldftn result, and handle that case in the get_Method intrinsic. It seems like just knowing that the input has type Action doesn't solve the problem, so I'm not sure #93720 is the right tracking issue. I think that one is more specific to GetType handling.

We don't need to know what method the Action points to, just the concrete type the Delegate.get_Method method is called on (System.Action in this case).

I'm missing something - why don't we need to know the method it points to? Is it because we keep metadata for all delegate targets if there's any call to get_Method? I was assuming we'd ideally only keep metadata for those delegate targets that had a get_Method call.

The compiler keeps track of delegate type + method. If we know delegate type, we know all the methods it can be used with.

We don't track this at method granularity because doing new Action(Foo).Method is very rare. People only do this if they want to work around lack of methodof in C#. What happens 95% of time is that we construct a delegate somewhere and a different part of the program will call Delegate.get_Method on it. That's why we only care about delegate type.

Dataflow losing track of the type in the trivial new Action(Foo).Method pattern is throwing a wrench in this because if someone uses this to work around methodof, we only see Delegate.get_Method was called on some unknown thing and we need to disable the optimization globally because this could be any delegate.

Got it, thank you!

MichalStrehovsky · 2024-04-15T05:29:36Z

Can we unit test this?

The test would be testing a scenario that only exists due to a dataflow issue we keep running into. Sven just ran into it last week in #100786. Once we fix the underlying issue, we'd have to delete the test because this will no longer be a problem.

We also don't have facilities to unit test this. This would have to be an E2E test and because it's testing "specific pattern inhibits a global optimization in the whole program", it would have to be a brand new test project because this can't be folded into any of the existing ones. All in all, felt like too much hassle to write a test with no longer term benefit.

We do have tests for "global optimization is active" and "global optimization is disabled", just not for this specific pattern that disables it. I'm also a bit confused why this pattern exists in the first place and why dataflow has a distinction between "no known value (IsEmpty)" and "an unknown value (IsUnknown)".

am11 · 2024-04-15T05:47:13Z

I assume it also fixes #101010.

Does it affect the 50 kB save from #100916?

MichalStrehovsky · 2024-04-15T05:50:17Z

I assume it also fixes #101010.

Does it affect the 50 kB save from #100916?

No, this doesn't bring back the LINQ expressions that were deleted there.

sbomer · 2024-04-15T16:36:49Z

why dataflow has a distinction between "no known value (IsEmpty)" and "an unknown value (IsUnknown)".

Empty is supposed to mean "we know that this abstract value cannot represent any real values", for example to model the return value of a method that is known to throw at runtime, whereas unknown means "it could be anything".

eerhardt · 2024-04-15T22:12:05Z

Can we add a test for this? Looks like it broke in dotnet/aspnetcore. dotnet/aspnetcore#55010 broke because of this. It might be good to have a "Native AOT" test that uses DI like this test in dotnet/aspnetcore:

https://github.com/dotnet/aspnetcore/blob/d594fc22076db3616a250817c3ca550fcbe69562/src/DefaultBuilder/test/Microsoft.AspNetCore.NativeAotTests/UseStartupThrowsForStructContainersTest.cs#L20

MichalStrehovsky · 2024-04-15T22:41:06Z

Can we add a test for this? Looks like it broke in dotnet/aspnetcore. dotnet/aspnetcore#55010 broke because of this. It might be good to have a "Native AOT" test that uses DI like this test in dotnet/aspnetcore:

https://github.com/dotnet/aspnetcore/blob/d594fc22076db3616a250817c3ca550fcbe69562/src/DefaultBuilder/test/Microsoft.AspNetCore.NativeAotTests/UseStartupThrowsForStructContainersTest.cs#L20

I already wrote my opinion on testing this in #101031 (comment).

We're running into two issues in dataflow analysis (modeling something that should be Unknown as Empty, and not being able to see through new Action().Method). Either of them not existing would be enough to have all codepaths covered by our existing testing. Creating a new test to cover a scenario that is going to be impossible to exercise doesn't seem like a great use of time (and CI resources in the future, since this must be a standalone test that cannot have anything else in it).

Given this is now blocking codeflow we either need to approve and merge this ASAP, or do a revert of the previous PR. @sbomer does this look good? I'm off to bed.

sbomer

I'm good with this change for now to unblock dependency flow. I'd like to fix the issue where this is not tracked as "unknown", but I can look into that as a follow-up.

eerhardt · 2024-04-15T23:13:52Z

My intention on testing was more of the end-to-end scenario that broke. So we wouldn't "have to delete the test because this will no longer be a problem.".

We also don't have facilities to unit test this.

We do have the facilities to test the scenario though. See

runtime/src/libraries/System.Diagnostics.DiagnosticSource/tests/NativeAotTests/System.Diagnostics.DiagnosticSource.NativeAotTests.proj

Lines 5 to 6 in 69062fd

    
           <TestConsoleAppSourceFiles Include="DiagnosticSourceEventSourceTests.cs" 
        
                                      EnabledProperties="EventSourceSupport" />

runtime/src/libraries/System.Diagnostics.DiagnosticSource/tests/NativeAotTests/DiagnosticSourceEventSourceTests.cs

Lines 45 to 49 in 69062fd

    
           public static int Main() 
        
           { 
        
               DiagnosticSource diagnosticSource = new DiagnosticListener("TestDiagnosticListener"); 
        
               using (var listener = new TestEventListener()) 
        
               {

We can write a Native AOT app just like the above for the ActivatorUtilities class that was broken in #101010. We can basically just use the repro provided in the bug as the test app - all of the referenced code is built in dotnet/runtime. That way we can get a little more "end-to-end" tests here.

jkotas · 2024-04-16T00:23:39Z

Merging to unblock codeflow

jkotas · 2024-04-16T00:27:21Z

/ba-g ignoring unrelated timeout

agocke · 2024-04-16T04:12:08Z

As general policy, I think every regression should have a regression test. Obviously if it's difficult or impossible to construct such a test, never mind. But if it's straightforward, there's by definition at least one incidence of the bug actually making into the product, which is enough in my opinion to merit the test. We run many thousands of tests on every PR and many of those have never, and will never, catch a bug in production. I'd rather invest in infrastructure improvements in test execution rather than avoid writing tests. We also accidentally revert PRs on occasion and even though we sometimes revert the tests as well, any extra coverage to avoid that is very useful at scale.

MichalStrehovsky · 2024-04-16T05:58:01Z

TL;DR: Any test I can think of writing for this will stop exercising the codepath added here by the end of 9.0, but will forever cost us seconds in CI time because it will have to be the most expensive test kind we have.

You'll have a hard time finding a PR of mine that wouldn't have a test. I do write test for pretty much any code I write. Writing a good and long term useful test exercising the code path I'm adding here is hard, if not impossible. I actually don't know if this codepath is going to be hittable at all once Sven fixes the Empty/Unknown confusion in dataflow; we might want to just revert this PR then.

We can write a Native AOT app just like the above for the ActivatorUtilities class that was broken in #101010. We can basically just use the repro provided in the bug as the test app

This test would not be long term useful. The line that is throwing the exception initializes a field in the cctor that is not even used in native AOT - we set it but never read it. Ironically, the only observable effect of the line is that it needs to disable whole program analysis of delegates due to dataflow analysis deficiencies. It is not the only reason why ASP.NET gets delegate whole program analysis disabled (the other reason is #96528) but once/if it becomes the only reason to disable the optimization in ASP.NET, I'm going to submit another PR that takes it out of statically reachable code on native AOT (I considered doing that in #100916 already and wrote it as an option in PR description). One way or the other, the problematic line is not going to trigger the if check I'm adding in this PR by the end of 9.0 (either because the Delegate.get_Method call will be removed from closure, or dataflow will get the necessary improvement).

We run many thousands of tests on every PR and many of those have never, and will never, catch a bug in production

We spend massive amount of time as a team over many years limiting the number of tests we run. We audited and deleted redundant tests between src/tests and src/libraries. We spent (and are spending) tons of time limiting the number of standalone tests in src/tests.

This can only be tested by adding another fully standalone test (the most expensive test kind we have) that takes seconds to build and will become redundant the moment we fix any of these:

Dataflow bug that confuses Unknown with Empty (Dataflow analysis models result of newobj as MultiValueLattice.Top #101102)
Improve tracking of new in dataflow (GetType() behave strange #93720)

1 sounds like a 9.0 bug. 2 would be very nice to fix in 9.0 because we (and our customers - this is customer-reported) keep running into it.

martincostello · 2024-04-16T06:22:37Z

End-to-end tests don't ever really become redundant unless you delete the public APIs they depend on, because they test a whole bunch of things for user scenarios without depending on implementation details. A highly-targeted "unit" test? Sure, that can become redundant as refactoring happens.

The repro is an example of user code (disregard that the delegate did nothing, that was to create a minimal repro to make the specific issue easy to diagnose and fix) that should work regardless of what implementation details it hits or what internals become redundant over time.

If it were made a bit less minimal (say, the delegate set the base address for the requests, as it did in the original app it's derived from) and checked a few extra cases then you'd get more bang-for-your-buck out of it for the time spent compiling it. In the extreme case you could even compile (most of) the whole app I found #101010 in originally and cover even more user scenarios with that one compilation.

am11 · 2024-04-16T07:04:48Z

No, this doesn't bring back the LINQ expressions that were deleted there.

I meant overall stats before the previous PR and after this PR; now we are keeping more symbols Sounds like still a win, was just curious if we re-measured "This makes ASP.NET 50 kB smaller with native AOT".

another fully standalone test (the most expensive test kind we have)

There are end-to-end tests for AOT which run in PRs of SDK / installer repos. Should we lower the bar of adding tests in those repos which were deemed expensive-for-runtime? It slipped though the cracks and made it to a preview release due to a test gap. Murphy's law says it can happen again.

MichalStrehovsky · 2024-04-16T07:15:13Z

If it were made a bit less minimal (say, the delegate set the base address for the requests, as it did in the original app it's derived from) and checked a few extra cases then you'd get more bang-for-your-buck out of it for the time spent compiling it.

The test needs to test exactly this code path and nothing else. To hit this bug, the program needs to have a call to Delegate.get_Method that hits the bug and no other call to Delegate.get_Method that could paper over the bug. Dragging in more code just increases the chances the bug will be papered over. If we try to make this test "more useful", it will just stop being a regression test for this bug: it will become yet another DI test in this repo that cannot catch this kind of bug (we already have a lot of DI tests in this repo that do not catch this bug). For example the repro in https://github.com/martincostello/dotnet-runtime-101010-repro can be fixed by adding /p:EventSourceSupport=true to the dotnet publish line (EventSourceSupport brings in code that papers over the bug).

On the other hand if we keep the test small and razor sharp testing this, it will stop testing the bug by the end of 9.0 (it's not a question of "if", only "when") and it's questionable what other residual value it will have due to it being focused on one thing only.

The other E2E tests we have in this repo are testing trimming warning suppressions and they will be useful as long as the suppressions exist.

I meant overall stats before the previous PR and after this PR; now we are keeping more symbols Sounds like still a win, was just curious if we re-measured "This makes ASP.NET 50 kB smaller with native AOT".

ASP.NET doesn't benefit from this optimization yet due to lack of #96528 so I expect the impact of this PR to be zero. (ASP.NET sets EventSourceSupport=true that will paper over this bug because it brings more reasons to disable the optimization.)

It slipped though the cracks and made it to a preview release due to a test gap.

Did it? It was hit as ASP.NET core repo was ingesting the runtime build with this.

martincostello · 2024-04-16T07:25:55Z

It certainly made its way all the way to be merged into dotnet/sdk, otherwise I wouldn't have found it.

eerhardt · 2024-04-16T15:45:34Z

TL;DR: Any test I can think of writing for this will stop exercising the codepath added here by the end of 9.0, but will forever cost us seconds in CI time because it will have to be the most expensive test kind we have.

It was hit as ASP.NET core repo was ingesting the runtime build with this.

Note that the test in the ASP.NET Core repo that caught this bug wasn't for this exact codepath, but still caught the bug anyway. The test that caught it is named UseStartupThrowsForStructContainersTest. It was testing a specific exceptional case in ASP.NET Core, but since it was an end-to-end functional test, it covers a lot of ground.

Adding a new end-to-end functional test for the scenario that broke here will be long term useful because it will ensure the scenario keeps working as we make changes in both in the nativeAOT runtime and in the ActivatorUtilities class.

Not all tests need to be unit tests. Not all tests need to be end-to-end functional tests. We can use both to get the best coverage. Yes, these end-to-end functional tests are expensive - so let's not write hundreds of them. But a handful of them, especially when the scenario they cover is critical to the .NET stack (and has been broken in the past), is important to ensure our changes don't introduce breaks.

Even if the codepath being fixed here will be deleted by the end of 9.0, the scenario that broke isn't going anywhere. Knowing that we will change the codepath makes it even more valuable to test the scenario. The test will ensure we don't break this scenario again when we do make changes.

jkotas · 2024-04-16T16:24:41Z

We have tiered validation system. We do accept that there are going to be bugs that won't get caught by dotnet/runtime pre-checkin tests and that will be only caught by dependent repos or outer loop validation.

We do have https://github.com/dotnet/runtime/tree/main/src/libraries/Microsoft.Extensions.DependencyInjection/tests/TrimmingTests that is basic end-to-end test that exercises DI and ActivatorUtilities in a simple app. Does this test run in native AOT outer loop runs? If not, it is something to fix.

The question to ask before adding new end-to-end tests: How many breaks would have been caught by the new end-to-end tests in last year?

There is dotnet/runtime bug caught by dependent repos every other week. It is a different end-to-end scenario and different corner case each time. We cannot afford to add a new end-to-end test for each of these.

eerhardt · 2024-04-16T16:27:59Z

We do have https://github.com/dotnet/runtime/tree/main/src/libraries/Microsoft.Extensions.DependencyInjection/tests/TrimmingTests that is basic end-to-end test that exercises DI and ActivatorUtilities in a simple app. Does this test run in native AOT outer loop runs? If not, it is something to fix.

No, those tests only run with PublishTrimmed=true. They don't run with PublishAot=true. You can tell becuase the project is named TrimmingTests.proj. See

runtime/Directory.Build.props

Lines 399 to 400 in 6f3b1a6

    
           <IsTrimmingTestProject Condition="$(MSBuildProjectName.EndsWith('.TrimmingTests'))">true</IsTrimmingTestProject> 
        
           <IsNativeAotTestProject Condition="$(MSBuildProjectName.EndsWith('.NativeAotTests'))">true</IsNativeAotTestProject>

agocke · 2024-04-16T17:49:41Z

We have tiered validation system. We do accept that there are going to be bugs that won't get caught by dotnet/runtime pre-checkin tests and that will be only caught by dependent repos or outer loop validation.

I do agree with this statement. I'm not super concerned with where the test that covers regressions lives, I'm more concerned with the fact that it exists. If it's too expensive to run in PR, I'm fine with it running with outerloop. I'm even fine with it living in another repo (as with the SDK tests) as long as we can be confident that test actually exercises the core scenario that regressed.

am11 · 2024-04-17T08:25:38Z

Did it? It was hit as ASP.NET core repo was ingesting the runtime build with this.

#101026 (comment) repros in 9.0 p3 and does not repro with 8.0. So no test in any repo has caught this regression in a while.

MichalStrehovsky · 2024-04-17T08:50:16Z

Did it? It was hit as ASP.NET core repo was ingesting the runtime build with this.

#101026 (comment) repros in 9.0 p3 and does not repro with 8.0. So no test in any repo has caught this regression in a while.

For the purposes of delegate optimization that I'm changing here, this:

class Program
{
    static void Main() => Console.WriteLine(new Action(Test<string>).Method);

    static void Test<T>() { }
}

Is no different from this:

class Program
{
    static Action GetTest() => new Action(Test<string>);

    static void Main() => Console.WriteLine(GetTest().Method);

    static void Test<T>() { }
}

Or this:

class Program
{
    static Action TheTest => new Action(Test<string>);

    static void Main() => Console.WriteLine(TheTest.Method);

    static void Test<T>() { }
}

The latter two do not run into this bug. I could come up with many more that do not run into the bug. The delegate optimization does not concern itself with these shapes. Dataflow analysis does. We have delegate optimization tests that tests one of these shapes.

We're not going to add E2E standalone tests for all of these variations that take seconds to build to test all possible variations. It is the job of the dataflow analysis to test these variations in it's unit tests that take milliseconds to run and are the place where dataflow analysis is tested.

I'm not fixing the dataflow analysis bug, merely working around it. The fix in the data flow analysis is tracked in #101102. Once that's fixed, my workaround in this PR won't even be needed, it will just cover some corner case I don't even know how to hit (I'll leave it to Sven whether we should just roll back this PR then).

We cannot unit test the delegate optimization. We can unit test dataflow. We do not test dataflow with delegate optimization tests. We do not test dataflow analysis with DI tests. We test it with dataflow tests.

Instead of tracking the return value as "TopValue" or "unknown", this models the constructor as returning a value with a static type when called with newobj, letting us undo the workaround from #101031.

Instead of tracking the return value as "TopValue" or "unknown", this models the constructor as returning a value with a static type when called with newobj, letting us undo the workaround from dotnet#101031.

Update ReflectionMethodBodyScanner.cs

0c828b1

MichalStrehovsky requested a review from sbomer April 14, 2024 21:05

dotnet-issue-labeler bot added the area-NativeAOT-coreclr label Apr 14, 2024

dotnet-policy-service bot assigned MichalStrehovsky Apr 14, 2024

jkotas reviewed Apr 15, 2024

View reviewed changes

sbomer approved these changes Apr 15, 2024

View reviewed changes

jkotas merged commit d41c3db into main Apr 16, 2024
88 of 90 checks passed

jkotas deleted the MichalStrehovsky-patch-2 branch April 16, 2024 00:28

MichalStrehovsky mentioned this pull request Apr 16, 2024

Dataflow analysis models result of newobj as MultiValueLattice.Top #101102

Closed

eerhardt mentioned this pull request Apr 16, 2024

Run DependencyInjection trimming tests as native AOT. #101129

Closed

sbomer mentioned this pull request Apr 17, 2024

Preserve static type info for return value of ctor #101212

Merged

matouskozak pushed a commit to matouskozak/runtime that referenced this pull request Apr 30, 2024

Update ReflectionMethodBodyScanner.cs (dotnet#101031)

e8d3893

github-actions bot locked and limited conversation to collaborators May 18, 2024

	// For now, if the intrinsic doesn't set a return value, fall back on the annotations.
	// Note that this will be DynamicallyAccessedMembers.None for the intrinsics which don't return types.
	returnValue ??= calledMethod.ReturnsVoid () ? MultiValueLattice.Top : annotatedMethodReturnValue;

	// Handle the return value or newobj result
	if (!handledFunction)
	{
	if (isNewObj)
	{
	if (newObjValue == null)
	methodReturnValue = UnknownValue.Instance;
	else
	methodReturnValue = newObjValue;
	}
	else
	{
	if (!calledMethod.Signature.ReturnType.IsVoid)
	{
	methodReturnValue = UnknownValue.Instance;
	}
	}
	}

Don't ignore the case when dataflow doesn't know what's going on #101031

Don't ignore the case when dataflow doesn't know what's going on #101031

Conversation

MichalStrehovsky commented Apr 14, 2024

dotnet-policy-service bot commented Apr 14, 2024

MichalStrehovsky commented Apr 14, 2024

agocke commented Apr 15, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MichalStrehovsky Apr 15, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MichalStrehovsky commented Apr 15, 2024

am11 commented Apr 15, 2024

MichalStrehovsky commented Apr 15, 2024

sbomer commented Apr 15, 2024

eerhardt commented Apr 15, 2024

MichalStrehovsky commented Apr 15, 2024

sbomer left a comment

Choose a reason for hiding this comment

eerhardt commented Apr 15, 2024

jkotas commented Apr 16, 2024

jkotas commented Apr 16, 2024

agocke commented Apr 16, 2024

MichalStrehovsky commented Apr 16, 2024

martincostello commented Apr 16, 2024

am11 commented Apr 16, 2024

MichalStrehovsky commented Apr 16, 2024

martincostello commented Apr 16, 2024

eerhardt commented Apr 16, 2024

jkotas commented Apr 16, 2024 • edited Loading

eerhardt commented Apr 16, 2024

agocke commented Apr 16, 2024

am11 commented Apr 17, 2024

MichalStrehovsky commented Apr 17, 2024

MichalStrehovsky Apr 15, 2024 •

edited

Loading

jkotas commented Apr 16, 2024 •

edited

Loading