Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't ignore the case when dataflow doesn't know what's going on #101031

Merged
merged 1 commit into from
Apr 16, 2024

Conversation

MichalStrehovsky
Copy link
Member

Fixes #101026.

TIL that when dataflow doesn't know something, it will model it as no value at all.

Copy link
Contributor

Tagging subscribers to this area: @agocke, @MichalStrehovsky, @jkotas
See info in area-owners.md if you want to be subscribed.

@MichalStrehovsky
Copy link
Member Author

I assume it also fixes #101010.

@agocke
Copy link
Member

agocke commented Apr 15, 2024

Can we unit test this?

@@ -656,6 +656,12 @@ public override bool HandleCall(MethodIL callingMethodBody, MethodDesc calledMet
if (Intrinsics.GetIntrinsicIdForMethod(callingMethodDefinition) == IntrinsicId.RuntimeReflectionExtensions_GetMethodInfo)
break;

if (param.IsEmpty())
{
// The static value is unknown and the below `foreach` won't execute
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the value unknown in my repro?

I would expect that the dataflow should be able to trace through new Action(Test<string>).Method without problems.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Known limitation of the dataflow analysis (#93720 is the tracking issue).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought #93720 was about not tracking variable types for GetType - how does that explain this issue? If we don't trace through new Action(Test<string>), I would have expected it to show up as an unknown value, not empty.

Copy link
Member Author

@MichalStrehovsky MichalStrehovsky Apr 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought #93720 was about not tracking variable types for GetType - how does that explain this issue? If we don't trace through new Action(Test<string>), I would have expected it to show up as an unknown value, not empty.

I thought unknown/empty are interchangeable but now I understand the difference. We eventually want to know the type allocated through the new Action so that GetType (or in this case Delegate.get_Method) know the type it's operating on.

In this case, we end up with empty instead of unknown because constructors take this path (they "return void"):

// For now, if the intrinsic doesn't set a return value, fall back on the annotations.
// Note that this will be DynamicallyAccessedMembers.None for the intrinsics which don't return types.
returnValue ??= calledMethod.ReturnsVoid () ? MultiValueLattice.Top : annotatedMethodReturnValue;

And we can no longer bash it to unknown here because the HandleCallAction returned true (i.e. handledFunction):

// Handle the return value or newobj result
if (!handledFunction)
{
if (isNewObj)
{
if (newObjValue == null)
methodReturnValue = UnknownValue.Instance;
else
methodReturnValue = newObjValue;
}
else
{
if (!calledMethod.Signature.ReturnType.IsVoid)
{
methodReturnValue = UnknownValue.Instance;
}
}
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, that looks like something we should fix (not that it'll help with the issue this PR is addressing). Thanks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We eventually want to know the type allocated through the new Action so that GetType (or in this case Delegate.get_Method) know the type it's operating on.

To fix tracking of new Action(Test<string>), I think we'd need to introduce a new dataflow value for Action that tracks the ldftn result, and handle that case in the get_Method intrinsic. It seems like just knowing that the input has type Action doesn't solve the problem, so I'm not sure #93720 is the right tracking issue. I think that one is more specific to GetType handling.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to know what method the Action points to, just the concrete type the Delegate.get_Method method is called on (System.Action in this case).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm missing something - why don't we need to know the method it points to? Is it because we keep metadata for all delegate targets if there's any call to get_Method? I was assuming we'd ideally only keep metadata for those delegate targets that had a get_Method call.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The compiler keeps track of delegate type + method. If we know delegate type, we know all the methods it can be used with.

We don't track this at method granularity because doing new Action(Foo).Method is very rare. People only do this if they want to work around lack of methodof in C#. What happens 95% of time is that we construct a delegate somewhere and a different part of the program will call Delegate.get_Method on it. That's why we only care about delegate type.

Dataflow losing track of the type in the trivial new Action(Foo).Method pattern is throwing a wrench in this because if someone uses this to work around methodof, we only see Delegate.get_Method was called on some unknown thing and we need to disable the optimization globally because this could be any delegate.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, thank you!

@MichalStrehovsky
Copy link
Member Author

Can we unit test this?

The test would be testing a scenario that only exists due to a dataflow issue we keep running into. Sven just ran into it last week in #100786. Once we fix the underlying issue, we'd have to delete the test because this will no longer be a problem.

We also don't have facilities to unit test this. This would have to be an E2E test and because it's testing "specific pattern inhibits a global optimization in the whole program", it would have to be a brand new test project because this can't be folded into any of the existing ones. All in all, felt like too much hassle to write a test with no longer term benefit.

We do have tests for "global optimization is active" and "global optimization is disabled", just not for this specific pattern that disables it. I'm also a bit confused why this pattern exists in the first place and why dataflow has a distinction between "no known value (IsEmpty)" and "an unknown value (IsUnknown)".

@am11
Copy link
Member

am11 commented Apr 15, 2024

I assume it also fixes #101010.

Does it affect the 50 kB save from #100916?

@MichalStrehovsky
Copy link
Member Author

I assume it also fixes #101010.

Does it affect the 50 kB save from #100916?

No, this doesn't bring back the LINQ expressions that were deleted there.

@sbomer
Copy link
Member

sbomer commented Apr 15, 2024

why dataflow has a distinction between "no known value (IsEmpty)" and "an unknown value (IsUnknown)".

Empty is supposed to mean "we know that this abstract value cannot represent any real values", for example to model the return value of a method that is known to throw at runtime, whereas unknown means "it could be anything".

@eerhardt
Copy link
Member

Can we add a test for this? Looks like it broke in dotnet/aspnetcore. dotnet/aspnetcore#55010 broke because of this. It might be good to have a "Native AOT" test that uses DI like this test in dotnet/aspnetcore:

https://github.com/dotnet/aspnetcore/blob/d594fc22076db3616a250817c3ca550fcbe69562/src/DefaultBuilder/test/Microsoft.AspNetCore.NativeAotTests/UseStartupThrowsForStructContainersTest.cs#L20

@MichalStrehovsky
Copy link
Member Author

Can we add a test for this? Looks like it broke in dotnet/aspnetcore. dotnet/aspnetcore#55010 broke because of this. It might be good to have a "Native AOT" test that uses DI like this test in dotnet/aspnetcore:

https://github.com/dotnet/aspnetcore/blob/d594fc22076db3616a250817c3ca550fcbe69562/src/DefaultBuilder/test/Microsoft.AspNetCore.NativeAotTests/UseStartupThrowsForStructContainersTest.cs#L20

I already wrote my opinion on testing this in #101031 (comment).

We're running into two issues in dataflow analysis (modeling something that should be Unknown as Empty, and not being able to see through new Action().Method). Either of them not existing would be enough to have all codepaths covered by our existing testing. Creating a new test to cover a scenario that is going to be impossible to exercise doesn't seem like a great use of time (and CI resources in the future, since this must be a standalone test that cannot have anything else in it).

Given this is now blocking codeflow we either need to approve and merge this ASAP, or do a revert of the previous PR. @sbomer does this look good? I'm off to bed.

Copy link
Member

@sbomer sbomer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm good with this change for now to unblock dependency flow. I'd like to fix the issue where this is not tracked as "unknown", but I can look into that as a follow-up.

@eerhardt
Copy link
Member

My intention on testing was more of the end-to-end scenario that broke. So we wouldn't "have to delete the test because this will no longer be a problem.".

We also don't have facilities to unit test this.

We do have the facilities to test the scenario though. See

<TestConsoleAppSourceFiles Include="DiagnosticSourceEventSourceTests.cs"
EnabledProperties="EventSourceSupport" />

public static int Main()
{
DiagnosticSource diagnosticSource = new DiagnosticListener("TestDiagnosticListener");
using (var listener = new TestEventListener())
{

We can write a Native AOT app just like the above for the ActivatorUtilities class that was broken in #101010. We can basically just use the repro provided in the bug as the test app - all of the referenced code is built in dotnet/runtime. That way we can get a little more "end-to-end" tests here.

@jkotas
Copy link
Member

jkotas commented Apr 16, 2024

Merging to unblock codeflow

@jkotas
Copy link
Member

jkotas commented Apr 16, 2024

/ba-g ignoring unrelated timeout

@jkotas jkotas merged commit d41c3db into main Apr 16, 2024
88 of 90 checks passed
@jkotas jkotas deleted the MichalStrehovsky-patch-2 branch April 16, 2024 00:28
@agocke
Copy link
Member

agocke commented Apr 16, 2024

As general policy, I think every regression should have a regression test. Obviously if it's difficult or impossible to construct such a test, never mind. But if it's straightforward, there's by definition at least one incidence of the bug actually making into the product, which is enough in my opinion to merit the test. We run many thousands of tests on every PR and many of those have never, and will never, catch a bug in production. I'd rather invest in infrastructure improvements in test execution rather than avoid writing tests. We also accidentally revert PRs on occasion and even though we sometimes revert the tests as well, any extra coverage to avoid that is very useful at scale.

@MichalStrehovsky
Copy link
Member Author

TL;DR: Any test I can think of writing for this will stop exercising the codepath added here by the end of 9.0, but will forever cost us seconds in CI time because it will have to be the most expensive test kind we have.

You'll have a hard time finding a PR of mine that wouldn't have a test. I do write test for pretty much any code I write. Writing a good and long term useful test exercising the code path I'm adding here is hard, if not impossible. I actually don't know if this codepath is going to be hittable at all once Sven fixes the Empty/Unknown confusion in dataflow; we might want to just revert this PR then.

We can write a Native AOT app just like the above for the ActivatorUtilities class that was broken in #101010. We can basically just use the repro provided in the bug as the test app

This test would not be long term useful. The line that is throwing the exception initializes a field in the cctor that is not even used in native AOT - we set it but never read it. Ironically, the only observable effect of the line is that it needs to disable whole program analysis of delegates due to dataflow analysis deficiencies. It is not the only reason why ASP.NET gets delegate whole program analysis disabled (the other reason is #96528) but once/if it becomes the only reason to disable the optimization in ASP.NET, I'm going to submit another PR that takes it out of statically reachable code on native AOT (I considered doing that in #100916 already and wrote it as an option in PR description). One way or the other, the problematic line is not going to trigger the if check I'm adding in this PR by the end of 9.0 (either because the Delegate.get_Method call will be removed from closure, or dataflow will get the necessary improvement).

We run many thousands of tests on every PR and many of those have never, and will never, catch a bug in production

We spend massive amount of time as a team over many years limiting the number of tests we run. We audited and deleted redundant tests between src/tests and src/libraries. We spent (and are spending) tons of time limiting the number of standalone tests in src/tests.

This can only be tested by adding another fully standalone test (the most expensive test kind we have) that takes seconds to build and will become redundant the moment we fix any of these:

  1. Dataflow bug that confuses Unknown with Empty (Dataflow analysis models result of newobj as MultiValueLattice.Top #101102)
  2. Improve tracking of new in dataflow (GetType() behave strange #93720)

1 sounds like a 9.0 bug. 2 would be very nice to fix in 9.0 because we (and our customers - this is customer-reported) keep running into it.

@martincostello
Copy link
Member

End-to-end tests don't ever really become redundant unless you delete the public APIs they depend on, because they test a whole bunch of things for user scenarios without depending on implementation details. A highly-targeted "unit" test? Sure, that can become redundant as refactoring happens.

The repro is an example of user code (disregard that the delegate did nothing, that was to create a minimal repro to make the specific issue easy to diagnose and fix) that should work regardless of what implementation details it hits or what internals become redundant over time.

If it were made a bit less minimal (say, the delegate set the base address for the requests, as it did in the original app it's derived from) and checked a few extra cases then you'd get more bang-for-your-buck out of it for the time spent compiling it. In the extreme case you could even compile (most of) the whole app I found #101010 in originally and cover even more user scenarios with that one compilation.

@am11
Copy link
Member

am11 commented Apr 16, 2024

No, this doesn't bring back the LINQ expressions that were deleted there.

I meant overall stats before the previous PR and after this PR; now we are keeping more symbols Sounds like still a win, was just curious if we re-measured "This makes ASP.NET 50 kB smaller with native AOT".

another fully standalone test (the most expensive test kind we have)

There are end-to-end tests for AOT which run in PRs of SDK / installer repos. Should we lower the bar of adding tests in those repos which were deemed expensive-for-runtime? It slipped though the cracks and made it to a preview release due to a test gap. Murphy's law says it can happen again.

@MichalStrehovsky
Copy link
Member Author

If it were made a bit less minimal (say, the delegate set the base address for the requests, as it did in the original app it's derived from) and checked a few extra cases then you'd get more bang-for-your-buck out of it for the time spent compiling it.

The test needs to test exactly this code path and nothing else. To hit this bug, the program needs to have a call to Delegate.get_Method that hits the bug and no other call to Delegate.get_Method that could paper over the bug. Dragging in more code just increases the chances the bug will be papered over. If we try to make this test "more useful", it will just stop being a regression test for this bug: it will become yet another DI test in this repo that cannot catch this kind of bug (we already have a lot of DI tests in this repo that do not catch this bug). For example the repro in https://github.com/martincostello/dotnet-runtime-101010-repro can be fixed by adding /p:EventSourceSupport=true to the dotnet publish line (EventSourceSupport brings in code that papers over the bug).

On the other hand if we keep the test small and razor sharp testing this, it will stop testing the bug by the end of 9.0 (it's not a question of "if", only "when") and it's questionable what other residual value it will have due to it being focused on one thing only.

The other E2E tests we have in this repo are testing trimming warning suppressions and they will be useful as long as the suppressions exist.

I meant overall stats before the previous PR and after this PR; now we are keeping more symbols Sounds like still a win, was just curious if we re-measured "This makes ASP.NET 50 kB smaller with native AOT".

ASP.NET doesn't benefit from this optimization yet due to lack of #96528 so I expect the impact of this PR to be zero. (ASP.NET sets EventSourceSupport=true that will paper over this bug because it brings more reasons to disable the optimization.)

It slipped though the cracks and made it to a preview release due to a test gap.

Did it? It was hit as ASP.NET core repo was ingesting the runtime build with this.

@martincostello
Copy link
Member

It certainly made its way all the way to be merged into dotnet/sdk, otherwise I wouldn't have found it.

@eerhardt
Copy link
Member

TL;DR: Any test I can think of writing for this will stop exercising the codepath added here by the end of 9.0, but will forever cost us seconds in CI time because it will have to be the most expensive test kind we have.

It was hit as ASP.NET core repo was ingesting the runtime build with this.

Note that the test in the ASP.NET Core repo that caught this bug wasn't for this exact codepath, but still caught the bug anyway. The test that caught it is named UseStartupThrowsForStructContainersTest. It was testing a specific exceptional case in ASP.NET Core, but since it was an end-to-end functional test, it covers a lot of ground.

Adding a new end-to-end functional test for the scenario that broke here will be long term useful because it will ensure the scenario keeps working as we make changes in both in the nativeAOT runtime and in the ActivatorUtilities class.

Not all tests need to be unit tests. Not all tests need to be end-to-end functional tests. We can use both to get the best coverage. Yes, these end-to-end functional tests are expensive - so let's not write hundreds of them. But a handful of them, especially when the scenario they cover is critical to the .NET stack (and has been broken in the past), is important to ensure our changes don't introduce breaks.

Even if the codepath being fixed here will be deleted by the end of 9.0, the scenario that broke isn't going anywhere. Knowing that we will change the codepath makes it even more valuable to test the scenario. The test will ensure we don't break this scenario again when we do make changes.

@jkotas
Copy link
Member

jkotas commented Apr 16, 2024

We have tiered validation system. We do accept that there are going to be bugs that won't get caught by dotnet/runtime pre-checkin tests and that will be only caught by dependent repos or outer loop validation.

We do have https://github.com/dotnet/runtime/tree/main/src/libraries/Microsoft.Extensions.DependencyInjection/tests/TrimmingTests that is basic end-to-end test that exercises DI and ActivatorUtilities in a simple app. Does this test run in native AOT outer loop runs? If not, it is something to fix.

The question to ask before adding new end-to-end tests: How many breaks would have been caught by the new end-to-end tests in last year?

There is dotnet/runtime bug caught by dependent repos every other week. It is a different end-to-end scenario and different corner case each time. We cannot afford to add a new end-to-end test for each of these.

@eerhardt
Copy link
Member

We do have https://github.com/dotnet/runtime/tree/main/src/libraries/Microsoft.Extensions.DependencyInjection/tests/TrimmingTests that is basic end-to-end test that exercises DI and ActivatorUtilities in a simple app. Does this test run in native AOT outer loop runs? If not, it is something to fix.

No, those tests only run with PublishTrimmed=true. They don't run with PublishAot=true. You can tell becuase the project is named TrimmingTests.proj. See

<IsTrimmingTestProject Condition="$(MSBuildProjectName.EndsWith('.TrimmingTests'))">true</IsTrimmingTestProject>
<IsNativeAotTestProject Condition="$(MSBuildProjectName.EndsWith('.NativeAotTests'))">true</IsNativeAotTestProject>

@agocke
Copy link
Member

agocke commented Apr 16, 2024

We have tiered validation system. We do accept that there are going to be bugs that won't get caught by dotnet/runtime pre-checkin tests and that will be only caught by dependent repos or outer loop validation.

I do agree with this statement. I'm not super concerned with where the test that covers regressions lives, I'm more concerned with the fact that it exists. If it's too expensive to run in PR, I'm fine with it running with outerloop. I'm even fine with it living in another repo (as with the SDK tests) as long as we can be confident that test actually exercises the core scenario that regressed.

@am11
Copy link
Member

am11 commented Apr 17, 2024

Did it? It was hit as ASP.NET core repo was ingesting the runtime build with this.

#101026 (comment) repros in 9.0 p3 and does not repro with 8.0. So no test in any repo has caught this regression in a while.

@MichalStrehovsky
Copy link
Member Author

Did it? It was hit as ASP.NET core repo was ingesting the runtime build with this.

#101026 (comment) repros in 9.0 p3 and does not repro with 8.0. So no test in any repo has caught this regression in a while.

For the purposes of delegate optimization that I'm changing here, this:

class Program
{
    static void Main() => Console.WriteLine(new Action(Test<string>).Method);

    static void Test<T>() { }
}

Is no different from this:

class Program
{
    static Action GetTest() => new Action(Test<string>);

    static void Main() => Console.WriteLine(GetTest().Method);

    static void Test<T>() { }
}

Or this:

class Program
{
    static Action TheTest => new Action(Test<string>);

    static void Main() => Console.WriteLine(TheTest.Method);

    static void Test<T>() { }
}

The latter two do not run into this bug. I could come up with many more that do not run into the bug. The delegate optimization does not concern itself with these shapes. Dataflow analysis does. We have delegate optimization tests that tests one of these shapes.

We're not going to add E2E standalone tests for all of these variations that take seconds to build to test all possible variations. It is the job of the dataflow analysis to test these variations in it's unit tests that take milliseconds to run and are the place where dataflow analysis is tested.

I'm not fixing the dataflow analysis bug, merely working around it. The fix in the data flow analysis is tracked in #101102. Once that's fixed, my workaround in this PR won't even be needed, it will just cover some corner case I don't even know how to hit (I'll leave it to Sven whether we should just roll back this PR then).

We cannot unit test the delegate optimization. We can unit test dataflow. We do not test dataflow with delegate optimization tests. We do not test dataflow analysis with DI tests. We test it with dataflow tests.

matouskozak pushed a commit to matouskozak/runtime that referenced this pull request Apr 30, 2024
sbomer added a commit that referenced this pull request Apr 30, 2024
Instead of tracking the return value as "TopValue" or "unknown",
this models the constructor as returning a value with a static
type when called with newobj, letting us undo the workaround from
#101031.
michaelgsharp pushed a commit to michaelgsharp/runtime that referenced this pull request May 9, 2024
Instead of tracking the return value as "TopValue" or "unknown",
this models the constructor as returning a value with a static
type when called with newobj, letting us undo the workaround from
dotnet#101031.
@github-actions github-actions bot locked and limited conversation to collaborators May 18, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[NativeAOT] Trivial use of Delegate.Method is not enabling delegate targets for reflection
7 participants