-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use canonical method in TransitionFrames whenever we parse signatures #33733
Conversation
I couldn't add an area label to this PR. Checkout this page to find out which area owner to ping, or please add exactly one area label to help train me in the future. |
I believe that this is expected to be handled by passing in |
|
@jkotas to make sure I understand the issue correctly, is this supposed to be handled on the TypeLoader side, to ensure we do not construct new types? |
@fadimounir I understand that approxparents is being passed correctly, but why isn't it preventing the exact type load? It seems that should be fixed instead of this. |
This is a type that has never been loaded before, and approxparents doesn't help in this case of server GC (and AFAIK, not even with regular GC). The issue with server GC is that it doesn't have a valid My understanding is that approxparents won't help if the type has not been previously loaded to at least approxparents, because we should never be allowed to create types (not even to the level of approxparents) during GC. Please correct me if my understanding is wrong. One possible easy way to fix this is to handle the null I currently see 2 patterns here where we could be calling into a MethodDesc's entry point without potentially loading the valuetypes in the signature:
In these two cases, when jitting the callers of such methods, the JIT is dealing with the canonical version of things, so all the places where the JIT/JIT interface calls into the The possible fix I see here is to call the If this doesn't feel like the right thing to do, this means that when we load these MethodDescs on the TypeLoader side, we should perform this step of walking the parameters and loading them, but this could be an overkill, and I think it might be better to perform that kind of loading only when we're about to execute the methods. @jkotas, @davidwrighton I'd like to get your thoughts on this proposed fix. If you'd like, I can post an updated PR with what I'm proposing for clarification. |
@fadimounir My understanding from our offline conversations is that there is some attempt to load a type like |
@davidwrighton thanks for bringing that to my attention. Let me examine this further and see why we're still attempting to load the fully instantiated type even though the |
Looks like the issue here is that First of all, is there a reason why Also, would it make sense to add something like this for each instantiation type argument we load, before attempting to load the instantiation itself (to be on the safe side)?: if (dropGenericArgumentLevel && level == CLASS_LOAD_APPROXPARENTS)
{
typeHnd = ClassLoader::CanonicalizeGenericArg(typeHnd);
} |
@fadimounir that sounds like the right fix. I'm wresting with 2 questions as I struggle to give you advice.
I'd like to see the results of a full outerloop (including some GCSTRESS) with the change applied to VAR and MVAR, and not protected to only the stackwalking/gc scenarios. |
I do not think you can ever get substitutions with MVARs today. Substituations are only used for method overriding that makes them applicable to type VARs only. |
We don't have the same problem with VARs. I couldn't repro any failures there, not even with static methods on generic types. If there's a risk of breaking change here, we can limit it to MVAR and only to the GC/stackwalk scenario: thRet = (psig.GetTypeVariableThrowing(pModule, typ, fLoadTypes, pTypeContext));
if (fLoadTypes == ClassLoader::LoadTypes)
ClassLoader::EnsureLoaded(thRet, level);
if ((IsGCThread() || IsStackWalkerThread()) && dropGenericArgumentLevel && level == CLASS_LOAD_APPROXPARENTS)
thRet = ClassLoader::CanonicalizeGenericArg(thRet); However, i'll submit a new commit to this PR where the changes are not just scoped to GC/stackwalking, and run an outerloop GCSTRESS job to see if that causes any failures |
The reason I wanted the run on the VAR cases is that we have much better coverage for odd things happening to generic types than to methods. I'm not entirely confident it won't break anything, and I'm not even sure that if it doesn't we shouldn't only fix MVAR, but having some confidence that this change doesn't negatively impact even when applied to both VAR and MVAR would be good. |
Sounds good. I pushed a commit that applies canonicalization after the call to |
The gcstress results are not as clean as baseline. I'll investigate further |
@davidwrighton I reran the gcstress tests after submitting a fix around pinvokes, and the results are now comparable to the baseline (modulo the failures related to #33366). I believe the changes should be ready to merge, although I have seen the regression test that I added fail non-deterministically on Windows arm configurations. It's not an AV or anything, but for some reason the test script returns 3 as an exit code, and the test does not emit any stdout. No idea what's going on there. I've been running the test constantly in a loop on a separate Windows arm device, using the same payload from the CI, and haven't seen any crashes. Not sure how to investigate this issue further. There are no dumps from the CI, and it doesn't seem like the test crashed really... Here's the test output: GC\Regressions\Github\runtime_32848\runtime_32848\runtime_32848.cmd [FAIL]
Return code: 1
Raw output file: D:\h\w\B2610973\w\AF970944\e\GC\Regressions\Reports\GC.Regressions\Github\runtime_32848\runtime_32848\runtime_32848.output.txt
Raw output:
BEGIN EXECUTION
"D:\h\w\B2610973\p\corerun.exe" runtime_32848.dll
Expected: 100
Actual: 3
END EXECUTION - FAILED
FAILED
Test Harness Exitcode is : 1
To run the test:
> set CORE_ROOT=D:\h\w\B2610973\p
> D:\h\w\B2610973\w\AF970944\e\GC\Regressions\Github\runtime_32848\runtime_32848\runtime_32848.cmd
Expected: True
Actual: False
Stack Trace:
F:\workspace\_work\1\s\artifacts\tests\coreclr\Windows_NT.arm.Checked\TestWrappers\GC.Regressions\GC.Regressions.XUnitWrapper.cs(140,0): at GC_Regressions._Github_runtime_32848_runtime_32848_runtime_32848_._Github_runtime_32848_runtime_32848_runtime_32848_cmd()
Output:
Return code: 1
Raw output file: D:\h\w\B2610973\w\AF970944\e\GC\Regressions\Reports\GC.Regressions\Github\runtime_32848\runtime_32848\runtime_32848.output.txt
Raw output:
BEGIN EXECUTION
"D:\h\w\B2610973\p\corerun.exe" runtime_32848.dll
Expected: 100
Actual: 3
END EXECUTION - FAILED
FAILED
Test Harness Exitcode is : 1
To run the test:
> set CORE_ROOT=D:\h\w\B2610973\p
> D:\h\w\B2610973\w\AF970944\e\GC\Regressions\Github\runtime_32848\runtime_32848\runtime_32848.cmd |
I suspect something very fishy is happening with test startup somewhere. Could you modify the test to always print out a message at the start of Main? That might help show the problem. |
src/coreclr/tests/src/GC/Regressions/Github/runtime_32848/runtime_32848.csproj
Outdated
Show resolved
Hide resolved
@davidwrighton Same issue with the test failure here. I added a |
Taking a look |
I had a quick phone chat with Fadi. I'm going to take one of our machines that ran this earlier offline and try out these bits on it, and synch up with Fadi later on. My naive guess here is that the precommands being included might be messing up the current path. |
The issue is a CHK assert coming out of the runtime, which when dismissed by the dialog handler on the machine manifests as exit code 3. I'm working with Fadi on ideas but it's clear the machines we have have a newer OS and different hardware, so that is likely the source of the difference between local and remote repro of the bug. |
I changed all calls to |
No crash dumps with _ASSERTE. I'll add some dummy code for now that will cause an AV, so I can get a dump for the failure |
…rgumentLevel is TRUE
With Matt's help, I was able to get my hands on a crash dump and look at the callstack: 0:000> !clrstack -f
OS Thread Id: 0x2b3c (0)
Child SP IP Call Site
04E0C998 68446D20 coreclr!common_assert_to_message_box + 120 at minkernel\crts\ucrt\src\appcrt\startup\assert.cpp:389
04E0CE50 68446C9F coreclr!common_assert + 79 at minkernel\crts\ucrt\src\appcrt\startup\assert.cpp:424
04E0CE70 68447BEB coreclr!_wassert + 27 at minkernel\crts\ucrt\src\appcrt\startup\assert.cpp:444
04E0CE90 68359225 coreclr!GCToOSInterface::GetCurrentProcessorNumber + 57 at F:\workspace\_work\1\s\src\coreclr\src\vm\gcenv.os.cpp:274
04E0CEA8 6839D0C3 coreclr!SVR::heap_select::select_heap + 19 at F:\workspace\_work\1\s\src\coreclr\src\gc\gc.cpp:5267
04E0CED0 683866CF coreclr!SVR::gc_heap::balance_heaps + 31 at F:\workspace\_work\1\s\src\coreclr\src\gc\gc.cpp:14174
04E0CF28 6838281B coreclr!SVR::gc_heap::allocate_more_space + 31 at F:\workspace\_work\1\s\src\coreclr\src\gc\gc.cpp:14533
04E0CF50 6837D731 coreclr!SVR::GCHeap::Alloc + 273 at F:\workspace\_work\1\s\src\coreclr\src\gc\gc.cpp:37197
04E0CF90 68236FC5 coreclr!Alloc + 205 at F:\workspace\_work\1\s\src\coreclr\src\vm\gchelpers.cpp:242
04E0CFB8 682377B1 coreclr!AllocateObject + 157 at F:\workspace\_work\1\s\src\coreclr\src\vm\gchelpers.cpp:1024
04E0CFE8 681CF3F1 coreclr!JIT_New + 209 at F:\workspace\_work\1\s\src\coreclr\src\vm\jithelpers.cpp:2313
04E0D024 [HelperMethodFrame: 04e0d024]
04E0D0C0 299ad68a system.console.dll!System.Console..cctor() + 30
04E0DBFC [HelperMethodFrame: 04e0dbfc]
04E0DC98 299ad590 system.console.dll!System.Console.get_Out() + 40
04E0DCC0 299ad140 system.console.dll!System.Console.WriteLine(System.String) + 24
04E0DCD8 299acdbe runtime_32848.dll!Test.Main() + 34
There's nothing special here: it's a simple Console.WriteLine that is triggering the assert, and the only special thing is that server GC is enabled. I also couldn't find any evidence of heap corruption. There is a pretty large number of threads though (27 threads). It could be the case that this is just hitting a corner case bug in the OS. The processor number returned by the So here's what i'm going to do:
|
Also changing a bunch of assert() calls to _ASSERTE. Usually when _ASSERTE fails in CI lab runs, we tend to get crash dumps associated with test results, unlike assert() which shows a GUI dialog that DHandler dismisses by clicking on the Abort button.
The test will run with gcstress as part of the gcstress lab runs
I've been investigating the crash dump from #32848, and the main issue is that the GC is calling into the TypeLoader to construct new types. This should never be allowed, and what made it worse in that crash dump is the fact that server GC was being used, so the call to
GetThread()
was returning NULL, causing an AV during type construction.The AV needs the following conditions to repro:
cc @dotnet/crossgen-contrib