Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NativeAot] win-x86: Funclets getting double called #99687

Closed
filipnavara opened this issue Mar 13, 2024 · 9 comments · Fixed by #99718
Closed

[NativeAot] win-x86: Funclets getting double called #99687

filipnavara opened this issue Mar 13, 2024 · 9 comments · Fixed by #99718

Comments

@filipnavara
Copy link
Member

Several of the runtime tests fail with a similar symptom:

in try
  in try
    in try
    in finally
  in filter
in filter
    in finally
  in filter
in filter
caught an exception!

FAILED!

[EXPECTED OUTPUT]
in try
  in try
    in try
    in finally
  in filter
in filter
caught an exception!

[DIFF RESULT]
< caught an exception!
---
>     in finally

Logs of all failed tests that hit this symptom:
testlogs.zip

Copy link
Contributor

Tagging subscribers to this area: @agocke, @MichalStrehovsky, @jkotas
See info in area-owners.md if you want to be subscribed.

@dotnet-policy-service dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label Mar 13, 2024
@jkotas
Copy link
Member

jkotas commented Mar 13, 2024

FWIW, I expect that there is going to be a long tail of issues related to Win x86 exception handling because of #99191 (comment) . I think we may need to get both CoreCLR and NAOT onto same plan to make Win x86 support in NAOT productized (or rethink the test strategy for Win x86 NAOT).

@filipnavara
Copy link
Member Author

filipnavara commented Mar 13, 2024

FWIW, I expect that there is going to be a long tail of issues related to Win x86 exception handling because of #99191 (comment) .

Very likely. I was hoping that there was some exposure of the code during linux-x86 CoreCLR bring up but apparently there are still gaps (possibly specific to the NativeAOT integration of the code).

I think we may need to get both CoreCLR and NAOT onto same plan to make Win x86 support in NAOT productized (or rethink the test strategy for Win x86 NAOT).

I'm focusing on getting the runtime tests passing in this configuration which is attacking the "test strategy" angle. I am open to exploring the CoreCLR+funclets option too. I just expect to run into the same issues regardless of whether we hit them first in CoreCLR or NativeAOT, so it makes sense to investigate them.

@jkotas
Copy link
Member

jkotas commented Mar 13, 2024

exposure of the code during linux-x86 CoreCLR bring up

FWIW, linux-x86 on CoreCLR has not been brought up to our regular quality bar, and it is completely broken at the moment - #96150 (comment) .

I am open to exploring the CoreCLR+funclets option too. I just expect to run into the same issues regardless of whether we hit them first in CoreCLR or NativeAOT, so it makes sense to investigate them.

Once you deal with the easy to repro functional issues, there are going to be stress related issues after that (GC reporting, etc.). We do not have the infrastructure like GC stress to find them on native AOT proactively. We depend on regular CoreCLR for that.

@filipnavara
Copy link
Member Author

filipnavara commented Mar 13, 2024

I suspect the cause will be switched argument order on the stack:

We do not have the infrastructure like GC stress to find them on native AOT proactively.

(I'd definitely prefer not to revive the GC stress code in NativeAOT.)

@filipnavara
Copy link
Member Author

So far I tracked it down to the "tryEnd"/"tryLength" offset of EH_CLAUSE_FAULT being incorrect in the decoded EH info. This causes it to incorrectly match by code offset during the second pass processing.

@filipnavara
Copy link
Member Author

win-x86: jitdump.x86.txt
win-x64: jitdump.x64.txt

X86:

3 EH table entries, 0 duplicate clauses, 3 total EH entries reported to VM
setEHcount(cEH=3)
EH#0: try [0018..002A) handled by [002E..0053) (finally)
EH#1: try [000E..002A) handled by [0060..0071) filter at [0053..0060)
EH#2: try [0004..002B) handled by [007E..008F) filter at [0071..007E)

X64:

3 EH table entries, 0 duplicate clauses, 0 cloned finallys, 3 total EH entries reported to VM
setEHcount(cEH=3)
EH#0: try [0025..0032) handled by [0043..0075) (finally)
EH#1: try [0018..003B) handled by [0090..00B0) filter at [0075..0090)
EH#2: try [000B..003C) handled by [00CB..00EB) filter at [00B0..00CB)

The end offset of try in EH#0 on x86 is wrong.

@filipnavara
Copy link
Member Author

I suspect this may be some artefact of FEATURE_EH_CALLFINALLY_THUNKS = 0.

@filipnavara
Copy link
Member Author

Confirmed my suspicion. This fixes it:

--- a/src/coreclr/jit/targetx86.h
+++ b/src/coreclr/jit/targetx86.h
@@ -54,8 +54,13 @@
   #define FEATURE_EH               1       // To aid platform bring-up, eliminate exceptional EH clauses (catch, filter,
                                            // filter-handler, fault) and directly execute 'finally' clauses.

+#ifdef FEATURE_EH_FUNCLETS
+  #define FEATURE_EH_CALLFINALLY_THUNKS 1  // Generate call-to-finally code in "thunks" in the enclosing EH region,
+                                           // protected by "cloned finally" clauses.
+#else
   #define FEATURE_EH_CALLFINALLY_THUNKS 0  // Generate call-to-finally code in "thunks" in the enclosing EH region,
                                            // protected by "cloned finally" clauses.
+#endif
   #define ETW_EBP_FRAMED           1       // if 1 we cannot use EBP as a scratch register and must create EBP based
                                            // frames for most methods
   #define CSE_CONSTS               1       // Enable if we want to CSE constants

@dotnet-policy-service dotnet-policy-service bot removed the untriaged New issue has not been triaged by the area owner label Mar 28, 2024
@github-actions github-actions bot locked and limited conversation to collaborators Apr 28, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

2 participants