Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[wasm] Blazor template AOT failures on windows. #49588

Closed
lewing opened this issue Mar 14, 2021 · 6 comments
Closed

[wasm] Blazor template AOT failures on windows. #49588

lewing opened this issue Mar 14, 2021 · 6 comments
Assignees
Labels
arch-wasm WebAssembly architecture area-Codegen-AOT-mono
Milestone

Comments

@lewing
Copy link
Member

lewing commented Mar 14, 2021

We're getting a runtime exception after an Windows AOT build of the template app. It looks like we're generating incorrect bitcode when emitting direct icalls to ves_icall_System_Environment_get_TickCount and ves_icall_System_Environment_get_TickCount64 methods. It looks like a possible collision happening in the native symbol resolution.

It presents as
image

The relevant section of llvm-dis output: diff is from windows (-failing) -> osx (+working)

-  %4 = notail call i32 @ves_icall_System_Environment_get_TickCount64()
+  %4 = notail call i32 @ves_icall_System_Environment_get_TickCount()
   %5 = load i32*, i32** @aotconst_interruption_request_flag, align 4, !invariant.load !0
   %t6 = load i32, i32* %5, align 4
   %6 = icmp eq i32 %t6, 0
   %7 = call i1 @llvm.expect.i1(i1 %6, i1 true)
   br i1 %7, label %BB4, label %BB5
 
+BB4:                                              ; preds = %BB6, %BB2
+  br label %BB1
+
 BB5:                                              ; preds = %BB2
   %8 = load i32*, i32** getelementptr inbounds ([16 x i32*], [16 x i32*]* @dummy_got, i32 0, i32 0), align 4
   %9 = bitcast i32* %8 to i32* ()*
@@ -63229,22 +63238,19 @@ BB5:                                              ; preds = %BB2
   %13 = call i1 @llvm.expect.i1(i1 %12, i1 true)
   br i1 %13, label %BB6, label %BB7
 
-BB4:                                              ; preds = %BB6, %BB2
-  br label %BB1
-
 BB1:                                              ; preds = %BB4
   %14 = phi i32 [ %4, %BB4 ]
   ret i32 %14
 
+BB6:                                              ; preds = %BB5
+  br label %BB4
+
 BB7:                                              ; preds = %BB5
   %15 = load volatile i32*, i32** %entry, align 4
   %16 = ptrtoint i32* %15 to i32
   %t21 = add i32 %16, 68
   %17 = icmp eq i32 %t21, 0
-  br i1 %17, label %EX_BB3, label %NOEX_BB5, !make.implicit !2
-
-BB6:                                              ; preds = %BB5
-  br label %BB4
+  br i1 %17, label %EX_BB3, label %NOEX_BB5, !make.implicit !3
 
 EX_BB3:                                           ; preds = %BB7
   br label %EX2_BB4
@@ -63264,11 +63270,11 @@ NOEX_BB5:                                         ; preds = %BB7
 
 NOTINITED_BB7:                                    ; No predecessors!
   call coldcc void @mono_aot_corlib_init_method(i32* bitcast ([6 x i8]* @info_aot_wrapper_corlib_System_System_dot_Environment__get_TickCount_pinvoke_i4_i4_ to i32*))
-  store i8 1, i8* getelementptr inbounds ([13021 x i8], [13021 x i8]* @mono_inited, i32 0, i32 281), align 1
+  store i8 1, i8* getelementptr inbounds ([13027 x i8], [13027 x i8]* @mono_inited, i32 0, i32 281), align 1
   br label %INITED_BB2
 }
 
-declare hidden i32 @ves_icall_System_Environment_get_TickCount64()
+declare hidden i32 @ves_icall_System_Environment_get_TickCount()
 
 ; Function Attrs: noinline uwtable
 define dso_local i64 @aot_wrapper_corlib_System_System_dot_Environment__get_TickCount64_pinvoke_i8_i8_(i32* %dummy_arg) #8 {
@@ -63278,7 +63284,7 @@ BB0:
   br label %INIT_BB1
 
 INIT_BB1:                                         ; preds = %BB0
-  %is_inited = load i8, i8* getelementptr inbounds ([13021 x i8], [13021 x i8]* @mono_inited, i32 0, i32 282), align 1
+  %is_inited = load i8, i8* getelementptr inbounds ([13027 x i8], [13027 x i8]* @mono_inited, i32 0, i32 282), align 1
   %0 = call i8 @llvm.expect.i8(i8 %is_inited, i8 1)
   %1 = icmp eq i8 %0, 0
   br i1 %1, label %INITED_BB2, label %INITED_BB2
@@ -63293,40 +63299,40 @@ BB3:                                              ; preds = %INITED_BB2
   br label %BB2
 
 BB2:                                              ; preds = %BB3
-  %4 = notail call i64 bitcast (i32 ()* @ves_icall_System_Environment_get_TickCount64 to i64 ()*)()
+  %4 = notail call i64 @ves_icall_System_Environment_get_TickCount64()
   %5 = load i32*, i32** @aotconst_interruption_request_flag, align 4, !invariant.load !0
   %t10 = load i32, i32* %5, align 4
   %6 = icmp eq i32 %t10, 0
   %7 = call i1 @llvm.expect.i1(i1 %6, i1 true)
   br i1 %7, label %BB4, label %BB5
 
-BB5:                                              ; preds = %BB2
-  %8 = load i32*, i32** getelementptr inbounds ([16 x i32*], [16 x i32*]* @dummy_got, i32 0, i32 0), align 4
-  %9 = bitcast i32* %8 to i32* ()*
-  %10 = notail call i32* @aot_wrapper_icall_mono_thread_interruption_checkpoint()
-  store i32* %10, i32** %entry, align 4
-  %11 = ptrtoint i32* %10 to i32
-  %12 = icmp eq i32 %11, 0
-  %13 = call i1 @llvm.expect.i1(i1 %12, i1 true)
-  br i1 %13, label %BB6, label %BB7
-
 BB4:                                              ; preds = %BB6, %BB2
-  %14 = phi i64 [ %4, %BB2 ], [ %4, %BB6 ]
+  %8 = phi i64 [ %4, %BB2 ], [ %4, %BB6 ]
   br label %BB1
 
+BB5:                                              ; preds = %BB2
+  %9 = load i32*, i32** getelementptr inbounds ([16 x i32*], [16 x i32*]* @dummy_got, i32 0, i32 0), align 4
+  %10 = bitcast i32* %9 to i32* ()*
+  %11 = notail call i32* @aot_wrapper_icall_mono_thread_interruption_checkpoint()
+  store i32* %11, i32** %entry, align 4
+  %12 = ptrtoint i32* %11 to i32
+  %13 = icmp eq i32 %12, 0
+  %14 = call i1 @llvm.expect.i1(i1 %13, i1 true)
+  br i1 %14, label %BB6, label %BB7
+
 BB1:                                              ; preds = %BB4
-  %15 = phi i64 [ %14, %BB4 ]
+  %15 = phi i64 [ %8, %BB4 ]
   ret i64 %15
 
+BB6:                                              ; preds = %BB5
+  br label %BB4
+
 BB7:                                              ; preds = %BB5
   %16 = load volatile i32*, i32** %entry, align 4
   %17 = ptrtoint i32* %16 to i32
   %t31 = add i32 %17, 68
   %18 = icmp eq i32 %t31, 0
-  br i1 %18, label %EX_BB3, label %NOEX_BB5, !make.implicit !2
-
-BB6:                                              ; preds = %BB5
-  br label %BB4
+  br i1 %18, label %EX_BB3, label %NOEX_BB5, !make.implicit !3
 
 EX_BB3:                                           ; preds = %BB7
   br label %EX2_BB4
@@ -63346,10 +63352,12 @@ NOEX_BB5:                                         ; preds = %BB7
 
 NOTINITED_BB7:                                    ; No predecessors!
   call coldcc void @mono_aot_corlib_init_method(i32* bitcast ([6 x i8]* @info_aot_wrapper_corlib_System_System_dot_Environment__get_TickCount64_pinvoke_i8_i8_ to i32*))
-  store i8 1, i8* getelementptr inbounds ([13021 x i8], [13021 x i8]* @mono_inited, i32 0, i32 282), align 1
+  store i8 1, i8* getelementptr inbounds ([13027 x i8], [13027 x i8]* @mono_inited, i32 0, i32 282), align 1
   br label %INITED_BB2
 }
 
+declare hidden i64 @ves_icall_System_Environment_get_TickCount64()
+
 ; Function Attrs: noinline uwtable
 define internal void @corlib_System_Environment_FailFast_string(i32* %arg_message, i32* %dummy_arg) #8 {
@lewing lewing added arch-wasm WebAssembly architecture area-Codegen-AOT-mono labels Mar 14, 2021
@lewing lewing added this to the 6.0.0 milestone Mar 14, 2021
@dotnet-issue-labeler dotnet-issue-labeler bot added the untriaged New issue has not been triaged by the area owner label Mar 14, 2021
@ghost
Copy link

ghost commented Mar 14, 2021

Tagging subscribers to 'arch-wasm': @lewing
See info in area-owners.md if you want to be subscribed.

Issue Details

We're getting an exception when AOT'ing the template app on windows. It looks like we're generating incorrect bitcode when emitting direct icalls to ves_icall_System_Environment_get_TickCount and ves_icall_System_Environment_get_TickCount64 methods. It looks like a possible collision happening in the native symbol resolution.

It presents like
image

-  %4 = notail call i32 @ves_icall_System_Environment_get_TickCount64()
+  %4 = notail call i32 @ves_icall_System_Environment_get_TickCount()
   %5 = load i32*, i32** @aotconst_interruption_request_flag, align 4, !invariant.load !0
   %t6 = load i32, i32* %5, align 4
   %6 = icmp eq i32 %t6, 0
   %7 = call i1 @llvm.expect.i1(i1 %6, i1 true)
   br i1 %7, label %BB4, label %BB5
 
+BB4:                                              ; preds = %BB6, %BB2
+  br label %BB1
+
 BB5:                                              ; preds = %BB2
   %8 = load i32*, i32** getelementptr inbounds ([16 x i32*], [16 x i32*]* @dummy_got, i32 0, i32 0), align 4
   %9 = bitcast i32* %8 to i32* ()*
@@ -63229,22 +63238,19 @@ BB5:                                              ; preds = %BB2
   %13 = call i1 @llvm.expect.i1(i1 %12, i1 true)
   br i1 %13, label %BB6, label %BB7
 
-BB4:                                              ; preds = %BB6, %BB2
-  br label %BB1
-
 BB1:                                              ; preds = %BB4
   %14 = phi i32 [ %4, %BB4 ]
   ret i32 %14
 
+BB6:                                              ; preds = %BB5
+  br label %BB4
+
 BB7:                                              ; preds = %BB5
   %15 = load volatile i32*, i32** %entry, align 4
   %16 = ptrtoint i32* %15 to i32
   %t21 = add i32 %16, 68
   %17 = icmp eq i32 %t21, 0
-  br i1 %17, label %EX_BB3, label %NOEX_BB5, !make.implicit !2
-
-BB6:                                              ; preds = %BB5
-  br label %BB4
+  br i1 %17, label %EX_BB3, label %NOEX_BB5, !make.implicit !3
 
 EX_BB3:                                           ; preds = %BB7
   br label %EX2_BB4
@@ -63264,11 +63270,11 @@ NOEX_BB5:                                         ; preds = %BB7
 
 NOTINITED_BB7:                                    ; No predecessors!
   call coldcc void @mono_aot_corlib_init_method(i32* bitcast ([6 x i8]* @info_aot_wrapper_corlib_System_System_dot_Environment__get_TickCount_pinvoke_i4_i4_ to i32*))
-  store i8 1, i8* getelementptr inbounds ([13021 x i8], [13021 x i8]* @mono_inited, i32 0, i32 281), align 1
+  store i8 1, i8* getelementptr inbounds ([13027 x i8], [13027 x i8]* @mono_inited, i32 0, i32 281), align 1
   br label %INITED_BB2
 }
 
-declare hidden i32 @ves_icall_System_Environment_get_TickCount64()
+declare hidden i32 @ves_icall_System_Environment_get_TickCount()
 
 ; Function Attrs: noinline uwtable
 define dso_local i64 @aot_wrapper_corlib_System_System_dot_Environment__get_TickCount64_pinvoke_i8_i8_(i32* %dummy_arg) #8 {
@@ -63278,7 +63284,7 @@ BB0:
   br label %INIT_BB1
 
 INIT_BB1:                                         ; preds = %BB0
-  %is_inited = load i8, i8* getelementptr inbounds ([13021 x i8], [13021 x i8]* @mono_inited, i32 0, i32 282), align 1
+  %is_inited = load i8, i8* getelementptr inbounds ([13027 x i8], [13027 x i8]* @mono_inited, i32 0, i32 282), align 1
   %0 = call i8 @llvm.expect.i8(i8 %is_inited, i8 1)
   %1 = icmp eq i8 %0, 0
   br i1 %1, label %INITED_BB2, label %INITED_BB2
@@ -63293,40 +63299,40 @@ BB3:                                              ; preds = %INITED_BB2
   br label %BB2
 
 BB2:                                              ; preds = %BB3
-  %4 = notail call i64 bitcast (i32 ()* @ves_icall_System_Environment_get_TickCount64 to i64 ()*)()
+  %4 = notail call i64 @ves_icall_System_Environment_get_TickCount64()
   %5 = load i32*, i32** @aotconst_interruption_request_flag, align 4, !invariant.load !0
   %t10 = load i32, i32* %5, align 4
   %6 = icmp eq i32 %t10, 0
   %7 = call i1 @llvm.expect.i1(i1 %6, i1 true)
   br i1 %7, label %BB4, label %BB5
 
-BB5:                                              ; preds = %BB2
-  %8 = load i32*, i32** getelementptr inbounds ([16 x i32*], [16 x i32*]* @dummy_got, i32 0, i32 0), align 4
-  %9 = bitcast i32* %8 to i32* ()*
-  %10 = notail call i32* @aot_wrapper_icall_mono_thread_interruption_checkpoint()
-  store i32* %10, i32** %entry, align 4
-  %11 = ptrtoint i32* %10 to i32
-  %12 = icmp eq i32 %11, 0
-  %13 = call i1 @llvm.expect.i1(i1 %12, i1 true)
-  br i1 %13, label %BB6, label %BB7
-
 BB4:                                              ; preds = %BB6, %BB2
-  %14 = phi i64 [ %4, %BB2 ], [ %4, %BB6 ]
+  %8 = phi i64 [ %4, %BB2 ], [ %4, %BB6 ]
   br label %BB1
 
+BB5:                                              ; preds = %BB2
+  %9 = load i32*, i32** getelementptr inbounds ([16 x i32*], [16 x i32*]* @dummy_got, i32 0, i32 0), align 4
+  %10 = bitcast i32* %9 to i32* ()*
+  %11 = notail call i32* @aot_wrapper_icall_mono_thread_interruption_checkpoint()
+  store i32* %11, i32** %entry, align 4
+  %12 = ptrtoint i32* %11 to i32
+  %13 = icmp eq i32 %12, 0
+  %14 = call i1 @llvm.expect.i1(i1 %13, i1 true)
+  br i1 %14, label %BB6, label %BB7
+
 BB1:                                              ; preds = %BB4
-  %15 = phi i64 [ %14, %BB4 ]
+  %15 = phi i64 [ %8, %BB4 ]
   ret i64 %15
 
+BB6:                                              ; preds = %BB5
+  br label %BB4
+
 BB7:                                              ; preds = %BB5
   %16 = load volatile i32*, i32** %entry, align 4
   %17 = ptrtoint i32* %16 to i32
   %t31 = add i32 %17, 68
   %18 = icmp eq i32 %t31, 0
-  br i1 %18, label %EX_BB3, label %NOEX_BB5, !make.implicit !2
-
-BB6:                                              ; preds = %BB5
-  br label %BB4
+  br i1 %18, label %EX_BB3, label %NOEX_BB5, !make.implicit !3
 
 EX_BB3:                                           ; preds = %BB7
   br label %EX2_BB4
@@ -63346,10 +63352,12 @@ NOEX_BB5:                                         ; preds = %BB7
 
 NOTINITED_BB7:                                    ; No predecessors!
   call coldcc void @mono_aot_corlib_init_method(i32* bitcast ([6 x i8]* @info_aot_wrapper_corlib_System_System_dot_Environment__get_TickCount64_pinvoke_i8_i8_ to i32*))
-  store i8 1, i8* getelementptr inbounds ([13021 x i8], [13021 x i8]* @mono_inited, i32 0, i32 282), align 1
+  store i8 1, i8* getelementptr inbounds ([13027 x i8], [13027 x i8]* @mono_inited, i32 0, i32 282), align 1
   br label %INITED_BB2
 }
 
+declare hidden i64 @ves_icall_System_Environment_get_TickCount64()
+
 ; Function Attrs: noinline uwtable
 define internal void @corlib_System_Environment_FailFast_string(i32* %arg_message, i32* %dummy_arg) #8 {
Author: lewing
Assignees: vargaz
Labels:

arch-wasm, area-Codegen-AOT-mono

Milestone: 6.0.0

@lewing lewing removed the untriaged New issue has not been triaged by the area owner label Mar 14, 2021
@lewing
Copy link
Member Author

lewing commented Mar 14, 2021

cc @pranavkm

@lewing
Copy link
Member Author

lewing commented Mar 15, 2021

msvc was merging the functions with /OPT:ICF and breaking the direct icall function lookup.

@jaykrell
Copy link
Contributor

@hez2010
Copy link
Contributor

hez2010 commented Mar 15, 2021

https://developercommunity.visualstudio.com/t/link-opticf-is-too-aggressive/1369631

I think this is by design, see notes in https://docs.microsoft.com/en-us/cpp/build/reference/opt-optimizations?view=msvc-160#arguments.

Because /OPT:ICF can cause the same address to be assigned to different functions or read-only data members (that is, const variables when compiled by using /Gy), it can break a program that depends on unique addresses for functions or read-only data members. For more information, see /Gy (Enable Function-Level Linking).

To fix it you need pass /OPT:NOICF to disable this behavior.

@jaykrell
Copy link
Contributor

The design imho is not quite right. There should be an option that says, never fold address-taken functions.
I believe gcc and/or clang are like that. I don't expect it to change but couldn't let it go w/o protest to msvc.

@lewing lewing closed this as completed Mar 15, 2021
radical added a commit to radical/runtime that referenced this issue Mar 16, 2021
radical added a commit that referenced this issue Mar 29, 2021
* [wasm] Wasm.Build.Tests: add support for shared builds

- Essentially, we want to share builds wherever possible. Example cases:

    - Same build, but run with different hosts like v8/chrome/safari, as
      separate test runs
    - Same build but run with different command line arguments

- Sharing builds especially helps when we are AOT'ing, which is slow!

- This is done by caching the builds with the key:

    `public record BuildArgs(string ProjectName, string Config, bool AOT, string ProjectFileContents, string? ExtraBuildArgs);`

- Also, ` SharedBuildClassFixture` is added, so that the builds can be
  cleaned up after all the tests in a particular class have finished
  running.

- Each test run gets a randomly generated test id. This is used for
  creating:
  1. build paths, like `artifacts/bin/Wasm.Build.Tests/net6.0-Release/browser-wasm/xharness-output/logs/n1xwbqxi.ict`
  2. and the log for running with xharness, eg. for Chrome, are in
     `artifacts/bin/Wasm.Build.Tests/net6.0-Release/browser-wasm/xharness-output/logs/n1xwbqxi.ict/Chrome/`

- split `WasmBuildAppTest.cs` into : `BuildTestBase.cs`, and
  `MainWithArgsTests.cs`.

* [wasm] Wasm.Build.Tests: Add test for relinking with InvariantGlobalization

* [wasm] Wasm.Build.Tests - Check `CultureInfo` for invariant culture

.. tests. Code stolen from @maximlipin's #49204

* fix invariant+aot test

* [wasm] Fix the order of include paths

For AOT we generate `pinvoke-table.h` in the obj directory. But there is
one present in the runtime pack too.

In my earlier changes the order in which these were passed as include
search paths was changed from:

`"-I/runtime/pack/microsoft.netcore.app.runtime.browser-wasm/Release/runtimes/browser-wasm/native/include/wasm" "-Iartifacts/obj/mono/Wasm.Console.Sample/wasm/Release/browser-wasm/wasm/"`

.. which meant that the one from the runtime pack took precedence, and
got used. So, fix the order!

And change the property names to indicate where they are sourced from.

* [wasm] Only test with Release config on CI

* [wasm] Fallback to `dotnet xharness` if `XHARNESS_CLI_PATH` is not set.

The environment variable is set on helix. During local testing it can be
useful when using a locally built xharness.

* [wasm] fix invariant test - 'en-ES' -> 'es-ES'

* [wasm] RunWithEmSdkEnv: log the working directory also

* [wasm] Re-enable wasm build tests

* [wasm] Add regression test for issue #49588

* fix test

* [wasm] Cleanup, and add more tests

* Update tests to track wasm relinking being default in some cases

* Fix InvariantGlobalization to track change in wasm relinking defaults

* [wasm] Update emsdk check message to track changes

* [wasm] Update tests to track changes

* [wasm] Move Scenario=BuildWasmApps to be submitted first

TLDR;
- this might help with the job getting scheduled first, and thus having
  a chance at completing at the same time as others.

Reasoning:

The problem this is trying to fix is:

1. The helix step submits 3 jobs:
    a. library tests to be run with v8
    b. library tests to be run with browser (scenario=wasmtestonbrowser)
    c. Wasm.Build.Tests (scenario=buildwasmapps)

2. The 3 jobs, individually take roughly 30mins each
3. And they get submitted at roughly the same time
4. But .. the first two seem to complete earlier, and the 3rd one
   completes 25-30mins later.

The hypothesis is that all the machines might be busy processing the
200+ work items from each of the first two steps, and so
Wasm.Build.Tests get scheduled pretty late.

So, here we move that to be submitted first, in the hope that it would
be able to run in parallel with the other jobs.
@ghost ghost locked as resolved and limited conversation to collaborators Apr 14, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
arch-wasm WebAssembly architecture area-Codegen-AOT-mono
Projects
None yet
Development

No branches or pull requests

4 participants