Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix thread state transition in get_function_pointer #9343

Merged
merged 1 commit into from
Sep 30, 2024

Conversation

filipnavara
Copy link
Member

get_function_pointer can be called from any thread. In particular, it can be called from Android thread pool thread when method marshaling is used. We need to ensure those threads stay in the GC safe mode from the runtime perspective once we stop executing user code on those threads. mono_thread_attach switches the thread to GC unsafe mode and it stays there after we stop executing user code. The next GC will try to suspend the thread which cannot succeed since there's no cooperative runtime code running on that thread anymore.

The fix is to use mono_jit_thread_attach which mimics what the iOS/macOS interop does. It creates the necessary MonoVM thread structures for JIT to work but doesn't transition the thread state.

Fixes #8253

`get_function_pointer` can be called from any thread. In particular, it can be called from Android thread pool thread when method marshaling is used. We need to ensure those threads stay in the GC safe mode from the runtime perspective once we stop executing user code on those threads. `mono_thread_attach` switches the thread to GC unsafe mode and it stays there after we stop executing user code. The next GC will try to suspend the thread which cannot succeed since there's no cooperative runtime code running on that thread anymore.

The fix is to use `mono_jit_thread_attach` which mimics what the iOS/macOS interop does. It creates the necessary MonoVM thread structures for JIT to work but doesn't transition the thread state.
@jonpryor
Copy link
Member

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

jonpryor pushed a commit that referenced this pull request Sep 30, 2024
Context: #8253
Context: #9343

We've had a number of reports about *hangs* when LLVM Marshal Methods
are enabled (see also #8253), but we had been willing
to accept having LLVM Marshal Methods enabled by default as it helps
improve app startup time.

However, [a customer reported a native crash][0] when LLVM Marshal
Methods are enabled:

> ![Thread Stack Trace][1]

	syscall at line 28 within libc
	__futex_wait_ex(void volatile*, bool, int, bool, timespec const*) at line 144 within libc
	sem_wait at line 108 within libc
	within libmonosgen-2.0.so (Build Id: …)
	within <anonymous:…>

This native crash suggests "something going wrong" within MonoVM.
We find this "scary" enough that we feel it is once again prudent
to disable LLVM Marshal Methods by default in .NET 9, by setting the
default value of `$(AndroidEnableMarshalMethods)`=False.

Even though #9343 has a potential fix for the hang, we feel that
we're too close to .NET 9 GA to validate in the wild.  This feature
will need to wait for .NET 10.  🙁

To explicitly enable LLVM Marshal Methods, set
`$(AndroidEnableMarshalMethods)`=True in your App `.csproj`.

[0]: https://discord.com/channels/732297728826277939/732297837953679412/1288758900627345463
[1]: https://cdn.discordapp.com/attachments/732297837953679412/1288758900522483712/05F6ED48-2BBA-4FC0-A54F-81BB57606500.png?ex=66f659c1&is=66f50841&hm=8f6f78f283f8c03e9fb37a452ad5d0babaaad3dd4bcc1b4887f4dcf60f651c12&
jonathanpeppers pushed a commit to jonathanpeppers/xamarin-android that referenced this pull request Sep 30, 2024
Context: dotnet#8253
Context: dotnet#9343

We've had a number of reports about *hangs* when LLVM Marshal Methods
are enabled (see also dotnet#8253), but we had been willing
to accept having LLVM Marshal Methods enabled by default as it helps
improve app startup time.

However, [a customer reported a native crash][0] when LLVM Marshal
Methods are enabled:

> ![Thread Stack Trace][1]

	syscall at line 28 within libc
	__futex_wait_ex(void volatile*, bool, int, bool, timespec const*) at line 144 within libc
	sem_wait at line 108 within libc
	within libmonosgen-2.0.so (Build Id: …)
	within <anonymous:…>

This native crash suggests "something going wrong" within MonoVM.
We find this "scary" enough that we feel it is once again prudent
to disable LLVM Marshal Methods by default in .NET 9, by setting the
default value of `$(AndroidEnableMarshalMethods)`=False.

Even though dotnet#9343 has a potential fix for the hang, we feel that
we're too close to .NET 9 GA to validate in the wild.  This feature
will need to wait for .NET 10.  🙁

To explicitly enable LLVM Marshal Methods, set
`$(AndroidEnableMarshalMethods)`=True in your App `.csproj`.

[0]: https://discord.com/channels/732297728826277939/732297837953679412/1288758900627345463
[1]: https://cdn.discordapp.com/attachments/732297837953679412/1288758900522483712/05F6ED48-2BBA-4FC0-A54F-81BB57606500.png?ex=66f659c1&is=66f50841&hm=8f6f78f283f8c03e9fb37a452ad5d0babaaad3dd4bcc1b4887f4dcf60f651c12&
jonathanpeppers added a commit that referenced this pull request Sep 30, 2024
…9346)

Context: #8253
Context: #9343

We've had a number of reports about *hangs* when LLVM Marshal Methods
are enabled (see also #8253), but we had been willing
to accept having LLVM Marshal Methods enabled by default as it helps
improve app startup time.

However, [a customer reported a native crash][0] when LLVM Marshal
Methods are enabled:

> ![Thread Stack Trace][1]

	syscall at line 28 within libc
	__futex_wait_ex(void volatile*, bool, int, bool, timespec const*) at line 144 within libc
	sem_wait at line 108 within libc
	within libmonosgen-2.0.so (Build Id: …)
	within <anonymous:…>

This native crash suggests "something going wrong" within MonoVM.
We find this "scary" enough that we feel it is once again prudent
to disable LLVM Marshal Methods by default in .NET 9, by setting the
default value of `$(AndroidEnableMarshalMethods)`=False.

Even though #9343 has a potential fix for the hang, we feel that
we're too close to .NET 9 GA to validate in the wild.  This feature
will need to wait for .NET 10.  🙁

To explicitly enable LLVM Marshal Methods, set
`$(AndroidEnableMarshalMethods)`=True in your App `.csproj`.

[0]: https://discord.com/channels/732297728826277939/732297837953679412/1288758900627345463
[1]: https://cdn.discordapp.com/attachments/732297837953679412/1288758900522483712/05F6ED48-2BBA-4FC0-A54F-81BB57606500.png?ex=66f659c1&is=66f50841&hm=8f6f78f283f8c03e9fb37a452ad5d0babaaad3dd4bcc1b4887f4dcf60f651c12&

Co-authored-by: Marek Habersack <grendel@twistedcode.net>
@jonathanpeppers jonathanpeppers merged commit 27b5d2e into dotnet:main Sep 30, 2024
55 of 57 checks passed
@filipnavara filipnavara deleted the patch-2 branch September 30, 2024 21:04
jonathanpeppers pushed a commit that referenced this pull request Sep 30, 2024
)

Fixes: #8253

`get_function_pointer` can be called from any thread. In particular,
it can be called from Android thread pool thread when method
marshaling is used. We need to ensure those threads stay in the GC
safe mode from the runtime perspective once we stop executing user
code on those threads. `mono_thread_attach` switches the thread to GC
unsafe mode and it stays there after we stop executing user code. The
next GC will try to suspend the thread which cannot succeed since
there's no cooperative runtime code running on that thread anymore.

The fix is to use `mono_jit_thread_attach` which mimics what the
iOS/macOS interop does. It creates the necessary MonoVM thread
structures for JIT to work but doesn't transition the thread state.
grendello added a commit that referenced this pull request Oct 1, 2024
dellis1972 pushed a commit that referenced this pull request Oct 2, 2024
@github-actions github-actions bot locked and limited conversation to collaborators Oct 31, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Known samples for AndroidEnableMarshalMethods=true
4 participants