Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DSO handling for like-named libraries is broken #9081

Closed
daltzctr opened this issue Jul 3, 2024 · 7 comments · Fixed by #9117
Closed

DSO handling for like-named libraries is broken #9081

daltzctr opened this issue Jul 3, 2024 · 7 comments · Fixed by #9117
Assignees
Labels
Area: App+Library Build Issues when building Library projects or Application projects. Area: App Runtime Issues in `libmonodroid.so`. bug Component does not function as intended.
Milestone

Comments

@daltzctr
Copy link

daltzctr commented Jul 3, 2024

Android framework version

net9.0-android

Affected platform version

.NET 9 Preview 5

Description

It crashes

Steps to Reproduce

Run the following project in release mode with .NET 9 Preview 5 workloads.

Repro: https://github.com/daltzctr/maui-dotnet9-crash

Did you find any workaround?

Disable AOT

Relevant log output

No response

@daltzctr daltzctr added Area: App Runtime Issues in `libmonodroid.so`. needs-triage Issues that need to be assigned. labels Jul 3, 2024
@grendello grendello added Area: App+Library Build Issues when building Library projects or Application projects. and removed needs-triage Issues that need to be assigned. labels Jul 3, 2024
@grendello grendello added this to the .NET 9 milestone Jul 3, 2024
@grendello
Copy link
Contributor

So, I've been pondering how to fix this issue on our end and I don't see a way to get rid of the ambiguity involved here. Here's what's going on:

We have two input items:

  • SkiaSharp.dll, the managed library
  • libSkiaSharp.so, native library used by the managed one

At build time, the AOT compiler adds a third item, the AOT image generated from SkiaSharp.dll, which is written to file named SkiaSharp.dll.so and packaged by us under the libaot-SkiaSharp.dll.so name.
As part of the build, we mutate the input shared library name and generate an xxHash for each variation. Then, at the runtime, when MonoVM asks us to load a native library, we calculate the xxHash and look it up in a table generated at build time. This works very well, except in this scenario we have an ambiguity:

        %struct.DSOCacheEntry {
                i64 u0x12e73d483788709d, ; from name: SkiaSharp.so
                i64 u0x3cb282562b838c95, ; uint64_t real_name_hash
                i1 false, ; bool ignore
                ptr @.DSOCacheEntry.23_name, ; name: libaot-SkiaSharp.dll.so
                ptr null; void* handle
        }, ; 71
        %struct.DSOCacheEntry {
                i64 u0x12e73d483788709d, ; from name: SkiaSharp.so
                i64 u0x43db119dcc3147fa, ; uint64_t real_name_hash
                i1 false, ; bool ignore
                ptr @.DSOCacheEntry.7_name, ; name: libSkiaSharp.so
                ptr null; void* handle
        }, ; 72

Input names of SkiaSharp.dll.so and libSkiaSharp.so were at some point mutated to the same form of SkiaSharp.so - a reasonable mutation, because managed code may refer to libSkiaSharp.so using the [DllImport ("SkiaSharp")] attribute and so MonoVM will asks us to load SkiaSharp.so. However, the Mono JIT will at some point load the SkiaSharp.dll library and try to find its AOT image, resulting in request to load SkiaSharp.so as well! This time, however, MonoVM wants to find what we have packaged as libaot-SkiaSharp.dll.so and we have a conflict.

Putting aside the fact that the lookup table should not have any entries with the same hashes, we have a decision to make when MonoVM asks us to dlopen a library named SkiaSharp.so. There are two contexts in this case:

  1. We're being asked to do so as part of the P/Invoke mechanism. This is, essentially, not ambiguous since we know Mono wants the "real" library, not the AOT image
  2. The dlopen hook in our runtime is invoked by MonoVM to load the shared library, but we have no idea what's the purpose - it can refer both to libSkiaSharp.so (for some reason) and to libaot-SkiaSharp.dll.so and we can't resolve that ambiguity without more context.

The immediate solution, and one which I'm going to implement now, is to completely ignore libaot-SkiaSharp.dll.so when building the app - we won't package it, we won't load it when Mono JIT asks for its AOT image. Instead, we will in this case load the libSkiaSharp.so library, and once this dotnet issue is fixed, MonoVM's JIT will simply ignore the library, finding that it's not an AOT image. This is reasonable, because the application can work just fine without the AOT image loaded, but it cannot work without libSkiaSharp.so loaded (all the p/invokes would fail).

The better solution is to avoid the ambiguity. In order to do that, I think the best way would be if Mono JIT asked to load aot-Assembly.so when it loads Assembly.dll and looks for its AOT image - this could be implemented only on Android or it could be used on all the platforms. Furthermore, AOT image loader should then refrain from asking for other permutations on the name, i.e. it should not follow with the attempt to load Assembly.so etc.

/cc @lambdageek

@lambdageek
Copy link
Member

lambdageek commented Jul 8, 2024

the Mono JIT will at some point load the SkiaSharp.dll library and try to find its AOT image, resulting in request to load SkiaSharp.so as well

I'm not sure this is true. Unless I'm completely misreading the code, the name we ask for is assembly->image->name + ".so":

https://github.com/dotnet/runtime/blob/e3363265b20905a55f4c6a7560f70150692548f7/src/mono/mono/mini/aot-runtime.c#L2049

That is the first thing we try to find is SkiaSharp.dll.so which does not conflict with the native libSkiaSharp.so

In order to do that, I think the best way would be if Mono JIT asked to load aot-Assembly.so when it loads Assembly.dll and looks for its AOT image - this could be implemented only on Android or it could be used on all the platforms.

I think if we were to implement this, I would suggest something like mono-aot://assembly.dll and then Android SDK's native loader hook could resolve that however it wants.

(This code path would be primarily android only - ios and wasm use static linking for AOT images so they never do the dlopen call at all)

Furthermore, AOT image loader should then refrain from asking for other permutations on the name, i.e. it should not follow with the attempt to load Assembly.so etc.

I think we can probably add a new flag to mono_dl_open - something like MONO_DL_FALLBACK_ONLY that would just pass the name to the native loader fallback function and not attempt a native dlopen at all

I'll add a comment to dotnet/runtime#104397

@grendello
Copy link
Contributor

@lambdageek I'll get to working on this this week, will log everything relevant and we'll see what's what. But I think the runtime requests SkiaSharp.so, which causes the whole confusion.

@grendello
Copy link
Contributor

With regards to mono-aot:// - can we make it as short as possible? If you want to follow the "protocol" syntax, can we make it aot://? The reason I ask is that we calculate the xxHash from the string, and I'd like it to take as little time as possible, even if it saves 1.23 cycles :)

@jpobst jpobst added the bug Component does not function as intended. label Jul 9, 2024
@daltzctr
Copy link
Author

Any status update on this?

@grendello
Copy link
Contributor

As dotnet/runtime#104397 is still open, I can't implement the original plan as described above but I'll see if I can come up with a temporary workaround until the runtime issues are fixed.

@grendello
Copy link
Contributor

PR #9117 implements a new approach to fixing/working around this issue (the runtime changes @lambdageek is working on are still needed and useful) - it splits the "regular" DSO cache and AOT DSO cache into separate sets. This way we avoid ambiguities like the one in this issue. The PR knows when it is being asked to find a shared library for a p/invoke (this is done via the p/invoke override MonoVM hook, so Mono itself doesn't load the DSO in question) and when Mono might be asking to load the AOT DSO. In the latter case, the PR first checks the AOT DSO cache for a match and then the "regular" cache, as we know p/invoke calls won't be going along this code path.

Attached are binaries from the PR build which contain the changes. @daltzctr to test them, you'll need to find location of the .NET for Android SDK on your disk and replace Xamarin.Android.Build.Tasks.* with the version from the zip below and then find instances of libmono-android.debug.so and libmono-android.release.so in the SDK tree and replace them with the corresponding ABI versions from the zip below.

The test app doesn't crash for me with those changes, it however crashes with something else:

07-18 18:58:33.318 11133 11133 F mono-rt : [ERROR] FATAL UNHANDLED EXCEPTION: System.MethodAccessException: Method `Android.Runtime.AndroidEnvironmentInternal.UnhandledException(System.Exception)' is inaccessible from method `AndroidX.Fragment.App.Fragment.n_OnCreateView_Landroid_view_LayoutInflater_Landroid_view_ViewGroup_Landroid_os_Bundle__mm_wrapper(intptr,intptr,intptr,intptr,intptr)'
07-18 18:58:33.318 11133 11133 F mono-rt :    at AndroidX.Fragment.App.Fragment.n_OnCreateView_Landroid_view_LayoutInflater_Landroid_view_ViewGroup_Landroid_os_Bundle__mm_wrapper(IntPtr jnienv, IntPtr native__this, IntPtr native_inflater, IntPtr native_container, IntPtr native_savedInstanceState)

This is a very weird exception, which might have to do with marshal methods support in our runtime. If you hit the same issue, disable marshal methods by setting the AndroidEnableMarshalMethods MSBuild property to false
binaries.zip

@github-actions github-actions bot locked and limited conversation to collaborators Sep 29, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Area: App+Library Build Issues when building Library projects or Application projects. Area: App Runtime Issues in `libmonodroid.so`. bug Component does not function as intended.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants