Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement signature-based thunking of function pointers #1868

Merged

Conversation

neobrain
Copy link
Member

@neobrain neobrain commented Jul 29, 2022

Overview

This PR adds support for manual marshaling of function pointers across guest<->host boundaries and applies it for extending X11 thunks. Previously, marshaling was restricted to function pointers detected by the generator in specific scenarios (e.g. functions queried through glGetProcAddr or callback parameters used in e.g. XSetIOError). We were unable to thunk many things not falling into those categories, such as function pointers embedded in structures/vtables or global function pointers.

The underlying features guest-callable host functions and host-trampolines for guest functions are now based solely on the signature of the target function. For example, instead of generating separate host-trampolines for glEnd and glFlush, only a single host-trampoline is generated for the function signature void(). Even without struct repacking (#1611), this enables manual thunking of vtables and other advanced forms of function pointers.

Effect on compatibility

X11 thunks are much more functional now, making many more applications work with OpenGL thunking, such as:

  • FTL
  • Steel Rats (Vulkan game)
  • Little Inferno
  • 7 Billion Humans
  • Neverwinter Nights: Enhanced edition
  • Magicka 2
  • Super Hexagon
  • They Bleed Pixels
  • World of Goo
  • VVVVVV
  • Unbound: Worlds Apart (Vulkan game)
  • SuperHOT
  • Super Meat Boy
  • SteamWorld Quest: Hand of Gilgamech
  • SteamWorld Dig 2
  • Starbound
  • SOMA
  • Snowyland
  • Iconoclasts
  • Shovel Knight
  • Forsaken Remastered
  • Xonotic

(Thanks to @Sonicadvance1 for testing most of these!)

Implementation details

A tricky challenge for guest-function pointers called from the host is that they combine information from the guest-side (target address, packing function) and from the host-side (unpacking function) to generate host-trampolines. With automatic marshaling, this was resolved by referring to host-side information by its SHA256 identifier, but that is not practical for handwritten code. Thunking XInitThreads is a good stress-test here, since in that function we need to wrap a host function pointer in a custom function that sets up a vtable for use by the guest.

I tried out different approaches to avoid this, nicknamed ALTERNATIVE 1, ALTERNATIVE 2 and MAIN VARIANT:

  • ALTERNATIVE 1: Sets up host-trampolines purely on the guest-side. For XInitThreads, this approach requires a lot of "bouncing" between guest and host: The guest first calls the native library function, and then it needs to query a function pointer (ACTUAL_XInitDisplayLock_fn) from the host (and make it guest-callable), replace ACTUAL_XInitDisplayLock_fn with a host-trampoline wrapping a custom guest function, which then takes care of making some vtable entries guest-callable.
  • ALTERNATIVE 2: Allocates host-trampolines on the guest and finalizes them on the guest-side. For XInitThreads, this slightly cleans up the wrapping of XInitDisplayLock to be wrapped on the host directly, resulting in less bouncing between guest and host.
  • MAIN VARIANT: Allows allocating host-trampolines from either guest or host, and always finalizes them on the host. For XInitThreads, this moves most nontrivial logic to the host-side, making for minimal bouncing and more localized code logic.

MAIN VARIANT is the version proposed for merging, but for illustration the other ones are included here as well. They'll be dropped from this PR once the code is fully ready.

TODO

  • Deprecate old callback generation
  • Pick more fitting names for new interfaces
  • Add abstraction over the AllocateHostTrampolineForGuestFunction pattern used in MAIN VARIANT
  • Enable -rdynamic so host thunk libraries can use the new FEX APIs directly
  • Fix thunkgentest
  • Install libThunksAPI.so to the proper location (not applicable anymore since that library was dropped)

(void*&)InitFN = dlsym(Handle, InitSym.c_str());
if (!InitFN) {
ERROR_AND_DIE_FMT("LoadLib: Failed to find export {}", InitSym);
}

auto Exports = InitFN();
auto Exports = InitFN((uintptr_t)MakeHostTrampolineForGuestFunction, (uintptr_t)FinalizeHostTrampolineForGuestFunction);
Copy link
Contributor

@skmp skmp Jul 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe make it a struct here with the FEX api we want to be available?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New version uses FEX symbols directly, which allows for keeping the shared set of interfaces more easily in sync.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. How would you feel about FEX exposing a data symbol with function pointers inside? Makes it easier for versioning.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where do we need things to be versioned?

constexpr auto CBIndex = sizeof...(Args);
uintptr_t cb;
static_assert(CBIndex <= 17 || CBIndex == 23);
if constexpr(CBIndex == 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this be done with a0 being a undefined sized array + alloca?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Replying here since GitHub doesn't have threads in the "main" comment page)

I like the generic PackedArguments implementation, though I'd like to see its impact on compile time before adopting something like it. The current approach is very predictable in this regard due to being simple (if not particularly elegant), whereas the metaprogramming involved here could be more involved in terms of computational expense and symbol pollution (to store the offset for each set of template arguments).

Copy link
Contributor

@skmp skmp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did a quick read of iths.

Overall, this looks like a step in the right direction with nice cleanups.

A couple of corner cases that we should support:

  • Callback marshaling might need to be overriden using a <entrypoing, argnum> identifier. Not all callbacks with the same signature are guaranteed to need the same marshaling - however most will not need custom logic.
  • I don't think we should use PackedArguments as a template, as it complicates things. If we have to, I think a generic implementation is best.
  • From previous work, it could be wise to do the marshaling after the pack/unpack step, for efficiency. I think this is done partially in this PR, but I need to read more carefully.

WRT packing structs, I think we need to define them like

// Definitions
struct struct-name;

union Host<struct-name> { 
HostField<type, HostOffset> FieldName;
....
}

struct Guest<struct-name> {
HostField<type, GuestOffset> FieldName;
,,,,
}

// Copy operators
Host<struct-name>::operator=(const Guest<struct-name> &other) {
this->FieldName = other->FieldName;
....
}

Guest<struct-name>::operator=(const Host<struct-name> &other) {
this->FieldName = other->FieldName;
....
}

Then we can overload the copy operators per type, at any level we want, for custom marshaling.

We can provide some defaults for pointers, same sized ints, strings, etc

@skmp
Copy link
Contributor

skmp commented Jul 30, 2022

For custom callback mashaling we can do

template<auto Entrypoint, size_t ArgPos, auto CallbackSignature>
using MarshalCallback = SignatureBasedMarshaller<CallbackSignature>;

Then use MarshalCallback<__func__, __arg_pos__, type> which will by default resolve by callback signature, but can be overridden with a partial template specialization.

@skmp
Copy link
Contributor

skmp commented Jul 30, 2022

Got generic PackedArguments I think we can do

template<typename... Args>
struct PackedArguments {
uint8_t storage[Size()];

constexpr auto& operator[](constexpr ArgNum) {
if (ArgNum > sizeof(Args)) { // die }
size_t offs = 0;
if constexpr (ArgNum = 0) { return reinterpret_cast<Args[ArgNum]&>storage[offs] }
offs += sizeof(Args[0]);
if constexpr (ArgNum = 1) { return reinterpret_cast<Args[ArgNum]&>storage[offs] }
offs += sizeof(Args[1]);
...
// Probably there's some better way here, maybe with a recursive template, a recursion or a constexpr loop
}

static constexpr Size() { 
// Add up sizes of Args...
return ...;
}
}

Then for use

auto Args = (PackedArgs<...>*)alloca(sizeof(PackedArgs<...>));

@skmp
Copy link
Contributor

skmp commented Jul 30, 2022

This would also solve edge cases with different alignment requirements too (x86 has a bit looser alignment, so offset can get out of sync unless packed)

@Sonicadvance1
Copy link
Member

Sonicadvance1 commented Jul 30, 2022

Some testing results on Snapdragon 888.

NecroDancer:

  • Jumps from x86 code to AArch64 code in libX11
  • _XInitImageFuncPtrs passes internal functions to the argument
    • XCreateImage and Whatever this crashFunction is

Neverwinter Nights: Enhanced edition:

  • Infinite loop?

Magicka 2: Vulkan game but GL/X11 thunk changes behaviour

[SDL] SDL Error: Couldn't create window
[Crash] Error in foundation/os/linux/window.cpp:288, build (no build identifier)

Assertion failed: handle != nullptr
Then SIGSEGV

Papers Please:
Could not create SDL window: Couldn't create window.

They Bleed Pixels:
A fatal error has occurred.
I guess failed to create window?

World of Goo:
terminate called without an active exception

VVVVVV:
Error: could not set the game to windowed mode: Invalid window

Surviving Mars:
symbol lookup error: /mnt/AutoNFS/LinuxSteamGames/steamapps/common/Surviving Mars/MarsSteam: undefined symbol: XkbGetKeyboard

Super Meat Boy:
SDL_CreateWindow(640, 480, OPENGL) failed: Couldn't create window

SteamWorld Quest: Hand of Gilgamech:
SteamWorld Dig 2:
ErrorReporting.cpp:17 ERROR: ShowErrorMessage: Could not create the game window, error: Couldn't create window

Starbound:
[Error] Fatal Exception caught: (ApplicationException) Application: Could not create SDL window: Couldn't create window

Dicey Dungeons:

Could not create SDL window: Couldn't create window.
Could not create SDL renderer: Invalid window.

Little Inferno:
7 Billion Humans:
Super Hexagon:
Vagante:
Urban Trial Playground:
Super Inefficient Golf:
Stellaris:
SOMA:
Snowyland:

  • Crash without message

SuperHot:

  • Crash without message after loading xcb things

Unbound Worlds Apart: Vulkan Game

  • Not affected by these changes
  • Works fine with or without vulkan thunks enabled at the same time

Works
FTL

Doesn't get captured
Tomb Raider 2013

  • Spooky, no idea how this manages to escape us.

@neobrain
Copy link
Member Author

neobrain commented Aug 2, 2022

Thanks for the testing and reviews! (I'll get to the reviews later when the remaining kinks are worked out)

When running games in Steam, something is causing libX11 to be dropped from the symbol lookup set compared to running outside of Steam. That's what's causing most of the issues @Sonicadvance1 mentioned. As a workaround, set export LD_PRELOAD=/usr/share/fex-emu/GuestThunks/libX11-guest.so before starting Steam.

I also added another commit to add more X11 symbols, which should get most of the games mentioned above working. Personally, I successfully tested 7 Billion Humans, Magicka 2, SteamWorld Dig 2, Super Meat Boy, and World of Goo. The only thing I couldn't get working is the (Windows-only) VVVVVV demo, which doesn't even launch for me even without thunks. notepad.exe started working with this new commit though, so at least the winex11 driver itself is functional now.

@neobrain neobrain force-pushed the feature_thunk_funcptrs_by_signature branch from bd63b02 to 9e3fc95 Compare August 4, 2022 16:44
@neobrain
Copy link
Member Author

neobrain commented Aug 4, 2022

Rebased on latest main including the fix for Steam's gameoverlayrenderer.so from #1880, so this version should have compatibility with many (most?) of the titles tested above without any ugly LD_PRELOAD hacks 🥳

I also added a mechanism for host thunk libraries to more directly call FEX functions. Used interfaces must be wrapped in ThunksAPI (due to our use of --no-undefined when compiling host thunk libs), which isn't super convenient but on the other hand does prevent excessive mixing across module boundaries :)

@neobrain
Copy link
Member Author

neobrain commented Aug 4, 2022

Callback marshaling might need to be overriden using a <entrypoing, argnum> identifier. Not all callbacks with the same signature are guaranteed to need the same marshaling - however most will not need custom logic.

Worth keeping in mind. That would be material for a future PR adding thunk APIs that require this support. Without a concrete use case it's unclear to me what the best way to define such overrides is, as not all function pointers have a particular API endpoint associated to them.

We also still support manual function pointer adaption, so the custom_host_impl/custom_guest_entrypoint annotations may already serve some of those future use cases well.

This would also solve edge cases with different alignment requirements too (x86 has a bit looser alignment, so offset can get out of sync unless packed)

Scary. Maybe we should have the generator emit static_asserts in packing/unpacking functions to catch this? Not sure we can statically determine the expected size in the generator though.

@Sonicadvance1
Copy link
Member

This is necessary to compile post-merge

diff --git a/ThunkLibs/GuestLibs/CMakeLists.txt b/ThunkLibs/GuestLibs/CMakeLists.txt
index 53aa0731..7e53b41b 100644
--- a/ThunkLibs/GuestLibs/CMakeLists.txt
+++ b/ThunkLibs/GuestLibs/CMakeLists.txt
@@ -40,7 +40,7 @@ function(generate NAME SOURCE_FILE)
       OUTPUT "${OUTFILE}"
       DEPENDS "${GENERATOR_EXE}"
       DEPENDS "${SOURCE_FILE}"
-      COMMAND "${GENERATOR_EXE}" "${SOURCE_FILE}" "${NAME}" "-${WHAT}" "${OUTFILE}" -- -std=c++17
+      COMMAND "${GENERATOR_EXE}" "${SOURCE_FILE}" "${NAME}" "-${WHAT}" "${OUTFILE}" -- -std=c++17 -DGUEST_THUNK_LIBRARY
             # Expand include directories to space-separated list of -isystem parameters
            "$<$<BOOL:${prop}>:;-isystem$<JOIN:${prop},;-isystem>>"
       VERBATIM

Necrodancer needs the following:

diff --git a/ThunkLibs/libGL/libGL_Guest.cpp b/ThunkLibs/libGL/libGL_Guest.cpp
index 65635cb8..08b2daca 100644
--- a/ThunkLibs/libGL/libGL_Guest.cpp
+++ b/ThunkLibs/libGL/libGL_Guest.cpp
@@ -43,14 +43,19 @@ const std::unordered_map<std::string_view, uintptr_t /* guest function address *
     });

 extern "C" {
-       voidFunc *glXGetProcAddress(const GLubyte *procname) {
+  voidFunc *glXGetProcAddress(const GLubyte *procname) {
     auto Ret = fexfn_pack_glXGetProcAddress(procname);
     if (!Ret) {
-        return nullptr;
+      return nullptr;
     }

     auto TargetFuncIt = HostPtrInvokers.find(reinterpret_cast<const char*>(procname));
     if (TargetFuncIt == HostPtrInvokers.end()) {
+      std::string_view procname_s { reinterpret_cast<const char*>(procname) };
+      if (procname_s == "glXGetProcAddress" ||
+          procname_s == "glXGetProcAddress") {
+        return reinterpret_cast<voidFunc*>(glXGetProcAddress);
+      }
       // Extension found in host but not in our interface definition => treat as fatal error
       fprintf(stderr, "glXGetProcAddress: not found %s\n", procname);
       __builtin_trap();
@@ -58,11 +63,11 @@ extern "C" {

     LinkAddressToFunction((uintptr_t)Ret, TargetFuncIt->second);
     return Ret;
-       }
+  }

-       voidFunc *glXGetProcAddressARB(const GLubyte *procname) {
-               return glXGetProcAddress(procname);
-       }
+  voidFunc *glXGetProcAddressARB(const GLubyte *procname) {
+    return glXGetProcAddress(procname);
+  }
 }

 // libGL.so must pull in libX11.so as a dependency. Referencing some libX11
diff --git a/ThunkLibs/libGL/libGL_interface.cpp b/ThunkLibs/libGL/libGL_interface.cpp
index 1822cbc0..d1d9367a 100644
--- a/ThunkLibs/libGL/libGL_interface.cpp
+++ b/ThunkLibs/libGL/libGL_interface.cpp
@@ -94,6 +94,7 @@ template<> struct fex_gen_config<glXSelectEvent> {};
 template<> struct fex_gen_config<glXSwapBuffers> {};
 template<> struct fex_gen_config<glXUseXFont> {};
 template<> struct fex_gen_config<glXWaitGL> {};
+template<> struct fex_gen_config<glXWaitX> {};
 template<> struct fex_gen_config<glXChooseVisual> {};
 template<> struct fex_gen_config<glXGetVisualFromFBConfig> {};

But then it crashes.

@Sonicadvance1
Copy link
Member

Sonicadvance1 commented Aug 4, 2022

With updated PR and prior patches:

Works:

  • FTL
  • Steel Rats (Vulkan game)
  • Little Inferno
  • 7 Billion Humans
  • Neverwinter Nights: Enhanced edition
  • Magicka 2
    • FXAA shader compilation failure on Freedreno (Weird one), but runs
  • Papers, Please
    • Needs not found symbols to be not fatal
    • Tries to find glFrustrumf which is GLES 1.x only
  • Super Hexagon
  • They Bleed Pixels
  • World of Goo
  • VVVVVV
  • Unbound: Worlds Apart (Vulkan game)
  • SuperHOT
  • Super Meat Boy
  • SteamWorld Quest: Hand of Gilgamech
    • This one absolutely spams "Finalizing Trampoline" repeat messages
  • SteamWorld Dig 2
  • Starbound
    • Also spams finalizing trampoline
  • SOMA
  • Snowyland
    • Also spams finalizing trampoline
  • Dicey Dungeons
    • Needs not found symbols to be not fatal
    • Tries to find glFrustrumf which is GLES 1.x only
  • Iconoclasts
  • Shovel Knight

Doesn't work:

  • Crypt of the Necrodancer
  • Vagante
  • Urban Trial Playground
  • Super Inefficient Golf
    • Unreal Engine game

Needs more testing:

  • Surviving Mars
    • Early exits claiming it needs a GL 4.5 capable GPU. Will need to test with Zink and also radeonsi.
    • At least it creates a GL context and gets correct data now.
  • Stellaris
    • Paradox Launcher doesn't work, need to test the game alone
  • Ion. Fury
    • It asplode.

32-bit game:

  • Tomb Raider
    • This is why this didn't get captured previously.

So overall with previous patches and making glXGetProcAddress non-fatal if it can't find a host pointer. I'm very pleased with this result and am looking forward to having this in this month's release this weekend.
Hopefully you can finish up the remaining work and clean it up so we can merge it for more stable GL thunks.

@Sonicadvance1
Copy link
Member

Just to make sure that I double check this Install libThunksAPI.so to the proper location task.
Needs to end up in /usr/lib/<triple-suffix>.
Right now in my testing it is ending up as /usr/lib/libThunksAPI.so

I would personally recommend a cmake option called LIB_SUFFIX which defaults to nothing and then the install path ends up being ${CMAKE_INSTALL_PREFIX}/lib${LIB_SUFFIX}/libThunksAPI.so

This allows us to support multi-arch distros and non-multiarch with just a configuration option, matching exactly what renderdoc does in this case.

@neobrain neobrain force-pushed the feature_thunk_funcptrs_by_signature branch from 9e3fc95 to 2aa6204 Compare August 5, 2022 17:17
@neobrain neobrain marked this pull request as ready for review August 5, 2022 17:18
@neobrain
Copy link
Member Author

neobrain commented Aug 5, 2022

Ready for review! (BUT DON'T MERGE; Some "lost" changes are grouped in the last commit that I'll need to squash tomorrow, but that shouldn't affect review) fully cleaned up now

The latest version tidies up the commit history, fixes the Finalizing Trampoline spam, and adds support for yet more function pointers (unlikely to fix any further games though).

The GL changes look fine but I'd rather not also put them into this PR, we can do a follow-up one though.

@skmp
Copy link
Contributor

skmp commented Aug 6, 2022

(Looking to the bash bug now, will review right after)

This changes how host trampolines for guest functions are created. Instead
of doing this purely on the guest-side, it's either the host-side that
creates them in a single step *or* a cooperative two-step initialization
process must be used. In the latter, trampolines are allocated and partially
initialized on the guest and must be finalized on the host before use.
@neobrain neobrain force-pushed the feature_thunk_funcptrs_by_signature branch from 4829c05 to 4b7ed95 Compare August 8, 2022 14:30
@neobrain
Copy link
Member Author

neobrain commented Aug 8, 2022

Rebased, addressed review feedback, and did a final round of cleanups on commit history. Ready for merge from my end.

Copy link
Member

@Sonicadvance1 Sonicadvance1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

\o/

@Sonicadvance1 Sonicadvance1 merged commit 62d9961 into FEX-Emu:main Aug 9, 2022
* is to allocate the trampoline for a given GuestTarget/GuestUnpacker on the guest-side,
* and provide the HostPacker host-side.
*/
__attribute__((visibility("default")))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we have a FEX_DEFAULT_VISIBILITY macro for this?

Copy link
Contributor

@skmp skmp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, this has some nice cleanups, and it's great to see host interface with FEX.

I totally disagree on having to pass the unpacker/packers from the guest to host, why can't we keep using the sha256 interfaces to keep things consistent and decoupled?

It is really unfortunate this was merged before a thorough review.

@@ -45,7 +72,7 @@ struct fex_guest_function_ptr {

#define EXPORTS(name) \
extern "C" { \
ExportEntry* fexthunks_exports_##name() { \
ExportEntry* fexthunks_exports_##name(uintptr_t allocate, uintptr_t finalize) { \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uintptr_t allocate, uintptr_t finalize should be removed

@@ -75,4 +102,89 @@ struct GuestcallInfo {
#elif defined(_M_ARM_64)
#define LOAD_INTERNAL_GUESTPTR_VIA_CUSTOM_ABI(target_variable) \
asm volatile("mov %0, x11" : "=r" (target_variable))
#else
#define LOAD_INTERNAL_GUESTPTR_VIA_CUSTOM_ABI(target_variable) \
abort()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be at least an stderr print here

// doesn't know about them when linking thunk libraries. This issue is avoided
// by declaring the functions as weak symbols. Their implementation in this
// file serves as a panicking fallback if matching symbols are not found.
namespace FEXCore {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also import some stuff from LogMan here for logging

auto args = reinterpret_cast<PackedArguments<Result, Args..., uintptr_t>*>(argsv);
constexpr auto CBIndex = sizeof...(Args);
uintptr_t cb;
static_assert(CBIndex <= 17 || CBIndex == 23);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be folded directly in PackedArguments. We could name the last argument aLast in the template specialisations, to make it simpler.

@neobrain neobrain deleted the feature_thunk_funcptrs_by_signature branch August 9, 2022 14:58
@neobrain
Copy link
Member Author

I totally disagree on having to pass the unpacker/packers from the guest to host, why can't we keep using the sha256 interfaces to keep things consistent and decoupled?

Adding signature-based callbacks to the host-side export table and working with it in the particular XInitThreads use case was impractical. It required very careful back-and-forth marshaling of various symbols that I didn't feel comfortable seeing applied on a broader scale. I don't remember the specifics, but it was bad enough that I didn't even consider including it as an "Alternative 3".

Instead of reinventing ad-hoc solutions on a case-by-case basis, the this PR establishes the "prepare-finalize" approach as a pattern at the API level. This ensures uniformity across use cases and provides for a canonical place (Thunks.cpp) to lookup documentation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants