-
Notifications
You must be signed in to change notification settings - Fork 123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement signature-based thunking of function pointers #1868
Implement signature-based thunking of function pointers #1868
Conversation
(void*&)InitFN = dlsym(Handle, InitSym.c_str()); | ||
if (!InitFN) { | ||
ERROR_AND_DIE_FMT("LoadLib: Failed to find export {}", InitSym); | ||
} | ||
|
||
auto Exports = InitFN(); | ||
auto Exports = InitFN((uintptr_t)MakeHostTrampolineForGuestFunction, (uintptr_t)FinalizeHostTrampolineForGuestFunction); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe make it a struct here with the FEX api we want to be available?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New version uses FEX symbols directly, which allows for keeping the shared set of interfaces more easily in sync.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. How would you feel about FEX exposing a data symbol with function pointers inside? Makes it easier for versioning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where do we need things to be versioned?
ThunkLibs/include/common/Host.h
Outdated
constexpr auto CBIndex = sizeof...(Args); | ||
uintptr_t cb; | ||
static_assert(CBIndex <= 17 || CBIndex == 23); | ||
if constexpr(CBIndex == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this be done with a0 being a undefined sized array + alloca?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Replying here since GitHub doesn't have threads in the "main" comment page)
I like the generic PackedArguments implementation, though I'd like to see its impact on compile time before adopting something like it. The current approach is very predictable in this regard due to being simple (if not particularly elegant), whereas the metaprogramming involved here could be more involved in terms of computational expense and symbol pollution (to store the offset for each set of template arguments).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did a quick read of iths.
Overall, this looks like a step in the right direction with nice cleanups.
A couple of corner cases that we should support:
- Callback marshaling might need to be overriden using a
<entrypoing, argnum>
identifier. Not all callbacks with the same signature are guaranteed to need the same marshaling - however most will not need custom logic. - I don't think we should use PackedArguments as a template, as it complicates things. If we have to, I think a generic implementation is best.
- From previous work, it could be wise to do the marshaling after the pack/unpack step, for efficiency. I think this is done partially in this PR, but I need to read more carefully.
WRT packing structs, I think we need to define them like
// Definitions
struct struct-name;
union Host<struct-name> {
HostField<type, HostOffset> FieldName;
....
}
struct Guest<struct-name> {
HostField<type, GuestOffset> FieldName;
,,,,
}
// Copy operators
Host<struct-name>::operator=(const Guest<struct-name> &other) {
this->FieldName = other->FieldName;
....
}
Guest<struct-name>::operator=(const Host<struct-name> &other) {
this->FieldName = other->FieldName;
....
}
Then we can overload the copy operators per type, at any level we want, for custom marshaling.
We can provide some defaults for pointers, same sized ints, strings, etc
For custom callback mashaling we can do
Then use |
Got generic
Then for use
|
This would also solve edge cases with different alignment requirements too (x86 has a bit looser alignment, so offset can get out of sync unless packed) |
Some testing results on Snapdragon 888. NecroDancer:
Neverwinter Nights: Enhanced edition:
Magicka 2: Vulkan game but GL/X11 thunk changes behaviour
Papers Please: They Bleed Pixels: World of Goo: VVVVVV: Surviving Mars: Super Meat Boy: SteamWorld Quest: Hand of Gilgamech: Starbound: Dicey Dungeons:
Little Inferno:
SuperHot:
Unbound Worlds Apart: Vulkan Game
Works Doesn't get captured
|
Thanks for the testing and reviews! (I'll get to the reviews later when the remaining kinks are worked out) When running games in Steam, something is causing libX11 to be dropped from the symbol lookup set compared to running outside of Steam. That's what's causing most of the issues @Sonicadvance1 mentioned. As a workaround, set I also added another commit to add more X11 symbols, which should get most of the games mentioned above working. Personally, I successfully tested 7 Billion Humans, Magicka 2, SteamWorld Dig 2, Super Meat Boy, and World of Goo. The only thing I couldn't get working is the (Windows-only) VVVVVV demo, which doesn't even launch for me even without thunks. |
bd63b02
to
9e3fc95
Compare
Rebased on latest main including the fix for Steam's gameoverlayrenderer.so from #1880, so this version should have compatibility with many (most?) of the titles tested above without any ugly I also added a mechanism for host thunk libraries to more directly call FEX functions. Used interfaces must be wrapped in ThunksAPI (due to our use of |
Worth keeping in mind. That would be material for a future PR adding thunk APIs that require this support. Without a concrete use case it's unclear to me what the best way to define such overrides is, as not all function pointers have a particular API endpoint associated to them. We also still support manual function pointer adaption, so the
Scary. Maybe we should have the generator emit |
This is necessary to compile post-merge diff --git a/ThunkLibs/GuestLibs/CMakeLists.txt b/ThunkLibs/GuestLibs/CMakeLists.txt
index 53aa0731..7e53b41b 100644
--- a/ThunkLibs/GuestLibs/CMakeLists.txt
+++ b/ThunkLibs/GuestLibs/CMakeLists.txt
@@ -40,7 +40,7 @@ function(generate NAME SOURCE_FILE)
OUTPUT "${OUTFILE}"
DEPENDS "${GENERATOR_EXE}"
DEPENDS "${SOURCE_FILE}"
- COMMAND "${GENERATOR_EXE}" "${SOURCE_FILE}" "${NAME}" "-${WHAT}" "${OUTFILE}" -- -std=c++17
+ COMMAND "${GENERATOR_EXE}" "${SOURCE_FILE}" "${NAME}" "-${WHAT}" "${OUTFILE}" -- -std=c++17 -DGUEST_THUNK_LIBRARY
# Expand include directories to space-separated list of -isystem parameters
"$<$<BOOL:${prop}>:;-isystem$<JOIN:${prop},;-isystem>>"
VERBATIM Necrodancer needs the following: diff --git a/ThunkLibs/libGL/libGL_Guest.cpp b/ThunkLibs/libGL/libGL_Guest.cpp
index 65635cb8..08b2daca 100644
--- a/ThunkLibs/libGL/libGL_Guest.cpp
+++ b/ThunkLibs/libGL/libGL_Guest.cpp
@@ -43,14 +43,19 @@ const std::unordered_map<std::string_view, uintptr_t /* guest function address *
});
extern "C" {
- voidFunc *glXGetProcAddress(const GLubyte *procname) {
+ voidFunc *glXGetProcAddress(const GLubyte *procname) {
auto Ret = fexfn_pack_glXGetProcAddress(procname);
if (!Ret) {
- return nullptr;
+ return nullptr;
}
auto TargetFuncIt = HostPtrInvokers.find(reinterpret_cast<const char*>(procname));
if (TargetFuncIt == HostPtrInvokers.end()) {
+ std::string_view procname_s { reinterpret_cast<const char*>(procname) };
+ if (procname_s == "glXGetProcAddress" ||
+ procname_s == "glXGetProcAddress") {
+ return reinterpret_cast<voidFunc*>(glXGetProcAddress);
+ }
// Extension found in host but not in our interface definition => treat as fatal error
fprintf(stderr, "glXGetProcAddress: not found %s\n", procname);
__builtin_trap();
@@ -58,11 +63,11 @@ extern "C" {
LinkAddressToFunction((uintptr_t)Ret, TargetFuncIt->second);
return Ret;
- }
+ }
- voidFunc *glXGetProcAddressARB(const GLubyte *procname) {
- return glXGetProcAddress(procname);
- }
+ voidFunc *glXGetProcAddressARB(const GLubyte *procname) {
+ return glXGetProcAddress(procname);
+ }
}
// libGL.so must pull in libX11.so as a dependency. Referencing some libX11
diff --git a/ThunkLibs/libGL/libGL_interface.cpp b/ThunkLibs/libGL/libGL_interface.cpp
index 1822cbc0..d1d9367a 100644
--- a/ThunkLibs/libGL/libGL_interface.cpp
+++ b/ThunkLibs/libGL/libGL_interface.cpp
@@ -94,6 +94,7 @@ template<> struct fex_gen_config<glXSelectEvent> {};
template<> struct fex_gen_config<glXSwapBuffers> {};
template<> struct fex_gen_config<glXUseXFont> {};
template<> struct fex_gen_config<glXWaitGL> {};
+template<> struct fex_gen_config<glXWaitX> {};
template<> struct fex_gen_config<glXChooseVisual> {};
template<> struct fex_gen_config<glXGetVisualFromFBConfig> {}; But then it crashes. |
With updated PR and prior patches: Works:
Doesn't work:
Needs more testing:
32-bit game:
So overall with previous patches and making glXGetProcAddress non-fatal if it can't find a host pointer. I'm very pleased with this result and am looking forward to having this in this month's release this weekend. |
Just to make sure that I double check this I would personally recommend a cmake option called This allows us to support multi-arch distros and non-multiarch with just a configuration option, matching exactly what renderdoc does in this case. |
9e3fc95
to
2aa6204
Compare
Ready for review! The latest version tidies up the commit history, fixes the The GL changes look fine but I'd rather not also put them into this PR, we can do a follow-up one though. |
(Looking to the bash bug now, will review right after) |
This changes how host trampolines for guest functions are created. Instead of doing this purely on the guest-side, it's either the host-side that creates them in a single step *or* a cooperative two-step initialization process must be used. In the latter, trampolines are allocated and partially initialized on the guest and must be finalized on the host before use.
This recurring pattern combines the existing helpers GetCallerForHostFunction and LinkAddressToFunction.
4829c05
to
4b7ed95
Compare
Rebased, addressed review feedback, and did a final round of cleanups on commit history. Ready for merge from my end. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
\o/
* is to allocate the trampoline for a given GuestTarget/GuestUnpacker on the guest-side, | ||
* and provide the HostPacker host-side. | ||
*/ | ||
__attribute__((visibility("default"))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we have a FEX_DEFAULT_VISIBILITY macro for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, this has some nice cleanups, and it's great to see host interface with FEX.
I totally disagree on having to pass the unpacker/packers from the guest to host, why can't we keep using the sha256 interfaces to keep things consistent and decoupled?
It is really unfortunate this was merged before a thorough review.
@@ -45,7 +72,7 @@ struct fex_guest_function_ptr { | |||
|
|||
#define EXPORTS(name) \ | |||
extern "C" { \ | |||
ExportEntry* fexthunks_exports_##name() { \ | |||
ExportEntry* fexthunks_exports_##name(uintptr_t allocate, uintptr_t finalize) { \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
uintptr_t allocate, uintptr_t finalize
should be removed
@@ -75,4 +102,89 @@ struct GuestcallInfo { | |||
#elif defined(_M_ARM_64) | |||
#define LOAD_INTERNAL_GUESTPTR_VIA_CUSTOM_ABI(target_variable) \ | |||
asm volatile("mov %0, x11" : "=r" (target_variable)) | |||
#else | |||
#define LOAD_INTERNAL_GUESTPTR_VIA_CUSTOM_ABI(target_variable) \ | |||
abort() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should be at least an stderr print here
// doesn't know about them when linking thunk libraries. This issue is avoided | ||
// by declaring the functions as weak symbols. Their implementation in this | ||
// file serves as a panicking fallback if matching symbols are not found. | ||
namespace FEXCore { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should also import some stuff from LogMan here for logging
auto args = reinterpret_cast<PackedArguments<Result, Args..., uintptr_t>*>(argsv); | ||
constexpr auto CBIndex = sizeof...(Args); | ||
uintptr_t cb; | ||
static_assert(CBIndex <= 17 || CBIndex == 23); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be folded directly in PackedArguments
. We could name the last argument aLast
in the template specialisations, to make it simpler.
Adding signature-based callbacks to the host-side export table and working with it in the particular XInitThreads use case was impractical. It required very careful back-and-forth marshaling of various symbols that I didn't feel comfortable seeing applied on a broader scale. I don't remember the specifics, but it was bad enough that I didn't even consider including it as an "Alternative 3". Instead of reinventing ad-hoc solutions on a case-by-case basis, the this PR establishes the "prepare-finalize" approach as a pattern at the API level. This ensures uniformity across use cases and provides for a canonical place (Thunks.cpp) to lookup documentation. |
Overview
This PR adds support for manual marshaling of function pointers across guest<->host boundaries and applies it for extending X11 thunks. Previously, marshaling was restricted to function pointers detected by the generator in specific scenarios (e.g. functions queried through
glGetProcAddr
or callback parameters used in e.g.XSetIOError
). We were unable to thunk many things not falling into those categories, such as function pointers embedded in structures/vtables or global function pointers.The underlying features guest-callable host functions and host-trampolines for guest functions are now based solely on the signature of the target function. For example, instead of generating separate host-trampolines for
glEnd
andglFlush
, only a single host-trampoline is generated for the function signaturevoid()
. Even without struct repacking (#1611), this enables manual thunking of vtables and other advanced forms of function pointers.Effect on compatibility
X11 thunks are much more functional now, making many more applications work with OpenGL thunking, such as:
(Thanks to @Sonicadvance1 for testing most of these!)
Implementation details
A tricky challenge for guest-function pointers called from the host is that they combine information from the guest-side (target address, packing function) and from the host-side (unpacking function) to generate host-trampolines. With automatic marshaling, this was resolved by referring to host-side information by its SHA256 identifier, but that is not practical for handwritten code. Thunking
XInitThreads
is a good stress-test here, since in that function we need to wrap a host function pointer in a custom function that sets up a vtable for use by the guest.I tried out different approaches to avoid this, nicknamed
ALTERNATIVE 1
,ALTERNATIVE 2
andMAIN VARIANT
:ACTUAL_XInitDisplayLock_fn
) from the host (and make it guest-callable), replaceACTUAL_XInitDisplayLock_fn
with a host-trampoline wrapping a custom guest function, which then takes care of making some vtable entries guest-callable.XInitDisplayLock
to be wrapped on the host directly, resulting in less bouncing between guest and host.MAIN VARIANT
is the version proposed for merging, but for illustration the other ones are included here as well. They'll be dropped from this PR once the code is fully ready.TODO
AllocateHostTrampolineForGuestFunction
pattern used in MAIN VARIANT-rdynamic
so host thunk libraries can use the new FEX APIs directlythunkgentest
Install(not applicable anymore since that library was dropped)libThunksAPI.so
to the proper location