Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dlopen dylib executables from memory for Android/Linuxish platforms, not disk #3845

Closed
4 tasks
benvanik opened this issue Nov 15, 2020 · 4 comments
Closed
4 tasks
Assignees
Labels
codegen Shared code generation infrastructure and dialects ew 🤮 That shouldn't be that way and needs cleanup performance ⚡ Performance/optimization related work across the compiler and runtime platform/android 🤖 Android-specific build, execution, benchmarking, and deployment runtime Relating to the IREE runtime library

Comments

@benvanik
Copy link
Collaborator

We avoid writing the dylib file out to disk where available. There's some options:

On Linuxy-things we can (it seems) just dlopen from /proc/self/fd/NN. Need to verify positioning - could dup + seek if it takes the position of the file, and if not could try mmapping the memory again (pages should be identical) but at an offset (not sure if that works).

On BSD there's fdlopen that takes the fd directly. Also need to verify positioning.

On Android we can use ANDROID_DLEXT_USE_LIBRARY_FD ANDROID_DLEXT_USE_LIBRARY_FD_OFFSET of android_dlopen_ext to point dlopen directly to a byte offset in an fd: https://developer.android.com/ndk/reference/group/libdl

Unknown on mac yet. Could stick with the fallback to start.

If we can't get one of the functions to dlopen from a position (and instead just an offset) we can exploit the fact that flatbuffers are just pointers into contiguous address ranges so we can take the pointer to the dylib in the executable and subtract the base pointer of the mmapped region to get the file offset in the originally-mapped fd.

We can keep the current disk approach as the ultimate fallback, but there's a better fallback where we use memfd_create+tmpfs, memcpy into that from the flatbuffer mmapped memory, and then dlopen from that fd/fdlopen/android_dlopen_ext/etc. That wires memory that we otherwise wouldn't, but executable binaries are small (no large constants) and compared to touching the disk at all it's probably in the ballpark of 1000x+ faster.

  • test: linux w/ dlopen to existing wrapper mmap fd fd at position
  • test: linux w/ dlopen to memfd_create memcpy'd memory
  • test: android w/ android_dlopen_ext to existing wrapper mmap fd at position
  • test: android w/ android_dlopen_ext to memfd_create memcpy'd memory

If we can get these working then on those platforms we can completely eliminate file IO from the runtime. Both approaches also set us up for sandboxing (where we can map the wrapper flatbuffer in one process and then share the sealed fd with the sandbox process such that it can share the same pages for the executables).

@benvanik benvanik added runtime Relating to the IREE runtime library codegen Shared code generation infrastructure and dialects performance ⚡ Performance/optimization related work across the compiler and runtime platform/android 🤖 Android-specific build, execution, benchmarking, and deployment labels Nov 15, 2020
@benvanik benvanik added this to the 2020Q4 Core milestone Nov 15, 2020
@benvanik
Copy link
Collaborator Author

benvanik commented Nov 15, 2020

There are (ahem) ways of doing this on both windows and mac using executable pages (effectively what JITs do), providing our own relocation based on platform: https://github.com/fancycode/MemoryModule, https://github.com/malisal/loaders

Even better than that would be to exploit the fact that we constrain these executables: we can just embed elf 100% of the time and have the worlds smallest elf loader. They really are not complex for what we are doing (we aren't trying to build entire bundled applications with these, but instead just have a few naked functions with no global state, no C++, etc): https://github.com/malisal/loaders/blob/master/elf/elf.c#L51-L186

Smash that together with something like this for the pages: https://sourcegraph.com/github.com/bkaradzic/SwiftShader/-/blob/src/Reactor/ExecutableMemory.cpp
and you have a simple loader that should work everywhere we can run a JIT. Clearly not as good as dlopening the file (it wires pages, it requires being able to alloc executable pages, defeats code-signing, etc), but for Windows and MacOS that seems fine. iOS would need its own solution.

@benvanik
Copy link
Collaborator Author

We can use #3909 to provide the backing allocation for executable memory or mapping into the files.

@benvanik benvanik added the ew 🤮 That shouldn't be that way and needs cleanup label Nov 22, 2020
benvanik added a commit that referenced this issue Mar 24, 2021
This allowed for a lot of file IO code to go away - there was needless
abstraction here as there was only a single user of a lot of these things
that was already platform-specialized.

Progress on #4369 and #3848.
Fixes #4642.
Unblocks #3845, which can now be added cleanly.
benvanik added a commit that referenced this issue Mar 24, 2021
This allowed for a lot of file IO code to go away - there was needless
abstraction here as there was only a single user of a lot of these things
that was already platform-specialized.

Progress on #4369 and #3848.
Fixes #4642.
Unblocks #3845, which can now be added cleanly.

# Conflicts:
#	iree/hal/local/loaders/legacy_library_loader.c
@benvanik
Copy link
Collaborator Author

With #5221 we can now implement variants of iree/base/internal/dynamic_library_* for various platforms. It may make sense to split out the file-based loading from the memory-based loading into separate files, or do everything in platform switch blocks (which should end up with 10-20 lines per platform, so not too bad). Either way, now that all of the IO and dylib manipulation are in a single file there's no need for abstractions to make all this happen.

benvanik added a commit that referenced this issue Mar 24, 2021
This allowed for a lot of file IO code to go away - there was needless
abstraction here as there was only a single user of a lot of these things
that was already platform-specialized.

Progress on #4369 and #3848.
Fixes #4642.
Unblocks #3845, which can now be added cleanly.

# Conflicts:
#	iree/hal/local/loaders/legacy_library_loader.c
benvanik added a commit that referenced this issue Mar 24, 2021
This allowed for a lot of file IO code to go away - there was needless
abstraction here as there was only a single user of a lot of these things
that was already platform-specialized.

Progress on #4369 and #3848.
Fixes #4642.
Unblocks #3845, which can now be added cleanly.

# Conflicts:
#	iree/hal/local/loaders/legacy_library_loader.c
benvanik added a commit that referenced this issue Mar 24, 2021
This allowed for a lot of file IO code to go away - there was needless
abstraction here as there was only a single user of a lot of these things
that was already platform-specialized.

Progress on #4369 and #3848.
Fixes #4642.
Unblocks #3845, which can now be added cleanly.

# Conflicts:
#	iree/hal/local/loaders/legacy_library_loader.c
benvanik added a commit that referenced this issue Mar 24, 2021
This allowed for a lot of file IO code to go away - there was needless
abstraction here as there was only a single user of a lot of these things
that was already platform-specialized.

Progress on #4369 and #3848.
Fixes #4642.
Unblocks #3845, which can now be added cleanly.

# Conflicts:
#	iree/hal/local/loaders/legacy_library_loader.c
benvanik added a commit that referenced this issue Mar 25, 2021
This allowed for a lot of file IO code to go away - there was needless
abstraction here as there was only a single user of a lot of these things
that was already platform-specialized.

Progress on #4369 and #3848.
Fixes #4642.
Unblocks #3845, which can now be added cleanly.

# Conflicts:
#	iree/hal/local/loaders/legacy_library_loader.c
benvanik added a commit that referenced this issue Mar 25, 2021
This allowed for a lot of file IO code to go away - there was needless
abstraction here as there was only a single user of a lot of these things
that was already platform-specialized.

Progress on #4369 and #3848.
Fixes #4642.
Unblocks #3845, which can now be added cleanly.
@benvanik
Copy link
Collaborator Author

benvanik commented Oct 1, 2021

No longer needed; the embedded elf approach is the way to go here and only if someone finds themselves absolutely needing platform-native shared objects should we deal with such tomfoolery.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
codegen Shared code generation infrastructure and dialects ew 🤮 That shouldn't be that way and needs cleanup performance ⚡ Performance/optimization related work across the compiler and runtime platform/android 🤖 Android-specific build, execution, benchmarking, and deployment runtime Relating to the IREE runtime library
Projects
None yet
Development

No branches or pull requests

1 participant