-
Notifications
You must be signed in to change notification settings - Fork 3.3k
Linking
Emscripten supports linking object files - containing LLVM bitcode - statically. This lets most build systems work with Emscripten with little or no changes (see Building Projects).
In addition, Emscripten as of 1.32.2
has support for dynamic linking of JavaScript modules. This reduces throughput a little, so for best performance you should avoid it. However, the slowdown should be small, and can be reduced with proper design of the modules, see below for details.
Some use cases where dynamic linking can be useful:
- Fast iteration times during development. Build your app to several libraries, only rebuild the one you just modified. This avoids recompiling the entire world each time.
- Applications where some code changes more than other code. For example, you might build a core game engine once, then build some game-specific code separately, and link them dynamically. The split can be helpful if you download the core engine only once, but have multiple game-specific code modules (different games, or game updates).
- Avoiding Chrome memory issues with a single large asm.js module. There is a known issue on Chrome where compiling a single big codebase can run out of memory. Splitting the codebase up into smaller parts can work around this limitation (but hopefully it will be fixed in the browser).
Before we get to dynamic linking, let's talk about static linking. Emscripten's linking model is a little different than most native platforms. To understand it, consider that native linking models work in a setting where the following facts are true:
- The application runs directly on the local system, and has access to local system libraries, like C and C++ standard libraries, and others.
- Code size is not a big concern. In part this is because the system libraries already exist on the system, so "hello world" in C++ can be small, even if it uses a large amount of iostream code in the C++ standard library. But also, code size is perhaps a matter that influences cold startup times, in that more code takes longer to load from disk, but the cost is general not significant, and modern OSes mitigate it in various ways, like caching apps they expect to be loaded.
In Emscripten's case, code is typically going to run on the web. That means the following:
- The application is running in a sandbox. It has no local system libraries to dynamically link to; it must ship its own system library code.
- Code size is a major concern, as the application's code is being downloaded over the internet, which is many orders of magnitude slower than an installed native app on one's local machine.
For that reason, Emscripten's "object files" are simply LLVM bitcode. That bitcode has all the high-level information to perform efficient dead code elimination, especially for a standalone app, which is what we have. In other words, you statically link in the C standard library, and we strip out the parts (most of it!) that you don't actually use. Emscripten also automatically handles system libraries for you, in order to do the best possible job it can at getting them small.
An additional factor here is that Emscripten has "js libraries" - system libraries written in JavaScript. Such system libraries are the way we access APIs on the web. It's also a convenient way for people to connect compiled code and handwritten code on the same page. Thus, Emscripten has two types of system libraries: containing LLVM bitcode, and containing JavaScript, unlike native platforms. This is another reason for Emscripten to handle system libraries in a special way, and in particular, in a way that lets it strip out as much of those js libraries as it can, leaving only what is actually used, and again, that works best in the context of statically linking a standalone app with no external dependencies.
A downside to this approach is that it means we have focused less on dynamic linking.
Emscripten's dynamic linking is fairly simple: you build several separate code "modules" containing JavaScript, and can link them at runtime. The linking basically connects up the unresolved symbols in each module with the implemented symbols in the others, in the simplest of ways. It does not currently support nuances like weak symbols or corner cases of linkonce
semantics. It should work fine on "simple" code, that is, code not using fancy features from C/C++ extensions that affect linking. As mentioned earlier, dynamic linking just hooks up an unresolved symbol to an implementation of it in another module, on a first come first served manner.
System libraries do utilize some more advanced linking features. For that reason, Emscripten tries to simplify the problem as follows: There are two types of shared modules:
- Main modules, which have system libraries linked in.
- Side modules, which do not have system libraries linked in.
A project should contain exactly one main module. It can then be linked at runtime to multiple side modules. This model also makes other things simplier, like only the singleton main module has the general JavaScript enviroment setup code to connect to the web page and so forth; side modules contain just the pure compiled LLVM bitcode and nothing more.
The one tricky aspect to this design is that a side module might need a system library that the main doesn't know about. See the section on system libraries, below, for how to handle that.
Note that the "main module" doesn't need to contain the main()
function. It could just as easily be in a side module. What makes the main module the "main" module is just that there is only one main module, and only it has system libs linked in.
(Note that system libraries are linked in to the main module statically. We still have some optimizations from doing it that way, even if we can't dead code eliminate as well as we'd like.)
If you want to jump to see running code, you can look in the test suite. There are test_dylink_*
tests that test general dynamic linking, and test_dlfcn_*
tests that test dlopen()
specifically. Otherwise, we describe the procedure now.
- Build one part of your code as the main module, using
-s MAIN_MODULE=1
. (You hopefully don't need to, but may be required to do something for system libraries here, see later below.) - Build other parts of your code as side modules, using
-s SIDE_MODULE=1
.
Note that both should have suffix .js
, as they contain JavaScript (emcc
uses suffixes to know what to emit). If you want, you can then rename the side modules to .so
or such (but it is just a superficial change.)
You then need to tell the main module to load the sides. You can do that using the Module
object, with something like
Module.dynamicLibraries = ['libsomething.js'];
At runtime, when you run the main module, if it sees dynamicLibraries
on Module
, then it loads them one by one and links them. The running application then can access code from any of the modules linked together.
dlopen()
is slightly simpler than general dynamic linking. The procedure begins in the same way, with the same flags used to build the main and side modules. The difference is that you do not use Module.dynamicLibraries
; instead, you must load the side module into the filesystem, so that dlopen
(or fopen
, etc.) can access it. That's basically it - you can then use dlopen(), dlsym()
, etc. normally.
As mentioned earlier, system libraries are handled in a special way by the Emscripten linker, and in dynamic linking, only the main module is linked against system libraries. A possible issue is if a side module needs a system library that the main does not. If so, you'll get a runtime error. This section explains what to do to fix that.
To get around this, you can build the main module with EMCC_FORCE_STDLIBS=1
in the environment to force inclusion of all standard libs. A more refined approach is to build the side module with -v
in order to see which system libs are actually needed - look for including lib[...]
messages - and then building the main module with something like EMCC_FORCE_STDLIBS=libcxx,libcxxabi
(if you need those two libs).
Emscripten's dynamic linking typically causes a 5-10% slowdown, in good conditions.
We need to work around the fact that function pointer calls in asm.js cannot call out of the module. To relax that limitation, we use Emscripten's emulated function pointers option, which implements function tables outside of asm.js code, in a single shared area for all the modules. This makes cross-module calls work, but they aren't fast. To speed them up, we have a fast-path if the function pointer call is in the current module. If so, we use a local copy of the function tables, which is much more efficient.
This means that performance will be best if you minimize the amount of calls between modules, and in particular of function pointer calls between modules. In the worst case, if all calls are cross-module, then the slowdown can be very large, perhaps running half as fast. But, if you design the modules so most calls are inside the same module, the 5-10% figure should be achievable (that number was measured on box2d and bullet, which should be fairly realistic C++ codebases, including virtual calls and so forth).
By default, main modules disable dead code elimination. That means that all the code compiled remains in the output, including all system libraries linked in, and also all the JS library code.
That is the default behavior since it is the least surprising. But it is also possible to use normal dead code elimination, by building with -s MAIN_MODULE=2
(instead of 1). In that mode, the main module is built normally, with no special behavior for keeping code alive. It is then your responsibility to make sure that code that side modules need is kept alive. You can do this in the usual ways, like adding to EXPORTED_FUNCTIONS
. See other.test_minimal_dynamic
for an example of this in action.
Native linkers generally only run code when all symbols are resolved. Emscripten's dynamic linker hooks up symbols to unresolved references to those symbols dynamically. As a result, we don't check if any symbols remain unresolved, and code can start to run even if there are. It will run successfully if they are not called in practice. If they are, you will get a runtime error. What went wrong should be clear from the stack trace (in an unminified build); building with -s ASSERTIONS=1
can help some more.
As a simple result from how it is implemented, Emscripten's dynamic linker can perform general dynamic linking - not just dlopen
- at runtime! For example, you can write this in your C code:
EM_ASM({
Runtime.loadDynamicLibrary('sideModule.js');
});
That will load and link a side module, entirely at runtime. If your module uses symbols that are resolved in that side module, they will be accessible. Note that you probably shouldn't depend on this feature, but it might be useful.
- A known limitation is that while functions work fine, global variables that are linked might not. We link globals through function calls, and try to call them rarely - once per basic block. If you link and call the symbol within the same basic block, bad stuff might happen.
Dynamic linking is supported in WASM with the following caveats:
-
dlopen(NULL, ...)
(self-loading) is not supported yet, cf.test_dlfcn_self
. - if you use
EMULATE_FUNCTION_POINTER_CASTS=1
, define it when compiling both the main and the side modules. This is not needed with WASM=0, but causes runtime errors with WASM=1. - consider testing the patch at #5436 if you get errors about symbol
_emscripten_glUniform1f$legalf32
on startup
Currently not available as of 2018-05 (-s LINKABLE=1 is not supported with -s USE_PTHREADS>0!
).
README.md ``