-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Strip symbols from produced .wasm #6904
Comments
These are the c++ mangled symbols I'm referring to: $ strings test.wasm | grep namespace |
Note that I also have the global constructors issue mentioned in this post: https://groups.google.com/forum/#!topic/emscripten-discuss/8jXT3-vQWUQ In my (-Os optimized) build, the mangled strings represent +300Kb out of 7,3Mb in .wasm, and the Thanks, |
@vedadkajtaz Can you use code explorer, wabt, or hex viewer to tell which sections contain these strings/symbols? It can be debug/DWAFR sections, names, linking, or just something in data section strings. |
+1 to @yurydelendik's comment. Strings like that can come from multiple areas:
To see what's going on in your case, aside from exploring the wasm, you can look in the JS (maybe with |
Hello, Thanks for the quick replies. I've transformed the binary into a .wat file (my intent was to parse it, get rid and/or obfuscate the symbols, and re-transform into a .wasm, until a better solution is found). The global constructors are in the export section, eg:
Btw notice that 2/3rds of these point to the same function number (whose signature seems not to expect any arguments to distinguish the callers), so I guess there is room for optimization here as well. The mangled symbols are in the data section, unfortunately in the middle of segments, making it hard to parse, eg:
I've researched the issue, and if I understand correctly, these are RTTI symbols. We do need RTTI feature (for dynamic_cast), yet I'm surprised that the compiler really needs to retain all of those symbols. There are a couple of typeid() calls in the code, I'll try to get rid of them all and see if this still happens. No Also, I did try Regarding the JS, only the global constructors appear in the generated file, no mangled RTTI symbols. Basically, this is how it looks like:
followed by hundreds of others. Then, further on:
again, followed by hundreds of others. Thanks, |
Yeah, names in the data section are likely RTTI. They could also be something like an assert message, although I think that can only generate a string for the filename, not the function.
Interesting. I think what's going on there is that the duplicate function eliminator pass has merged those functions' implementations. How important is it for you to not have global constructor function names? We can remove those (by exporting a single "runGlobalConstructors" which calls them in wasm, then the only string would be the name of that singleton). It hasn't been a priority til now, but it shouldn't be too much work to do. |
Regarding the RTTI: the iOS version (which shares +95% of the code) production build exposes roughly the same mangled symbols (slightly more actually), so we may dismiss this as a not emscripten-specific issue. My attempt of removing all the I'll probably figure out a solution to obfuscate those in all platforms binaries (or decide that we don't care, but I doubt so).
Possible. This is the (obfuscated) excerpt from the .wat:
And the function definition:
So, the actual job seems to be done in 4995.
Well, it has double importance for this project: the size (both download and wasm compiling, which seems to grow exponentially with the .wasm file size on some browsers), and internals exposure (unless we decide that it doesn't matter, as stated above). Perhaps you could point me to the place in the code this process, and your "runGlobalConstructors" suggestion would take place, I might be able to help (unless it's python code, which I have no experience in). Thanks, your help is very appreciated. |
My current thought is that this could be done in binaryen's It would involve a little Python code, though, in emscripten's |
Thanks for the feedback, I'll take a look. FYI I've successfully implemented an rtti obfuscator for the .wasm and the asmjs .mem files (and will try applying it to the iOS and Android binaries as well). |
Hmm, is binaryen involved when building the asmjs target? The generated asmjs .js exposes exactly the same issue:
|
Binaryen is not used for asm.js, but the backed emits the same constructor list for both. But yeah, that means that if we optimize this in binaryen it would not help asm.js. For asm.js though, doing a text replacement to obfuscated names should be pretty easy. |
Indeed. |
Where does this take place? |
In asm2wasm, that's the Looking forward, I think it's more important to support asm2wasm + the wasm backend as opposed to asm2wasm + asm.js, so doing this once in binaryen seems simplest (+ some other solution for asm.js if needed in the meantime). Another reason for doing it in binaryen is that the wasm backend may add some complexity here - we probably can't just collapse all the ctors into a singleton when using wasm object files, in particular, as the ctors may need to be linked and reordered etc. later. So this can only happen in the very final linking stage (where binaryen runs). |
Hello, FYI I'm getting this error while running unmodified wasm-ctor-eval (triggered by
|
That's expected - evalling of ctors has to stop if it sees an import may be called, like |
Thanks, will try without SAFE_HEAP and will let you know. |
After disabling SAFE_HEAP, ASSERTIONS and STACK_OVERFLOW_CHECK, I get:
|
Invokes could be due to exceptions or setjmp. Very hard to optimize with those around. |
I see. I played a bit with the However, most (if not all) of the constructors lack any emscripten-specific code (hence no |
Actually, my debugging output was somehow truncated, hence the previous statement was not accurate. We're now down to 34 global constructors. Out of these 34, eval of:
I wonder whether the tool can safely recover from FailToEvalException thrown from instance.callExport() ? If so, we'd still get the benefit of getting rid of most of the global constructors in my case. Others could then be somehow merged into a single function call. |
It appears it was timing out. The |
This issue has been automatically marked as stale because there has been no activity in the past year. It will be closed automatically if no further activity occurs in the next 7 days. Feel free to re-open at any time if this issue is still relevant. |
If you are having something like this as the output of twiggy |
Hello,
not sure whether this is actually an emscripten or clang issue, but here it goes.
I cannot manage to find a consistent way of stripping the symbols from the produced .wasm file. Basically, we end up with all the namespace, class and function names publicly visible.
I've created a mini project to reproduce the issue (3 classes w/virtual methods).
Basically, providing -Os, -O2 or higher on one hand, and --llvm-lto 2 on the other, strips the symbols away.
However, it does not work for my actual project. Varying -O option somehow makes the number of visible symbols vary by couple of hundreds, but most of them still end up in the .wasm.
I have also tried -g0, --llvm-opts "['-strip-debug']", postprocessing with wasm-opt, but haven't found the way to strip those symbols.
Any ideas?
Thanks!
The text was updated successfully, but these errors were encountered: