-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Calling wasm function pointers from JS regressions #12733
Comments
I agree with most of this. We should have done better to avoid breaking external library authors who use this API. A few notes:
|
I think it would be great if we could find a way to completely remove the |
Ah gotcha - did not look deep into that.
That is not accurate, for several reasons:
function dynCall(sig, ptr, args) {
// Without WASM_BIGINT support we cannot directly call function with i64 as
// part of thier signature, so we rely the dynCall functions generated by
// wasm-emscripten-finalize
if (sig.indexOf('j') != -1) {
return dynCallLegacy(sig, ptr, args);
}
assert(wasmTable.get(ptr), 'missing table entry in dynCall: ' + ptr);
return wasmTable.get(ptr).apply(null, args)
} That is, when not using WASM_BIGINT support, all dynamic signature dispatches first do a substring search, and that is already slower than the performance of static signature dispatch.
Benchmarking dynamic dispatch vs static dispatch, in current #include <stdio.h>
#include <emscripten.h>
void foo(int param1, int param2)
{
}
typedef void (*fptr)(int param1, int param2);
EM_JS(int, callFunc, (fptr func, int param1, int param2), {
// Do periodic testing, maybe a JIT only kicks in after having yielded back to main loop?
setInterval(function() {
// var t0 = process.hrtime();
var t0 = performance.now();
for(var i = 0; i < 1000000; ++i) {
dynCall('vii', func, [param1, param2]);
dynCall('vii', func, [param1, param2]);
dynCall('vii', func, [param1, param2]);
dynCall('vii', func, [param1, param2]);
dynCall('vii', func, [param1, param2]);
dynCall('vii', func, [param1, param2]);
dynCall('vii', func, [param1, param2]);
dynCall('vii', func, [param1, param2]);
}
// var t1 = process.hrtime();
var t1 = performance.now();
// console.log('8 million calls took ' + ((t1[0]-t0[0])*1e3 + (t1[1]-t0[1])/1e6) + ' msecs.');
console.log('8 million calls took ' + (t1-t0) + ' msecs.');
}, 1000);
});
int main()
{
callFunc(foo, 42, 100);
} and in Emscripten 2.0.1 with static dispatch: #include <stdio.h>
#include <emscripten.h>
void foo(int param1, int param2)
{
}
typedef void (*fptr)(int param1, int param2);
EM_JS(int, callFunc, (fptr func, int param1, int param2), {
// Do periodic testing, maybe a JIT only kicks in after having yielded back to main loop?
setInterval(function() {
//var t0 = process.hrtime();
var t0 = performance.now();
for(var i = 0; i < 1000000; ++i) {
dynCall_vii(func, param1, param2);
dynCall_vii(func, param1, param2);
dynCall_vii(func, param1, param2);
dynCall_vii(func, param1, param2);
dynCall_vii(func, param1, param2);
dynCall_vii(func, param1, param2);
dynCall_vii(func, param1, param2);
dynCall_vii(func, param1, param2);
}
//var t1 = process.hrtime();
var t1 = performance.now();
// console.log('8 million calls took ' + ((t1[0]-t0[0])*1e3 + (t1[1]-t0[1])/1e6) + ' msecs.');
console.log('8 million calls took ' + (t1-t0) + ' msecs.');
}, 1000);
});
int main()
{
callFunc(foo, 42, 100);
} I get the following results: Dynamic dispatch Emscripten
Static dispatch Emscripten
So function pointer calls have become ~7.23x slower in Node.js, ~23.1x times slower in Firefox, and ~7.83x slower in Chrome. This is not using I would recommend strongly downplaying dynamic signature dispatch, it should only be used when one really has a dynamic dispatch problem. When one has static knowledge of the signature, static dispatch should always be used. This should be the case even for "cold" code, since many applications do not need to ever do any dynamic dispatch, so can avoid codegenning that function in in the first place.
That does not seem to be the case, or the "where they are needed" bit has somehow changed? See the test case in the first comment. In old Emscripten 2.0.1 and older, the fast static |
This part I was not aware of. @kripken are you aware of this? One good thing is that don't currently have any dyanmic calls with i64 signatures (at least not any emscripten core). embind is one place they can arise. Regarding your performance number, do you still see a slowdown when comparing just:
to:
The work I've doing has been assuming these a roughly equivalent in terms of performance (I mean logically they are the same right?) . Hopefully change to optimize makeDynCall will reduce it to just |
Presumably callers of |
I don't know what WASM_BIGINT does in Emscripten. Does it translate C++ Generally speaking, on the engine side, Wasm-BigInt integration being disabled means that Wasm functions with i64 parameters/results cannot be exported to or imported from JavaScript (the attempt will fail validation). Turning on Wasm-BigInt integration does not affect the performance of any functions that don't use i64 parameters/results. So overall I'm surprised by the claim that "WASM_BIGINT is detrimental to performance", but don't know enough to outright dispute it. |
Sorry I should have explained emscripten historically converts i64 wasm paramters to pairs of i32s. This involves a binaryen pass to re-write all such imports/exports. On the JS side we then decompose numbers into pairs of numbers before calling such functions. When |
I have done some profiling on Overall I think it's rare for i64s to make a difference at all to speed. And |
The point about Let me try to get the conversation back on track to the real problem I wanted to propose. The issue is that static signature dispatch @sbc100 mentioned that it should still work. But it does not (try the code out!). He also mentioned that one should use dynamic signature dispatch Is there a way that we can get static signature dispatch back with the See also the comment in #12732 (comment). |
Benchmarking #include <stdio.h>
#include <emscripten.h>
void foo(int param1, int param2)
{
}
typedef void (*fptr)(int param1, int param2);
EM_JS(int, callFunc, (fptr func, int param1, int param2), {
// Do periodic testing, maybe a JIT only kicks in after having yielded back to main loop?
setInterval(function() {
// var t0 = process.hrtime();
var t0 = performance.now();
for(var i = 0; i < 1000000; ++i) {
wasmTable.get(func)(param1, param2);
wasmTable.get(func)(param1, param2);
wasmTable.get(func)(param1, param2);
wasmTable.get(func)(param1, param2);
wasmTable.get(func)(param1, param2);
wasmTable.get(func)(param1, param2);
wasmTable.get(func)(param1, param2);
wasmTable.get(func)(param1, param2);
}
// var t1 = process.hrtime();
var t1 = performance.now();
// console.log('8 million calls took ' + ((t1[0]-t0[0])*1e3 + (t1[1]-t0[1])/1e6) + ' msecs.');
console.log('8 million calls took ' + (t1-t0) + ' msecs.');
}, 1000);
});
int main()
{
callFunc(foo, 42, 100);
} with
yields the following results:
Compared to static signature dispatch in Emscripten 2.0.1,
It is noticeable though that I would recommend that |
Interestingly, benchmarking #include <stdio.h>
#include <emscripten.h>
void foo(int param1, int param2)
{
}
typedef void (*fptr)(int param1, int param2);
EM_JS(int, callFunc, (fptr func, int param1, int param2), {
// Do periodic testing, maybe a JIT only kicks in after having yielded back to main loop?
setInterval(function() {
var t0 = process.hrtime();
// var t0 = performance.now();
var f = wasmTable.get(func);
for(var i = 0; i < 1000000; ++i) {
f(param1, param2);
f(param1, param2);
f(param1, param2);
f(param1, param2);
f(param1, param2);
f(param1, param2);
f(param1, param2);
f(param1, param2);
}
var t1 = process.hrtime();
// var t1 = performance.now();
console.log('8 million calls took ' + ((t1[0]-t0[0])*1e3 + (t1[1]-t0[1])/1e6) + ' msecs.');
// console.log('8 million calls took ' + (t1-t0) + ' msecs.');
}, 1000);
});
int main()
{
callFunc(foo, 42, 100);
} gives
with Firefox being super-fast. So in cases when one wants to call a function several times instead of just once, binding it beforehand via |
Wow, thanks for benchmarking that. 6-15x slowdown over the wasm-side dynCalls is more than I would have imagined. Do you think there are real world programs that make dynCalls frequently enough to be impacted by this slowdown though? How common are JS->wasm dyncalls in real world programs? @kripken do you think its work re-enabling the native dynCalls for optimizing builds? |
Very interesting data! Good you found this @juj, we did not appreciate how big a difference this makes. It looks like that supports that It also looks like even in a loop the VM doesn't realize the gets will all return the same thing (which I guess they might not, if the wasm can modify the table, which in the future it might). @jakobkummerow , thoughts on those? The data also supports that caching the result of One possibility is to mirror the table into a normal JS object. We do have full knowledge of when the table is modified ( // A mirror of wasmTable.
var tableGetTable = [];
function tableGet(index) {
// Grow the mirror if we need to.
if (index >= tableGetTable.length) {
tableGetTable.length = index + 1;
}
// Fetch from the table if we need to.
if (!tableGetTable[index]) {
tableGetTable[index] = wasmTable.get(index);
}
return tableGetTable[index];
} That takes around 100ms for me, which is pretty close to what I get with just direct calls in the loop of a cached function (70, and which is definitely the upper limit, and not something we can likely achieve in general), and much faster than constantly calling The main downsides to this may be that it adds a little code size (at least fixed, not per-sig), and that it needs some integration with Thoughts? |
That sounds like a good solution. Nice that its actually a perf win over calling into wasm, that is great news. We already maintain The codesize regression should be balanced by the codesize wins we got when we removed the corresponding wasm functions. |
@kripken I don't find it surprising that So +1 to custom caching when Emscripten knows statically that repeated table accesses are unnecessary. |
Just to be clear the comparison here is between It seems like the checks and assumptions would logically be the same in both cases no? In both cases the args have to be converted to wasm types, and in both cases all the table access checks need to be performed. |
Looking at this today, this regression still persists and is blocking us from updating. Testing the sample code in the first comment throws
Would you be able to work on fixing this regression @sbc100 @kripken ? |
See my proposal in #12733 (comment) - is that an acceptable solution for you? It should address the speed issue. However, it doesn't automatically make |
The issue is unfortunately not about speed regression, but about the breakage of the usage of Also, it is unclear to me what the replacement API for |
Thought I would share what I did in Magnum, in a hope it helps resolving this regression. I managed to work around this by doing what The other attempt that worked was passing |
A direct table access, I think. For example, for function dynCall_vi(ptr, a0) {
wasmTable.get(ptr)(a0);
} And that can be optimized using We could perhaps add an option to emit |
Yes I think we can add some kind of backward compat option where we generate those little JS helpers. The previous behaviour was that binaryen would generate them for all sigs so we can probably do the same. Is there some way we can opt into this behaviour? Perhaps only do it if we see the string |
That might be slightly brittle, I worry, and also more complex. An option to opt in to creating them sounds better to me personally. |
Perhaps we could generate them by default for a few releases two and include a debug message if they get used:
|
@sbc100 I think we've already removed the functionality for a while, so restoring it now for a few releases isn't that good? That is, there will always be a range in history without it. That's unfortunate, but it's what we have. I think it's good enough to add a new option going forward. If that sounds good I can work on a PR? |
That would be great. Do you want to add the getTable optimization at the same time? |
I started working on a PR, and would like to propose a way to solve the different issues that we are seeing. Check out #13289 (comment) , and in particular https://github.com/emscripten-core/emscripten/pull/13289/files#diff-7edea7d5dbf970f71db122fcaaf14d6458c48594c1a6366a9ee37216b9b73304R5 for the set of APIs that I propose. Parts of the PR are still wip unfinished, I wanted to float this conversation over to you first before spending time to finish it all. |
Hi guys, how can I use the new {{{ }}} construct on pre or post js files? It seems like I can only use this on .jslib files. Thanks! |
pre and post JS files don't currently run through our pre-processor. I tried to enable this back in #18525 but I had to revert it because it broke some folks. If you would like to enable this, or have an option to use it, please open a new bug |
I created one here. |
I believe these issues were addressed in #13296. Can this issue be closed now? |
Yeah, I think so. |
There are at least two code syntax regressions that occurred due to #12059 .
dynCall_sig(funcPtr, arg1, arg2, ...)
no longer works.E.g.
a.c
library_a.js
It looks like the intended replacement is the several orders of magnitude slower dynamic
dynCall('sig', funcPtr, [arg1, arg2, ...]);
dispatch(?), but that is not good enough, and should not be used by anyone, unless they are really really in a polymorphic situation with their function pointer signature.dynCall()
is currently not available in-s MINIMAL_RUNTIME=1
builds.{{{ makeDynCall('sig') }}}(funcPtr, arg1, arg2, ...);
No longer works. Added test and fix for that in PR Fix regression that broke calls to makeDynCall in JS library code. #12732 .The text was updated successfully, but these errors were encountered: