-
Notifications
You must be signed in to change notification settings - Fork 745
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove optimization for memory.copy(x, x, C) #3073
Conversation
Start fuzzing. |
We can't just fold everything to nop. Unless you are until make sure:
Or I miss something? |
I think this is the implicit trap issue. When the instruction can trap, we can't optimize it out. What I think we can do here is calculate the side effects of the entire node. Then since we know |
That's sound quite complicated. But I have only one extremely radical solution) Yes, I mean update spec of validation for bulk-memories. If followed by spec there are no check for if (src + size > memory.size || dst + size > memory.size) {
oob_trap();
} what if we add one extra check for runtime? if (src + size > memory.size || dst + size > memory.size) {
if (src == dst) {
return nop();
} else {
oob_trap();
}
} Or perform in this case we could safely perform this optimization. But I'm not sure is it make sense) It's pretty radical solution. Especially due to bulk memory already standardized |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, let's land this for now to unbreak things, and consider other options.
I don't think it's that complicated myself, but up to you. A spec change is possible to suggest, I have no idea of the chances there though. |
Hmm, I just thought it required precise range check similar to getMaxBits but care about actual value's range. But if you see much more simpler way It will be nice to handle this. Just not sure all of this worth for such infrequent case. |
Sorry for missing this in review. Changing the bulk memory spec would be infeasible at this point because the semantics of memory.copy have been discussed extensively already and the proposal has already moved to phase 4. |
Yes, but we already have precedence with even more drastic change in reference-types on phase 4.
Also spec have one special case with skipping bounds check when size of copy bytes is zero: https://github.com/WebAssembly/bulk-memory-operations/blob/996ef3f7f9feaa53fd094050e7baeea4c5c0c361/interpreter/exec/eval.ml#L323 |
More specifically, we already discussed these different possibilities for bulk memory bounds checking explicitly (see discussions linked from WebAssembly/bulk-memory-operations#126). The motivation for the current semantics is that it's the simplest possible conditional check (i.e.
EDIT: in the linked implementation, the zero special case is only executed if the bounds check succeeds (i.e. there is no |
I see and understand the main design goal is reduce runtime checking cost. But if all this special cases will do inside main oop check? if (src + size > memory.size || dst + size > memory.size) {
if (src == dst || size == 0) {
return nop();
}
oob_trap();
} |
Maybe I'm not understanding, but in a runtime implementation with explicit bounds checks, I think those two checks would have to be performed independently of each other? i.e. |
The main idea make Because if Upd: found this discussion: |
That is currently the case. The check is described in the proposal overview.
To give extra context, the current semantics were decided as a result of discussions in WebAssembly/bulk-memory-operations#124 (towards the end). |
In the example you linked, would it be possible to preserve the trapping behaviour by replacing the 0-byte copy with a byte load on location |
It will significantly increase binary size due to we add this variant slightly better: (module
(memory 0)
(func (export "a") (param i32 i32)
local.get 0
i32.load
drop
local.get 1
i32.load
drop
)
) But also add 1 extra byte compare to (module
(memory 0)
(func (export "a") (param i32 i32)
local.get 0
local.get 1
i32.const 0
memory.copy
)
) |
Ah I see. Do constant zero-byte copies appear often in Wasm code passed to binaryen? The C/C++/Rust -> Wasm stage would have much more flexibility in eliminating them (or even the Wasm -> platform stage in engines). |
Binaryen also used as prime codegenerator without LLVM for couple of languages. But even for LLVM it quite useful. emscripten, wasm-pack (Rust) and other toolkits or compilers use binaryen in final stage. The main goal of Binaryen is code size reduction. And you could always got |
It seems we couldn't simply remove
memory.copy(x, x, C)
even if all arguments are side effect free due tox
value could exceeds memory bounds.See: #3038 (comment)