Remove optimization for memory.copy(x, x, C) #3073

MaxGraey · 2020-08-23T16:03:25Z

It seems we couldn't simply remove memory.copy(x, x, C) even if all arguments are side effect free due to x value could exceeds memory bounds.

See: #3038 (comment)

MaxGraey · 2020-08-23T16:04:43Z

Start fuzzing.

MaxGraey · 2020-08-23T16:18:17Z

We can't just fold everything to nop. Unless you are until make sure:

all arguments are side effect free
dest + size < memory.size. But we can check this only in runtime for general case.

Or I miss something?

kripken · 2020-08-23T16:28:38Z

I think this is the implicit trap issue. When the instruction can trap, we can't optimize it out.

What I think we can do here is calculate the side effects of the entire node. Then since we know dst == src we can ignore the read and write of memory. Then we can check if any side effects remain, and only optimize if there are none.

MaxGraey · 2020-08-23T16:35:21Z

What I think we can do here is calculate the side effects of the entire node. Then since we know dst == src we can ignore the read and write of memory. Then we can check if any side effects remain, and only optimize if there are none.

That's sound quite complicated. But I have only one extremely radical solution) Yes, I mean update spec of validation for bulk-memories. If followed by spec there are no check for src == dst with early return. Instead of that it always check:

if (src + size > memory.size || dst + size > memory.size) {
  oob_trap();
}

what if we add one extra check for runtime?

if (src + size > memory.size || dst + size > memory.size) {
  if (src == dst) {
    return nop();
  } else {
    oob_trap();
  }
}

Or perform src == dst check even before (outside) main oop check.

in this case we could safely perform this optimization. But I'm not sure is it make sense) It's pretty radical solution. Especially due to bulk memory already standardized

kripken

Ok, let's land this for now to unbreak things, and consider other options.

kripken · 2020-08-23T19:33:11Z

I don't think it's that complicated myself, but up to you.

A spec change is possible to suggest, I have no idea of the chances there though.

MaxGraey · 2020-08-23T19:38:01Z

I don't think it's that complicated myself, but up to you.

Hmm, I just thought it required precise range check similar to getMaxBits but care about actual value's range. But if you see much more simpler way It will be nice to handle this. Just not sure all of this worth for such infrequent case.

tlively · 2020-08-24T05:20:10Z

Sorry for missing this in review. Changing the bulk memory spec would be infeasible at this point because the semantics of memory.copy have been discussed extensively already and the proposal has already moved to phase 4.

MaxGraey · 2020-08-24T07:24:24Z

Changing the bulk memory spec would be infeasible at this point because the semantics of memory.copy have been discussed extensively already and the proposal has already moved to phase 4

Yes, but we already have precedence with even more drastic change in reference-types on phase 4.

memory.copy have been discussed extensively

Also spec have one special case with skipping bounds check when size of copy bytes is zero: https://github.com/WebAssembly/bulk-memory-operations/blob/996ef3f7f9feaa53fd094050e7baeea4c5c0c361/interpreter/exec/eval.ml#L323

conrad-watt · 2020-08-24T09:21:33Z

More specifically, we already discussed these different possibilities for bulk memory bounds checking explicitly (see discussions linked from WebAssembly/bulk-memory-operations#126).

The motivation for the current semantics is that it's the simplest possible conditional check (i.e. max_offset+len > memory.size). Having a semantics where there's no trap if the size is 0 would require an additional check in some implementations.

~~If it's not possible to eliminate the op in this case, could it be rewritten to a bare byte load at the maximum byte index of the copy?~~
There might be an overflow issue.

EDIT: in the linked implementation, the zero special case is only executed if the bounds check succeeds (i.e. there is no trap; see the shift made here, combined with the when clause of the previous case https://github.com/WebAssembly/bulk-memory-operations/pull/126/files#diff-39718c053592dcf46dfea22c69b94f0eL313).

MaxGraey · 2020-08-24T09:34:40Z

I see and understand the main design goal is reduce runtime checking cost. But if all this special cases will do inside main oop check?

if (src + size > memory.size || dst + size > memory.size) {
  if (src == dst || size == 0) {
    return nop();
  }
  oob_trap();
}

conrad-watt · 2020-08-24T09:44:08Z

I see and understand the main design goal is reduce runtime checking cost. But if all this special cases will do inside main oop check?
if (src + size > memory.size || dst + size > memory.size) {
  if (src == dst || size == 0) {
    return nop();
  }
  oob_trap();
}

Maybe I'm not understanding, but in a runtime implementation with explicit bounds checks, I think those two checks would have to be performed independently of each other?

i.e. memory.copy(x, x, C): x+C could be in-bounds, but you would still need it to be a no-op in order to eliminate it.

MaxGraey · 2020-08-24T10:09:34Z

The main idea make memory.copy(x, x, C) -> nop possible even if x + C out of bound. Similar to memory.copy(x, x, 0) which it seems valid if x is out of bound as well. Or I not right understand how handling case with zero size?

Because if memory.copy(x, y, 0) should also trap if x > memory.size or y > memory.size. We should also remove another one optimization rule

Upd: found this discussion:
WebAssembly/bulk-memory-operations#111

conrad-watt · 2020-08-24T11:53:45Z

Because if memory.copy(x, y, 0) should also trap if x > memory.size or y > memory.size. We should also remove another one optimization rule

That is currently the case. The check is described in the proposal overview.

Upd: found this discussion:
WebAssembly/bulk-memory-operations#111

To give extra context, the current semantics were decided as a result of discussions in WebAssembly/bulk-memory-operations#124 (towards the end).

MaxGraey · 2020-08-24T11:59:34Z

Hmm, it seems in this case we should remove all reductions to nop in memory.copy optimization.
cc @kripken @tlively

conrad-watt · 2020-08-24T12:02:38Z

In the example you linked, would it be possible to preserve the trapping behaviour by replacing the 0-byte copy with a byte load on location max(x,y) + a drop?

MaxGraey · 2020-08-24T12:12:59Z

n the example you linked, would it be possible to preserve the trapping behaviour by replacing the 0-byte copy with a byte load on location max(x,y) + a drop?

It will significantly increase binary size due to we add i32.select, extra 2 local reads, one i32.gt, one load instruction and one drop instruction. So it doesn't make any sense unfortunately

this variant slightly better:

(module
  (memory 0)
  (func (export "a") (param i32 i32)
    local.get 0
    i32.load
    drop
    local.get 1
    i32.load
    drop
  )
)

But also add 1 extra byte compare to

(module
  (memory 0)
  (func (export "a") (param i32 i32)
    local.get 0
    local.get 1
    i32.const 0
    memory.copy
  )
)

conrad-watt · 2020-08-24T12:28:48Z

Ah I see. Do constant zero-byte copies appear often in Wasm code passed to binaryen? The C/C++/Rust -> Wasm stage would have much more flexibility in eliminating them (or even the Wasm -> platform stage in engines).

MaxGraey · 2020-08-24T12:48:53Z

Binaryen also used as prime codegenerator without LLVM for couple of languages. But even for LLVM it quite useful. emscripten, wasm-pack (Rust) and other toolkits or compilers use binaryen in final stage. The main goal of Binaryen is code size reduction. And you could always got mem.copy(x, x, s) or mem.copy(x, y, 0) cases after some optimizations passes like DCE, global / local simplification, inlining and etc.

remove optimization for memory.copy(x, x, C)

5cd2807

kripken approved these changes Aug 23, 2020

View reviewed changes

kripken merged commit 33ccea3 into WebAssembly:master Aug 23, 2020

MaxGraey deleted the fix-memcopy-2 branch August 23, 2020 19:32

MaxGraey mentioned this pull request Aug 24, 2020

Optimize bulk memory.copy #3038

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove optimization for memory.copy(x, x, C) #3073

Remove optimization for memory.copy(x, x, C) #3073

MaxGraey commented Aug 23, 2020

MaxGraey commented Aug 23, 2020

MaxGraey commented Aug 23, 2020 •

edited

Loading

kripken commented Aug 23, 2020

MaxGraey commented Aug 23, 2020 •

edited

Loading

kripken left a comment

kripken commented Aug 23, 2020

MaxGraey commented Aug 23, 2020 •

edited

Loading

tlively commented Aug 24, 2020

MaxGraey commented Aug 24, 2020

conrad-watt commented Aug 24, 2020 •

edited

Loading

MaxGraey commented Aug 24, 2020

conrad-watt commented Aug 24, 2020

MaxGraey commented Aug 24, 2020 •

edited

Loading

conrad-watt commented Aug 24, 2020

MaxGraey commented Aug 24, 2020

conrad-watt commented Aug 24, 2020 •

edited

Loading

MaxGraey commented Aug 24, 2020 •

edited

Loading

conrad-watt commented Aug 24, 2020 •

edited

Loading

MaxGraey commented Aug 24, 2020 •

edited

Loading

Remove optimization for memory.copy(x, x, C) #3073

Remove optimization for memory.copy(x, x, C) #3073

Conversation

MaxGraey commented Aug 23, 2020

MaxGraey commented Aug 23, 2020

MaxGraey commented Aug 23, 2020 • edited Loading

kripken commented Aug 23, 2020

MaxGraey commented Aug 23, 2020 • edited Loading

kripken left a comment

Choose a reason for hiding this comment

kripken commented Aug 23, 2020

MaxGraey commented Aug 23, 2020 • edited Loading

tlively commented Aug 24, 2020

MaxGraey commented Aug 24, 2020

conrad-watt commented Aug 24, 2020 • edited Loading

MaxGraey commented Aug 24, 2020

conrad-watt commented Aug 24, 2020

MaxGraey commented Aug 24, 2020 • edited Loading

conrad-watt commented Aug 24, 2020

MaxGraey commented Aug 24, 2020

conrad-watt commented Aug 24, 2020 • edited Loading

MaxGraey commented Aug 24, 2020 • edited Loading

conrad-watt commented Aug 24, 2020 • edited Loading

MaxGraey commented Aug 24, 2020 • edited Loading

MaxGraey commented Aug 23, 2020 •

edited

Loading

MaxGraey commented Aug 23, 2020 •

edited

Loading

MaxGraey commented Aug 23, 2020 •

edited

Loading

conrad-watt commented Aug 24, 2020 •

edited

Loading

MaxGraey commented Aug 24, 2020 •

edited

Loading

conrad-watt commented Aug 24, 2020 •

edited

Loading

MaxGraey commented Aug 24, 2020 •

edited

Loading

conrad-watt commented Aug 24, 2020 •

edited

Loading

MaxGraey commented Aug 24, 2020 •

edited

Loading