This repository has been archived by the owner on Dec 22, 2021. It is now read-only.

simdsignunsignedextendedload.md #77

Closed

rrwinterton wants to merge 8 commits into WebAssembly:master from rrwinterton:patch-1

rrwinterton commented May 3, 2019

proposes 8, 16, 32 signed and unsigned extended load instructions.


          simdsignunsignedextendedload.md

f8b5081

proposes 8, 16, 32 signed and unsigned extended load instructions.

dtig reviewed

View reviewed changes

Member

dtig left a comment

Thanks for creating the PR! Please add the new proposed instructions to SIMD.md in their own section, and include them in BinarySIMD.md with proposed opcode numbers as well. The Spec should be architecture agnostic, so while the details about Intel/ARM instructions are useful, I think they are documented well in the existing documentation in the issue, and don't need to be in the Spec.

gnzlbg suggested changes

View reviewed changes

proposals/simd/simdsignandunsignedextendedloads.md Outdated


		Currently as proposed there is an instructions defined in the WASM SIMD ISA as follows.

		i8x16.mul which is a register to register operation that takes 16 8 bit integers and

Contributor

gnzlbg May 3, 2019 •

edited

Loading

The instruction multiples 32 8-bit integers, 16 integers from the first register with 16 integers from the second register, resulting in 16 8-bit integers.

proposals/simd/simdsignandunsignedextendedloads.md Outdated


		i8x16.mul which is a register to register operation that takes 16 8 bit integers and

		multiplies them together resulting in an 8 bit value. If the distribution of the integers it flat this

Contributor

gnzlbg May 3, 2019

typo: it -> is

Also what do you mean with the distribution of the integers being flat? And even if that is the case, does it always result in overflow? e.g., if all integers are 0, then the distribution would be as flat as it gets but no overflow would happen.

proposals/simd/simdsignandunsignedextendedloads.md Outdated


		multiplies them together resulting in an 8 bit value. If the distribution of the integers it flat this

		would result in a large percent of the instructions with overflow. This is a problem for many applications.

Contributor

gnzlbg May 3, 2019 •

edited

Loading

i8x16.mul implements wrapping integer multiplication, therefore, I don't see why overflow there is a problem (as in the behavior of these on overflow is well-defined, its neither an error nor UB), and wrapping multiplies is a very common and desirable operation to provide (the text reads a bit like "wrapping multiply is bad").

proposals/simd/simdsignandunsignedextendedloads.md Outdated


		### Proposed new instructions

		Six new load instructions are being proposed to make integer multiplies easier. i8x16zxload, i8x16sxload, i16x8zxload, i16x8sxload, i32x4zxload, i32x4sxload. This would make i8, i16, i32 multiplies useful and more practical for applications such as machine learning, image compression and video and rendering data processing.The new instructions would take consecutive integers of the corresponding size and zero sign extend and sign extend the consecutive bytes, words or dword to the promoted size of signed or unsigned data. An example of zero sign extend is shown below: Intel and ARM both have this capability by doing the following:

Contributor

gnzlbg May 3, 2019

Nowhere does the text say what the proposed instructions actually do, what are their argument types, results, operational semantics, etc.

proposals/simd/simdsignandunsignedextendedloads.md Outdated


		and unsigned overflow on the data it operates on.

		The following is a partial sample example of how sign extended loads are be used in a matrix multiply of 8 bit integers:

Contributor

gnzlbg May 3, 2019

Please show WASM instead.

gnzlbg reviewed

View reviewed changes

proposals/simd/simdsignandunsignedextendedloads.md Outdated

+              *   i32x2.zxload
+              *   i8x8.sxload
+              *   i16x4.sxload
+              *   i32x2.sxload

Contributor

gnzlbg May 3, 2019

All these types are 64-bit wide vector types, but the spec does not define v64 anywhere.

Contributor

gnzlbg May 3, 2019

I find the document and the issue quite unclear, so I might be completely misunderstanding what problem this is trying to address, but if what you want is to be able load, e.g., 4 x 16-bit integers into a v128 containing 4 x 32-bit integers by sign or zero extending them, then I would prefer something that's more similar to the scalar intrinsics, e.g., like i32x4.load_i16x4_s(memarg) -> v128.

Member

dtig Jul 31, 2019 •

edited

Loading

I think the intent here is to introduce something equivalent to the intel pmovzx(b/d/q) instruction, so the operation would be as follows -

Packed_Zero_Extend_BYTE_to_WORD(DEST, SRC)
DEST[15:0] <- ZeroExtend(SRC[7:0]);
DEST[31:16] <- ZeroExtend(SRC[15:8]);
DEST[47:32] <- ZeroExtend(SRC[23:16]);
DEST[63:48] <- ZeroExtend(SRC[31:24]);
DEST[79:64] <- ZeroExtend(SRC[39:32]);
DEST[95:80] <- ZeroExtend(SRC[47:40]);
DEST[111:96] <- ZeroExtend(SRC[55:48]);
DEST[127:112] <- ZeroExtend(SRC[63:56]);

So it's not strictly a V128->V128 operation. I do agree that this would benefit from clearer names instead of zx/sx convention that's not very readable.

gnzlbg reviewed

View reviewed changes

proposals/simd/simdsignandunsignedextendedloads.md Outdated

+                     "pmaddwd %%xmm0, %%xmm3         \n\t"
+                     "paddd %%xmm2, %%xmm4           \n\t"
+                     "paddd %%xmm3, %%xmm5           \n\t"

Contributor

gnzlbg May 3, 2019

typo: backticks missing

gnzlbg reviewed

View reviewed changes

proposals/simd/simdsignandunsignedextendedloads.md Outdated

		* i32x2.sxload

		As a result of these new instructions a multiply can now be done without worrying about signed

Contributor

gnzlbg May 3, 2019

typo: line break

gnzlbg reviewed

View reviewed changes

proposals/simd/simdsignandunsignedextendedloads.md Outdated

		i8x16.mul which is a register to register operation that takes 16 8 bit integers and

		multiplies them together resulting in an 8 bit value. If the distribution of the integers it flat this

Contributor

gnzlbg May 3, 2019

typo: line break

gnzlbg reviewed

View reviewed changes

proposals/simd/simdsignandunsignedextendedloads.md Outdated

		Currently as proposed there is an instructions defined in the WASM SIMD ISA as follows.

		i8x16.mul which is a register to register operation that takes 16 8 bit integers and

Contributor

gnzlbg May 3, 2019

typo: line break

Contributor

gnzlbg commented May 3, 2019

Please link the issue (#28) and the rendered document in the top-level comment.

tlively mentioned this pull request

Add opcodes for extending different size integer signed and unsigned loads #83

Closed

penzn added 4 commits

July 17, 2019 10:02


          Add extended load definitions to SIMD.md

685e1eb


          Add extended load definitions to BinarySIMD.md


          Create docs directory

c932ed9


          Implementaiton status for extended loads

fd955ac

penzn mentioned this pull request

Add extended load definitions to SIMD.md rrwinterton/simd#1

Merged


          Merge pull request #1 from penzn/extended-load

9bb4698

Add extended load definitions to SIMD.md

tlively mentioned this pull request

This is to assign the opcodes for the load extend instructions #86

Closed

penzn and others added 2 commits

July 23, 2019 14:30


          Extended loads: 32->64 bit ops

429bacf


          Merge pull request #3 from penzn/extended-load

20f4468

Extended loads: 32->64 bit ops

Contributor

penzn commented Jul 27, 2019

I believe we have populated the opcodes and instruction descriptions in this PR. One thing to note is that it implements #28 and not #21 (latter proposes vector to vector operations).

dtig suggested changes

View reviewed changes

proposals/simd/docs/SIMD-sign-and-zero-extended-loads.md




		* movzxbw

Member

dtig Jul 31, 2019

Did you mean to include the packed versions of these instructions (ex: pmovzxbw/pmovsxbw)? Please link the exact instruction you mean if that's not the one.

proposals/simd/docs/SIMD-sign-and-zero-extended-loads.md

		@@ -0,0 +1,64 @@
		### Proposal WebAssembly SIMD Modification

Member

dtig Jul 31, 2019

Can this document be removed? While this is useful information, I don't see a need for this to be included here.

Contributor

penzn Aug 1, 2019

Do you want it to be included in the body of the PR or just simply remove it? @rrwinterton, what do you think?

proposals/simd/SIMD.md

		@@ -666,6 +666,15 @@ natural alignment.

		Load a `v128` vector from the given heap address.

		Extended loads:

Member

dtig Jul 31, 2019

Can this be made clear here that this is not operating on 64bit widths widening to a 128-bit vector? Having a concise description here will be useful.

Contributor

penzn Aug 1, 2019

Do you mean to clarify that we are extending every lane, rather than widening a 64-bit value?

Contributor

penzn commented Aug 20, 2019

The description of the change (so that we can delete the file):

Proposed new instructions

Six new load instructions are being proposed to make integer multiplies easier. i8x16.zxload, i8x16.sxload, i16x8.zxload, i16x8.sxload, i32x4.zxload, i32x4.sxload. This would make i8, i16, i32 multiplies useful and more practical for applications such as machine learning, image compression and video and rendering data processing. The new instructions would take consecutive integers of the corresponding size and zero sign extend and sign extend the consecutive bytes, words or dword to the promoted size of signed or unsigned data. An example of zero sign extend is shown below: Intel and ARM both have this capability by doing the following:

Intel Instructions:

movzxbw
movzxwd
movzxdq
movsxbw
movsxwd
movsxdq

ARM Instructions:

LDR X0, [X1] Load from the address in X1
LDR X0, [X1, #8] Load from address X1 + 8
LDR X0, [X1, X2] Load from address X1 + X2
LDR X0, [X1, X2, LSL, #3] Load from address X1 + (X2 << 3)
LDR X0, [X1, W2, SXTW] Load from address X1 + sign extend(W2)
LDR X0, [X1, W2, SXTW, #3] Load from address X1 + (sign extend(W2) << 3)

So the new instructions for WASM would be defined as follows:

i8x8.zxload
i16x4.zxload
i32x2.zxload
i8x8.sxload
i16x4.sxload
i32x2.sxload

As a result of these new instructions a multiply can now be done without worrying about signed

and unsigned overflow on the data it operates on.

The following is a partial sample example of how sign extended loads are be used in a matrix multiply of 8 bit integers:

       "pmovzxbw 0x00(%[mem]), %%xmm0\n\t"
       "pshufd $0x00,%%xmm1,%%xmm2     \n\t"
       "pshufd $0x55,%%xmm1,%%xmm3     \n\t"
       "pmaddwd %%xmm0, %%xmm2         \n\t"
       "pmaddwd %%xmm0, %%xmm3         \n\t"
       "paddd %%xmm2, %%xmm4           \n\t"
       "paddd %%xmm3, %%xmm5           \n\t"

penzn added a commit to penzn/simd that referenced this pull request


          Incorporate review suggestions

d5edaaf

Clarify what are the inputs and outputs to extended loads, remove the
file with the description (move it to issue comments).

For WebAssembly#77

penzn mentioned this pull request

Incorporate review suggestions rrwinterton/simd#4

Open

Contributor

AndrewScheidecker commented Aug 21, 2019

These names would be more consistent with the scalar extending load instructions (i.e. i32.load8_s):

original	alternative 1	alternative 2
`i8x8.zxload`	`i16x8.load8x8_u`	`i16x8.load8_u`
`i16x4.zxload`	`i32x4.load16x4_u`	`i32x4.load16_u`
`i32x2.zxload`	`i64x2.load32x2_u`	`i64x2.load32_u`
`i8x8.sxload`	`i16x8.load8x8_s`	`i16x8.load8_s`
`i16x4.sxload`	`i32x4.load16x4_s`	`i32x4.load16_s`
`i32x2.sxload`	`i64x2.load32x2_s`	`i64x2.load32_s`

Member

tlively commented Aug 21, 2019

Thanks for those suggestions, @AndrewScheidecker! I personally prefer alternative 1 because they are more explicit about how much memory is actually being loaded.

Contributor

penzn commented Aug 21, 2019

I like that scheme as well, incorporated in rrwinterton#4

penzn mentioned this pull request

August sync agenda #97

Closed

penzn pushed a commit to penzn/simd that referenced this pull request


          Introduce Load and Extend

00aebcf

Incorporate review suggestions from WebAssembly#77

penzn pushed a commit to penzn/simd that referenced this pull request


          Introduce Load and Extend

408027c

Incorporate review suggestions from WebAssembly#77

penzn mentioned this pull request

Introduce Load and Extend #98

Merged

dtig closed this

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet