QuantizeLinear acceleration #2967

AlexandreEichenberger · 2024-10-04T16:08:31Z

Simple change that compute the reciprocal needed for the scale factor outside of the inner loop. Roughly cut down the time of quantize linear by a factor of 2.

Default is off, and this can be enabled with -O3 -enable-fast-math option, which at this time is only about the reciprocal for quantize linear and dynamic quantize linear.

Added a lit test with this option on.

At some time, we may want to turn this one on by default, but it breaks 2 backend tests (because the values are just at the border between 2 quantized values). I opened a PR in ONNX to explore if we can fix it at the source. [ https://github.com/onnx/onnx/issues/6433 ]

Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>

tungld · 2024-10-07T06:18:15Z

It seems there was precision loss in the backend test for dynamicquatizedlinear. I see this in Jenkins s390x:

ref_outputs = [array([153, 255,   0,  26, 221, 179], dtype=uint8), array(0.01960784, dtype=float32), array(153, dtype=uint8)]
outputs = [array([153, 255,   0,  25, 221, 179], dtype=uint8), array(0.01960784, dtype=float32), array(153, dtype=uint8)]

AlexandreEichenberger · 2024-10-07T15:23:49Z

Yes, and I wonder if we should attempt to change the test in ONNX to enable this optimization.

Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>

tungld · 2024-10-08T07:08:17Z

src/Conversion/ONNXToKrnl/Quantization/QuantizeLinear.cpp

+      !DISABLE_FAST_MATH_FOR_QL && isa<FloatType>(inputElementType);
+  if (useOneOverScale) {
+    Value one = create.math.constant(inputElementType, 1.0);
+    oneOverScale = create.math.div(one, scale);


I wonder whether we can have a little better performance if we prepare a vector here instead of a scalar. I guess in the loop create.math.mul(x, oneOverScale); will splat oneOverScale whose type is of scalar.

Good idea, I can check if the splat is migrated outside of the loop.

No Splat in the innermost loop, they are migrated out.

scf.for %arg3 = %c0_6 to %c65536_7 step %c16 { %8 = vector.load %reshape[%arg3] : memref<65536xf32>, vector<16xf32> %9 = arith.mulf %8, %6 : vector<16xf32> %10 = vector.shape_cast %9 : vector<16xf32> to vector<4x4xf32> %11 = vector.extract %10[0] : vector<4xf32> from vector<4x4xf32> %12 = "krnl.round_even"(%11) : (vector<4xf32>) -> vector<4xf32> %13 = vector.insert %12, %10 [0] : vector<4xf32> into vector<4x4xf32> %14 = vector.extract %10[1] : vector<4xf32> from vector<4x4xf32> %15 = "krnl.round_even"(%14) : (vector<4xf32>) -> vector<4xf32> %16 = vector.insert %15, %13 [1] : vector<4xf32> into vector<4x4xf32> %17 = vector.extract %10[2] : vector<4xf32> from vector<4x4xf32> %18 = "krnl.round_even"(%17) : (vector<4xf32>) -> vector<4xf32> %19 = vector.insert %18, %16 [2] : vector<4xf32> into vector<4x4xf32> %20 = vector.extract %10[3] : vector<4xf32> from vector<4x4xf32> %21 = "krnl.round_even"(%20) : (vector<4xf32>) -> vector<4xf32> %22 = vector.insert %21, %19 [3] : vector<4xf32> into vector<4x4xf32> %23 = vector.shape_cast %22 : vector<4x4xf32> to vector<16xf32> %24 = arith.addf %23, %7 : vector<16xf32> %25 = arith.maxnumf %24, %cst_0 : vector<16xf32> %26 = arith.minnumf %25, %cst : vector<16xf32> %27 = arith.fptoui %26 : vector<16xf32> to vector<16xi32> %28 = arith.trunci %27 : vector<16xi32> to vector<16xi8> %29 = builtin.unrealized_conversion_cast %28 : vector<16xi8> to vector<16xui8> vector.store %29, %reshape_5[%arg3] : memref<65536xui8>, vector<16xui8> }

Great! Thanks!

Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>

…er/onnx-mlir into round-opt-v2

Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>

AlexandreEichenberger · 2024-10-10T20:09:24Z

FYI, on z16 went from 95us to 14us when using the -enable-fast-math in combination with using the HW instruction for round. Without the HW instruction, enable-fast-math results in 37us.

AlexandreEichenberger · 2024-10-10T20:10:41Z

@tungld its ready for another review, made the flag optional for the moment.

tungld

LGTM!

jenkins-droid · 2024-10-11T09:33:20Z

Jenkins Linux amd64 Build #15832 [push] QuantizeLinear accelerat... started at 04:33

jenkins-droid · 2024-10-11T09:33:20Z

Jenkins Linux ppc64le Build #14862 [push] QuantizeLinear accelerat... started at 05:45

jenkins-droid · 2024-10-11T09:33:21Z

Jenkins Linux s390x Build #15835 [push] QuantizeLinear accelerat... started at 05:33

jenkins-droid · 2024-10-11T10:46:24Z

Jenkins Linux amd64 Build #15832 [push] QuantizeLinear accelerat... passed after 1 hr 13 min

jenkins-droid · 2024-10-11T11:09:41Z

Jenkins Linux s390x Build #15835 [push] QuantizeLinear accelerat... passed after 1 hr 36 min

jenkins-droid · 2024-10-11T11:56:23Z

Jenkins Linux ppc64le Build #14862 [push] QuantizeLinear accelerat... passed after 2 hr 23 min

AlexandreEichenberger added 9 commits October 1, 2024 09:59

use one over scale for quantize linear

e5d44b9

Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>

add krnl.round

dcd0fd4

Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>

update

23beab9

update

ac301f0

fix lit test because of 1/scale usage

cf2bd9e

Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>

try to use roundEven

17cff60

Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>

update

b85672b

undid effort for roundeven, kept just reciprocal

a6365a5

Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>

default on

fb70707

Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>

AlexandreEichenberger requested a review from tungld October 4, 2024 16:40

Merge branch 'main' into round-opt-v1

c0ce2d2

AlexandreEichenberger added 3 commits October 7, 2024 11:49

update

dea818e

first attempt with hw roundeven

5ac6305

Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>

cleaner interface

c7a39f5

Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>

tungld reviewed Oct 8, 2024

View reviewed changes

AlexandreEichenberger added 13 commits October 8, 2024 10:28

use shapecast

0c0a908

Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>

added lowering pattern for shapecast

b128e6e

Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>

fix merge issue, add reg pressure to quant lin

a022acf

Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>

update

e41eb81

Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>

update

e08351f

Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>

enable scalar fiebr

b8ebc2e

Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>

update

2d25e55

Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>

update

5f1999c

Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>

update

c6239bd

Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>

cleanup

9cd309b

Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>

cleanup

4f92d93

Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>

fix lit tests

e938dd0

Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>

added doc for new Krnl op

02e2f96

Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>

AlexandreEichenberger added 15 commits October 8, 2024 20:13

format

f620eb3

Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>

Merge branch 'main' into round-opt-v2

42040e7

update

887ccd5

Merge branch 'round-opt-v2' of https://github.com/AlexandreEichenberg…

5b6b867

…er/onnx-mlir into round-opt-v2

respond to comments

23031aa

Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>

reverted to using round even wiht emulation do to a mac os issue

3977369

Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>

update

404bb41

also reverted the non-z16 tests

9622cee

Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>

merge

47ac186

Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>

update

4fea80f

added a flag to optionally enable fast math

710e016

Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>

format

ef1e118

Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>

update unrolling for quanitze

e9eb2c5

Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>

update unrolling for quanitze

351ab5d

Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>

comments

69a6963

Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>

tungld approved these changes Oct 11, 2024

View reviewed changes

Merge branch 'main' into round-opt-v1

7b1a30c

tungld merged commit c3dbcf8 into onnx:main Oct 11, 2024
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

QuantizeLinear acceleration #2967

QuantizeLinear acceleration #2967

AlexandreEichenberger commented Oct 4, 2024 •

edited

Loading

tungld commented Oct 7, 2024

AlexandreEichenberger commented Oct 7, 2024

tungld Oct 8, 2024 •

edited

Loading

AlexandreEichenberger Oct 8, 2024 •

edited

Loading

AlexandreEichenberger Oct 10, 2024

tungld Oct 11, 2024

AlexandreEichenberger commented Oct 10, 2024

AlexandreEichenberger commented Oct 10, 2024

tungld left a comment

jenkins-droid commented Oct 11, 2024

jenkins-droid commented Oct 11, 2024

jenkins-droid commented Oct 11, 2024

jenkins-droid commented Oct 11, 2024

jenkins-droid commented Oct 11, 2024

jenkins-droid commented Oct 11, 2024

QuantizeLinear acceleration #2967

QuantizeLinear acceleration #2967

Conversation

AlexandreEichenberger commented Oct 4, 2024 • edited Loading

tungld commented Oct 7, 2024

AlexandreEichenberger commented Oct 7, 2024

tungld Oct 8, 2024 • edited Loading

Choose a reason for hiding this comment

AlexandreEichenberger Oct 8, 2024 • edited Loading

Choose a reason for hiding this comment

AlexandreEichenberger Oct 10, 2024

Choose a reason for hiding this comment

tungld Oct 11, 2024

Choose a reason for hiding this comment

AlexandreEichenberger commented Oct 10, 2024

AlexandreEichenberger commented Oct 10, 2024

tungld left a comment

Choose a reason for hiding this comment

jenkins-droid commented Oct 11, 2024

jenkins-droid commented Oct 11, 2024

jenkins-droid commented Oct 11, 2024

jenkins-droid commented Oct 11, 2024

jenkins-droid commented Oct 11, 2024

jenkins-droid commented Oct 11, 2024

AlexandreEichenberger commented Oct 4, 2024 •

edited

Loading

tungld Oct 8, 2024 •

edited

Loading

AlexandreEichenberger Oct 8, 2024 •

edited

Loading