[Paddle Inference] refactor linear_compress #55490

lizhenyun01 · 2023-07-17T17:42:16Z

PR types

Others

PR changes

Others

Description

重构linear_compress API,分为weight_only_linear和llm_int8_linear，以及量化weight的quant_for_infer(cpu)

weight_only_linear:融合weight_only int8/int4 gemm/gemv计算(gemv_int4待支持），会根据x的shape自动选择gemm/gemv
llm_int8_linear：llm.int8 gemm 增加了add bias支持
quant_for_infer：将weight量化为weight_only/llm.int8需要的格式
API设计文档
Pcard-74466

paddle-bot · 2023-07-17T17:42:20Z

你的PR提交成功，感谢你对开源项目的贡献!
请关注后续CI自动化测试结果，详情请参考Paddle-CI手册。
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

vivienfanghuagood · 2023-07-18T07:39:36Z

paddle/phi/kernels/funcs/weight_only_gemv.cu

+  const float* weight_scale_data = weight_scale.data<float>();
+  T* out_data = dev_ctx.template Alloc<T>(out);
+
+  int64_t m = 1;


vivienfanghuagood · 2023-07-18T07:41:35Z

paddle/phi/kernels/funcs/weight_only_gemv.cu

+
+  int64_t m = 1;
+  int64_t n = 1;
+  int64_t k = 1;


这是声明一个int_64再强制类型转换？看起来有点奇怪

vivienfanghuagood · 2023-07-18T07:44:14Z

paddle/phi/kernels/funcs/weight_only_gemv.cu

+#endif
+}
+
+template <typename T, bool Enable>


Enable -> EnableFastGelu may be better？

qingqing01 · 2023-07-18T08:27:23Z

paddle/phi/infermeta/unary.cc

@@ -3103,6 +3103,48 @@ void QrInferMeta(const MetaTensor& x,
  r->set_dtype(x.dtype());
 }

+void QuantForCompressInferMeta(const MetaTensor& x,


注意这里命名 Quant已经代表Compress了

qingqing01 · 2023-07-18T08:31:19Z

python/paddle/incubate/nn/functional/quantized_matmul.py

+from paddle.framework import in_dynamic_mode
+
+
+def quant_for_compress(x, layout="weight_only_int8"):


同上，这个API命名不是很合适

qingqing01 · 2023-07-18T08:31:59Z

python/paddle/incubate/nn/functional/quantized_matmul.py

+        return (out, scale)
+
+
+def quantized_matmul(


还是有Class的API?

qingqing01 · 2023-07-18T08:44:44Z

python/paddle/incubate/nn/functional/quantized_matmul.py

+    weight,
+    bias=None,
+    weight_scale=None,
+    quant_method="None",


这里有weight，bias的概念是命名是Linear，而不是matmul

paddle-ci-bot · 2023-07-27T03:09:08Z

Sorry to inform you that 5c5a1da's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually.

qingqing01 · 2023-08-09T10:02:16Z

paddle/phi/kernels/gpu/weight_only_linear_kernel.cu

+      }
+    }
+#else
+    LOG(ERROR) << "Please compile with cutlass to EnableUseCutlass()";


PADDLE_THROW(phi::errors::Unimplemented())

EnableUseCutlass 请用正确的语法

qingqing01 · 2023-08-09T10:05:09Z

python/paddle/nn/quant/quantized_linear.py

+    Args:
+        x (Tensor): The input Tensor to be quantized .
+        layout (str|None): The layout of Tensor is quantized, must be one of 'weight_only_int8',
+            'weight_only_int4' and 'llm.int8', default: 'weight_only_int8'.


layout含义不准确，weight_only_int8等都是量化类型，不是Layout

这个API和下面分开的API的区别是啥？

quant_for_infer这个op是原本用来量化weight到weight_only和llm.int8需要的layout的，只处理weight用

quant_for_infer -> weight_quantize;
layout -> algo

qingqing01 · 2023-08-09T10:06:29Z

python/paddle/nn/quant/quantized_linear.py

+from paddle import _C_ops
+from paddle.fluid.data_feeder import check_variable_and_dtype
+from paddle.fluid.layer_helper import LayerHelper
+from paddle.framework import in_dynamic_mode


非必要import不用fluid下面的内容

qingqing01 · 2023-08-09T10:06:47Z

test/legacy_test/test_llm_int8_linear.py

@@ -0,0 +1,309 @@
+# Copyright (c) 2023 PaddlePaddle Authors. All Rights Reserved.


单测不用放到 legacy_test下面

移到了quantization目录

heavyrain-lzy

LGTM for yaml

jzhang533

LGTM for API change
看起来paddle.nn.LinearCompress未曾release过，兼容性是没问题的。

jzhang533 · 2023-08-21T03:16:54Z

python/paddle/nn/quant/quantized_linear.py

+            bias = paddle.cast(paddle.randn([32]), dtype='float16')
+            if paddle.device.cuda.get_device_capability()[0] >= 8:
+                out = llm_int8_linear(x, weight, bias=bias, weight_scale=scale, threshold=6.0)
+                print(out.shape) # [1, 2, 32]


这三个paddle.cast，让示例代码看着好别扭。

这个是必须Ampere才可以用吧？文档里写的是要求cuda version >= 11.2，示例代码里检查的是compute capability，多少有些困惑。

compute capability也是需要的，主要是因为CI环境限制，之后看看能不能优化写法

zhangbo9674

LGTM for use print

* Modify kernels to support quantized_matmul --------- Co-authored-by: superxf <1208713646@qq.com>

lizhenyun01 added 2 commits July 18, 2023 01:35

Modify kernels to support quantized_matmul

710883a

refactor linear_compress as quantized_matmul

3316834

lizhenyun01 added 2 commits July 18, 2023 13:07

add doc and test

5c5a1da

Merge remote-tracking branch 'upstream/develop' into quantized_matmul

343ff9c

vivienfanghuagood reviewed Jul 18, 2023

View reviewed changes

qingqing01 reviewed Jul 18, 2023

View reviewed changes

lizhenyun01 and others added 5 commits August 7, 2023 21:20

fix weight_only&llm_int8 API

9cadb61

Merge remote-tracking branch 'upstream/develop' into quantized_matmul

a263304

fix quant_for_infer and API doc

df244bb

close unitest in win32

ffa9711

fix dim check

f0991be

lizhenyun01 changed the title ~~refactor linear_compress as quantized_matmul~~ [Paddle Inference] refactor linear_compress Aug 9, 2023

qingqing01 reviewed Aug 9, 2023

View reviewed changes

lizhenyun01 added 9 commits August 10, 2023 20:49

Complete license and rename quant_for_infer to weight_quantize

19a01b6

fix sample code

4158dd6

fix sample code for weight_quantize

e7f471c

fix doc examples

5c6d138

Merge branch 'develop' into quantized_matmul

578267a

fix shape check

169dede

add splitK for weight_only

68ac001

fix weight_quantize op

2b79d36

Merge branch 'develop' into quantized_matmul

0651ec0

vivienfanghuagood approved these changes Aug 18, 2023

View reviewed changes

heavengate approved these changes Aug 18, 2023

View reviewed changes

heavyrain-lzy approved these changes Aug 18, 2023

View reviewed changes

jzhang533 approved these changes Aug 21, 2023

View reviewed changes

zhangbo9674 approved these changes Aug 21, 2023

View reviewed changes

lanxianghit approved these changes Aug 21, 2023

View reviewed changes

raindrops2sea approved these changes Aug 22, 2023

View reviewed changes

heavengate merged commit ffff3da into PaddlePaddle:develop Aug 22, 2023

BeingGod pushed a commit to BeingGod/Paddle that referenced this pull request Sep 9, 2023

[Paddle Inference] refactor linear_compress (PaddlePaddle#55490)

ca75065

* Modify kernels to support quantized_matmul --------- Co-authored-by: superxf <1208713646@qq.com>

lizhenyun01 deleted the quantized_matmul branch July 18, 2024 09:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Paddle Inference] refactor linear_compress #55490

[Paddle Inference] refactor linear_compress #55490

lizhenyun01 commented Jul 17, 2023 •

edited

Loading

paddle-bot bot commented Jul 17, 2023

vivienfanghuagood Jul 18, 2023

vivienfanghuagood Jul 18, 2023 •

edited

Loading

vivienfanghuagood Jul 18, 2023

qingqing01 Jul 18, 2023

qingqing01 Jul 18, 2023

qingqing01 Jul 18, 2023

qingqing01 Jul 18, 2023

paddle-ci-bot bot commented Jul 27, 2023

qingqing01 Aug 9, 2023

lizhenyun01 Aug 10, 2023

qingqing01 Aug 9, 2023

lizhenyun01 Aug 9, 2023

lizhenyun01 Aug 10, 2023 •

edited

Loading

qingqing01 Aug 9, 2023

lizhenyun01 Aug 10, 2023

qingqing01 Aug 9, 2023

lizhenyun01 Aug 10, 2023

heavyrain-lzy left a comment

jzhang533 left a comment

jzhang533 Aug 21, 2023

lizhenyun01 Aug 21, 2023

zhangbo9674 left a comment

		from paddle.framework import in_dynamic_mode


		def quant_for_compress(x, layout="weight_only_int8"):

		@@ -0,0 +1,309 @@
		# Copyright (c) 2023 PaddlePaddle Authors. All Rights Reserved.

[Paddle Inference] refactor linear_compress #55490

[Paddle Inference] refactor linear_compress #55490

Conversation

lizhenyun01 commented Jul 17, 2023 • edited Loading

PR types

PR changes

Description

paddle-bot bot commented Jul 17, 2023

Choose a reason for hiding this comment

vivienfanghuagood Jul 18, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

paddle-ci-bot bot commented Jul 27, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lizhenyun01 Aug 10, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

heavyrain-lzy left a comment

Choose a reason for hiding this comment

jzhang533 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zhangbo9674 left a comment

Choose a reason for hiding this comment

lizhenyun01 commented Jul 17, 2023 •

edited

Loading

vivienfanghuagood Jul 18, 2023 •

edited

Loading

lizhenyun01 Aug 10, 2023 •

edited

Loading