Added support for quantization of fusion_gru #27518

wojtuss · 2020-09-23T12:58:44Z

PR types

New features

PR changes

OPs

Describe

This patch adds support for INT8 quantization of fusion_gru op. It includes commits from PR #27481 and provides the rest of functionality required for #27330.

This patch adds also a test with transformation of a quant GRU model into int8 model. The saved int8 model can be used for testing accuracy and performance:

ctest -R save_quant2_model_gru -V

Performance benchmarking will make sense only after bumping up oneDNN version commit with an optimized GRU INT8 primitive, as the current oneDNN version provides unoptimized GRU INT8 kernel only. The oneDNN version will be updated most probably by the end of this week.

With these changes INT8 quantization of the fusion_gru op will be enabled. However, quantization of all the quantizable operators in the GRU model does not work yet because other operators like concat does not support quantization with shift yet. For performance reasons it is desirable to have a sequence of quantized operators without dequantization/quantization in between, so support for quantization of concat op with shift will be implemented as well. A PR with the changes should come by the end of this week as well.

[Update]
Now the patch has updated oneDNN commit hash containing optimized version of GRU INT8 kernel. Here are the benchmark results of the saved GRU INT8 model on CLX 6248:

	fp32	qat (fp32)	int8	int8-qat diff	fp32/int8 ratio
Precision	0.89211	0.89198	0.89221	0.00023
Recall	0.89442	0.89449	0.89412	-0.00037
F1 score	0.89326	0.89323	0.89316	-0.00007
batch latency (ms)	25.3818	27.8914	15.9434		1.59

The command for GRU INT8 model benchmarking:

build/paddle/fluid/inference/tests/api/test_analyzer_lexical_analysis \
          --infer_model=build/third_party/inference_demo/quant/GRU_quant2_int8 \
          --infer_data=build/third_party/inference_demo/gru/GRU_eval_data.bin \
          --batch_size=50 \
          --cpu_num_threads=1 \
          --with_accuracy_layer=true \
          --use_analysis=false \
          --iterations=0

For GRU FP32 use the model from http://paddle-inference-dist.bj.bcebos.com/gru/GRU_eval_model_v2.tar.gz

There are still options to improve INT8 performance, we are working on them.

paddle-bot-old · 2020-09-23T12:59:14Z

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

grygielski

Great work! LGTM

jczaja

LGTM

use faster data format

grygielski

LGTM

luotao1 · 2020-09-29T09:29:31Z

2020-09-28 19:53:24 ****************
2020-09-28 19:53:25 0. You must have Dianhai approval for change 20+ files or add than 1000+ lines of content.
2020-09-28 19:53:25 There are 1 approved errors.
2020-09-28 19:53:25 ****************

Could you separate this PR, and we can merge them ASAP.

wojtuss · 2020-09-29T10:53:30Z

Could you separate this PR, and we can merge them ASAP.

@luotao1
I have rebased to the develop branch. Now the special approval is not required.

lidanqing-intel · 2020-09-29T13:28:41Z

@wangzhen-nlp 嗨，这是fusion_gru INT8 实现，你要review一下吗？

lidanqing-intel · 2020-10-14T12:33:01Z

@wojtuss Next time we can use fps as measurement. Because batch latency is latency for one batch which is 50 samples.
We also measure bs=1

wojtuss added Intel int8 labels Sep 23, 2020

wojtuss requested review from jczaja and grygielski September 23, 2020 12:58

grygielski previously approved these changes Sep 23, 2020

View reviewed changes

wojtuss dismissed grygielski’s stale review via 92bf678 September 24, 2020 12:23

jczaja previously approved these changes Sep 24, 2020

View reviewed changes

wojtuss dismissed jczaja’s stale review via 28356f6 September 28, 2020 11:52

wojtuss force-pushed the wojtuss/fusion_gru_quantization branch from 92bf678 to 28356f6 Compare September 28, 2020 11:52

paddle-bot-old bot referenced this pull request Sep 28, 2020

Add support for (de/re)quantization with shift

842c9ee

Wojciech Uss added 4 commits September 29, 2020 05:51

Added support for quantization of fusion_gru

1eeb193

reverted clang_format version change

e0e7382

fix code format error

90cd9cc

use oneDNN GRU INT8 optimized version

0d4fc6d

use faster data format

grygielski previously approved these changes Sep 29, 2020

View reviewed changes

wojtuss closed this Sep 29, 2020

wojtuss reopened this Sep 29, 2020

wojtuss dismissed grygielski’s stale review via 0d4fc6d September 29, 2020 10:51

wojtuss force-pushed the wojtuss/fusion_gru_quantization branch from 28356f6 to 0d4fc6d Compare September 29, 2020 10:51

wojtuss requested a review from luotao1 October 1, 2020 06:28

luotao1 approved these changes Oct 1, 2020

View reviewed changes

luotao1 merged commit 966447e into PaddlePaddle:develop Oct 1, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added support for quantization of fusion_gru #27518

Added support for quantization of fusion_gru #27518

wojtuss commented Sep 23, 2020 •

edited

Loading

paddle-bot-old bot commented Sep 23, 2020

grygielski left a comment

jczaja left a comment

grygielski left a comment

luotao1 commented Sep 29, 2020

wojtuss commented Sep 29, 2020

lidanqing-intel commented Sep 29, 2020

lidanqing-intel commented Oct 14, 2020

Added support for quantization of fusion_gru #27518

Added support for quantization of fusion_gru #27518

Conversation

wojtuss commented Sep 23, 2020 • edited Loading

PR types

PR changes

Describe

paddle-bot-old bot commented Sep 23, 2020

grygielski left a comment

Choose a reason for hiding this comment

jczaja left a comment

Choose a reason for hiding this comment

grygielski left a comment

Choose a reason for hiding this comment

luotao1 commented Sep 29, 2020

wojtuss commented Sep 29, 2020

lidanqing-intel commented Sep 29, 2020

lidanqing-intel commented Oct 14, 2020

wojtuss commented Sep 23, 2020 •

edited

Loading