Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added support for quantization of fusion_gru #27518

Merged

Conversation

wojtuss
Copy link

@wojtuss wojtuss commented Sep 23, 2020

PR types

New features

PR changes

OPs

Describe

This patch adds support for INT8 quantization of fusion_gru op. It includes commits from PR #27481 and provides the rest of functionality required for #27330.

This patch adds also a test with transformation of a quant GRU model into int8 model. The saved int8 model can be used for testing accuracy and performance:

ctest -R save_quant2_model_gru -V

Performance benchmarking will make sense only after bumping up oneDNN version commit with an optimized GRU INT8 primitive, as the current oneDNN version provides unoptimized GRU INT8 kernel only. The oneDNN version will be updated most probably by the end of this week.

With these changes INT8 quantization of the fusion_gru op will be enabled. However, quantization of all the quantizable operators in the GRU model does not work yet because other operators like concat does not support quantization with shift yet. For performance reasons it is desirable to have a sequence of quantized operators without dequantization/quantization in between, so support for quantization of concat op with shift will be implemented as well. A PR with the changes should come by the end of this week as well.

[Update]
Now the patch has updated oneDNN commit hash containing optimized version of GRU INT8 kernel. Here are the benchmark results of the saved GRU INT8 model on CLX 6248:

  fp32 qat (fp32) int8 int8-qat diff fp32/int8 ratio
Precision 0.89211 0.89198 0.89221 0.00023  
Recall 0.89442 0.89449 0.89412 -0.00037  
F1 score 0.89326 0.89323 0.89316 -0.00007  
batch latency (ms) 25.3818 27.8914 15.9434   1.59

The command for GRU INT8 model benchmarking:

build/paddle/fluid/inference/tests/api/test_analyzer_lexical_analysis \
          --infer_model=build/third_party/inference_demo/quant/GRU_quant2_int8 \
          --infer_data=build/third_party/inference_demo/gru/GRU_eval_data.bin \
          --batch_size=50 \
          --cpu_num_threads=1 \
          --with_accuracy_layer=true \
          --use_analysis=false \
          --iterations=0

For GRU FP32 use the model from http://paddle-inference-dist.bj.bcebos.com/gru/GRU_eval_model_v2.tar.gz

There are still options to improve INT8 performance, we are working on them.

@paddle-bot-old
Copy link

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

grygielski
grygielski previously approved these changes Sep 23, 2020
Copy link
Contributor

@grygielski grygielski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! LGTM

jczaja
jczaja previously approved these changes Sep 24, 2020
Copy link
Contributor

@jczaja jczaja left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

grygielski
grygielski previously approved these changes Sep 29, 2020
Copy link
Contributor

@grygielski grygielski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@luotao1
Copy link
Contributor

luotao1 commented Sep 29, 2020

2020-09-28 19:53:24 ****************
2020-09-28 19:53:25 0. You must have Dianhai approval for change 20+ files or add than 1000+ lines of content.
2020-09-28 19:53:25 There are 1 approved errors.
2020-09-28 19:53:25 ****************

Could you separate this PR, and we can merge them ASAP.

@wojtuss
Copy link
Author

wojtuss commented Sep 29, 2020

Could you separate this PR, and we can merge them ASAP.

@luotao1
I have rebased to the develop branch. Now the special approval is not required.

@lidanqing-intel
Copy link
Contributor

@wangzhen-nlp 嗨,这是fusion_gru INT8 实现,你要review一下吗?

@wojtuss wojtuss requested a review from luotao1 October 1, 2020 06:28
@luotao1 luotao1 merged commit 966447e into PaddlePaddle:develop Oct 1, 2020
@lidanqing-intel
Copy link
Contributor

@wojtuss Next time we can use fps as measurement. Because batch latency is latency for one batch which is 50 samples.
We also measure bs=1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants