Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rnn/lstm/gru dynamic quantization #5435

Merged
merged 60 commits into from
May 8, 2024
Merged

Conversation

nihui
Copy link
Member

@nihui nihui commented Apr 18, 2024

  • rnn
  • rnn-arm
  • lstm
  • lstm-arm
  • lstm-x86
  • gru
  • gru-arm
  • fix over load s8
  • coverage
  • doc
  • speed test
  • rnn aq
  • rnn-arm aq
  • lstm aq
  • lstm-arm aq
  • lstm-x86 aq
  • gru aq
  • gru-arm aq

@github-actions github-actions bot added the x86 label Apr 22, 2024
@nihui nihui changed the title [WIP] rnn/lstm/gru weight only quantization rnn/lstm/gru weight only quantization Apr 24, 2024
@nihui nihui changed the title rnn/lstm/gru weight only quantization [WIP] rnn/lstm/gru weight only quantization Apr 24, 2024
@nihui nihui changed the title [WIP] rnn/lstm/gru weight only quantization [WIP] rnn/lstm/gru dynamic quantization Apr 28, 2024
@nihui
Copy link
Member Author

nihui commented May 7, 2024

import torch
import torch.nn as nn
import torch.nn.functional as F
import pnnx

class Model(nn.Module):
    def __init__(self):
        super(Model, self).__init__()

        self.rnn = nn.RNN(input_size=256, hidden_size=256, num_layers=30)
        self.lstm = nn.LSTM(input_size=256, hidden_size=256, num_layers=30)
        self.gru = nn.GRU(input_size=256, hidden_size=256, num_layers=30)

    def forward(self, x):
        out0, _ = self.rnn(x)
        out1, _ = self.lstm(x)
        out2, _ = self.gru(x)
        return out0, out1, out2

net = Model().half().float()
net.eval()

torch.manual_seed(0)
x = torch.rand(300, 1, 256)

pnnx.export(net, "rnn.pt", x)
ncnn2int8 rnn.ncnn.param rnn.ncnn.bin rnn-int8.ncnn.param rnn-int8.ncnn.bin /dev/null
rnn/rnn-int8.bin fp16 int8
模型体积 60.1M 30.6M
qcom855plus MAE fp32 fp16 int8
30层rnn 0 2.29E-08 7.31E-08
30层lstm 0 4.39E-09 5.54E-09
30层gru 0 6.75E-09 1.96E-08
qcom855plus 单线程耗时 fp32 fp16 int8
30层rnn 45.16 24.81 19.87
30层lstm 256.51 121.99 60.7
30层gru 167.52 94.68 46.29
i5-12400 单线程跑30层lstm-int8模型耗时  
naive(sse2) 95.24
sse2 87.02
avx 64.85
avx2 42.22
avxvnni 23.24
avx512 27.95
avx512vnni 15.8

@nihui
Copy link
Member Author

nihui commented May 8, 2024

imx6d 单线程耗时 fp32 int8
30层rnn 1392.22 504.83
30层lstm 6063.91 1833.46
30层gru 4357.59 1300.93

@nihui nihui changed the title [WIP] rnn/lstm/gru dynamic quantization rnn/lstm/gru dynamic quantization May 8, 2024
@github-actions github-actions bot added the doc label May 8, 2024
@nihui nihui merged commit 08b7d99 into Tencent:master May 8, 2024
64 of 66 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants