GitHub - asr-sheep1/DRB_method: DRB:Discarding redundant blocks to speed up Transformer decoding for speech recognition

asr-sheep1 / DRB_method Public

Notifications You must be signed in to change notification settings
Fork 1
Star 0

DRB:Discarding redundant blocks to speed up Transformer decoding for speech recognition

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
README		README
conf_aishell1_base		conf_aishell1_base
conf_librispeech_base		conf_librispeech_base
conf_librispeech_drb		conf_librispeech_drb
drb.py		drb.py
weight_file		weight_file

Repository files navigation

The Transformer model has great success in the field of speech recognition due to its powerful contextual modeling capability. 
However, Transformer decoder uses encoder output features with redundant information, which limits the further improvement of the model decoding 
speed and is not conducive to the wider application of the model. 
   Therefore, we proposes a Transformer decoding acceleration method by compressing acoustic feature sequences, which is called 
Redundant block discarding (DRB). It uses the spike sequence generated by the temporal concatenation classifier (CTC) to remove consecutive
redundant blank frames from the encoder output features to reduce the feature length required by the decoder, thus reducing the decoder computation
and achieving the goal of improving the decoding speed of the model.
   Our method is implemented in the WENET speech recognition toolbox. At present, only the private core code segment of DRB method is released, 
and more complete code structure will be released for your reference in the future.


Transformer模型具有强大的上下文建模能力，因此在语音识别领域取得了巨大的成功。然而，Transformer解码器使用的编码器输出特征带有冗余信息，
这限制了模型解码速度的进一步提升，不利于模型更广泛的应用。为此，我们提出一种压缩声学特征序列的Transformer解码加速方法，称为丢弃冗余块（DRB）。
它利用时序连接分类器（CTC）产生的尖峰序列去除编码器输出特征中连续冗余的空白帧，减小解码器所需的编码特征长度，以降低解码器的计算量，提高模型解码速度。
   DRB方法适用于CTC/AED结构的语音识别模型，仅需微调原有模型就能实现显著的解码加速效果，并且识别精度没有损失太多。我们的方法基于优秀的开源语音识别工具箱Wenet
实现，在开源中文数据集Aishell-1上的到了有效的验证结果。
   目前，我们仅仅放出了DRB方法实现的主题代码，以及两个不同数据集的训练配置参数。后续，我们将完善pipline、实验结果以及代码权重相关文件。
   权重文件见weight_file文件，有百度网盘保存的权重数据链接，陆续上传
   
DRB使用方法：
训练：
1、训练一个收敛的CTC/AED模型；
2、冻结网络的encoder部分以及CTC分类器，将DRB作用于encoder输出特征，仅对decoder进行微调训练，直至模型收敛。
推理：
两阶段重打分的方法:在CTC/AED模型结构中有更高的精度和识别速度，因此将DRB作用于该解码方法能够实现显著的GPU/CPU解码加速.
传统的attention自回归解码：DRB方法对encoder输出去除冗余后，在CPU设备上仅牺牲较小精度损失，即可有显著解码加速效果。

本方法更详细全面的阐述见https://link.cnki.net/doi/10.19678/j.issn.1000-3428.0065685
wenet官方地址:【https://github.com/wenet-e2e/wenet】
本文使用的版本为此账号仓库fork的版本