Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ITN runtime. #2001

Merged
merged 13 commits into from
Sep 18, 2023
Merged

ITN runtime. #2001

merged 13 commits into from
Sep 18, 2023

Conversation

duj12
Copy link
Contributor

@duj12 duj12 commented Sep 7, 2023

The fst-based ITN model is compiled from WeTextProcessing,
I made a version of mine, which solve some known corner cases. You can download it from ModelScope

For now, decoder_main and api_main(Both with libtorch backend) is validated after ITN is added.
Other runtime may still have some issue.

For decoder_main, the running command and result is like:

dujing@dujing-pc:/mnt/d/work/code/wenet-itn/runtime/libtorch$ build_debug/bin/decoder_main --model_path ../resource/ASR/final.zip --unit_path ../resource/ASR/units.txt --wav_path ../resource/WAV/number.wav --fst_path ../resource/ASR/TLG.fst --dict_path ../resource/ASR/words.txt --itn_model_dir ../resource/ASR
test 2016年03月08日2018年05月09日呃132 个1358

For api_main, the running command and result is like:

dujing@dujing-pc:/mnt/d/work/code/wenet-itn/runtime/libtorch$ build_debug/bin/api_main --model_dir ../resource/ASR --wav_path ../resource/WAV/number.wav
I0907 20:08:05.275250  8845 wenet_api.cc:48] Reading torch model
I0907 20:08:05.299885  8845 torch_asr_model.cc:35] Num intra-op threads: 1
I0907 20:08:06.364199  8845 torch_asr_model.cc:73] Torch Model Info:
I0907 20:08:06.364248  8845 torch_asr_model.cc:74]      subsampling_rate 4
I0907 20:08:06.364253  8845 torch_asr_model.cc:75]      right context 6
I0907 20:08:06.364255  8845 torch_asr_model.cc:76]      sos 11999
I0907 20:08:06.364257  8845 torch_asr_model.cc:77]      eos 11999
I0907 20:08:06.364260  8845 torch_asr_model.cc:78]      is bidirectional decoder 1
I0907 20:08:06.387041  8845 fst.h:820] FstImpl::ReadHeader: source: ../resource/ASR/TLG.fst, fst_type: vector, arc_type: standard, version: 2, flags: 0
I0907 20:08:12.140612  8845 wenet_api.cc:98] Reading ITN fst
I0907 20:08:12.142271  8845 fst.h:820] FstImpl::ReadHeader: source: ../resource/ASR/zh_itn_tagger.fst, fst_type: vector, arc_type: standard, version: 2, flags: 0
I0907 20:08:12.163174  8845 fst.h:820] FstImpl::ReadHeader: source: ../resource/ASR/zh_itn_verbalizer.fst, fst_type: vector, arc_type: standard, version: 2, flags: 0
I0907 20:08:12.235846  8845 asr_decoder.cc:104] Required 67 get 67
I0907 20:08:12.935179  8845 asr_decoder.cc:104] Required 64 get 64
I0907 20:08:16.780755  8845 asr_decoder.cc:200] Partial CTC result 二零一
I0907 20:08:16.780877  8845 asr_decoder.cc:104] Required 64 get 64
I0907 20:08:24.570278  8845 asr_decoder.cc:200] Partial CTC result 二零一六年
I0907 20:08:24.570405  8845 asr_decoder.cc:104] Required 64 get 64
I0907 20:08:29.565124  8845 asr_decoder.cc:200] Partial CTC result 二零一六年三月
I0907 20:08:29.565235  8845 asr_decoder.cc:104] Required 64 get 64
I0907 20:08:29.638185  8845 asr_decoder.cc:200] Partial CTC result 二零一六年三月八日
I0907 20:08:29.638310  8845 asr_decoder.cc:104] Required 64 get 64
I0907 20:08:29.677501  8845 asr_decoder.cc:200] Partial CTC result 二零一六年三月八日
I0907 20:08:29.677613  8845 asr_decoder.cc:104] Required 64 get 64
I0907 20:08:29.714907  8845 asr_decoder.cc:200] Partial CTC result 二零一六年三月八日二零一八年
I0907 20:08:29.715016  8845 asr_decoder.cc:104] Required 64 get 64
I0907 20:08:29.738862  8845 asr_decoder.cc:200] Partial CTC result 二零一六年三月八日二零一八年
I0907 20:08:29.738973  8845 asr_decoder.cc:104] Required 64 get 64
I0907 20:08:29.834138  8845 asr_decoder.cc:200] Partial CTC result 二零一六年三月八日二零一八年五月九
I0907 20:08:29.834255  8845 asr_decoder.cc:104] Required 64 get 64
I0907 20:08:29.867908  8845 asr_decoder.cc:200] Partial CTC result 二零一六年三月八日二零一八年五月九日
I0907 20:08:29.868029  8845 asr_decoder.cc:104] Required 64 get 64
I0907 20:08:29.898561  8845 asr_decoder.cc:200] Partial CTC result 二零一六年三月八日二零一八年五月九日
I0907 20:08:29.898604  8845 ctc_endpoint.cc:42] Endpointing rule rule2 activated: true,1160,7040
I0907 20:08:29.898610  8845 asr_decoder.cc:125] Endpoint is detected at 707
I0907 20:08:29.898674  8845 asr_decoder.cc:104] Required 64 get 64
I0907 20:08:29.939894  8845 asr_decoder.cc:200] Partial CTC result 二零一六年三月八日二零一八年五月九日
...
...

I0907 20:08:35.052362  8845 queue.h:633] AutoQueue: using LIFO discipline
I0907 20:08:35.052731  8845 asr_decoder.cc:200] Partial CTC result 2016年03月08日2018年05月09日132 个1358
I0907 20:08:35.905037  8845 asr_decoder.cc:84] Rescoring cost latency: 1341ms.
I0907 20:08:35.905140  8845 api_main.cc:42] 1 {
  "nbest" : [{
      "sentence" : "2016年03月08日2018年05月09日呃132 个1358 "
    }],
  "type" : "final_result"
}

In this ITN runtime code, It's worth noting that the version of openfst is upgraded from 1.6.5 to 1.7.2.
So when building TLG.fst, you'd better use the same version of openfst(At least for now, the official verison of openfst in KALDI is 1.7.2(https://github.com/kaldi-asr/kaldi/blob/master/tools/Makefile#L10)).
And make sure the version of pynini in WeTextProcessing is proper(For now, the itn model compile from WeTextProcessing==0.1.1 can run successfully.)
On this version, the N-gram Language model(I have not verify the TLG.fst released by WeNet yet, but I suppose it will work) and ITN model can both work.

@robin1001
Copy link
Collaborator

Great job! Please fix the lint problem.

@xingchensong
Copy link
Member

感谢!晚会看一下

runtime/core/api/wenet_api.cc Outdated Show resolved Hide resolved
runtime/core/bin/CMakeLists.txt Show resolved Hide resolved
runtime/core/decoder/asr_decoder.h Outdated Show resolved Hide resolved
runtime/core/post_processor/post_processor.cc Outdated Show resolved Hide resolved
runtime/core/post_processor/post_processor.h Outdated Show resolved Hide resolved
@xingchensong
Copy link
Member

我个人建议,能不能用sub project的方式把wetext的runtime部分引入进来呢?现在相当于把wetext的runtime的代码copy过来了,如果wetext有修改runtime,这边还得同步,@robin1001 觉得呢

@xingchensong
Copy link
Member

把wetext当成一个subproject引入可以参考我之前的实现 : main...xcsong-wetextprocessing
当时没合入主要是规则还不稳定,以及openfst的版本冲突等原因

@robin1001
Copy link
Collaborator

我个人建议,能不能用sub project的方式把wetext的runtime部分引入进来呢?现在相当于把wetext的runtime的代码copy过来了,如果wetext有修改runtime,这边还得同步,@robin1001 觉得呢

对,更建议用这种方式。以第三方库的形式集成进来。

@robin1001
Copy link
Collaborator

openfst 我们可以直接升级,这块应该没有什么风险。

@duj12
Copy link
Contributor Author

duj12 commented Sep 13, 2023

把wetext当成一个subproject引入可以参考我之前的实现 : main...xcsong-wetextprocessing 当时没合入主要是规则还不稳定,以及openfst的版本冲突等原因

今天试了一下,目前以第三方库引入之后,构建过程中,由于wenet和wetext都包含一个utils/string.h,且在引用这个头文件时都是用了“#include utils/string.h"去导入,导致两个库同时构建时,就会出现其中一个库找不到其名称空间下的utils/string.h中的方法的问题(https://github.com/duj12/wenet/blob/itn-wetext-sub/runtime/core/cmake/wetextprocessing.cmake#L25 这里会遇到构建.o时找不到wetext中的string.h的问题)。这是个fatal的issue,不改动wenet或wetext源码就无法解决。
另外就是由于utils在两个库中都是各自的子模块,导致构建时,必须有一个utils要单独拉出来构建成一个新的其他名字的lib,这样构建就会很不方便且优雅(https://github.com/duj12/wenet/blob/itn-wetext-sub/runtime/core/cmake/wetextprocessing.cmake#L10)。

关于这个问题,你和震东可以想想有啥好的办法解决,最简单的办法就是把wetext中的utils/string.h改个名字。

@duj12
Copy link
Contributor Author

duj12 commented Sep 13, 2023

我个人建议,能不能用sub project的方式把wetext的runtime部分引入进来呢?现在相当于把wetext的runtime的代码copy过来了,如果wetext有修改runtime,这边还得同步,@robin1001 觉得呢

@xingchensong 目前这个部分的代码是我之前很早就拷贝的,有些函数方法名称还没有同步到最新的大写字母开头的那个版本上。然后命名空间改了一下,直接复用了utils模块,同时改了一下代码以便通过cpp-lint和clang-format的检测。

@robin1001
Copy link
Collaborator

@xingchensong 帮忙看一下

@xingchensong
Copy link
Member

sorry,miss掉了才看到

@xingchensong
Copy link
Member

wetext 已完成文件更名:wenet-e2e/WeTextProcessing#111

@duj12
Copy link
Contributor Author

duj12 commented Sep 16, 2023

wetext 已完成文件更名:wenet-e2e/WeTextProcessing#111

@xingchensong subproject的方式集成已经完成了,现在的编译方式就是,只要ITN选项打开,就会直接把wetext_processor编到wenet库里面。没有加额外的宏。
然后decoder_main中使用时,输入参数中如果加上ITN模型路径且两个特定fst文件存在,结果就会执行ITN。
api中使用时,如果模型路径下面包含了两个特定名称的fst文件,结果就会执行ITN。

@xingchensong
Copy link
Member

great ,感谢!!!

@xingchensong xingchensong merged commit e8f451e into wenet-e2e:main Sep 18, 2023
6 checks passed
@@ -140,7 +140,7 @@ class AsrDecoder {
std::shared_ptr<PostProcessor> post_processor_;
std::shared_ptr<ContextGraph> context_graph_;

std::shared_ptr<fst::Fst<fst::StdArc>> fst_ = nullptr;
std::shared_ptr<fst::VectorFst<fst::StdArc>> fst_ = nullptr;
// output symbol table
std::shared_ptr<fst::SymbolTable> symbol_table_;
// e2e unit symbol table
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update: 原来1.6.5的openfst使用fst::Fst读取,为了wetext更新到1.7.2的openfst后,需要改成fst::VectorFst

Copy link
Member

@xingchensong xingchensong Oct 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

潜在问题纪录:recipe中的make_tlg.sh中构图所使用的fstcompile,不能是wenet自带的 https://github.com/kkm000/openfst/archive/refs/tags/win/1.7.2.1.tar.gz (为了适配windows做了一些修改),而需要官方原版openfst,http://www.openfst.org/twiki/pub/FST/FstDownload/openfst-1.7.2.tar.gz

该问题仅针对构图过程存在,构图使用原版1.7.2, 推理时使用kkm000版1.7.2,是可以正常推理的

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: 需要在recipe中添加相关的提醒 OR 用patch方式修复一下kkm000的openfst?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

原版和 kkm 区别是?打什么样的 patch?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mmexport1697069306007.png

fstcompile的时候会报这个错。

google了一下找到一个解释 wincentbalin/compile-static-openfst#1

Copy link
Member

@xingchensong xingchensong Oct 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

原本的策略是linux和win用不同的openfst(win用kkm,linux用mjansche)后来在这个pr 里,统一用kkm了b59ca15

Copy link

@maiphong0411 maiphong0411 Mar 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential problem log: fstcompile used in the composition of the make_tlg.sh in the recipe cannot be the https://github.com/kkm000/openfst/archive/refs/tags/win/1.7.2.1.tar.gz that comes with Wenet (some modifications have been made to adapt to Windows), And you need the official original openfst, http://www.openfst.org/twiki/pub/FST/FstDownload/openfst-1.7.2.tar.gz

This problem only exists for the composition process, the original version 1.7.2 is used for composition, and the kkm000 version 1.7.2 is used for inference, which can be reasoned normally

I tried to use http://www.openfst.org/twiki/pub/FST/FstDownload/openfst-1.7.2.tar.gz, but it's not work, I had an error fst
Screenshot 2024-03-05 111357

@maiphong0411
Copy link

and I had a warning
Screenshot 2024-03-05 111624

@roney123
Copy link
Contributor

@duj12 大佬,这样加ITN的话,word_pieces的字、时间戳和ITN之后的是不是不一致了

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants