-
Notifications
You must be signed in to change notification settings - Fork 454
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add C++ runtime for *streaming* faster conformer transducer from NeMo. #889
Conversation
@csukuangfj would we need StackStates and UnStackStates methods for this? |
Yes, please refer to sherpa-onnx/sherpa-onnx/csrc/online-nemo-ctc-model.cc Lines 121 to 122 in 8af2af8
and sherpa-onnx/sherpa-onnx/csrc/online-nemo-ctc-model.cc Lines 156 to 157 in 8af2af8
Note that for decoding, you can support only batch_size == 1. |
Hi @csukuangfj , |
All you need is to change the offline C++ version to an online version.
to add two methods, .e.g., void SetNeMoDecoderStates(std::vector<Ort::Value> states);
std::vector<Ort::Value> &GetNeMoDecoderStates();
|
@csukuangfj could you review these changes please. Waiting for your feedback. Also, could you assist me with online-transducer-greedy-search-nemo-decoder.cc. Following offline-transducer-greedy-search-nemo-decoder.cc is not so helpful in this case, as its a streaming mode Thank You |
By the way, you need to change sherpa-onnx/sherpa-onnx/csrc/online-recognizer-impl.cc Lines 15 to 17 in 81346d1
and sherpa-onnx/sherpa-onnx/csrc/online-recognizer-impl.cc Lines 36 to 38 in 81346d1
You can use the number of outputs from the decoder model to decide whether to create a normal You can refer to
to create a session for the decoder modeland refer to the following code to get the number of outputs for the decoder model
You only need to support two kinds of transducer models in sherpa-onnx: one for stateless transducer, and one for NeMo stateful transducer. |
We have both a C++ and a Python version for the non-streaming nemo transducer greedy search Please read them carefully. The only differences from the non-streaming one:
|
Hi @csukuangfj , Thank You |
|
||
auto states = model_->StackStates(states_vec); | ||
|
||
auto [t, ns] = model_->RunEncoder(std::move(x), std::move(states), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please note that this is NeMo transducer encoder, not the one from icefall.
please refer to
sherpa-onnx/scripts/nemo/fast-conformer-hybrid-transducer-ctc/test-onnx-transducer.py
Lines 169 to 173 in 2db7775
self.encoder.get_inputs()[0].name: x.numpy(), | |
self.encoder.get_inputs()[1].name: x_lens.numpy(), | |
self.encoder.get_inputs()[2].name: self.cache_last_channel, | |
self.encoder.get_inputs()[3].name: self.cache_last_time, | |
self.encoder.get_inputs()[4].name: self.cache_last_channel_len, |
and
sherpa-onnx/sherpa-onnx/csrc/online-nemo-ctc-model.cc
Lines 52 to 97 in 2db7775
std::vector<Ort::Value> Forward(Ort::Value x, | |
std::vector<Ort::Value> states) { | |
Ort::Value &cache_last_channel = states[0]; | |
Ort::Value &cache_last_time = states[1]; | |
Ort::Value &cache_last_channel_len = states[2]; | |
int32_t batch_size = x.GetTensorTypeAndShapeInfo().GetShape()[0]; | |
std::array<int64_t, 1> length_shape{batch_size}; | |
Ort::Value length = Ort::Value::CreateTensor<int64_t>( | |
allocator_, length_shape.data(), length_shape.size()); | |
int64_t *p_length = length.GetTensorMutableData<int64_t>(); | |
std::fill(p_length, p_length + batch_size, ChunkLength()); | |
// (B, T, C) -> (B, C, T) | |
x = Transpose12(allocator_, &x); | |
std::array<Ort::Value, 5> inputs = { | |
std::move(x), View(&length), std::move(cache_last_channel), | |
std::move(cache_last_time), std::move(cache_last_channel_len)}; | |
auto out = | |
sess_->Run({}, input_names_ptr_.data(), inputs.data(), inputs.size(), | |
output_names_ptr_.data(), output_names_ptr_.size()); | |
// out[0]: logit | |
// out[1] logit_length | |
// out[2:] states_next | |
// | |
// we need to remove out[1] | |
std::vector<Ort::Value> ans; | |
ans.reserve(out.size() - 1); | |
for (int32_t i = 0; i != out.size(); ++i) { | |
if (i == 1) { | |
continue; | |
} | |
ans.push_back(std::move(out[i])); | |
} | |
return ans; | |
} |
You never need to use processed_frames
.
I hope that you can understand what we have written.
Remember that the hybrid transducer + CTC shares the same encoder, which means you can borrow what we have done for the streaming NeMo CTC.
Please compare carefully between
sherpa-onnx/scripts/nemo/fast-conformer-hybrid-transducer-ctc/test-onnx-ctc.py
Lines 127 to 132 in 2db7775
self.model.get_inputs()[0].name: x.numpy(), | |
self.model.get_inputs()[1].name: x_lens.numpy(), | |
self.model.get_inputs()[2].name: self.cache_last_channel, | |
self.model.get_inputs()[3].name: self.cache_last_time, | |
self.model.get_inputs()[4].name: self.cache_last_channel_len, | |
}, |
and
sherpa-onnx/scripts/nemo/fast-conformer-hybrid-transducer-ctc/test-onnx-transducer.py
Lines 169 to 173 in 2db7775
self.encoder.get_inputs()[0].name: x.numpy(), | |
self.encoder.get_inputs()[1].name: x_lens.numpy(), | |
self.encoder.get_inputs()[2].name: self.cache_last_channel, | |
self.encoder.get_inputs()[3].name: self.cache_last_time, | |
self.encoder.get_inputs()[4].name: self.cache_last_channel_len, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Than You. I have borrowed the forward method for RunEncoder method in online-transducer-nemo-model.cc
.
I have a question regarding the initialization of the decoder states in online-recognizer-transducer-nemo-impl.h
I define these two methods.
std::unique_ptr<OnlineStream> CreateStream() const override {
auto stream = std::make_unique<OnlineStream>(config_.feat_config);
stream->SetStates(model_->GetInitStates());
InitOnlineStream(stream.get());
return stream;
}
void InitOnlineStream(OnlineStream *stream) const {
auto r = decoder_->GetEmptyResult();
stream->SetResult(r);
stream->SetNeMoDecoderStates(model_->GetDecoderInitStates(batch_size_));
}
Should the line in InitOnlineStream
be this?
stream->SetNeMoDecoderStates(decoder_->GetDecoderInitStates(batch_size_));
Ort::Value encoder_out = Transpose12(model_->Allocator(), &t[0]); | ||
|
||
// defined in online-transducer-greedy-search-nemo-decoder.h | ||
std::vector<OnlineTransducerDecoderResult> results = decoder_-> Decode(std::move(encoder_out), std::move(t[1])); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to pass the decoder model states
of the previous chunk to the decoder_->Decode()
.
By the way, you can create a new method for decoder_
to take an additional argument containing the decoder_states
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
decoder_-> Decode(std::move(encoder_out), std::move(t[1]),
std::move(out_states), &results, ss, n);
I made some changes in online-recognizer-transducer-nemo-impl.h
and the Deocde() method now takes in states of previous chunks.
Ort::Value encoder_out = Transpose12(model_->Allocator(), &t[0]); | ||
|
||
// defined in online-transducer-greedy-search-nemo-decoder.h | ||
decoder_-> Decode(std::move(encoder_out), std::move(t[1]), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the way, you don't need to pass the encoder model states to the greedy search decoder.
Please pass the decoder model states to it instead.
Please read carefully our python streaming transducer greedy search decoding example.
I have posted the example here one more time in case you have not read it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
decoder_-> Decode(std::move(encoder_out), std::move(out_states), &results, ss, n);
is this correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. out_states
is from the encoder.
Remember that out_states
is used only for the internal states of the encoder model. We don't need to use it in the greedy search decoding.
We need to pass the LSTM states from the decoder model to the greedy search decoder.
I suggest you again that you re-read the python decoding example and figure out how the decoding works.
(you need to know how LSTM works)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
working on it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand that when initializing the model in beginning, init_cache_state intializes the initial states of the encoder model. Further when decoding begins, before the first chunk is decoded, decoder model comes into action and initializes the decoder states. After a chunk has been decoded, it emits the next states for the decoder model which becomes the current states for the next chunk.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, so you don't need the encoder states during greedy search decoding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::vector<Ort::Value> decoder_states = model_->GetDecoderInitStates(1);
decoder_-> Decode(std::move(encoder_out), std::move(decoder_states), &results, ss, n);
}
GetNeMoDecoderStates
fetches the intial states of the decoder. But I am not really sure of the implementation here done above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please read
- (1) our python code for nemo streaming transducer greedy search decoding
- (2) our c++ code for nemo non-streaming transducer greedy search decoding
make sure you indeed understand the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hi @csukuangfj ,
thanks again.
I did, and I do understand the code.
i see in offline-transducer-greedy-search-nemo-decoder.cc
how RunDecoder method takes the initial state of the decoder.
model->RunDecoder(std::move(decoder_input_pair.first),
std::move(decoder_input_pair.second),
model->GetDecoderInitStates(1));
and I realize that above, I did it similar way. I am missing where exactly I am doing wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I revised the code
// defined in online-transducer-greedy-search-nemo-decoder.h
std::vector<Ort::Value> decoder_states = model_->GetDecoderInitStates(1);
// updated decoder states are returned
decoder_states = decoder_->Decode(std::move(encoder_out),
std::move(decoder_states),
&results, ss, n);
std::vector<std::vector<Ort::Value>> next_states =
model_->UnStackStates(decoder_states);
Is this correct?
By the way, please make sure the code compiles successfully on your computer. |
Hi @csukuangfj, I am unable to pin-point and solve this compilation error. Could you please take a look. [ 56%] Building CXX object sherpa-onnx/csrc/CMakeFiles/sherpa-onnx-core.dir/online-recognizer-impl.cc.o
In file included from /usr/include/c++/11/memory:76,
from /mnt/local/sangeet/workncode/k2-fsa/fang/sherpa-onnx/sherpa-onnx/csrc/online-recognizer-impl.h:8,
from /mnt/local/sangeet/workncode/k2-fsa/fang/sherpa-onnx/sherpa-onnx/csrc/online-recognizer-impl.cc:5:
/usr/include/c++/11/bits/unique_ptr.h: In instantiation of ‘typename std::_MakeUniq<_Tp>::__single_object std::make_unique(_Args&& ...) [with _Tp = sherpa_onnx::OnlineTransducerModifiedBeamSearchDecoder; _Args = {sherpa_onnx::OnlineTransducerModel*, sherpa_onnx::OnlineLM*, int&, float&, int&, float&, float&}; typename std::_MakeUniq<_Tp>::__single_object = std::unique_ptr<sherpa_onnx::OnlineTransducerModifiedBeamSearchDecoder, std::default_delete<sherpa_onnx::OnlineTransducerModifiedBeamSearchDecoder> >]’:
/mnt/local/sangeet/workncode/k2-fsa/fang/sherpa-onnx/sherpa-onnx/csrc/online-recognizer-transducer-impl.h:109:77: required from here
/usr/include/c++/11/bits/unique_ptr.h:962:30: error: invalid new-expression of abstract class type ‘sherpa_onnx::OnlineTransducerModifiedBeamSearchDecoder’
962 | { return unique_ptr<_Tp>(new _Tp(std::forward<_Args>(__args)...)); }
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /mnt/local/sangeet/workncode/k2-fsa/fang/sherpa-onnx/sherpa-onnx/csrc/online-recognizer-transducer-impl.h:30,
from /mnt/local/sangeet/workncode/k2-fsa/fang/sherpa-onnx/sherpa-onnx/csrc/online-recognizer-impl.cc:9:
/mnt/local/sangeet/workncode/k2-fsa/fang/sherpa-onnx/sherpa-onnx/csrc/online-transducer-modified-beam-search-decoder.h:18:7: note: because the following virtual functions are pure within ‘sherpa_onnx::OnlineTransducerModifiedBeamSearchDecoder’:
18 | class OnlineTransducerModifiedBeamSearchDecoder
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /mnt/local/sangeet/workncode/k2-fsa/fang/sherpa-onnx/sherpa-onnx/csrc/online-stream.h:17,
from /mnt/local/sangeet/workncode/k2-fsa/fang/sherpa-onnx/sherpa-onnx/csrc/online-recognizer.h:22,
from /mnt/local/sangeet/workncode/k2-fsa/fang/sherpa-onnx/sherpa-onnx/csrc/online-recognizer-impl.h:13,
from /mnt/local/sangeet/workncode/k2-fsa/fang/sherpa-onnx/sherpa-onnx/csrc/online-recognizer-impl.cc:5:
/mnt/local/sangeet/workncode/k2-fsa/fang/sherpa-onnx/sherpa-onnx/csrc/online-transducer-decoder.h:85:35: note: ‘virtual std::vector<Ort::Value> sherpa_onnx::OnlineTransducerDecoder::Decode_me(Ort::Value, std::vector<Ort::Value>, std::vector<sherpa_onnx::OnlineTransducerDecoderResult>*, sherpa_onnx::OnlineStream**, int32_t)’
85 | virtual std::vector<Ort::Value> Decode_me(Ort::Value encoder_out,
| ^~~~~~~~~
In file included from /usr/include/c++/11/memory:76,
from /mnt/local/sangeet/workncode/k2-fsa/fang/sherpa-onnx/sherpa-onnx/csrc/online-recognizer-impl.h:8,
from /mnt/local/sangeet/workncode/k2-fsa/fang/sherpa-onnx/sherpa-onnx/csrc/online-recognizer-impl.cc:5:
/usr/include/c++/11/bits/unique_ptr.h: In instantiation of ‘typename std::_MakeUniq<_Tp>::__single_object std::make_unique(_Args&& ...) [with _Tp = sherpa_onnx::OnlineTransducerGreedySearchDecoder; _Args = {sherpa_onnx::OnlineTransducerModel*, int&, float&, float&}; typename std::_MakeUniq<_Tp>::__single_object = std::unique_ptr<sherpa_onnx::OnlineTransducerGreedySearchDecoder, std::default_delete<sherpa_onnx::OnlineTransducerGreedySearchDecoder> >]’:
/mnt/local/sangeet/workncode/k2-fsa/fang/sherpa-onnx/sherpa-onnx/csrc/online-recognizer-transducer-impl.h:115:71: required from here
/usr/include/c++/11/bits/unique_ptr.h:962:30: error: invalid new-expression of abstract class type ‘sherpa_onnx::OnlineTransducerGreedySearchDecoder’
962 | { return unique_ptr<_Tp>(new _Tp(std::forward<_Args>(__args)...)); }
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /mnt/local/sangeet/workncode/k2-fsa/fang/sherpa-onnx/sherpa-onnx/csrc/online-recognizer-transducer-impl.h:28,
from /mnt/local/sangeet/workncode/k2-fsa/fang/sherpa-onnx/sherpa-onnx/csrc/online-recognizer-impl.cc:9:
/mnt/local/sangeet/workncode/k2-fsa/fang/sherpa-onnx/sherpa-onnx/csrc/online-transducer-greedy-search-decoder.h:15:7: note: because the following virtual functions are pure within ‘sherpa_onnx::OnlineTransducerGreedySearchDecoder’:
15 | class OnlineTransducerGreedySearchDecoder : public OnlineTransducerDecoder {
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /mnt/local/sangeet/workncode/k2-fsa/fang/sherpa-onnx/sherpa-onnx/csrc/online-stream.h:17,
from /mnt/local/sangeet/workncode/k2-fsa/fang/sherpa-onnx/sherpa-onnx/csrc/online-recognizer.h:22,
from /mnt/local/sangeet/workncode/k2-fsa/fang/sherpa-onnx/sherpa-onnx/csrc/online-recognizer-impl.h:13,
from /mnt/local/sangeet/workncode/k2-fsa/fang/sherpa-onnx/sherpa-onnx/csrc/online-recognizer-impl.cc:5:
/mnt/local/sangeet/workncode/k2-fsa/fang/sherpa-onnx/sherpa-onnx/csrc/online-transducer-decoder.h:85:35: note: ‘virtual std::vector<Ort::Value> sherpa_onnx::OnlineTransducerDecoder::Decode_me(Ort::Value, std::vector<Ort::Value>, std::vector<sherpa_onnx::OnlineTransducerDecoderResult>*, sherpa_onnx::OnlineStream**, int32_t)’
85 | virtual std::vector<Ort::Value> Decode_me(Ort::Value encoder_out,
| ^~~~~~~~~
In file included from /usr/include/c++/11/memory:76,
from /mnt/local/sangeet/workncode/k2-fsa/fang/sherpa-onnx/sherpa-onnx/csrc/online-recognizer-impl.h:8,
from /mnt/local/sangeet/workncode/k2-fsa/fang/sherpa-onnx/sherpa-onnx/csrc/online-recognizer-impl.cc:5:
/usr/include/c++/11/bits/unique_ptr.h: In instantiation of ‘typename std::_MakeUniq<_Tp>::__single_object std::make_unique(_Args&& ...) [with _Tp = sherpa_onnx::OnlineTransducerGreedySearchNeMoDecoder; _Args = {sherpa_onnx::OnlineTransducerNeMoModel*, float&}; typename std::_MakeUniq<_Tp>::__single_object = std::unique_ptr<sherpa_onnx::OnlineTransducerGreedySearchNeMoDecoder, std::default_delete<sherpa_onnx::OnlineTransducerGreedySearchNeMoDecoder> >]’:
/mnt/local/sangeet/workncode/k2-fsa/fang/sherpa-onnx/sherpa-onnx/csrc/online-recognizer-transducer-nemo-impl.h:53:75: required from here
/usr/include/c++/11/bits/unique_ptr.h:962:30: error: invalid new-expression of abstract class type ‘sherpa_onnx::OnlineTransducerGreedySearchNeMoDecoder’
962 | { return unique_ptr<_Tp>(new _Tp(std::forward<_Args>(__args)...)); }
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /mnt/local/sangeet/workncode/k2-fsa/fang/sherpa-onnx/sherpa-onnx/csrc/online-recognizer-transducer-nemo-impl.h:26,
from /mnt/local/sangeet/workncode/k2-fsa/fang/sherpa-onnx/sherpa-onnx/csrc/online-recognizer-impl.cc:10:
/mnt/local/sangeet/workncode/k2-fsa/fang/sherpa-onnx/sherpa-onnx/csrc/online-transducer-greedy-search-nemo-decoder.h:15:7: note: because the following virtual functions are pure within ‘sherpa_onnx::OnlineTransducerGreedySearchNeMoDecoder’:
15 | class OnlineTransducerGreedySearchNeMoDecoder : public OnlineTransducerDecoder {
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /mnt/local/sangeet/workncode/k2-fsa/fang/sherpa-onnx/sherpa-onnx/csrc/online-stream.h:17,
from /mnt/local/sangeet/workncode/k2-fsa/fang/sherpa-onnx/sherpa-onnx/csrc/online-recognizer.h:22,
from /mnt/local/sangeet/workncode/k2-fsa/fang/sherpa-onnx/sherpa-onnx/csrc/online-recognizer-impl.h:13,
from /mnt/local/sangeet/workncode/k2-fsa/fang/sherpa-onnx/sherpa-onnx/csrc/online-recognizer-impl.cc:5:
/mnt/local/sangeet/workncode/k2-fsa/fang/sherpa-onnx/sherpa-onnx/csrc/online-transducer-decoder.h:82:16: note: ‘virtual void sherpa_onnx::OnlineTransducerDecoder::Decode(Ort::Value, std::vector<sherpa_onnx::OnlineTransducerDecoderResult>*)’
82 | virtual void Decode(Ort::Value encoder_out,
| ^~~~~~
cc1plus: note: unrecognized command-line option ‘-Wno-missing-template-keyword’ may have been intended to silence earlier diagnostics
make[2]: *** [sherpa-onnx/csrc/CMakeFiles/sherpa-onnx-core.dir/build.make:832: sherpa-onnx/csrc/CMakeFiles/sherpa-onnx-core.dir/online-recognizer-impl.cc.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:1552: sherpa-onnx/csrc/CMakeFiles/sherpa-onnx-core.dir/all] Error 2
make: *** [Makefile:156: all] Error 2
|
@@ -82,6 +82,11 @@ class OnlineTransducerDecoder { | |||
virtual void Decode(Ort::Value encoder_out, | |||
std::vector<OnlineTransducerDecoderResult> *result) = 0; | |||
|
|||
virtual std::vector<Ort::Value> Decode_me(Ort::Value encoder_out, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your compilation error is caused by this method.
Please remove it.
Also, your greedy search decoder does not need to inherit from this class.
OnlineRecognizerConfig config_; | ||
SymbolTable symbol_table_; | ||
std::unique_ptr<OnlineTransducerNeMoModel> model_; | ||
std::unique_ptr<OnlineTransducerDecoder> decoder_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can build an instance of OnlineTransducerGreedySearchNeMoDecoder
directly.
OnlineTransducerGreedySearchNeMoDecoder
does not need to inherit from OnlineTransducerNeMoModel
.
std::unique_ptr<OnlineTransducerDecoder> decoder_; | |
std::unique_ptr<OnlineTransducerGreedySearchNeMoDecoder> decoder_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I just fixed it now, after reading that greedy search decoder does not inherit form online transducer decoder. thank you
|
||
// TODO(fangjun): Remember to change these constants if needed | ||
int32_t frame_shift_ms = 10; | ||
int32_t subsampling_factor = 4; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the way, the subsampling factor of the NeMo transducer model is not 4. I think it is 8. Please recheck it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed.
float frame_shift_in_seconds = 0.01; | ||
|
||
// subsampling factor is 4 | ||
int32_t trailing_silence_frames = s->GetResult().num_trailing_blanks * 4; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please replace 4
with the actual subsampling factor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'subsampling_factor': 8,
yes, its indeed. thankyou
Ort::Value encoder_out = Transpose12(model_->Allocator(), &t[0]); | ||
|
||
// defined in online-transducer-greedy-search-nemo-decoder.h | ||
std::vector<Ort::Value> decoder_states = model_->GetDecoderInitStates(1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please get the decoder states from the stream.
Remember that we need to get the decoder states from the previous chunk.
Also, you need to save the decoder states of this chunk for the next chunk.
I hope that you indeed understand our Python decoding code for streaming NeMo stateful transducer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I do understand the complete logic. What I might be doing wrong are the inaccuracies with the C++ implementation.
But I try my best here.
// **STEP-0**
// get intial states of decoder.
std::vector<Ort::Value>& decoder_states = ss[0]->GetNeMoDecoderStates();
// Subsequent decoder states (for each) are updated inside the Decode method.
// This returns the decoder state from the LAST chunk. We probably dont need it. So we can discard it.
decoder_states = decoder_->Decode(std::move(encoder_out),
std::move(decoder_states),
&result, ss, n);
now, here is my logic inside the Decode method.
// **STEP-1**
// decoder_output_pair.second returns the next decoder state
std::pair<Ort::Value, std::vector<Ort::Value>> decoder_output_pair =
model->RunDecoder(std::move(decoder_input_pair.first),
std::move(decoder_states));
// now we start with each chunks in the input sequence.
for (int32_t t = 0; t != num_rows; ++t) {
// rest of the code
if (y != blank_id) {
// rest of the code
// last decoder state becomes the current state for the first chunk
decoder_output_pair =
model->RunDecoder(std::move(decoder_input_pair.first),
std::move(decoder_output_pair.second));
}
// **STEP-2**
// Update the decoder states for the next chunk. So basically for every next chunk, the last decoder state becomes the current state.
decoder_states = std::move(decoder_output_pair.second);
}
@csukuangfj what do you think?
&result, ss, n); | ||
|
||
std::vector<std::vector<Ort::Value>> next_states = | ||
model_->UnStackStates(decoder_states); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For greedy search with batch size 1, I think we don't need to use
stack
or unstack
states for the decoder model.
We will need them once we implement modified_beam_search decoding.
Please forget about things about stack
and unstack
states for the decoder model now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay..doing the necessary changes now.
thanks
I suggest that you copy & paste our C++ greedy search decoding code for non-streaming stateful NeMo transducer Almost everything you need is already there. |
…pdated, compilation success, decoding not working yet
Hi @csukuangfj , |
Sure, will push new commits to your branch this week. |
Hi @csukuangfj , To give you an example.. I have the suspicion that something is wrong inside the greedy search decoder implementation. |
…s good, then incorrect predictions.
@@ -46,6 +46,7 @@ static OnlineRecognizerResult Convert(const OnlineTransducerDecoderResult &src, | |||
r.timestamps.reserve(src.tokens.size()); | |||
|
|||
for (auto i : src.tokens) { | |||
if (i == -1) continue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you describe in which case i is -1?
|
||
// defined in ./online-recognizer-transducer-impl.h | ||
// static may or may not be here? TODDOs | ||
static OnlineRecognizerResult Convert(const OnlineTransducerDecoderResult &src, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
static OnlineRecognizerResult Convert(const OnlineTransducerDecoderResult &src, | |
OnlineRecognizerResult Convert(const OnlineTransducerDecoderResult &src, |
Please remove static
.
OnlineTransducerDecoderResult | ||
OnlineTransducerGreedySearchNeMoDecoder::GetEmptyResult() const { | ||
int32_t context_size = 8; | ||
int32_t blank_id = 0; // always 0 | ||
OnlineTransducerDecoderResult r; | ||
r.tokens.resize(context_size, -1); | ||
r.tokens.back() = blank_id; | ||
|
||
return r; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OnlineTransducerDecoderResult | |
OnlineTransducerGreedySearchNeMoDecoder::GetEmptyResult() const { | |
int32_t context_size = 8; | |
int32_t blank_id = 0; // always 0 | |
OnlineTransducerDecoderResult r; | |
r.tokens.resize(context_size, -1); | |
r.tokens.back() = blank_id; | |
return r; | |
} |
Please remove it and remove any code related to
OnlineTransducerGreedySearchNeMoDecoder::GetEmptyResult
.
// Define and initialize encoder_out_length | ||
Ort::MemoryInfo memory_info = Ort::MemoryInfo::CreateCpu(OrtDeviceAllocator, OrtMemTypeCPU); | ||
|
||
int64_t length_value = 1; | ||
std::vector<int64_t> length_shape = {1}; | ||
|
||
Ort::Value encoder_out_length = Ort::Value::CreateTensor<int64_t>( | ||
memory_info, &length_value, 1, length_shape.data(), length_shape.size() | ||
); | ||
|
||
const int64_t *p_length = encoder_out_length.GetTensorData<int64_t>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Define and initialize encoder_out_length | |
Ort::MemoryInfo memory_info = Ort::MemoryInfo::CreateCpu(OrtDeviceAllocator, OrtMemTypeCPU); | |
int64_t length_value = 1; | |
std::vector<int64_t> length_shape = {1}; | |
Ort::Value encoder_out_length = Ort::Value::CreateTensor<int64_t>( | |
memory_info, &length_value, 1, length_shape.data(), length_shape.size() | |
); | |
const int64_t *p_length = encoder_out_length.GetTensorData<int64_t>(); |
I hope you can understand that for the streaming case, the encoder_out_length is dim1
.
|
||
for (int32_t i = 0; i != batch_size; ++i) { | ||
const float *this_p = p + dim1 * dim2 * i; | ||
int32_t this_len = p_length[i]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
int32_t this_len = p_length[i]; | |
int32_t this_len = dim1; |
You are almost there! I am merging it and take care of the rest. Thank you for your contribution! |
This PR is to integrate Nemo's faster conformer transducer into sherpa-decoder.
More commits to be added.