Add C++ runtime for streaming faster conformer transducer from NeMo. #889

sangeet2020 · 2024-05-17T10:58:18Z

This PR is to integrate Nemo's faster conformer transducer into sherpa-decoder.
More commits to be added.

sangeet2020 · 2024-05-17T12:49:11Z

@csukuangfj would we need StackStates and UnStackStates methods for this?

csukuangfj · 2024-05-17T12:51:45Z

@csukuangfj would we need StackStates and UnStackStates methods for this?

Yes, please refer to

sherpa-onnx/sherpa-onnx/csrc/online-nemo-ctc-model.cc

Lines 121 to 122 in 8af2af8

    
           std::vector<Ort::Value> StackStates( 
        
               std::vector<std::vector<Ort::Value>> states) const {

and

sherpa-onnx/sherpa-onnx/csrc/online-nemo-ctc-model.cc

Lines 156 to 157 in 8af2af8

    
           std::vector<std::vector<Ort::Value>> UnStackStates( 
        
               std::vector<Ort::Value> states) const {

Note that for decoding, you can support only batch_size == 1.

sangeet2020 · 2024-05-19T15:08:28Z

Hi @csukuangfj ,
could you please help me with online-transducer-greedy-search-nemo-decoder.cc. A basic outline should be good to start with.
Thank you

csukuangfj · 2024-05-20T01:40:59Z

Please refer to our Python example for online NeMo transducer greedy search decoding
https://github.com/k2-fsa/sherpa-onnx/blob/master/scripts/nemo/fast-conformer-hybrid-transducer-ctc/test-onnx-transducer.py
For simplicity, please support only batch size == 1 for greedy search
Please refer to the offline NeMo transducer greedy search decoding in C++ at
https://github.com/k2-fsa/sherpa-onnx/blob/master/sherpa-onnx/csrc/offline-transducer-greedy-search-nemo-decoder.h

All you need is to change the offline C++ version to an online version.

NeMo transducer is stateful so you need to follow

sherpa-onnx/sherpa-onnx/csrc/online-stream.h

Lines 91 to 92 in 8af2af8

void SetStates(std::vector<Ort::Value> states);

std::vector<Ort::Value> &GetStates();

to add two methods, .e.g.,

 void SetNeMoDecoderStates(std::vector<Ort::Value> states); 
 std::vector<Ort::Value> &GetNeMoDecoderStates();

You need to follow
https://github.com/k2-fsa/sherpa-onnx/blob/master/sherpa-onnx/csrc/offline-recognizer-transducer-nemo-impl.h
to add

online-recognizer-transducer-nemo-impl.h

sangeet2020 · 2024-05-22T10:18:01Z

@csukuangfj could you review these changes please. Waiting for your feedback.

Also, could you assist me with online-transducer-greedy-search-nemo-decoder.cc. Following offline-transducer-greedy-search-nemo-decoder.cc is not so helpful in this case, as its a streaming mode

Thank You

csukuangfj · 2024-05-22T13:13:54Z

By the way, you need to change

sherpa-onnx/sherpa-onnx/csrc/online-recognizer-impl.cc

Lines 15 to 17 in 81346d1

    
           if (!config.model_config.transducer.encoder.empty()) { 
        
             return std::make_unique<OnlineRecognizerTransducerImpl>(config); 
        
           }

and

sherpa-onnx/sherpa-onnx/csrc/online-recognizer-impl.cc

Lines 36 to 38 in 81346d1

    
           if (!config.model_config.transducer.encoder.empty()) { 
        
             return std::make_unique<OnlineRecognizerTransducerImpl>(mgr, config); 
        
           }

You can use the number of outputs from the decoder model to decide whether to create a normal OnlineRecognizerTransducerImpl or OnlineRecognizerTransducerNeMoImpl.

You can refer to

sherpa-onnx/sherpa-onnx/csrc/online-transducer-model.cc

Line 45 in 81346d1

auto sess = std::make_unique<Ort::Session>(env, model_data, model_data_length,

to create a session for the decoder model
and refer to the following code to get the number of outputs for the decoder model

sherpa-onnx/sherpa-onnx/csrc/onnx-utils.cc

Line 38 in 81346d1

size_t node_count = sess->GetOutputCount();

You only need to support two kinds of transducer models in sherpa-onnx: one for stateless transducer, and one for NeMo stateful transducer.

csukuangfj · 2024-05-22T13:18:30Z

Following offline-transducer-greedy-search-nemo-decoder.cc is not so helpful in this case, as its a streaming mode

We have both a C++ and a Python version for the non-streaming nemo transducer greedy search
and
a Python version for streaming NeMo transducer greed search.

Please read them carefully. The only differences from the non-streaming one:

You need to process chunk-by-chunk, where there are already code examples for stateless streaming transducer and for stateful NeMo CTC model
You need to save the decoder states across chunks

sangeet2020 · 2024-05-23T10:22:41Z

Hi @csukuangfj ,
Thank you for the feedback.
i have made necessary changes as you said above. Can you please review it once.

Thank You

sherpa-onnx/csrc/online-transducer-nemo-model.h

sherpa-onnx/csrc/online-recognizer-impl.cc

sherpa-onnx/csrc/online-recognizer-transducer-nemo-impl.h

csukuangfj · 2024-05-23T10:56:42Z

sherpa-onnx/csrc/online-recognizer-transducer-nemo-impl.h

+
+    auto states = model_->StackStates(states_vec);
+
+    auto [t, ns] = model_->RunEncoder(std::move(x), std::move(states),


Please note that this is NeMo transducer encoder, not the one from icefall.

please refer to

sherpa-onnx/scripts/nemo/fast-conformer-hybrid-transducer-ctc/test-onnx-transducer.py

Lines 169 to 173 in 2db7775

self.encoder.get_inputs()[0].name: x.numpy(),

self.encoder.get_inputs()[1].name: x_lens.numpy(),

self.encoder.get_inputs()[2].name: self.cache_last_channel,

self.encoder.get_inputs()[3].name: self.cache_last_time,

self.encoder.get_inputs()[4].name: self.cache_last_channel_len,

and

sherpa-onnx/sherpa-onnx/csrc/online-nemo-ctc-model.cc

Lines 52 to 97 in 2db7775

std::vector<Ort::Value> Forward(Ort::Value x,

std::vector<Ort::Value> states) {

Ort::Value &cache_last_channel = states[0];

Ort::Value &cache_last_time = states[1];

Ort::Value &cache_last_channel_len = states[2];

int32_t batch_size = x.GetTensorTypeAndShapeInfo().GetShape()[0];

std::array<int64_t, 1> length_shape{batch_size};

Ort::Value length = Ort::Value::CreateTensor<int64_t>(

allocator_, length_shape.data(), length_shape.size());

int64_t *p_length = length.GetTensorMutableData<int64_t>();

std::fill(p_length, p_length + batch_size, ChunkLength());

// (B, T, C) -> (B, C, T)

x = Transpose12(allocator_, &x);

std::array<Ort::Value, 5> inputs = {

std::move(x), View(&length), std::move(cache_last_channel),

std::move(cache_last_time), std::move(cache_last_channel_len)};

auto out =

sess_->Run({}, input_names_ptr_.data(), inputs.data(), inputs.size(),

output_names_ptr_.data(), output_names_ptr_.size());

// out[0]: logit

// out[1] logit_length

// out[2:] states_next

//

// we need to remove out[1]

std::vector<Ort::Value> ans;

ans.reserve(out.size() - 1);

for (int32_t i = 0; i != out.size(); ++i) {

if (i == 1) {

continue;

}

ans.push_back(std::move(out[i]));

}

return ans;

}

You never need to use processed_frames.

I hope that you can understand what we have written.

Remember that the hybrid transducer + CTC shares the same encoder, which means you can borrow what we have done for the streaming NeMo CTC.

Please compare carefully between

sherpa-onnx/scripts/nemo/fast-conformer-hybrid-transducer-ctc/test-onnx-ctc.py

Lines 127 to 132 in 2db7775

self.model.get_inputs()[0].name: x.numpy(),

self.model.get_inputs()[1].name: x_lens.numpy(),

self.model.get_inputs()[2].name: self.cache_last_channel,

self.model.get_inputs()[3].name: self.cache_last_time,

self.model.get_inputs()[4].name: self.cache_last_channel_len,

},

and

sherpa-onnx/scripts/nemo/fast-conformer-hybrid-transducer-ctc/test-onnx-transducer.py

Lines 169 to 173 in 2db7775

self.encoder.get_inputs()[0].name: x.numpy(),

self.encoder.get_inputs()[1].name: x_lens.numpy(),

self.encoder.get_inputs()[2].name: self.cache_last_channel,

self.encoder.get_inputs()[3].name: self.cache_last_time,

self.encoder.get_inputs()[4].name: self.cache_last_channel_len,

Than You. I have borrowed the forward method for RunEncoder method in online-transducer-nemo-model.cc.

I have a question regarding the initialization of the decoder states in online-recognizer-transducer-nemo-impl.h
I define these two methods.

std::unique_ptr<OnlineStream> CreateStream() const override { auto stream = std::make_unique<OnlineStream>(config_.feat_config); stream->SetStates(model_->GetInitStates()); InitOnlineStream(stream.get()); return stream; } void InitOnlineStream(OnlineStream *stream) const { auto r = decoder_->GetEmptyResult(); stream->SetResult(r); stream->SetNeMoDecoderStates(model_->GetDecoderInitStates(batch_size_)); }

Should the line in InitOnlineStream be this?

stream->SetNeMoDecoderStates(decoder_->GetDecoderInitStates(batch_size_));

csukuangfj · 2024-05-23T10:58:03Z

sherpa-onnx/csrc/online-recognizer-transducer-nemo-impl.h

+    Ort::Value encoder_out = Transpose12(model_->Allocator(), &t[0]);
+
+    // defined in online-transducer-greedy-search-nemo-decoder.h
+    std::vector<OnlineTransducerDecoderResult> results = decoder_-> Decode(std::move(encoder_out), std::move(t[1]));


You need to pass the decoder model states of the previous chunk to the decoder_->Decode().

By the way, you can create a new method for decoder_
to take an additional argument containing the decoder_states.

decoder_-> Decode(std::move(encoder_out), std::move(t[1]), std::move(out_states), &results, ss, n);

I made some changes in online-recognizer-transducer-nemo-impl.h and the Deocde() method now takes in states of previous chunks.

sherpa-onnx/csrc/online-transducer-nemo-model.h

sherpa-onnx/csrc/online-recognizer-transducer-nemo-impl.h

csukuangfj · 2024-05-24T03:58:27Z

sherpa-onnx/csrc/online-recognizer-transducer-nemo-impl.h

+    Ort::Value encoder_out = Transpose12(model_->Allocator(), &t[0]);
+
+    // defined in online-transducer-greedy-search-nemo-decoder.h
+    decoder_-> Decode(std::move(encoder_out), std::move(t[1]),


By the way, you don't need to pass the encoder model states to the greedy search decoder.

Please pass the decoder model states to it instead.

Please read carefully our python streaming transducer greedy search decoding example.

https://github.com/k2-fsa/sherpa-onnx/blob/master/scripts/nemo/fast-conformer-hybrid-transducer-ctc/test-onnx-transducer.py#L190

I have posted the example here one more time in case you have not read it.

decoder_-> Decode(std::move(encoder_out), std::move(out_states), &results, ss, n);

is this correct?

No. out_states is from the encoder.

Remember that out_states is used only for the internal states of the encoder model. We don't need to use it in the greedy search decoding.

We need to pass the LSTM states from the decoder model to the greedy search decoder.

I suggest you again that you re-read the python decoding example and figure out how the decoding works.

(you need to know how LSTM works)

working on it.

I understand that when initializing the model in beginning, init_cache_state intializes the initial states of the encoder model. Further when decoding begins, before the first chunk is decoded, decoder model comes into action and initializes the decoder states. After a chunk has been decoded, it emits the next states for the decoder model which becomes the current states for the next chunk.

yes, so you don't need the encoder states during greedy search decoding.

std::vector<Ort::Value> decoder_states = model_->GetDecoderInitStates(1); decoder_-> Decode(std::move(encoder_out), std::move(decoder_states), &results, ss, n); }

GetNeMoDecoderStates fetches the intial states of the decoder. But I am not really sure of the implementation here done above.

please read

(1) our python code for nemo streaming transducer greedy search decoding

(2) our c++ code for nemo non-streaming transducer greedy search decoding

make sure you indeed understand the code.

hi @csukuangfj ,
thanks again.
I did, and I do understand the code.

i see in offline-transducer-greedy-search-nemo-decoder.cc how RunDecoder method takes the initial state of the decoder.

model->RunDecoder(std::move(decoder_input_pair.first), std::move(decoder_input_pair.second), model->GetDecoderInitStates(1));

and I realize that above, I did it similar way. I am missing where exactly I am doing wrong.

I revised the code

// defined in online-transducer-greedy-search-nemo-decoder.h std::vector<Ort::Value> decoder_states = model_->GetDecoderInitStates(1); // updated decoder states are returned decoder_states = decoder_->Decode(std::move(encoder_out), std::move(decoder_states), &results, ss, n); std::vector<std::vector<Ort::Value>> next_states = model_->UnStackStates(decoder_states);

Is this correct?

csukuangfj · 2024-05-24T04:03:17Z

By the way, please make sure the code compiles successfully on your computer.

…r cosmetic changes

sangeet2020 · 2024-05-27T09:41:25Z

Hi @csukuangfj,

I am unable to pin-point and solve this compilation error. Could you please take a look.

[ 56%] Building CXX object sherpa-onnx/csrc/CMakeFiles/sherpa-onnx-core.dir/online-recognizer-impl.cc.o
In file included from /usr/include/c++/11/memory:76,
                 from /mnt/local/sangeet/workncode/k2-fsa/fang/sherpa-onnx/sherpa-onnx/csrc/online-recognizer-impl.h:8,
                 from /mnt/local/sangeet/workncode/k2-fsa/fang/sherpa-onnx/sherpa-onnx/csrc/online-recognizer-impl.cc:5:
/usr/include/c++/11/bits/unique_ptr.h: In instantiation of ‘typename std::_MakeUniq<_Tp>::__single_object std::make_unique(_Args&& ...) [with _Tp = sherpa_onnx::OnlineTransducerModifiedBeamSearchDecoder; _Args = {sherpa_onnx::OnlineTransducerModel*, sherpa_onnx::OnlineLM*, int&, float&, int&, float&, float&}; typename std::_MakeUniq<_Tp>::__single_object = std::unique_ptr<sherpa_onnx::OnlineTransducerModifiedBeamSearchDecoder, std::default_delete<sherpa_onnx::OnlineTransducerModifiedBeamSearchDecoder> >]’:
/mnt/local/sangeet/workncode/k2-fsa/fang/sherpa-onnx/sherpa-onnx/csrc/online-recognizer-transducer-impl.h:109:77:   required from here
/usr/include/c++/11/bits/unique_ptr.h:962:30: error: invalid new-expression of abstract class type ‘sherpa_onnx::OnlineTransducerModifiedBeamSearchDecoder’
  962 |     { return unique_ptr<_Tp>(new _Tp(std::forward<_Args>(__args)...)); }
      |                              ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /mnt/local/sangeet/workncode/k2-fsa/fang/sherpa-onnx/sherpa-onnx/csrc/online-recognizer-transducer-impl.h:30,
                 from /mnt/local/sangeet/workncode/k2-fsa/fang/sherpa-onnx/sherpa-onnx/csrc/online-recognizer-impl.cc:9:
/mnt/local/sangeet/workncode/k2-fsa/fang/sherpa-onnx/sherpa-onnx/csrc/online-transducer-modified-beam-search-decoder.h:18:7: note:   because the following virtual functions are pure within ‘sherpa_onnx::OnlineTransducerModifiedBeamSearchDecoder’:
   18 | class OnlineTransducerModifiedBeamSearchDecoder
      |       ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /mnt/local/sangeet/workncode/k2-fsa/fang/sherpa-onnx/sherpa-onnx/csrc/online-stream.h:17,
                 from /mnt/local/sangeet/workncode/k2-fsa/fang/sherpa-onnx/sherpa-onnx/csrc/online-recognizer.h:22,
                 from /mnt/local/sangeet/workncode/k2-fsa/fang/sherpa-onnx/sherpa-onnx/csrc/online-recognizer-impl.h:13,
                 from /mnt/local/sangeet/workncode/k2-fsa/fang/sherpa-onnx/sherpa-onnx/csrc/online-recognizer-impl.cc:5:
/mnt/local/sangeet/workncode/k2-fsa/fang/sherpa-onnx/sherpa-onnx/csrc/online-transducer-decoder.h:85:35: note:     ‘virtual std::vector<Ort::Value> sherpa_onnx::OnlineTransducerDecoder::Decode_me(Ort::Value, std::vector<Ort::Value>, std::vector<sherpa_onnx::OnlineTransducerDecoderResult>*, sherpa_onnx::OnlineStream**, int32_t)’
   85 |   virtual std::vector<Ort::Value> Decode_me(Ort::Value encoder_out,
      |                                   ^~~~~~~~~
In file included from /usr/include/c++/11/memory:76,
                 from /mnt/local/sangeet/workncode/k2-fsa/fang/sherpa-onnx/sherpa-onnx/csrc/online-recognizer-impl.h:8,
                 from /mnt/local/sangeet/workncode/k2-fsa/fang/sherpa-onnx/sherpa-onnx/csrc/online-recognizer-impl.cc:5:
/usr/include/c++/11/bits/unique_ptr.h: In instantiation of ‘typename std::_MakeUniq<_Tp>::__single_object std::make_unique(_Args&& ...) [with _Tp = sherpa_onnx::OnlineTransducerGreedySearchDecoder; _Args = {sherpa_onnx::OnlineTransducerModel*, int&, float&, float&}; typename std::_MakeUniq<_Tp>::__single_object = std::unique_ptr<sherpa_onnx::OnlineTransducerGreedySearchDecoder, std::default_delete<sherpa_onnx::OnlineTransducerGreedySearchDecoder> >]’:
/mnt/local/sangeet/workncode/k2-fsa/fang/sherpa-onnx/sherpa-onnx/csrc/online-recognizer-transducer-impl.h:115:71:   required from here
/usr/include/c++/11/bits/unique_ptr.h:962:30: error: invalid new-expression of abstract class type ‘sherpa_onnx::OnlineTransducerGreedySearchDecoder’
  962 |     { return unique_ptr<_Tp>(new _Tp(std::forward<_Args>(__args)...)); }
      |                              ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /mnt/local/sangeet/workncode/k2-fsa/fang/sherpa-onnx/sherpa-onnx/csrc/online-recognizer-transducer-impl.h:28,
                 from /mnt/local/sangeet/workncode/k2-fsa/fang/sherpa-onnx/sherpa-onnx/csrc/online-recognizer-impl.cc:9:
/mnt/local/sangeet/workncode/k2-fsa/fang/sherpa-onnx/sherpa-onnx/csrc/online-transducer-greedy-search-decoder.h:15:7: note:   because the following virtual functions are pure within ‘sherpa_onnx::OnlineTransducerGreedySearchDecoder’:
   15 | class OnlineTransducerGreedySearchDecoder : public OnlineTransducerDecoder {
      |       ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /mnt/local/sangeet/workncode/k2-fsa/fang/sherpa-onnx/sherpa-onnx/csrc/online-stream.h:17,
                 from /mnt/local/sangeet/workncode/k2-fsa/fang/sherpa-onnx/sherpa-onnx/csrc/online-recognizer.h:22,
                 from /mnt/local/sangeet/workncode/k2-fsa/fang/sherpa-onnx/sherpa-onnx/csrc/online-recognizer-impl.h:13,
                 from /mnt/local/sangeet/workncode/k2-fsa/fang/sherpa-onnx/sherpa-onnx/csrc/online-recognizer-impl.cc:5:
/mnt/local/sangeet/workncode/k2-fsa/fang/sherpa-onnx/sherpa-onnx/csrc/online-transducer-decoder.h:85:35: note:     ‘virtual std::vector<Ort::Value> sherpa_onnx::OnlineTransducerDecoder::Decode_me(Ort::Value, std::vector<Ort::Value>, std::vector<sherpa_onnx::OnlineTransducerDecoderResult>*, sherpa_onnx::OnlineStream**, int32_t)’
   85 |   virtual std::vector<Ort::Value> Decode_me(Ort::Value encoder_out,
      |                                   ^~~~~~~~~
In file included from /usr/include/c++/11/memory:76,
                 from /mnt/local/sangeet/workncode/k2-fsa/fang/sherpa-onnx/sherpa-onnx/csrc/online-recognizer-impl.h:8,
                 from /mnt/local/sangeet/workncode/k2-fsa/fang/sherpa-onnx/sherpa-onnx/csrc/online-recognizer-impl.cc:5:
/usr/include/c++/11/bits/unique_ptr.h: In instantiation of ‘typename std::_MakeUniq<_Tp>::__single_object std::make_unique(_Args&& ...) [with _Tp = sherpa_onnx::OnlineTransducerGreedySearchNeMoDecoder; _Args = {sherpa_onnx::OnlineTransducerNeMoModel*, float&}; typename std::_MakeUniq<_Tp>::__single_object = std::unique_ptr<sherpa_onnx::OnlineTransducerGreedySearchNeMoDecoder, std::default_delete<sherpa_onnx::OnlineTransducerGreedySearchNeMoDecoder> >]’:
/mnt/local/sangeet/workncode/k2-fsa/fang/sherpa-onnx/sherpa-onnx/csrc/online-recognizer-transducer-nemo-impl.h:53:75:   required from here
/usr/include/c++/11/bits/unique_ptr.h:962:30: error: invalid new-expression of abstract class type ‘sherpa_onnx::OnlineTransducerGreedySearchNeMoDecoder’
  962 |     { return unique_ptr<_Tp>(new _Tp(std::forward<_Args>(__args)...)); }
      |                              ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /mnt/local/sangeet/workncode/k2-fsa/fang/sherpa-onnx/sherpa-onnx/csrc/online-recognizer-transducer-nemo-impl.h:26,
                 from /mnt/local/sangeet/workncode/k2-fsa/fang/sherpa-onnx/sherpa-onnx/csrc/online-recognizer-impl.cc:10:
/mnt/local/sangeet/workncode/k2-fsa/fang/sherpa-onnx/sherpa-onnx/csrc/online-transducer-greedy-search-nemo-decoder.h:15:7: note:   because the following virtual functions are pure within ‘sherpa_onnx::OnlineTransducerGreedySearchNeMoDecoder’:
   15 | class OnlineTransducerGreedySearchNeMoDecoder : public OnlineTransducerDecoder {
      |       ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /mnt/local/sangeet/workncode/k2-fsa/fang/sherpa-onnx/sherpa-onnx/csrc/online-stream.h:17,
                 from /mnt/local/sangeet/workncode/k2-fsa/fang/sherpa-onnx/sherpa-onnx/csrc/online-recognizer.h:22,
                 from /mnt/local/sangeet/workncode/k2-fsa/fang/sherpa-onnx/sherpa-onnx/csrc/online-recognizer-impl.h:13,
                 from /mnt/local/sangeet/workncode/k2-fsa/fang/sherpa-onnx/sherpa-onnx/csrc/online-recognizer-impl.cc:5:
/mnt/local/sangeet/workncode/k2-fsa/fang/sherpa-onnx/sherpa-onnx/csrc/online-transducer-decoder.h:82:16: note:     ‘virtual void sherpa_onnx::OnlineTransducerDecoder::Decode(Ort::Value, std::vector<sherpa_onnx::OnlineTransducerDecoderResult>*)’
   82 |   virtual void Decode(Ort::Value encoder_out,
      |                ^~~~~~
cc1plus: note: unrecognized command-line option ‘-Wno-missing-template-keyword’ may have been intended to silence earlier diagnostics
make[2]: *** [sherpa-onnx/csrc/CMakeFiles/sherpa-onnx-core.dir/build.make:832: sherpa-onnx/csrc/CMakeFiles/sherpa-onnx-core.dir/online-recognizer-impl.cc.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:1552: sherpa-onnx/csrc/CMakeFiles/sherpa-onnx-core.dir/all] Error 2
make: *** [Makefile:156: all] Error 2

csukuangfj · 2024-05-27T10:31:18Z

sherpa-onnx/csrc/online-transducer-decoder.h

@@ -82,6 +82,11 @@ class OnlineTransducerDecoder {
  virtual void Decode(Ort::Value encoder_out,
                      std::vector<OnlineTransducerDecoderResult> *result) = 0;

+  virtual std::vector<Ort::Value> Decode_me(Ort::Value encoder_out,


Your compilation error is caused by this method.

Please remove it.

Also, your greedy search decoder does not need to inherit from this class.

csukuangfj · 2024-05-27T10:32:31Z

sherpa-onnx/csrc/online-recognizer-transducer-nemo-impl.h

+  OnlineRecognizerConfig config_;
+  SymbolTable symbol_table_;
+  std::unique_ptr<OnlineTransducerNeMoModel> model_;
+  std::unique_ptr<OnlineTransducerDecoder> decoder_;


You can build an instance of OnlineTransducerGreedySearchNeMoDecoder directly.

OnlineTransducerGreedySearchNeMoDecoder does not need to inherit from OnlineTransducerNeMoModel.

Suggested change

std::unique_ptr<OnlineTransducerDecoder> decoder_;

std::unique_ptr<OnlineTransducerGreedySearchNeMoDecoder> decoder_;

Yes, I just fixed it now, after reading that greedy search decoder does not inherit form online transducer decoder. thank you

csukuangfj · 2024-05-27T10:34:17Z

sherpa-onnx/csrc/online-recognizer-transducer-nemo-impl.h

+
+    // TODO(fangjun): Remember to change these constants if needed
+    int32_t frame_shift_ms = 10;
+    int32_t subsampling_factor = 4;


By the way, the subsampling factor of the NeMo transducer model is not 4. I think it is 8. Please recheck it.

csukuangfj · 2024-05-27T10:34:41Z

sherpa-onnx/csrc/online-recognizer-transducer-nemo-impl.h

+    float frame_shift_in_seconds = 0.01;
+
+    // subsampling factor is 4
+    int32_t trailing_silence_frames = s->GetResult().num_trailing_blanks * 4;


Please replace 4 with the actual subsampling factor.

'subsampling_factor': 8,
yes, its indeed. thankyou

csukuangfj · 2024-05-27T10:38:43Z

sherpa-onnx/csrc/online-recognizer-transducer-nemo-impl.h

+    Ort::Value encoder_out = Transpose12(model_->Allocator(), &t[0]);
+
+    // defined in online-transducer-greedy-search-nemo-decoder.h
+    std::vector<Ort::Value> decoder_states = model_->GetDecoderInitStates(1);


Please get the decoder states from the stream.

Remember that we need to get the decoder states from the previous chunk.

Also, you need to save the decoder states of this chunk for the next chunk.

I hope that you indeed understand our Python decoding code for streaming NeMo stateful transducer.

Yes, I do understand the complete logic. What I might be doing wrong are the inaccuracies with the C++ implementation.

But I try my best here.

// **STEP-0** // get intial states of decoder. std::vector<Ort::Value>& decoder_states = ss[0]->GetNeMoDecoderStates(); // Subsequent decoder states (for each) are updated inside the Decode method. // This returns the decoder state from the LAST chunk. We probably dont need it. So we can discard it. decoder_states = decoder_->Decode(std::move(encoder_out), std::move(decoder_states), &result, ss, n);

now, here is my logic inside the Decode method.

// **STEP-1** // decoder_output_pair.second returns the next decoder state std::pair<Ort::Value, std::vector<Ort::Value>> decoder_output_pair = model->RunDecoder(std::move(decoder_input_pair.first), std::move(decoder_states)); // now we start with each chunks in the input sequence. for (int32_t t = 0; t != num_rows; ++t) { // rest of the code if (y != blank_id) { // rest of the code // last decoder state becomes the current state for the first chunk decoder_output_pair = model->RunDecoder(std::move(decoder_input_pair.first), std::move(decoder_output_pair.second)); } // **STEP-2** // Update the decoder states for the next chunk. So basically for every next chunk, the last decoder state becomes the current state. decoder_states = std::move(decoder_output_pair.second); }

@csukuangfj what do you think?

csukuangfj · 2024-05-27T10:40:03Z

sherpa-onnx/csrc/online-recognizer-transducer-nemo-impl.h

+                                      &result, ss, n);
+
+    std::vector<std::vector<Ort::Value>> next_states =
+        model_->UnStackStates(decoder_states);


For greedy search with batch size 1, I think we don't need to use
stack or unstack states for the decoder model.

We will need them once we implement modified_beam_search decoding.

Please forget about things about stack and unstack states for the decoder model now.

okay..doing the necessary changes now.
thanks

csukuangfj · 2024-05-27T10:44:42Z

I suggest that you copy & paste our C++ greedy search decoding code for non-streaming stateful NeMo transducer
and then change the code to handle the states of the decoder model.

Almost everything you need is already there.

…pdated, compilation success, decoding not working yet

sangeet2020 · 2024-05-28T15:21:14Z

Hi @csukuangfj ,
I really appreciate all your help throughout .
Can I please request you fix the greedy decoder implementation ..been stuck for quite some now, and cant get any way through this.
thank you

csukuangfj · 2024-05-29T13:31:16Z

Sure, will push new commits to your branch this week.

sangeet2020 · 2024-05-29T13:44:38Z

Hi @csukuangfj ,
I made some minor changes. As of now, there are no errors, decoding works.
but the predictions are correct only upto few decoding streams, then it starts incorrect predictions.

To give you an example..
CORRECT PREDICTION: after early nightfall the yellow lamps...
CURRENT PREDICTION: after the would light here and...

I have the suspicion that something is wrong inside the greedy search decoder implementation.

…s good, then incorrect predictions.

csukuangfj · 2024-05-30T02:24:14Z

sherpa-onnx/csrc/online-recognizer-transducer-impl.h

@@ -46,6 +46,7 @@ static OnlineRecognizerResult Convert(const OnlineTransducerDecoderResult &src,
  r.timestamps.reserve(src.tokens.size());

  for (auto i : src.tokens) {
+    if (i == -1) continue;


Could you describe in which case i is -1?

csukuangfj · 2024-05-30T02:24:34Z

sherpa-onnx/csrc/online-recognizer-transducer-nemo-impl.h

+
+// defined in ./online-recognizer-transducer-impl.h
+// static may or may not be here? TODDOs
+static OnlineRecognizerResult Convert(const OnlineTransducerDecoderResult &src,


Suggested change

static OnlineRecognizerResult Convert(const OnlineTransducerDecoderResult &src,

OnlineRecognizerResult Convert(const OnlineTransducerDecoderResult &src,

Please remove static.

csukuangfj · 2024-05-30T05:47:25Z

sherpa-onnx/csrc/online-transducer-greedy-search-nemo-decoder.cc

+OnlineTransducerDecoderResult
+OnlineTransducerGreedySearchNeMoDecoder::GetEmptyResult() const {
+  int32_t context_size = 8;
+  int32_t blank_id = 0;  // always 0
+  OnlineTransducerDecoderResult r;
+  r.tokens.resize(context_size, -1);
+  r.tokens.back() = blank_id;
+
+  return r;
+}
+


Suggested change

OnlineTransducerDecoderResult

OnlineTransducerGreedySearchNeMoDecoder::GetEmptyResult() const {

int32_t context_size = 8;

int32_t blank_id = 0; // always 0

OnlineTransducerDecoderResult r;

r.tokens.resize(context_size, -1);

r.tokens.back() = blank_id;

return r;

}

Please remove it and remove any code related to
OnlineTransducerGreedySearchNeMoDecoder::GetEmptyResult.

csukuangfj · 2024-05-30T05:50:14Z

sherpa-onnx/csrc/online-transducer-greedy-search-nemo-decoder.cc

+  // Define and initialize encoder_out_length
+  Ort::MemoryInfo memory_info = Ort::MemoryInfo::CreateCpu(OrtDeviceAllocator, OrtMemTypeCPU);
+
+  int64_t length_value = 1;
+  std::vector<int64_t> length_shape = {1};
+
+  Ort::Value encoder_out_length = Ort::Value::CreateTensor<int64_t>(
+      memory_info, &length_value, 1, length_shape.data(), length_shape.size()
+  );
+
+  const int64_t *p_length = encoder_out_length.GetTensorData<int64_t>();


Suggested change

// Define and initialize encoder_out_length

Ort::MemoryInfo memory_info = Ort::MemoryInfo::CreateCpu(OrtDeviceAllocator, OrtMemTypeCPU);

int64_t length_value = 1;

std::vector<int64_t> length_shape = {1};

Ort::Value encoder_out_length = Ort::Value::CreateTensor<int64_t>(

memory_info, &length_value, 1, length_shape.data(), length_shape.size()

);

const int64_t *p_length = encoder_out_length.GetTensorData<int64_t>();

I hope you can understand that for the streaming case, the encoder_out_length is dim1.

csukuangfj · 2024-05-30T05:50:46Z

sherpa-onnx/csrc/online-transducer-greedy-search-nemo-decoder.cc

+
+  for (int32_t i = 0; i != batch_size; ++i) {
+    const float *this_p = p + dim1 * dim2 * i;
+    int32_t this_len = p_length[i];


Suggested change

int32_t this_len = p_length[i];

int32_t this_len = dim1;

csukuangfj · 2024-05-30T05:54:42Z

You are almost there!

I am merging it and take care of the rest.

Thank you for your contribution!

adding online nemo transducer model files

cdca4e6

model file added; necessary methods added

ca4bfe8

cc file outline added

2bb7d7e

add transducer decoding script

44f8d8c

add support for nemo transducer

afb10d4

csukuangfj requested changes May 23, 2024

View reviewed changes

sangeet2020 added 2 commits May 23, 2024 17:33

fixed deocder method to take states of previous chunks

7800cc0

minor changes

d47bf6f

csukuangfj requested changes May 24, 2024

View reviewed changes

sangeet2020 added 3 commits May 24, 2024 12:57

doc updated, model definitions modified

7837a5d

more fixes, bugs...

4c3e741

revert changes

a5c9cc8

sangeet2020 force-pushed the master branch from 40c5e6a to a5c9cc8 Compare May 25, 2024 00:01

sangeet2020 and others added 3 commits May 25, 2024 02:04

Revert files to commit 7837a5d

6608ec3

add missing methods in online-recognizer-transducer-nemo-impl.h, othe…

72a45c2

…r cosmetic changes

Merge branch 'k2-fsa:master' into master

f5f7b27

csukuangfj reviewed May 27, 2024

View reviewed changes

csukuangfj requested changes May 27, 2024

View reviewed changes

remove methods not needed, online-recognizer-transducer-nemo-impl.h u…

e1613b6

…pdated, compilation success, decoding not working yet

Decoding works. but results are not perfect. For first few frames, it…

f9633f6

…s good, then incorrect predictions.

csukuangfj reviewed May 30, 2024

View reviewed changes

csukuangfj merged commit 3f472a9 into k2-fsa:master May 30, 2024
194 of 207 checks passed

csukuangfj mentioned this pull request May 30, 2024

Fix nemo streaming transducer greedy search #944

Merged


		auto states = model_->StackStates(states_vec);

		auto [t, ns] = model_->RunEncoder(std::move(x), std::move(states),

	self.encoder.get_inputs()[0].name: x.numpy(),
	self.encoder.get_inputs()[1].name: x_lens.numpy(),
	self.encoder.get_inputs()[2].name: self.cache_last_channel,
	self.encoder.get_inputs()[3].name: self.cache_last_time,
	self.encoder.get_inputs()[4].name: self.cache_last_channel_len,

	std::vector<Ort::Value> Forward(Ort::Value x,
	std::vector<Ort::Value> states) {
	Ort::Value &cache_last_channel = states[0];
	Ort::Value &cache_last_time = states[1];
	Ort::Value &cache_last_channel_len = states[2];

	int32_t batch_size = x.GetTensorTypeAndShapeInfo().GetShape()[0];

	std::array<int64_t, 1> length_shape{batch_size};

	Ort::Value length = Ort::Value::CreateTensor<int64_t>(
	allocator_, length_shape.data(), length_shape.size());

	int64_t *p_length = length.GetTensorMutableData<int64_t>();

	std::fill(p_length, p_length + batch_size, ChunkLength());

	// (B, T, C) -> (B, C, T)
	x = Transpose12(allocator_, &x);

	std::array<Ort::Value, 5> inputs = {
	std::move(x), View(&length), std::move(cache_last_channel),
	std::move(cache_last_time), std::move(cache_last_channel_len)};

	auto out =
	sess_->Run({}, input_names_ptr_.data(), inputs.data(), inputs.size(),
	output_names_ptr_.data(), output_names_ptr_.size());
	// out[0]: logit
	// out[1] logit_length
	// out[2:] states_next
	//
	// we need to remove out[1]

	std::vector<Ort::Value> ans;
	ans.reserve(out.size() - 1);

	for (int32_t i = 0; i != out.size(); ++i) {
	if (i == 1) {
	continue;
	}

	ans.push_back(std::move(out[i]));
	}

	return ans;
	}

	self.model.get_inputs()[0].name: x.numpy(),
	self.model.get_inputs()[1].name: x_lens.numpy(),
	self.model.get_inputs()[2].name: self.cache_last_channel,
	self.model.get_inputs()[3].name: self.cache_last_time,
	self.model.get_inputs()[4].name: self.cache_last_channel_len,
	},

	std::unique_ptr<OnlineTransducerDecoder> decoder_;
	std::unique_ptr<OnlineTransducerGreedySearchNeMoDecoder> decoder_;

	static OnlineRecognizerResult Convert(const OnlineTransducerDecoderResult &src,
	OnlineRecognizerResult Convert(const OnlineTransducerDecoderResult &src,

Add C++ runtime for *streaming* faster conformer transducer from NeMo. #889

Add C++ runtime for *streaming* faster conformer transducer from NeMo. #889

Conversation

sangeet2020 commented May 17, 2024 • edited Loading

sangeet2020 commented May 17, 2024

csukuangfj commented May 17, 2024

sangeet2020 commented May 19, 2024

csukuangfj commented May 20, 2024

sangeet2020 commented May 22, 2024 • edited Loading

csukuangfj commented May 22, 2024

csukuangfj commented May 22, 2024

sangeet2020 commented May 23, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sangeet2020 May 24, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sangeet2020 May 24, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sangeet2020 May 24, 2024 • edited Loading

Choose a reason for hiding this comment

csukuangfj commented May 24, 2024

sangeet2020 commented May 27, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

csukuangfj commented May 27, 2024

sangeet2020 commented May 28, 2024

csukuangfj commented May 29, 2024

sangeet2020 commented May 29, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

csukuangfj commented May 30, 2024

Add C++ runtime for streaming faster conformer transducer from NeMo. #889

Add C++ runtime for streaming faster conformer transducer from NeMo. #889

sangeet2020 commented May 17, 2024 •

edited

Loading

sangeet2020 commented May 22, 2024 •

edited

Loading

sangeet2020 May 24, 2024 •

edited

Loading

sangeet2020 May 24, 2024 •

edited

Loading

sangeet2020 May 24, 2024 •

edited

Loading

sangeet2020 commented May 29, 2024 •

edited

Loading