Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add C++ runtime for *streaming* faster conformer transducer from NeMo. #889

Merged
merged 15 commits into from
May 30, 2024

Conversation

sangeet2020
Copy link
Contributor

@sangeet2020 sangeet2020 commented May 17, 2024

This PR is to integrate Nemo's faster conformer transducer into sherpa-decoder.
More commits to be added.

@sangeet2020
Copy link
Contributor Author

@csukuangfj would we need StackStates and UnStackStates methods for this?

@csukuangfj
Copy link
Collaborator

@csukuangfj would we need StackStates and UnStackStates methods for this?

Yes, please refer to

std::vector<Ort::Value> StackStates(
std::vector<std::vector<Ort::Value>> states) const {

and

std::vector<std::vector<Ort::Value>> UnStackStates(
std::vector<Ort::Value> states) const {

Note that for decoding, you can support only batch_size == 1.

@sangeet2020
Copy link
Contributor Author

Hi @csukuangfj ,
could you please help me with online-transducer-greedy-search-nemo-decoder.cc. A basic outline should be good to start with.
Thank you

@csukuangfj
Copy link
Collaborator

  1. Please refer to our Python example for online NeMo transducer greedy search decoding
    https://github.com/k2-fsa/sherpa-onnx/blob/master/scripts/nemo/fast-conformer-hybrid-transducer-ctc/test-onnx-transducer.py

  2. For simplicity, please support only batch size == 1 for greedy search

  3. Please refer to the offline NeMo transducer greedy search decoding in C++ at
    https://github.com/k2-fsa/sherpa-onnx/blob/master/sherpa-onnx/csrc/offline-transducer-greedy-search-nemo-decoder.h

All you need is to change the offline C++ version to an online version.

  1. NeMo transducer is stateful so you need to follow
    void SetStates(std::vector<Ort::Value> states);
    std::vector<Ort::Value> &GetStates();

to add two methods, .e.g.,

 void SetNeMoDecoderStates(std::vector<Ort::Value> states); 
 std::vector<Ort::Value> &GetNeMoDecoderStates(); 
  1. You need to follow
    https://github.com/k2-fsa/sherpa-onnx/blob/master/sherpa-onnx/csrc/offline-recognizer-transducer-nemo-impl.h
    to add
online-recognizer-transducer-nemo-impl.h

@sangeet2020
Copy link
Contributor Author

sangeet2020 commented May 22, 2024

@csukuangfj could you review these changes please. Waiting for your feedback.

Also, could you assist me with online-transducer-greedy-search-nemo-decoder.cc. Following offline-transducer-greedy-search-nemo-decoder.cc is not so helpful in this case, as its a streaming mode

Thank You

@csukuangfj
Copy link
Collaborator

By the way, you need to change

if (!config.model_config.transducer.encoder.empty()) {
return std::make_unique<OnlineRecognizerTransducerImpl>(config);
}

and

if (!config.model_config.transducer.encoder.empty()) {
return std::make_unique<OnlineRecognizerTransducerImpl>(mgr, config);
}

You can use the number of outputs from the decoder model to decide whether to create a normal OnlineRecognizerTransducerImpl or OnlineRecognizerTransducerNeMoImpl.

You can refer to

auto sess = std::make_unique<Ort::Session>(env, model_data, model_data_length,

to create a session for the decoder model
and refer to the following code to get the number of outputs for the decoder model
size_t node_count = sess->GetOutputCount();

You only need to support two kinds of transducer models in sherpa-onnx: one for stateless transducer, and one for NeMo stateful transducer.

@csukuangfj
Copy link
Collaborator

Following offline-transducer-greedy-search-nemo-decoder.cc is not so helpful in this case, as its a streaming mode

We have both a C++ and a Python version for the non-streaming nemo transducer greedy search
and
a Python version for streaming NeMo transducer greed search.

Please read them carefully. The only differences from the non-streaming one:

  • You need to process chunk-by-chunk, where there are already code examples for stateless streaming transducer and for stateful NeMo CTC model
  • You need to save the decoder states across chunks

@sangeet2020
Copy link
Contributor Author

Hi @csukuangfj ,
Thank you for the feedback.
i have made necessary changes as you said above. Can you please review it once.

Thank You

sherpa-onnx/csrc/online-recognizer-impl.cc Outdated Show resolved Hide resolved
sherpa-onnx/csrc/online-recognizer-transducer-nemo-impl.h Outdated Show resolved Hide resolved

auto states = model_->StackStates(states_vec);

auto [t, ns] = model_->RunEncoder(std::move(x), std::move(states),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please note that this is NeMo transducer encoder, not the one from icefall.

please refer to

self.encoder.get_inputs()[0].name: x.numpy(),
self.encoder.get_inputs()[1].name: x_lens.numpy(),
self.encoder.get_inputs()[2].name: self.cache_last_channel,
self.encoder.get_inputs()[3].name: self.cache_last_time,
self.encoder.get_inputs()[4].name: self.cache_last_channel_len,

and

std::vector<Ort::Value> Forward(Ort::Value x,
std::vector<Ort::Value> states) {
Ort::Value &cache_last_channel = states[0];
Ort::Value &cache_last_time = states[1];
Ort::Value &cache_last_channel_len = states[2];
int32_t batch_size = x.GetTensorTypeAndShapeInfo().GetShape()[0];
std::array<int64_t, 1> length_shape{batch_size};
Ort::Value length = Ort::Value::CreateTensor<int64_t>(
allocator_, length_shape.data(), length_shape.size());
int64_t *p_length = length.GetTensorMutableData<int64_t>();
std::fill(p_length, p_length + batch_size, ChunkLength());
// (B, T, C) -> (B, C, T)
x = Transpose12(allocator_, &x);
std::array<Ort::Value, 5> inputs = {
std::move(x), View(&length), std::move(cache_last_channel),
std::move(cache_last_time), std::move(cache_last_channel_len)};
auto out =
sess_->Run({}, input_names_ptr_.data(), inputs.data(), inputs.size(),
output_names_ptr_.data(), output_names_ptr_.size());
// out[0]: logit
// out[1] logit_length
// out[2:] states_next
//
// we need to remove out[1]
std::vector<Ort::Value> ans;
ans.reserve(out.size() - 1);
for (int32_t i = 0; i != out.size(); ++i) {
if (i == 1) {
continue;
}
ans.push_back(std::move(out[i]));
}
return ans;
}

You never need to use processed_frames.

I hope that you can understand what we have written.

Remember that the hybrid transducer + CTC shares the same encoder, which means you can borrow what we have done for the streaming NeMo CTC.

Please compare carefully between

self.model.get_inputs()[0].name: x.numpy(),
self.model.get_inputs()[1].name: x_lens.numpy(),
self.model.get_inputs()[2].name: self.cache_last_channel,
self.model.get_inputs()[3].name: self.cache_last_time,
self.model.get_inputs()[4].name: self.cache_last_channel_len,
},

and

self.encoder.get_inputs()[0].name: x.numpy(),
self.encoder.get_inputs()[1].name: x_lens.numpy(),
self.encoder.get_inputs()[2].name: self.cache_last_channel,
self.encoder.get_inputs()[3].name: self.cache_last_time,
self.encoder.get_inputs()[4].name: self.cache_last_channel_len,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Than You. I have borrowed the forward method for RunEncoder method in online-transducer-nemo-model.cc.

I have a question regarding the initialization of the decoder states in online-recognizer-transducer-nemo-impl.h
I define these two methods.

  std::unique_ptr<OnlineStream> CreateStream() const override {
    auto stream = std::make_unique<OnlineStream>(config_.feat_config);
    stream->SetStates(model_->GetInitStates());
    InitOnlineStream(stream.get());
    return stream;
  }

  void InitOnlineStream(OnlineStream *stream) const {
    auto r = decoder_->GetEmptyResult();

    stream->SetResult(r);
    stream->SetNeMoDecoderStates(model_->GetDecoderInitStates(batch_size_));
  }

Should the line in InitOnlineStream be this?

stream->SetNeMoDecoderStates(decoder_->GetDecoderInitStates(batch_size_));

Ort::Value encoder_out = Transpose12(model_->Allocator(), &t[0]);

// defined in online-transducer-greedy-search-nemo-decoder.h
std::vector<OnlineTransducerDecoderResult> results = decoder_-> Decode(std::move(encoder_out), std::move(t[1]));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to pass the decoder model states of the previous chunk to the decoder_->Decode().

By the way, you can create a new method for decoder_
to take an additional argument containing the decoder_states.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

    decoder_-> Decode(std::move(encoder_out), std::move(t[1]),
                      std::move(out_states), &results, ss, n);

I made some changes in online-recognizer-transducer-nemo-impl.h and the Deocde() method now takes in states of previous chunks.

sherpa-onnx/csrc/online-transducer-nemo-model.h Outdated Show resolved Hide resolved
sherpa-onnx/csrc/online-transducer-nemo-model.h Outdated Show resolved Hide resolved
sherpa-onnx/csrc/online-transducer-nemo-model.h Outdated Show resolved Hide resolved
sherpa-onnx/csrc/online-transducer-nemo-model.h Outdated Show resolved Hide resolved
sherpa-onnx/csrc/online-transducer-nemo-model.h Outdated Show resolved Hide resolved
sherpa-onnx/csrc/online-transducer-nemo-model.h Outdated Show resolved Hide resolved
sherpa-onnx/csrc/online-recognizer-transducer-nemo-impl.h Outdated Show resolved Hide resolved
sherpa-onnx/csrc/online-recognizer-transducer-nemo-impl.h Outdated Show resolved Hide resolved
sherpa-onnx/csrc/online-recognizer-transducer-nemo-impl.h Outdated Show resolved Hide resolved
sherpa-onnx/csrc/online-recognizer-transducer-nemo-impl.h Outdated Show resolved Hide resolved
Ort::Value encoder_out = Transpose12(model_->Allocator(), &t[0]);

// defined in online-transducer-greedy-search-nemo-decoder.h
decoder_-> Decode(std::move(encoder_out), std::move(t[1]),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, you don't need to pass the encoder model states to the greedy search decoder.

Please pass the decoder model states to it instead.

Please read carefully our python streaming transducer greedy search decoding example.

https://github.com/k2-fsa/sherpa-onnx/blob/master/scripts/nemo/fast-conformer-hybrid-transducer-ctc/test-onnx-transducer.py#L190

I have posted the example here one more time in case you have not read it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

decoder_-> Decode(std::move(encoder_out), std::move(out_states), &results, ss, n);

is this correct?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. out_states is from the encoder.

Remember that out_states is used only for the internal states of the encoder model. We don't need to use it in the greedy search decoding.

We need to pass the LSTM states from the decoder model to the greedy search decoder.

I suggest you again that you re-read the python decoding example and figure out how the decoding works.

(you need to know how LSTM works)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

working on it.

Copy link
Contributor Author

@sangeet2020 sangeet2020 May 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that when initializing the model in beginning, init_cache_state intializes the initial states of the encoder model. Further when decoding begins, before the first chunk is decoded, decoder model comes into action and initializes the decoder states. After a chunk has been decoded, it emits the next states for the decoder model which becomes the current states for the next chunk.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, so you don't need the encoder states during greedy search decoding.

Copy link
Contributor Author

@sangeet2020 sangeet2020 May 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

    std::vector<Ort::Value> decoder_states = model_->GetDecoderInitStates(1);
    decoder_-> Decode(std::move(encoder_out), std::move(decoder_states), &results, ss, n);
    }

GetNeMoDecoderStates fetches the intial states of the decoder. But I am not really sure of the implementation here done above.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please read

  • (1) our python code for nemo streaming transducer greedy search decoding
  • (2) our c++ code for nemo non-streaming transducer greedy search decoding

make sure you indeed understand the code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi @csukuangfj ,
thanks again.
I did, and I do understand the code.

i see in offline-transducer-greedy-search-nemo-decoder.cc how RunDecoder method takes the initial state of the decoder.

      model->RunDecoder(std::move(decoder_input_pair.first),
                        std::move(decoder_input_pair.second),
                        model->GetDecoderInitStates(1));

and I realize that above, I did it similar way. I am missing where exactly I am doing wrong.

Copy link
Contributor Author

@sangeet2020 sangeet2020 May 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I revised the code

    // defined in online-transducer-greedy-search-nemo-decoder.h
    std::vector<Ort::Value> decoder_states = model_->GetDecoderInitStates(1);
    // updated decoder states are returned
    decoder_states = decoder_->Decode(std::move(encoder_out), 
                                      std::move(decoder_states), 
                                      &results, ss, n);

    std::vector<std::vector<Ort::Value>> next_states =
        model_->UnStackStates(decoder_states);

Is this correct?

@csukuangfj
Copy link
Collaborator

By the way, please make sure the code compiles successfully on your computer.

@sangeet2020
Copy link
Contributor Author

Hi @csukuangfj,

I am unable to pin-point and solve this compilation error. Could you please take a look.

[ 56%] Building CXX object sherpa-onnx/csrc/CMakeFiles/sherpa-onnx-core.dir/online-recognizer-impl.cc.o
In file included from /usr/include/c++/11/memory:76,
                 from /mnt/local/sangeet/workncode/k2-fsa/fang/sherpa-onnx/sherpa-onnx/csrc/online-recognizer-impl.h:8,
                 from /mnt/local/sangeet/workncode/k2-fsa/fang/sherpa-onnx/sherpa-onnx/csrc/online-recognizer-impl.cc:5:
/usr/include/c++/11/bits/unique_ptr.h: In instantiation of ‘typename std::_MakeUniq<_Tp>::__single_object std::make_unique(_Args&& ...) [with _Tp = sherpa_onnx::OnlineTransducerModifiedBeamSearchDecoder; _Args = {sherpa_onnx::OnlineTransducerModel*, sherpa_onnx::OnlineLM*, int&, float&, int&, float&, float&}; typename std::_MakeUniq<_Tp>::__single_object = std::unique_ptr<sherpa_onnx::OnlineTransducerModifiedBeamSearchDecoder, std::default_delete<sherpa_onnx::OnlineTransducerModifiedBeamSearchDecoder> >]’:
/mnt/local/sangeet/workncode/k2-fsa/fang/sherpa-onnx/sherpa-onnx/csrc/online-recognizer-transducer-impl.h:109:77:   required from here
/usr/include/c++/11/bits/unique_ptr.h:962:30: error: invalid new-expression of abstract class type ‘sherpa_onnx::OnlineTransducerModifiedBeamSearchDecoder’
  962 |     { return unique_ptr<_Tp>(new _Tp(std::forward<_Args>(__args)...)); }
      |                              ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /mnt/local/sangeet/workncode/k2-fsa/fang/sherpa-onnx/sherpa-onnx/csrc/online-recognizer-transducer-impl.h:30,
                 from /mnt/local/sangeet/workncode/k2-fsa/fang/sherpa-onnx/sherpa-onnx/csrc/online-recognizer-impl.cc:9:
/mnt/local/sangeet/workncode/k2-fsa/fang/sherpa-onnx/sherpa-onnx/csrc/online-transducer-modified-beam-search-decoder.h:18:7: note:   because the following virtual functions are pure within ‘sherpa_onnx::OnlineTransducerModifiedBeamSearchDecoder’:
   18 | class OnlineTransducerModifiedBeamSearchDecoder
      |       ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /mnt/local/sangeet/workncode/k2-fsa/fang/sherpa-onnx/sherpa-onnx/csrc/online-stream.h:17,
                 from /mnt/local/sangeet/workncode/k2-fsa/fang/sherpa-onnx/sherpa-onnx/csrc/online-recognizer.h:22,
                 from /mnt/local/sangeet/workncode/k2-fsa/fang/sherpa-onnx/sherpa-onnx/csrc/online-recognizer-impl.h:13,
                 from /mnt/local/sangeet/workncode/k2-fsa/fang/sherpa-onnx/sherpa-onnx/csrc/online-recognizer-impl.cc:5:
/mnt/local/sangeet/workncode/k2-fsa/fang/sherpa-onnx/sherpa-onnx/csrc/online-transducer-decoder.h:85:35: note:     ‘virtual std::vector<Ort::Value> sherpa_onnx::OnlineTransducerDecoder::Decode_me(Ort::Value, std::vector<Ort::Value>, std::vector<sherpa_onnx::OnlineTransducerDecoderResult>*, sherpa_onnx::OnlineStream**, int32_t)’
   85 |   virtual std::vector<Ort::Value> Decode_me(Ort::Value encoder_out,
      |                                   ^~~~~~~~~
In file included from /usr/include/c++/11/memory:76,
                 from /mnt/local/sangeet/workncode/k2-fsa/fang/sherpa-onnx/sherpa-onnx/csrc/online-recognizer-impl.h:8,
                 from /mnt/local/sangeet/workncode/k2-fsa/fang/sherpa-onnx/sherpa-onnx/csrc/online-recognizer-impl.cc:5:
/usr/include/c++/11/bits/unique_ptr.h: In instantiation of ‘typename std::_MakeUniq<_Tp>::__single_object std::make_unique(_Args&& ...) [with _Tp = sherpa_onnx::OnlineTransducerGreedySearchDecoder; _Args = {sherpa_onnx::OnlineTransducerModel*, int&, float&, float&}; typename std::_MakeUniq<_Tp>::__single_object = std::unique_ptr<sherpa_onnx::OnlineTransducerGreedySearchDecoder, std::default_delete<sherpa_onnx::OnlineTransducerGreedySearchDecoder> >]’:
/mnt/local/sangeet/workncode/k2-fsa/fang/sherpa-onnx/sherpa-onnx/csrc/online-recognizer-transducer-impl.h:115:71:   required from here
/usr/include/c++/11/bits/unique_ptr.h:962:30: error: invalid new-expression of abstract class type ‘sherpa_onnx::OnlineTransducerGreedySearchDecoder’
  962 |     { return unique_ptr<_Tp>(new _Tp(std::forward<_Args>(__args)...)); }
      |                              ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /mnt/local/sangeet/workncode/k2-fsa/fang/sherpa-onnx/sherpa-onnx/csrc/online-recognizer-transducer-impl.h:28,
                 from /mnt/local/sangeet/workncode/k2-fsa/fang/sherpa-onnx/sherpa-onnx/csrc/online-recognizer-impl.cc:9:
/mnt/local/sangeet/workncode/k2-fsa/fang/sherpa-onnx/sherpa-onnx/csrc/online-transducer-greedy-search-decoder.h:15:7: note:   because the following virtual functions are pure within ‘sherpa_onnx::OnlineTransducerGreedySearchDecoder’:
   15 | class OnlineTransducerGreedySearchDecoder : public OnlineTransducerDecoder {
      |       ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /mnt/local/sangeet/workncode/k2-fsa/fang/sherpa-onnx/sherpa-onnx/csrc/online-stream.h:17,
                 from /mnt/local/sangeet/workncode/k2-fsa/fang/sherpa-onnx/sherpa-onnx/csrc/online-recognizer.h:22,
                 from /mnt/local/sangeet/workncode/k2-fsa/fang/sherpa-onnx/sherpa-onnx/csrc/online-recognizer-impl.h:13,
                 from /mnt/local/sangeet/workncode/k2-fsa/fang/sherpa-onnx/sherpa-onnx/csrc/online-recognizer-impl.cc:5:
/mnt/local/sangeet/workncode/k2-fsa/fang/sherpa-onnx/sherpa-onnx/csrc/online-transducer-decoder.h:85:35: note:     ‘virtual std::vector<Ort::Value> sherpa_onnx::OnlineTransducerDecoder::Decode_me(Ort::Value, std::vector<Ort::Value>, std::vector<sherpa_onnx::OnlineTransducerDecoderResult>*, sherpa_onnx::OnlineStream**, int32_t)’
   85 |   virtual std::vector<Ort::Value> Decode_me(Ort::Value encoder_out,
      |                                   ^~~~~~~~~
In file included from /usr/include/c++/11/memory:76,
                 from /mnt/local/sangeet/workncode/k2-fsa/fang/sherpa-onnx/sherpa-onnx/csrc/online-recognizer-impl.h:8,
                 from /mnt/local/sangeet/workncode/k2-fsa/fang/sherpa-onnx/sherpa-onnx/csrc/online-recognizer-impl.cc:5:
/usr/include/c++/11/bits/unique_ptr.h: In instantiation of ‘typename std::_MakeUniq<_Tp>::__single_object std::make_unique(_Args&& ...) [with _Tp = sherpa_onnx::OnlineTransducerGreedySearchNeMoDecoder; _Args = {sherpa_onnx::OnlineTransducerNeMoModel*, float&}; typename std::_MakeUniq<_Tp>::__single_object = std::unique_ptr<sherpa_onnx::OnlineTransducerGreedySearchNeMoDecoder, std::default_delete<sherpa_onnx::OnlineTransducerGreedySearchNeMoDecoder> >]’:
/mnt/local/sangeet/workncode/k2-fsa/fang/sherpa-onnx/sherpa-onnx/csrc/online-recognizer-transducer-nemo-impl.h:53:75:   required from here
/usr/include/c++/11/bits/unique_ptr.h:962:30: error: invalid new-expression of abstract class type ‘sherpa_onnx::OnlineTransducerGreedySearchNeMoDecoder’
  962 |     { return unique_ptr<_Tp>(new _Tp(std::forward<_Args>(__args)...)); }
      |                              ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /mnt/local/sangeet/workncode/k2-fsa/fang/sherpa-onnx/sherpa-onnx/csrc/online-recognizer-transducer-nemo-impl.h:26,
                 from /mnt/local/sangeet/workncode/k2-fsa/fang/sherpa-onnx/sherpa-onnx/csrc/online-recognizer-impl.cc:10:
/mnt/local/sangeet/workncode/k2-fsa/fang/sherpa-onnx/sherpa-onnx/csrc/online-transducer-greedy-search-nemo-decoder.h:15:7: note:   because the following virtual functions are pure within ‘sherpa_onnx::OnlineTransducerGreedySearchNeMoDecoder’:
   15 | class OnlineTransducerGreedySearchNeMoDecoder : public OnlineTransducerDecoder {
      |       ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /mnt/local/sangeet/workncode/k2-fsa/fang/sherpa-onnx/sherpa-onnx/csrc/online-stream.h:17,
                 from /mnt/local/sangeet/workncode/k2-fsa/fang/sherpa-onnx/sherpa-onnx/csrc/online-recognizer.h:22,
                 from /mnt/local/sangeet/workncode/k2-fsa/fang/sherpa-onnx/sherpa-onnx/csrc/online-recognizer-impl.h:13,
                 from /mnt/local/sangeet/workncode/k2-fsa/fang/sherpa-onnx/sherpa-onnx/csrc/online-recognizer-impl.cc:5:
/mnt/local/sangeet/workncode/k2-fsa/fang/sherpa-onnx/sherpa-onnx/csrc/online-transducer-decoder.h:82:16: note:     ‘virtual void sherpa_onnx::OnlineTransducerDecoder::Decode(Ort::Value, std::vector<sherpa_onnx::OnlineTransducerDecoderResult>*)’
   82 |   virtual void Decode(Ort::Value encoder_out,
      |                ^~~~~~
cc1plus: note: unrecognized command-line option ‘-Wno-missing-template-keyword’ may have been intended to silence earlier diagnostics
make[2]: *** [sherpa-onnx/csrc/CMakeFiles/sherpa-onnx-core.dir/build.make:832: sherpa-onnx/csrc/CMakeFiles/sherpa-onnx-core.dir/online-recognizer-impl.cc.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:1552: sherpa-onnx/csrc/CMakeFiles/sherpa-onnx-core.dir/all] Error 2
make: *** [Makefile:156: all] Error 2

@@ -82,6 +82,11 @@ class OnlineTransducerDecoder {
virtual void Decode(Ort::Value encoder_out,
std::vector<OnlineTransducerDecoderResult> *result) = 0;

virtual std::vector<Ort::Value> Decode_me(Ort::Value encoder_out,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your compilation error is caused by this method.

Please remove it.

Also, your greedy search decoder does not need to inherit from this class.

OnlineRecognizerConfig config_;
SymbolTable symbol_table_;
std::unique_ptr<OnlineTransducerNeMoModel> model_;
std::unique_ptr<OnlineTransducerDecoder> decoder_;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can build an instance of OnlineTransducerGreedySearchNeMoDecoder directly.

OnlineTransducerGreedySearchNeMoDecoder does not need to inherit from OnlineTransducerNeMoModel.

Suggested change
std::unique_ptr<OnlineTransducerDecoder> decoder_;
std::unique_ptr<OnlineTransducerGreedySearchNeMoDecoder> decoder_;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I just fixed it now, after reading that greedy search decoder does not inherit form online transducer decoder. thank you


// TODO(fangjun): Remember to change these constants if needed
int32_t frame_shift_ms = 10;
int32_t subsampling_factor = 4;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, the subsampling factor of the NeMo transducer model is not 4. I think it is 8. Please recheck it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

float frame_shift_in_seconds = 0.01;

// subsampling factor is 4
int32_t trailing_silence_frames = s->GetResult().num_trailing_blanks * 4;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please replace 4 with the actual subsampling factor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'subsampling_factor': 8,
yes, its indeed. thankyou

Ort::Value encoder_out = Transpose12(model_->Allocator(), &t[0]);

// defined in online-transducer-greedy-search-nemo-decoder.h
std::vector<Ort::Value> decoder_states = model_->GetDecoderInitStates(1);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please get the decoder states from the stream.

Remember that we need to get the decoder states from the previous chunk.

Also, you need to save the decoder states of this chunk for the next chunk.

I hope that you indeed understand our Python decoding code for streaming NeMo stateful transducer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I do understand the complete logic. What I might be doing wrong are the inaccuracies with the C++ implementation.

But I try my best here.

    // **STEP-0**
    // get intial states of decoder.
    std::vector<Ort::Value>& decoder_states = ss[0]->GetNeMoDecoderStates();
    
    // Subsequent decoder states (for each) are updated inside the Decode method.
    // This returns the decoder state from the LAST chunk. We probably dont need it. So we can discard it.
    decoder_states = decoder_->Decode(std::move(encoder_out), 
                                      std::move(decoder_states), 
                                      &result, ss, n);

now, here is my logic inside the Decode method.

    // **STEP-1**
   // decoder_output_pair.second returns the next decoder state
  std::pair<Ort::Value, std::vector<Ort::Value>> decoder_output_pair =
      model->RunDecoder(std::move(decoder_input_pair.first),
                         std::move(decoder_states));
    
    // now we start with each chunks in the input sequence.
  for (int32_t t = 0; t != num_rows; ++t) {
     // rest of the code
     
         if (y != blank_id) {
      // rest of the code

      // last decoder state becomes the current state for the first chunk
      decoder_output_pair =
          model->RunDecoder(std::move(decoder_input_pair.first),
                             std::move(decoder_output_pair.second));
    }
    
    // **STEP-2**
   // Update the decoder states for the next chunk. So basically for every next chunk, the last decoder state becomes the current state.
    decoder_states = std::move(decoder_output_pair.second);
  
  }
    

@csukuangfj what do you think?

&result, ss, n);

std::vector<std::vector<Ort::Value>> next_states =
model_->UnStackStates(decoder_states);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For greedy search with batch size 1, I think we don't need to use
stack or unstack states for the decoder model.

We will need them once we implement modified_beam_search decoding.

Please forget about things about stack and unstack states for the decoder model now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay..doing the necessary changes now.
thanks

@csukuangfj
Copy link
Collaborator

I suggest that you copy & paste our C++ greedy search decoding code for non-streaming stateful NeMo transducer
and then change the code to handle the states of the decoder model.

Almost everything you need is already there.

…pdated, compilation success, decoding not working yet
@sangeet2020
Copy link
Contributor Author

Hi @csukuangfj ,
I really appreciate all your help throughout .
Can I please request you fix the greedy decoder implementation ..been stuck for quite some now, and cant get any way through this.
thank you

@csukuangfj
Copy link
Collaborator

Sure, will push new commits to your branch this week.

@sangeet2020
Copy link
Contributor Author

sangeet2020 commented May 29, 2024

Hi @csukuangfj ,
I made some minor changes. As of now, there are no errors, decoding works.
but the predictions are correct only upto few decoding streams, then it starts incorrect predictions.

To give you an example..
CORRECT PREDICTION: after early nightfall the yellow lamps...
CURRENT PREDICTION: after the would light here and...

I have the suspicion that something is wrong inside the greedy search decoder implementation.

@@ -46,6 +46,7 @@ static OnlineRecognizerResult Convert(const OnlineTransducerDecoderResult &src,
r.timestamps.reserve(src.tokens.size());

for (auto i : src.tokens) {
if (i == -1) continue;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you describe in which case i is -1?


// defined in ./online-recognizer-transducer-impl.h
// static may or may not be here? TODDOs
static OnlineRecognizerResult Convert(const OnlineTransducerDecoderResult &src,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
static OnlineRecognizerResult Convert(const OnlineTransducerDecoderResult &src,
OnlineRecognizerResult Convert(const OnlineTransducerDecoderResult &src,

Please remove static.

Comment on lines +40 to +50
OnlineTransducerDecoderResult
OnlineTransducerGreedySearchNeMoDecoder::GetEmptyResult() const {
int32_t context_size = 8;
int32_t blank_id = 0; // always 0
OnlineTransducerDecoderResult r;
r.tokens.resize(context_size, -1);
r.tokens.back() = blank_id;

return r;
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
OnlineTransducerDecoderResult
OnlineTransducerGreedySearchNeMoDecoder::GetEmptyResult() const {
int32_t context_size = 8;
int32_t blank_id = 0; // always 0
OnlineTransducerDecoderResult r;
r.tokens.resize(context_size, -1);
r.tokens.back() = blank_id;
return r;
}

Please remove it and remove any code related to
OnlineTransducerGreedySearchNeMoDecoder::GetEmptyResult.

Comment on lines +169 to +179
// Define and initialize encoder_out_length
Ort::MemoryInfo memory_info = Ort::MemoryInfo::CreateCpu(OrtDeviceAllocator, OrtMemTypeCPU);

int64_t length_value = 1;
std::vector<int64_t> length_shape = {1};

Ort::Value encoder_out_length = Ort::Value::CreateTensor<int64_t>(
memory_info, &length_value, 1, length_shape.data(), length_shape.size()
);

const int64_t *p_length = encoder_out_length.GetTensorData<int64_t>();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Define and initialize encoder_out_length
Ort::MemoryInfo memory_info = Ort::MemoryInfo::CreateCpu(OrtDeviceAllocator, OrtMemTypeCPU);
int64_t length_value = 1;
std::vector<int64_t> length_shape = {1};
Ort::Value encoder_out_length = Ort::Value::CreateTensor<int64_t>(
memory_info, &length_value, 1, length_shape.data(), length_shape.size()
);
const int64_t *p_length = encoder_out_length.GetTensorData<int64_t>();

I hope you can understand that for the streaming case, the encoder_out_length is dim1.


for (int32_t i = 0; i != batch_size; ++i) {
const float *this_p = p + dim1 * dim2 * i;
int32_t this_len = p_length[i];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
int32_t this_len = p_length[i];
int32_t this_len = dim1;

@csukuangfj
Copy link
Collaborator

You are almost there!

I am merging it and take care of the rest.

Thank you for your contribution!

@csukuangfj csukuangfj merged commit 3f472a9 into k2-fsa:master May 30, 2024
194 of 207 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants