Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

predict: ctc_decoder parameters are never applied #365

Open
bertsky opened this issue Oct 2, 2024 · 0 comments
Open

predict: ctc_decoder parameters are never applied #365

bertsky opened this issue Oct 2, 2024 · 0 comments

Comments

@bertsky
Copy link
Collaborator

bertsky commented Oct 2, 2024

prepare_ctc_decoder_params(args.ctc_decoder)

This merely post-processes some of the command-line choices. It never actually instantiates a CTCDecoderProcessor, or replaces the default one in the postprocessor pipeline.

How is this supposed to have worked in the first place?

Also, the parameterization of this postprocessor begs more questions. Assuming some tests have been done with the dictionary feature (word beam search):

  • Why is non_word_chars not automatically configured to all the punctuation characters in the entire charset during training? (All the public models I see merely contain the default characters, i.e. only ASCII punctuation.)
  • Why is word_separator only whitespace by default – shouldn't that allow more cases like hyphen (esp. in German)?
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant