Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

network topology at CNN-RNN interface #353

Open
bertsky opened this issue Jan 21, 2024 · 0 comments
Open

network topology at CNN-RNN interface #353

bertsky opened this issue Jan 21, 2024 · 0 comments
Labels
enhancement New feature or request training Concerns how to achieve good model quality

Comments

@bertsky
Copy link
Collaborator

bertsky commented Jan 21, 2024

Calamari's network specs do not contain or require a reshaping/projection operation before the first LSTM layer, this seems to be added automatically.

However, other traditional CNN-RNN implementations offer an alternative element: an LSTM which takes the height axis as sequence and summarises into a single output vector per width position:

  • Tesseract traditionally uses Lfys<h> layer, e.g. in the default VGSL 1,36,0,1 Ct3,3,16 Mp3,3 Lfys48 Lfx96 Lrx96 Lfx192
  • Kraken offers that as well (but usually rather just reshapes via explicit S1(1x<h>)1,3 element)

Is it perhaps expected that the combination of reshape and CenterNormalizer will do a better job? I wonder whether this has ever been thoroughly investigated. Also, CenterNormalizer might degrade instead of improve horizontal statistics, esp. for handwriting (where some have even argued a need for deslanting), or with grayscale input.

@bertsky bertsky added enhancement New feature or request training Concerns how to achieve good model quality labels Oct 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request training Concerns how to achieve good model quality
Projects
None yet
Development

No branches or pull requests

1 participant