Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use these options in order to ameliorate the results on Arabic datasets #279

Closed
Tailor2019 opened this issue Sep 26, 2021 · 9 comments
Closed

Comments

@Tailor2019
Copy link

Tailor2019 commented Sep 26, 2021

Hello!
@ChWick
Does these options can effect results when training Calamari on an Arabic datasets and it is adviced to use them as options in the training command or not:
--data.pre_proc.processors.0.modes
--data.pre_proc.processors.1.modes
--data.pre_proc.processors.1.extra_params
--data.pre_proc.processors.1.line_height
--data.pre_proc.processors.2.modes
--data.pre_proc.processors.2.normalize DATA.PRE_PROC.PROCESSORS.2.NORMALIZE
--data.pre_proc.processors.2.invert DATA.PRE_PROC.PROCESSORS.2.INVERT
--data.pre_proc.processors.2.transpose DATA.PRE_PROC.PROCESSORS.2.TRANSPOSE
--data.pre_proc.processors.2.pad DATA.PRE_PROC.PROCESSORS.2.PAD
--data.pre_proc.processors.2.pad_value DATA.PRE_PROC.PROCESSORS.2.PAD_VALUE
--data.pre_proc.processors.3.modes
--data.pre_proc.processors.3.bidi_direction {LTR,RTL,AUTO,L,R,auto}
--data.pre_proc.processors.4.modes
--data.pre_proc.processors.5.modes
--data.pre_proc.processors.5.unicode_normalization DATA.PRE_PROC.PROCESSORS.5.UNICODE_NORMALIZATION
--data.pre_proc.processors.6.modes
--data.pre_proc.processors.6.replacement_groups
--data.pre_proc.processors.7.modes
Thanks for your reply

@andbue
Copy link
Member

andbue commented Sep 27, 2021

The only thing that is worth worrying about when training on Arabic data is the bidi_direction. Most of the time, "auto" works just fine as it identifies strings containing only RTL or neutral characters and correctly sets the writing direction. There may be, however, some strings that can't be identified easily as RTL or LTR. In those cases, the algorithm defaults to LTR if you don't set bidi_direction=RTL:

>>> from bidi.algorithm import get_display
>>> list(get_display("سلام"))
['م', 'ا', 'ل', 'س']
>>> list(get_display("(1) 125", base_dir="L"))
['(', '1', ')', ' ', '1', '2', '5']
>>> list(get_display("(1) 125", base_dir="R"))
['1', '2', '5', ' ', '(', '1', ')']
>>> list(get_display("(1) 125"))
['(', '1', ')', ' ', '1', '2', '5']

If you know that all of your text is RTL, the safe option is to set bidi_direction=RTL.

@Tailor2019
Copy link
Author

@andbue
thanks for your reply
Please do these options have an effect on the architecture if we change their default value
--data.pre_proc.processors.0.modes
--data.pre_proc.processors.1.modes
--data.pre_proc.processors.1.extra_params
--data.pre_proc.processors.1.line_height
--data.pre_proc.processors.2.modes
--data.pre_proc.processors.2.normalize DATA.PRE_PROC.PROCESSORS.2.NORMALIZE
--data.pre_proc.processors.2.invert DATA.PRE_PROC.PROCESSORS.2.INVERT
--data.pre_proc.processors.2.transpose DATA.PRE_PROC.PROCESSORS.2.TRANSPOSE
--data.pre_proc.processors.2.pad DATA.PRE_PROC.PROCESSORS.2.PAD
--data.pre_proc.processors.2.pad_value DATA.PRE_PROC.PROCESSORS.2.PAD_VALUE
--data.pre_proc.processors.3.modes
Thanks in advance!

@andbue
Copy link
Member

andbue commented Sep 28, 2021

They don't have any effect on the network architecture (if that is what you mean). The only thing coming close to that might be the line_height parameter that changes the height in pixels the line images are scaled to (defaults to 48). The rest only affect the image preprocessing (center_normalizer parameters, image normalization, inverting, transposing, padding).

@Tailor2019
Copy link
Author

Tailor2019 commented Sep 28, 2021

Thanks a lot for your reply!
@andbue
for the numbers 0;1;2;3 does it refers to the preprocessing of the image in the layer 0(conv)...?
for example when I change this value --data.pre_proc.processors.2.pad to 32 it will have a macroscopic effect on my system?

@andbue
Copy link
Member

andbue commented Sep 29, 2021

No, the numbers are only there to put the preprocessing functions in the correct order. If you set "pad" to 32 it will add 32xline_height (instead of default 16xline_height, if I'm not mistaken) empty pixels to each side of the text line, nothing more.

@Tailor2019
Copy link
Author

Thanks for your reply!
@andbue
for example for the the preprocessing functions in order 2 there is 3 options:
--data.pre_proc.processors.2.modes
--data.pre_proc.processors.2.normalize DATA.PRE_PROC.PROCESSORS.2.NORMALIZE
--data.pre_proc.processors.2.invert DATA.PRE_PROC.PROCESSORS.2.INVERT
Why we don't use different numbers for these preprocessing functions and we use only the number "2" for these functions?
What is the role of this option ""--data.pre_proc.processors.2.modes ""?
thanks in advance!

@ChWick
Copy link
Member

ChWick commented Sep 30, 2021

During preprocessing there is a (customizable) list of preprocessors that are applied to the line images. Each of the preprocessors has an ID (the number in the command line arguments). Some have additional parameters (e.g. the NormalizeProcessor that can invert/transpose/pad... images).
The modes parameter is valid for every processor and states when to apply it (Training, Evaluation, Prediction). By default, the processor is applied always, but there are processors, for example DataAugmentation that should only be applied during training.
The defaults are already sane, so you should not/never change these settings unless you know what you are doing.

@Tailor2019
Copy link
Author

Thanks a lot for this eplanation!
@ChWick
as in the documentation we can guess that there is 8 preprocessors but the role of the modes parameter does to activate the adequate preprocessor ?
What is its effect?
for example for this preprocessor "--data.pre_proc.processors.5.modes" and "--data.pre_proc.processors.4.modes"
what is the contribtion of the modes parameter in this 2 preprocessors?
Thanks in advance!

@andbue
Copy link
Member

andbue commented Sep 30, 2021

The modes parameter, as @ChWick already stated, activates the processor for a specific scenario, e.g. for training or for prediction. You would only need to change it if your input data were already preprocessed (e.g. your images are already normalized, inverted and transposed or your text is already transformed into display order).

It is a bit harder to see which of the default preprocessors correspons to which function. To find out, try something like

>>> from calamari_ocr.ocr.dataset.data import Data
>>> params = Data.default_params()
>>> list(enumerate(params.pre_proc.processors))
[
(0, CenterNormalizerProcessorParams(modes={<PipelineMode.TRAINING: 'training'>, <PipelineMode.PREDICTION: 'prediction'>, <PipelineMode.EVALUATION: 'evaluation'>}, extra_params=(4, 1.0, 0.3), line_height=-1)), 
(1, FinalPreparationProcessorParams(modes={<PipelineMode.TRAINING: 'training'>, <PipelineMode.PREDICTION: 'prediction'>, <PipelineMode.EVALUATION: 'evaluation'>}, normalize=True, invert=True, transpose=True, pad=16, pad_value=0)), 
(2, BidiTextProcessorParams(modes={<PipelineMode.TRAINING: 'training'>, <PipelineMode.TARGETS: 'targets'>, <PipelineMode.PREDICTION: 'prediction'>, <PipelineMode.EVALUATION: 'evaluation'>}, bidi_direction=<BidiDirection.AUTO: 'auto'>)), 
(3, StripTextProcessorParams(modes={<PipelineMode.TRAINING: 'training'>, <PipelineMode.TARGETS: 'targets'>, <PipelineMode.PREDICTION: 'prediction'>, <PipelineMode.EVALUATION: 'evaluation'>})), 
(4, TextNormalizerProcessorParams(modes={<PipelineMode.TRAINING: 'training'>, <PipelineMode.TARGETS: 'targets'>, <PipelineMode.PREDICTION: 'prediction'>, <PipelineMode.EVALUATION: 'evaluation'>}, unicode_normalization='NFC')), 
(5, TextRegularizerProcessorParams(modes={<PipelineMode.TRAINING: 'training'>, <PipelineMode.TARGETS: 'targets'>, <PipelineMode.PREDICTION: 'prediction'>, <PipelineMode.EVALUATION: 'evaluation'>}, replacement_groups=[<ReplacementGroup.Spaces: 'spaces'>], replacements=None)), 
(6, AugmentationProcessorParams(modes={<PipelineMode.TRAINING: 'training'>}, augmenter=DefaultDataAugmenterParams(), n_augmentations=0)), 
(7, PrepareSampleProcessorParams(modes={<PipelineMode.TRAINING: 'training'>, <PipelineMode.PREDICTION: 'prediction'>, <PipelineMode.EVALUATION: 'evaluation'>}))
]

The code for each of these preprocessors can be found in /imageprocessors (0, 1, 6, 7) or /textprocessors (2-5) at https://github.com/Calamari-OCR/calamari/tree/master/calamari_ocr/ocr/dataset.
@ChWick : maybe it would be helpful if paiargparse could somehow include the name of the preprocessor classes in the help strings?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants