-
Notifications
You must be signed in to change notification settings - Fork 209
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use these options in order to ameliorate the results on Arabic datasets #279
Comments
The only thing that is worth worrying about when training on Arabic data is the bidi_direction. Most of the time, "auto" works just fine as it identifies strings containing only RTL or neutral characters and correctly sets the writing direction. There may be, however, some strings that can't be identified easily as RTL or LTR. In those cases, the algorithm defaults to LTR if you don't set bidi_direction=RTL: >>> from bidi.algorithm import get_display
>>> list(get_display("سلام"))
['م', 'ا', 'ل', 'س']
>>> list(get_display("(1) 125", base_dir="L"))
['(', '1', ')', ' ', '1', '2', '5']
>>> list(get_display("(1) 125", base_dir="R"))
['1', '2', '5', ' ', '(', '1', ')']
>>> list(get_display("(1) 125"))
['(', '1', ')', ' ', '1', '2', '5'] If you know that all of your text is RTL, the safe option is to set bidi_direction=RTL. |
@andbue |
They don't have any effect on the network architecture (if that is what you mean). The only thing coming close to that might be the line_height parameter that changes the height in pixels the line images are scaled to (defaults to 48). The rest only affect the image preprocessing (center_normalizer parameters, image normalization, inverting, transposing, padding). |
Thanks a lot for your reply! |
No, the numbers are only there to put the preprocessing functions in the correct order. If you set "pad" to 32 it will add 32xline_height (instead of default 16xline_height, if I'm not mistaken) empty pixels to each side of the text line, nothing more. |
Thanks for your reply! |
During preprocessing there is a (customizable) list of preprocessors that are applied to the line images. Each of the preprocessors has an ID (the number in the command line arguments). Some have additional parameters (e.g. the |
Thanks a lot for this eplanation! |
The modes parameter, as @ChWick already stated, activates the processor for a specific scenario, e.g. for training or for prediction. You would only need to change it if your input data were already preprocessed (e.g. your images are already normalized, inverted and transposed or your text is already transformed into display order). It is a bit harder to see which of the default preprocessors correspons to which function. To find out, try something like >>> from calamari_ocr.ocr.dataset.data import Data
>>> params = Data.default_params()
>>> list(enumerate(params.pre_proc.processors))
[
(0, CenterNormalizerProcessorParams(modes={<PipelineMode.TRAINING: 'training'>, <PipelineMode.PREDICTION: 'prediction'>, <PipelineMode.EVALUATION: 'evaluation'>}, extra_params=(4, 1.0, 0.3), line_height=-1)),
(1, FinalPreparationProcessorParams(modes={<PipelineMode.TRAINING: 'training'>, <PipelineMode.PREDICTION: 'prediction'>, <PipelineMode.EVALUATION: 'evaluation'>}, normalize=True, invert=True, transpose=True, pad=16, pad_value=0)),
(2, BidiTextProcessorParams(modes={<PipelineMode.TRAINING: 'training'>, <PipelineMode.TARGETS: 'targets'>, <PipelineMode.PREDICTION: 'prediction'>, <PipelineMode.EVALUATION: 'evaluation'>}, bidi_direction=<BidiDirection.AUTO: 'auto'>)),
(3, StripTextProcessorParams(modes={<PipelineMode.TRAINING: 'training'>, <PipelineMode.TARGETS: 'targets'>, <PipelineMode.PREDICTION: 'prediction'>, <PipelineMode.EVALUATION: 'evaluation'>})),
(4, TextNormalizerProcessorParams(modes={<PipelineMode.TRAINING: 'training'>, <PipelineMode.TARGETS: 'targets'>, <PipelineMode.PREDICTION: 'prediction'>, <PipelineMode.EVALUATION: 'evaluation'>}, unicode_normalization='NFC')),
(5, TextRegularizerProcessorParams(modes={<PipelineMode.TRAINING: 'training'>, <PipelineMode.TARGETS: 'targets'>, <PipelineMode.PREDICTION: 'prediction'>, <PipelineMode.EVALUATION: 'evaluation'>}, replacement_groups=[<ReplacementGroup.Spaces: 'spaces'>], replacements=None)),
(6, AugmentationProcessorParams(modes={<PipelineMode.TRAINING: 'training'>}, augmenter=DefaultDataAugmenterParams(), n_augmentations=0)),
(7, PrepareSampleProcessorParams(modes={<PipelineMode.TRAINING: 'training'>, <PipelineMode.PREDICTION: 'prediction'>, <PipelineMode.EVALUATION: 'evaluation'>}))
] The code for each of these preprocessors can be found in /imageprocessors (0, 1, 6, 7) or /textprocessors (2-5) at https://github.com/Calamari-OCR/calamari/tree/master/calamari_ocr/ocr/dataset. |
Hello!
@ChWick
Does these options can effect results when training Calamari on an Arabic datasets and it is adviced to use them as options in the training command or not:
--data.pre_proc.processors.0.modes
--data.pre_proc.processors.1.modes
--data.pre_proc.processors.1.extra_params
--data.pre_proc.processors.1.line_height
--data.pre_proc.processors.2.modes
--data.pre_proc.processors.2.normalize DATA.PRE_PROC.PROCESSORS.2.NORMALIZE
--data.pre_proc.processors.2.invert DATA.PRE_PROC.PROCESSORS.2.INVERT
--data.pre_proc.processors.2.transpose DATA.PRE_PROC.PROCESSORS.2.TRANSPOSE
--data.pre_proc.processors.2.pad DATA.PRE_PROC.PROCESSORS.2.PAD
--data.pre_proc.processors.2.pad_value DATA.PRE_PROC.PROCESSORS.2.PAD_VALUE
--data.pre_proc.processors.3.modes
--data.pre_proc.processors.3.bidi_direction {LTR,RTL,AUTO,L,R,auto}
--data.pre_proc.processors.4.modes
--data.pre_proc.processors.5.modes
--data.pre_proc.processors.5.unicode_normalization DATA.PRE_PROC.PROCESSORS.5.UNICODE_NORMALIZATION
--data.pre_proc.processors.6.modes
--data.pre_proc.processors.6.replacement_groups
--data.pre_proc.processors.7.modes
Thanks for your reply
The text was updated successfully, but these errors were encountered: