Detection model performs poorly when image is scaled (e.g. 1.5x in both dims) #1535

ajkdrag · 2024-03-28T17:16:50Z

Bug description

Detection model performs poorly when image is scaled (e.g. 1.5x in both dims)

Code snippet to reproduce the bug

img = cv2.resize(img, None, fx=1.5, fy=1.5, interpolation=cv2.INTER_CUBIC)

If I do something like to my dataset, the detection model performs poorly.
I am using:

"preserve_aspect_ratio": True,
"symmetric_pad": True

Error traceback

No error, but poor bboxes.

Environment

DocTR version: 0.8.1
TensorFlow version: N/A
PyTorch version: N/A (torchvision N/A)
OpenCV version: N/A
OS: Debian GNU/Linux 11 (bullseye)
Python version: 3.8.18
Is CUDA available (TensorFlow): N/A
Is CUDA available (PyTorch): N/A
CUDA runtime version: 11.8.89
GPU models and configuration: GPU 0: Tesla T4
Nvidia driver version: 525.105.17
cuDNN version: Could not collect

Deep Learning backend

is_tf_available: False
is_torch_available: True

The text was updated successfully, but these errors were encountered:

felixdittrich92 · 2024-03-28T19:24:54Z

Hey @ajkdrag 👋,

Thanks for the feedback. :)
In general upscaling is never a good idea because you lose a lot of quality.
If you really need to do this you could try it with super resolution (for example: https://huggingface.co/docs/diffusers/api/pipelines/stable_diffusion/latent_upscale).

But yeah with the last runs we have already extended the applied augmentations but there is still some space for additions like zoom (in/out) / quality compression / etc.

Only for interest have you also tried the newly trained fast models from main branch ? :)

ajkdrag · 2024-03-29T04:19:56Z

I have yet to try the Fast models. I saw the links were updated. Will give it a shot today.
The problem i am facing (when testing with DB models) is that for a few set of documents whose dims is say 1176 x 762, if I do upscale and then run, the detection output is still fine and the recognition output improves significantly while for other set of images the detection output degrades and fewer boxes are captured.

ajkdrag · 2024-03-29T04:21:38Z

Also, in DBNet, the preprocessing gives an image of size 1024x1024, is it possible that for large rectangular docs like bank checks, resizing to square will mess up things?

felixdittrich92 · 2024-03-29T10:40:28Z

@ajkdrag with keep_aspect_ratio=True (default) and symmetric_pad=True (default) it shouldn't.
But feel free to play a bit with both parameters.
You can also try to lower the bin_thresh for DB models:

https://mindee.github.io/doctr/using_doctr/using_models.html#advanced-options

ajkdrag · 2024-03-29T16:04:32Z

I tried the Fast model and it works pretty good, but i expected the Fast model to be "fast" :D it takes like a sec per image, but papers with code mentioned it to be almost realtime. I am using the main branch with reparameterize.

felixdittrich92 · 2024-03-29T17:39:47Z

Hey yeah 😃
That's not really comparable because we work on larger Images (1024x1024) and the model needs to detect/segment much more text instances, additional we have had to modify the postprocessing that it works also well with text rich documents (maybe we could track how much time goes on the postproc but i don't think that it will be too much)

All papers (DB / FAST) was build for scene text detection in the wild and tested on datasets like IC15, etc.

ajkdrag · 2024-03-29T18:48:28Z

Got it. Fast works well, but I think I understand the issue now. For images that are "long", i.e. aspect ratio say: 1176 x 256 , the bin-thresh is really tricky to work with. In my use case, (which is scanned bank checks), I get images that are of this aspect ratio, and for few batches, if I set bin_thresh to 0.2 it works well, while for others, I have to go down to 0.08. Could you suggest some tricks/workarounds for such usecases?

felixdittrich92 · 2024-04-16T05:31:17Z

Hey sorry i totally missed your message.
Have you found a way to handle it ?

felixdittrich92 · 2024-05-22T07:50:50Z

Moved to #1604

ajkdrag added the type: bug Something isn't working label Mar 28, 2024

felixdittrich92 added the awaiting response Waiting for feedback label Apr 16, 2024

felixdittrich92 mentioned this issue May 22, 2024

[experimential] [detection] model training iteration with updated augmentation pipeline #1604

Open

9 tasks

felixdittrich92 closed this as completed May 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Detection model performs poorly when image is scaled (e.g. 1.5x in both dims) #1535

Detection model performs poorly when image is scaled (e.g. 1.5x in both dims) #1535

ajkdrag commented Mar 28, 2024

felixdittrich92 commented Mar 28, 2024

ajkdrag commented Mar 29, 2024

ajkdrag commented Mar 29, 2024

felixdittrich92 commented Mar 29, 2024 •

edited

Loading

ajkdrag commented Mar 29, 2024

felixdittrich92 commented Mar 29, 2024 •

edited

Loading

ajkdrag commented Mar 29, 2024

felixdittrich92 commented Apr 16, 2024

felixdittrich92 commented May 22, 2024

Detection model performs poorly when image is scaled (e.g. 1.5x in both dims) #1535

Detection model performs poorly when image is scaled (e.g. 1.5x in both dims) #1535

Comments

ajkdrag commented Mar 28, 2024

Bug description

Code snippet to reproduce the bug

Error traceback

Environment

Deep Learning backend

felixdittrich92 commented Mar 28, 2024

ajkdrag commented Mar 29, 2024

ajkdrag commented Mar 29, 2024

felixdittrich92 commented Mar 29, 2024 • edited Loading

ajkdrag commented Mar 29, 2024

felixdittrich92 commented Mar 29, 2024 • edited Loading

ajkdrag commented Mar 29, 2024

felixdittrich92 commented Apr 16, 2024

felixdittrich92 commented May 22, 2024

felixdittrich92 commented Mar 29, 2024 •

edited

Loading

felixdittrich92 commented Mar 29, 2024 •

edited

Loading