Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detection model performs poorly when image is scaled (e.g. 1.5x in both dims) #1535

Closed
ajkdrag opened this issue Mar 28, 2024 · 9 comments
Closed
Labels
awaiting response Waiting for feedback type: bug Something isn't working

Comments

@ajkdrag
Copy link

ajkdrag commented Mar 28, 2024

Bug description

Detection model performs poorly when image is scaled (e.g. 1.5x in both dims)

Code snippet to reproduce the bug

img = cv2.resize(img, None, fx=1.5, fy=1.5, interpolation=cv2.INTER_CUBIC)

If I do something like to my dataset, the detection model performs poorly.
I am using:

"preserve_aspect_ratio": True,
"symmetric_pad": True

Error traceback

No error, but poor bboxes.

Environment

DocTR version: 0.8.1
TensorFlow version: N/A
PyTorch version: N/A (torchvision N/A)
OpenCV version: N/A
OS: Debian GNU/Linux 11 (bullseye)
Python version: 3.8.18
Is CUDA available (TensorFlow): N/A
Is CUDA available (PyTorch): N/A
CUDA runtime version: 11.8.89
GPU models and configuration: GPU 0: Tesla T4
Nvidia driver version: 525.105.17
cuDNN version: Could not collect

Deep Learning backend

is_tf_available: False
is_torch_available: True

@ajkdrag ajkdrag added the type: bug Something isn't working label Mar 28, 2024
@felixdittrich92
Copy link
Contributor

Hey @ajkdrag 👋,

Thanks for the feedback. :)
In general upscaling is never a good idea because you lose a lot of quality.
If you really need to do this you could try it with super resolution (for example: https://huggingface.co/docs/diffusers/api/pipelines/stable_diffusion/latent_upscale).

But yeah with the last runs we have already extended the applied augmentations but there is still some space for additions like zoom (in/out) / quality compression / etc.

Only for interest have you also tried the newly trained fast models from main branch ? :)

@ajkdrag
Copy link
Author

ajkdrag commented Mar 29, 2024

I have yet to try the Fast models. I saw the links were updated. Will give it a shot today.
The problem i am facing (when testing with DB models) is that for a few set of documents whose dims is say 1176 x 762, if I do upscale and then run, the detection output is still fine and the recognition output improves significantly while for other set of images the detection output degrades and fewer boxes are captured.

@ajkdrag
Copy link
Author

ajkdrag commented Mar 29, 2024

Also, in DBNet, the preprocessing gives an image of size 1024x1024, is it possible that for large rectangular docs like bank checks, resizing to square will mess up things?

@felixdittrich92
Copy link
Contributor

felixdittrich92 commented Mar 29, 2024

@ajkdrag with keep_aspect_ratio=True (default) and symmetric_pad=True (default) it shouldn't.
But feel free to play a bit with both parameters.
You can also try to lower the bin_thresh for DB models:

https://mindee.github.io/doctr/using_doctr/using_models.html#advanced-options

@ajkdrag
Copy link
Author

ajkdrag commented Mar 29, 2024

I tried the Fast model and it works pretty good, but i expected the Fast model to be "fast" :D it takes like a sec per image, but papers with code mentioned it to be almost realtime. I am using the main branch with reparameterize.

@felixdittrich92
Copy link
Contributor

felixdittrich92 commented Mar 29, 2024

Hey yeah 😃
That's not really comparable because we work on larger Images (1024x1024) and the model needs to detect/segment much more text instances, additional we have had to modify the postprocessing that it works also well with text rich documents (maybe we could track how much time goes on the postproc but i don't think that it will be too much)

All papers (DB / FAST) was build for scene text detection in the wild and tested on datasets like IC15, etc.

@ajkdrag
Copy link
Author

ajkdrag commented Mar 29, 2024

Got it. Fast works well, but I think I understand the issue now. For images that are "long", i.e. aspect ratio say: 1176 x 256 , the bin-thresh is really tricky to work with. In my use case, (which is scanned bank checks), I get images that are of this aspect ratio, and for few batches, if I set bin_thresh to 0.2 it works well, while for others, I have to go down to 0.08. Could you suggest some tricks/workarounds for such usecases?

@felixdittrich92
Copy link
Contributor

Hey sorry i totally missed your message.
Have you found a way to handle it ?

@felixdittrich92
Copy link
Contributor

Moved to #1604

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting response Waiting for feedback type: bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants