-
Notifications
You must be signed in to change notification settings - Fork 435
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding TableNet model to extract tabular data #524
Comments
Hi @felixdittrich92, Thanks for bringing this on the table, it is a very interesting and useful feature. To answer this question we can look at the speed of your model, can you benchmark this on your side ? If it is fast enough, we can start by implementing it separately in a new module, and it will run independently from the main pipeline. We can first implement the model in pytorch as you suggested, and provide a pretrained version (.pt) in the config, and tackle the dataset/training script integration later on! Have a nice day ! 😄 |
@charlesmindee |
@charlesmindee
What do you think ? |
Hi @felixdittrich92, Thanks for the benchmark, does the ONNX model which takes 3s to run include the OCR task as well (I understand that it doesn't include tesseract but is there any other module appart from the raw tablenet ?) ? Have a nice day! 😄 |
@charlesmindee I wish you the same |
Hi @felixdittrich92, It is absolutely not a problem if we don't take care of this in the near future, It could be indeed great for us if you could share the dataset/training scripts but don't get too wrapped up in it! Best! |
@charlesmindee |
Topic for |
add a tablenet model to extract tabular data as dataframe from images
(i have a ready to use model(.pt) trained on marmot dataset and need a bit guidiance where to add - prefered as onnx and for self training i can add also in reference /same for dataset but only in Pytorch (Lightning))
After the restructuring / hocr pdfa export
@fg-mindee @charlesmindee
The text was updated successfully, but these errors were encountered: