Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about identifying table content #514

Open
jribault opened this issue Oct 17, 2019 · 1 comment
Open

Question about identifying table content #514

jribault opened this issue Oct 17, 2019 · 1 comment

Comments

@jribault
Copy link

Hi,

I'm trying to identify patent (with country, number, date.... I have my own model) and it's working pretty well on text but not very well when patent are listed in table. Do you have any advice concerning table recognition ? Should I try to preprocess the text to remove the table ? Should I modify the template ?

I'm a bit stuck so any advice are welcome :)

@Sunnycheey
Copy link

My idea is:
First detecting coordinates of tables by GROBID, and then using other tools like tabula to process the desired region.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants