Figures and tables in the back / annex section ignored #737

de-code · 2021-04-14T10:11:52Z

This is related to #698

Some documents have main figures and supplementary figures.
If in those cases, the segmentation model labels the supplementary figures as annex,
then the content is passed separately to the fulltext model.
If the fulltext then correctly labels it as figure, then the figures from the annex are not included in the output.

The text was updated successfully, but these errors were encountered:

de-code · 2021-04-14T10:12:44Z

This seems to be due to FullTextParser processing figures and tables from the body only.

lfoppiano · 2021-04-16T01:50:34Z

@de-code do you have a Pdf for testing?

de-code · 2021-04-16T07:45:21Z

One example is DOI 10.1101/306803 or 306803v1 (from the bioRxiv 10k validation dataset).
It has "Extended Data Figure 1" etc.
I haven't tested whether they are going to get extracted well with the default models.

lfoppiano · 2024-12-19T12:46:01Z

Hi @de-code (long time 😄), I stumbled upon this issue now, indeed it seems the tables in the back are not processed.
According to my understanding of the guidelines, they should be output in the annex, as they are annotated for the segmentation model.
The PR #963 is slowly gathering priority 😸

This was referenced Apr 14, 2021

added annex figures, tables, equations elifesciences/grobid#36

Merged

added annex figures, tables, equations #738

Open

lfoppiano added bug From Hemiptera and especially its suborder Heteroptera enhancement labels Dec 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Figures and tables in the back / annex section ignored #737

Figures and tables in the back / annex section ignored #737

de-code commented Apr 14, 2021

de-code commented Apr 14, 2021

lfoppiano commented Apr 16, 2021

de-code commented Apr 16, 2021 •

edited

Loading

lfoppiano commented Dec 19, 2024

Figures and tables in the back / annex section ignored #737

Figures and tables in the back / annex section ignored #737

Comments

de-code commented Apr 14, 2021

de-code commented Apr 14, 2021

lfoppiano commented Apr 16, 2021

de-code commented Apr 16, 2021 • edited Loading

lfoppiano commented Dec 19, 2024

de-code commented Apr 16, 2021 •

edited

Loading