From 8484a950923a396df61613b7740a22a5e4ade31b Mon Sep 17 00:00:00 2001 From: Luca Foppiano Date: Wed, 27 Nov 2024 16:00:37 +0000 Subject: [PATCH] more documentation --- doc/Grobid-specialized-processes.md | 48 +++- .../article_light/benchmaking-bioxiv.md | 72 ++++++ .../article_light/benchmaking-elife.md | 72 ++++++ .../flavors/article_light/benchmaking-plos.md | 72 ++++++ .../flavors/article_light/benchmaking-pmc.md | 72 ++++++ .../article_light_ref/benchmaking-bioxiv.md | 0 .../article_light_ref/benchmaking-elife.md | 202 ++++++++++++++++ .../article_light_ref/benchmaking-plos.md | 218 ++++++++++++++++++ .../article_light_ref/benchmaking-pmc.md | 202 ++++++++++++++++ 9 files changed, 952 insertions(+), 6 deletions(-) create mode 100644 doc/benchmarks/flavors/article_light/benchmaking-bioxiv.md create mode 100644 doc/benchmarks/flavors/article_light/benchmaking-elife.md create mode 100644 doc/benchmarks/flavors/article_light/benchmaking-plos.md create mode 100644 doc/benchmarks/flavors/article_light/benchmaking-pmc.md create mode 100644 doc/benchmarks/flavors/article_light_ref/benchmaking-bioxiv.md create mode 100644 doc/benchmarks/flavors/article_light_ref/benchmaking-elife.md create mode 100644 doc/benchmarks/flavors/article_light_ref/benchmaking-plos.md create mode 100644 doc/benchmarks/flavors/article_light_ref/benchmaking-pmc.md diff --git a/doc/Grobid-specialized-processes.md b/doc/Grobid-specialized-processes.md index 475735ff83..700c13f316 100644 --- a/doc/Grobid-specialized-processes.md +++ b/doc/Grobid-specialized-processes.md @@ -25,15 +25,31 @@ Following, an updated view of the cascade architecture: At the moment, the flavored processes are available as follows: -| Identifier | Flavored models | Description | Advantages and Limitations | -|-----------------------|---------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| `article/light` | `segmentation`, `header` | Simple process that extracts only title, authors, publication date and doi from the header, and put everything else in the body | Simple model that can work with any document and bring the advantage of pdfalto processing which solves many issue with text ordering and column recognition. Limitation are that all noise not being part of the article, such as references, page numbers, headnotes, and footnotes are also included in the body. | -| `article/light-ref` | `segmentation`, `header` | Simple process that extracts only title, authors, publication date and doi from the header, the references, and put everything else in the body | Variation of the `article/light` that includes the recognision of references. More versatile than `article/light` in the realm of variation of scientific articles, such as corrections, erratums, letters which may contain references. | +| Name | Identifier | Flavored models | Description | Advantages and Limitations | +|-----------------------------------------------|-----------------------|---------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| Article lightweight structure | `article/light` | `segmentation`, `header` | Simple process that extracts only title, authors, publication date and doi from the header, and put everything else in the body | Simple model that can work with any document and bring the advantage of pdfalto processing which solves many issue with text ordering and column recognition. Limitation are that all noise not being part of the article, such as references, page numbers, headnotes, and footnotes are also included in the body. | +| Article lightweight structure with references | `article/light-ref` | `segmentation`, `header` | Simple process that extracts only title, authors, publication date and doi from the header, the references, and put everything else in the body | Variation of the `article/light` that includes the recognision of references. More versatile than `article/light` in the realm of variation of scientific articles, such as corrections, erratums, letters which may contain references. | ## Benchmarking -The evaluation of the flavors is performed in the same way as the standard processing for scientific articles. -However, the evaluation is performed on a reduced set of fields: +The evaluation of the flavors is performed in the same way as the standard processing for scientific articles: + +- **BidLSTM_ChainCRF_FEATURES** as sequence labeling for the header model + +- **BidLSTM_ChainCRF_FEATURES** as sequence labeling for the reference-segmenter model + +- **BidLSTM-CRF-FEATURES** as sequence labeling for the citation model + +- **BidLSTM_CRF_FEATURES** as sequence labeling for the affiliation-address model + +- **CRF Wapiti** as sequence labelling engine for all other models. + +Header extractions are consolidated by default with [biblio-glutton](https://github.com/kermitt2/biblio-glutton) service (the results with CrossRef REST API as consolidation service should be similar but much slower). + +The evaluation, which is usually create grobid files suffixing `fulltext.tei.xml`, will suffix also the flavor, for example `article/light` will be suffixed as `article_light.tei.xml`. +In this way is possible to run evaluation for multiple flavor without loosing the Grobid processed files. + +The evaluation is performed on a reduced set of fields: | Flavor | Header fields | Fulltext fields | Citation fields | |---------------------|--------------------------------------|-----------------|----------------------------------| @@ -41,3 +57,23 @@ However, the evaluation is performed on a reduced set of fields: | `article/light-ref` | `title`, `first author`, `authors` | N/A | Same as the standard processing* | (*) for this flavor the citation model is included to avoid regressions, as the citation parsing is performed using the standard citation model + +The benchmarks results are listed here with links to the full reports. + +### Article lightweight structure + +| Corpus | Header (avg. micro F1 Ratcliff/Obershelp@0.95) | Full report | +|-----------------|------------------------------------------------|----------------------------------------------------------------------------------| +| Bioxiv | 89.4 | [benchmaking-bioxiv.md](benchmarks/flavors/article_light/benchmaking-bioxiv.md) | +| PMC_sample_1943 | 95.71 | [benchmaking-pmc.md](benchmarks/flavors/article_light/benchmaking-pmc.md) | +| PLOS_1000 | 99.37 | [benchmaking-plos.md](benchmarks/flavors/article_light/benchmaking-plos.md) | +| eLife_984 | 88.73 | [benchmaking-elife.md](benchmarks/flavors/article_light/benchmaking-elife.md) | + +### Article lightweight structure with references + +| Corpus | Header (avg. micro F1 Ratcliff/Obershelp@0.95) | Citations (Instance-level f-score (RatcliffObershelp)) | Full report | +|-----------------|------------------------------------------------|--------------------------------------------------------|-------------------------------------------------------------------------------------| +| Bioxiv | 89.79 | 56.31 | [benchmaking-bioxiv.md](benchmarks/flavors/article_light_ref/benchmaking-bioxiv.md) | +| PMC_sample_1943 | 95.74 | 58.78 | [benchmaking-pmc.md](benchmarks/flavors/article_light_ref/benchmaking-pmc.md) | +| PLOS_1000 | 99.52 | 48.04 | [benchmaking-plos.md](benchmarks/flavors/article_light_ref/benchmaking-plos.md) | +| eLife_984 | 91.35 | 76.14 | [benchmaking-elife.md](benchmarks/flavors/article_light_ref/benchmaking-elife.md) | diff --git a/doc/benchmarks/flavors/article_light/benchmaking-bioxiv.md b/doc/benchmarks/flavors/article_light/benchmaking-bioxiv.md new file mode 100644 index 0000000000..5eae51353c --- /dev/null +++ b/doc/benchmarks/flavors/article_light/benchmaking-bioxiv.md @@ -0,0 +1,72 @@ +## Header metadata + +Evaluation on 1996 random PDF files out of 1998 PDF (ratio 1.0). + +#### Strict Matching (exact matches) + +**Field-level results** + +| label | precision | recall | f1 | support | +|-----------------------------|-----------|-----------|-----------|---------| +| authors | 82.92 | 81.5 | 82.2 | 1995 | +| first_author | 96.33 | 94.78 | 95.55 | 1993 | +| title | 78.16 | 73.7 | 75.86 | 1996 | +| | | | | | +| **all fields (micro avg.)** | **85.91** | **83.32** | **84.59** | 5984 | +| all fields (macro avg.) | 85.8 | 83.33 | 84.54 | 5984 | + +#### Soft Matching (ignoring punctuation, case and space characters mismatches) + +**Field-level results** + +| label | precision | recall | f1 | support | +|-----------------------------|-----------|-----------|----------|---------| +| authors | 83.53 | 82.11 | 82.81 | 1995 | +| first_author | 96.63 | 95.08 | 95.85 | 1993 | +| title | 80.66 | 76.05 | 78.29 | 1996 | +| | | | | | +| **all fields (micro avg.)** | **87.03** | **84.41** | **85.7** | 5984 | +| all fields (macro avg.) | 86.94 | 84.41 | 85.65 | 5984 | + +#### Levenshtein Matching (Minimum Levenshtein distance at 0.8) + +**Field-level results** + +| label | precision | recall | f1 | support | +|-----------------------------|-----------|-----------|-----------|---------| +| authors | 91.59 | 90.03 | 90.8 | 1995 | +| first_author | 96.84 | 95.28 | 96.05 | 1993 | +| title | 92.03 | 86.77 | 89.32 | 1996 | +| | | | | | +| **all fields (micro avg.)** | **93.5** | **90.69** | **92.08** | 5984 | +| all fields (macro avg.) | 93.48 | 90.69 | 92.06 | 5984 | + +#### Ratcliff/Obershelp Matching (Minimum Ratcliff/Obershelp similarity at 0.95) + +**Field-level results** + +| label | precision | recall | f1 | support | +|-----------------------------|-----------|-----------|----------|---------| +| authors | 87.51 | 86.02 | 86.75 | 1995 | +| first_author | 96.33 | 94.78 | 95.55 | 1993 | +| title | 88.42 | 83.37 | 85.82 | 1996 | +| | | | | | +| **all fields (micro avg.)** | **90.78** | **88.05** | **89.4** | 5984 | +| all fields (macro avg.) | 90.75 | 88.05 | 89.37 | 5984 | + +#### Instance-level results + +``` +Total expected instances: 1996 +Total correct instances: 1278 (strict) +Total correct instances: 1312 (soft) +Total correct instances: 1613 (Levenshtein) +Total correct instances: 1496 (ObservedRatcliffObershelp) + +Instance-level recall: 64.03 (strict) +Instance-level recall: 65.73 (soft) +Instance-level recall: 80.81 (Levenshtein) +Instance-level recall: 74.95 (RatcliffObershelp) +``` + +Evaluation metrics produced in 15.364 seconds diff --git a/doc/benchmarks/flavors/article_light/benchmaking-elife.md b/doc/benchmarks/flavors/article_light/benchmaking-elife.md new file mode 100644 index 0000000000..ad355b6813 --- /dev/null +++ b/doc/benchmarks/flavors/article_light/benchmaking-elife.md @@ -0,0 +1,72 @@ +## Header metadata + +Evaluation on 957 random PDF files out of 982 PDF (ratio 1.0). + +#### Strict Matching (exact matches) + +**Field-level results** + +| label | precision | recall | f1 | support | +|-----------------------------|-----------|-----------|-----------|---------| +| authors | 78.74 | 78.16 | 78.45 | 957 | +| first_author | 92 | 91.42 | 91.71 | 956 | +| title | 89.92 | 87.67 | 88.78 | 957 | +| | | | | | +| **all fields (micro avg.)** | **86.87** | **85.75** | **86.31** | 2870 | +| all fields (macro avg.) | 86.89 | 85.75 | 86.31 | 2870 | + +#### Soft Matching (ignoring punctuation, case and space characters mismatches) + +**Field-level results** + +| label | precision | recall | f1 | support | +|-----------------------------|-----------|-----------|-----------|---------| +| authors | 79.05 | 78.47 | 78.76 | 957 | +| first_author | 92 | 91.42 | 91.71 | 956 | +| title | 97 | 94.57 | 95.77 | 957 | +| | | | | | +| **all fields (micro avg.)** | **89.3** | **88.15** | **88.73** | 2870 | +| all fields (macro avg.) | 89.35 | 88.15 | 88.75 | 2870 | + +#### Levenshtein Matching (Minimum Levenshtein distance at 0.8) + +**Field-level results** + +| label | precision | recall | f1 | support | +|-----------------------------|-----------|-----------|-----------|---------| +| authors | 90.53 | 89.86 | 90.19 | 957 | +| first_author | 92.32 | 91.74 | 92.03 | 956 | +| title | 98.5 | 96.03 | 97.25 | 957 | +| | | | | | +| **all fields (micro avg.)** | **93.75** | **92.54** | **93.14** | 2870 | +| all fields (macro avg.) | 93.78 | 92.54 | 93.16 | 2870 | + +#### Ratcliff/Obershelp Matching (Minimum Ratcliff/Obershelp similarity at 0.95) + +**Field-level results** + +| label | precision | recall | f1 | support | +|-----------------------------|-----------|-----------|-----------|---------| +| authors | 84.32 | 83.7 | 84.01 | 957 | +| first_author | 92 | 91.42 | 91.71 | 956 | +| title | 98.5 | 96.03 | 97.25 | 957 | +| | | | | | +| **all fields (micro avg.)** | **91.56** | **90.38** | **90.97** | 2870 | +| all fields (macro avg.) | 91.61 | 90.38 | 90.99 | 2870 | + +#### Instance-level results + +``` +Total expected instances: 957 +Total correct instances: 678 (strict) +Total correct instances: 729 (soft) +Total correct instances: 811 (Levenshtein) +Total correct instances: 773 (ObservedRatcliffObershelp) + +Instance-level recall: 70.85 (strict) +Instance-level recall: 76.18 (soft) +Instance-level recall: 84.74 (Levenshtein) +Instance-level recall: 80.77 (RatcliffObershelp) +``` + +Evaluation metrics produced in 13.732 seconds diff --git a/doc/benchmarks/flavors/article_light/benchmaking-plos.md b/doc/benchmarks/flavors/article_light/benchmaking-plos.md new file mode 100644 index 0000000000..51ea391fb1 --- /dev/null +++ b/doc/benchmarks/flavors/article_light/benchmaking-plos.md @@ -0,0 +1,72 @@ +## Header metadata + +Evaluation on 1000 random PDF files out of 998 PDF (ratio 1.0). + +#### Strict Matching (exact matches) + +**Field-level results** + +| label | precision | recall | f1 | support | +|-----------------------------|-----------|-----------|-----------|---------| +| authors | 98.97 | 99.28 | 99.12 | 969 | +| first_author | 99.28 | 99.59 | 99.43 | 969 | +| title | 95.79 | 95.5 | 95.64 | 1000 | +| | | | | | +| **all fields (micro avg.)** | **97.99** | **98.09** | **98.04** | 2938 | +| all fields (macro avg.) | 98.01 | 98.12 | 98.07 | 2938 | + +#### Soft Matching (ignoring punctuation, case and space characters mismatches) + +**Field-level results** + +| label | precision | recall | f1 | support | +|-----------------------------|-----------|-----------|-----------|---------| +| authors | 98.97 | 99.28 | 99.12 | 969 | +| first_author | 99.28 | 99.59 | 99.43 | 969 | +| title | 99.3 | 99 | 99.15 | 1000 | +| | | | | | +| **all fields (micro avg.)** | **99.18** | **99.29** | **99.23** | 2938 | +| all fields (macro avg.) | 99.18 | 99.29 | 99.24 | 2938 | + +#### Levenshtein Matching (Minimum Levenshtein distance at 0.8) + +**Field-level results** + +| label | precision | recall | f1 | support | +|-----------------------------|-----------|-----------|-----------|---------| +| authors | 99.28 | 99.59 | 99.43 | 969 | +| first_author | 99.38 | 99.69 | 99.54 | 969 | +| title | 99.7 | 99.4 | 99.55 | 1000 | +| | | | | | +| **all fields (micro avg.)** | **99.46** | **99.56** | **99.51** | 2938 | +| all fields (macro avg.) | 99.45 | 99.56 | 99.51 | 2938 | + +#### Ratcliff/Obershelp Matching (Minimum Ratcliff/Obershelp similarity at 0.95) + +**Field-level results** + +| label | precision | recall | f1 | support | +|-----------------------------|-----------|-----------|-----------|---------| +| authors | 99.18 | 99.48 | 99.33 | 969 | +| first_author | 99.28 | 99.59 | 99.43 | 969 | +| title | 99.5 | 99.2 | 99.35 | 1000 | +| | | | | | +| **all fields (micro avg.)** | **99.32** | **99.42** | **99.37** | 2938 | +| all fields (macro avg.) | 99.32 | 99.42 | 99.37 | 2938 | + +#### Instance-level results + +``` +Total expected instances: 1000 +Total correct instances: 950 (strict) +Total correct instances: 985 (soft) +Total correct instances: 989 (Levenshtein) +Total correct instances: 988 (ObservedRatcliffObershelp) + +Instance-level recall: 95 (strict) +Instance-level recall: 98.5 (soft) +Instance-level recall: 98.9 (Levenshtein) +Instance-level recall: 98.8 (RatcliffObershelp) +``` + +Evaluation metrics produced in 12.571 seconds diff --git a/doc/benchmarks/flavors/article_light/benchmaking-pmc.md b/doc/benchmarks/flavors/article_light/benchmaking-pmc.md new file mode 100644 index 0000000000..ce7d5bc78c --- /dev/null +++ b/doc/benchmarks/flavors/article_light/benchmaking-pmc.md @@ -0,0 +1,72 @@ +## Header metadata + +Evaluation on 1943 random PDF files out of 1941 PDF (ratio 1.0). + +#### Strict Matching (exact matches) + +**Field-level results** + +| label | precision | recall | f1 | support | +|-----------------------------|-----------|-----------|-----------|---------| +| authors | 92.5 | 92.17 | 92.34 | 1941 | +| first_author | 96.28 | 95.93 | 96.1 | 1941 | +| title | 84.28 | 83.32 | 83.8 | 1943 | +| | | | | | +| **all fields (micro avg.)** | **91.03** | **90.47** | **90.75** | 5825 | +| all fields (macro avg.) | 91.02 | 90.47 | 90.75 | 5825 | + +#### Soft Matching (ignoring punctuation, case and space characters mismatches) + +**Field-level results** + +| label | precision | recall | f1 | support | +|-----------------------------|-----------|-----------|-----------|---------| +| authors | 94.42 | 94.08 | 94.25 | 1941 | +| first_author | 96.64 | 96.29 | 96.46 | 1941 | +| title | 91.98 | 90.94 | 91.46 | 1943 | +| | | | | | +| **all fields (micro avg.)** | **94.35** | **93.77** | **94.06** | 5825 | +| all fields (macro avg.) | 94.35 | 93.77 | 94.06 | 5825 | + +#### Levenshtein Matching (Minimum Levenshtein distance at 0.8) + +**Field-level results** + +| label | precision | recall | f1 | support | +|-----------------------------|-----------|----------|----------|---------| +| authors | 96.54 | 96.19 | 96.36 | 1941 | +| first_author | 96.95 | 96.6 | 96.77 | 1941 | +| title | 98.13 | 97.01 | 97.57 | 1943 | +| | | | | | +| **all fields (micro avg.)** | **97.2** | **96.6** | **96.9** | 5825 | +| all fields (macro avg.) | 97.2 | 96.6 | 96.9 | 5825 | + +#### Ratcliff/Obershelp Matching (Minimum Ratcliff/Obershelp similarity at 0.95) + +**Field-level results** + +| label | precision | recall | f1 | support | +|-----------------------------|-----------|-----------|-----------|---------| +| authors | 95.6 | 95.26 | 95.43 | 1941 | +| first_author | 96.28 | 95.93 | 96.1 | 1941 | +| title | 96.15 | 95.06 | 95.6 | 1943 | +| | | | | | +| **all fields (micro avg.)** | **96.01** | **95.42** | **95.71** | 5825 | +| all fields (macro avg.) | 96.01 | 95.42 | 95.71 | 5825 | + +#### Instance-level results + +``` +Total expected instances: 1943 +Total correct instances: 1511 (strict) +Total correct instances: 1675 (soft) +Total correct instances: 1820 (Levenshtein) +Total correct instances: 1766 (ObservedRatcliffObershelp) + +Instance-level recall: 77.77 (strict) +Instance-level recall: 86.21 (soft) +Instance-level recall: 93.67 (Levenshtein) +Instance-level recall: 90.89 (RatcliffObershelp) +``` + +Evaluation metrics produced in 14.6 seconds diff --git a/doc/benchmarks/flavors/article_light_ref/benchmaking-bioxiv.md b/doc/benchmarks/flavors/article_light_ref/benchmaking-bioxiv.md new file mode 100644 index 0000000000..e69de29bb2 diff --git a/doc/benchmarks/flavors/article_light_ref/benchmaking-elife.md b/doc/benchmarks/flavors/article_light_ref/benchmaking-elife.md new file mode 100644 index 0000000000..f7a76a269f --- /dev/null +++ b/doc/benchmarks/flavors/article_light_ref/benchmaking-elife.md @@ -0,0 +1,202 @@ +## Header metadata + +Evaluation on 984 random PDF files out of 982 PDF (ratio 1.0). + +#### Strict Matching (exact matches) + +**Field-level results** + +| label | precision | recall | f1 | support | +|-----------------------------|-----------|-----------|-----------|---------| +| authors | 80.86 | 79.96 | 80.41 | 983 | +| first_author | 91.77 | 90.84 | 91.3 | 982 | +| title | 89.68 | 87.4 | 88.52 | 984 | +| | | | | | +| **all fields (micro avg.)** | **87.43** | **86.06** | **86.74** | 2949 | +| all fields (macro avg.) | 87.44 | 86.06 | 86.74 | 2949 | + +#### Soft Matching (ignoring punctuation, case and space characters mismatches) + +**Field-level results** + +| label | precision | recall | f1 | support | +|-----------------------------|-----------|----------|----------|---------| +| authors | 81.17 | 80.26 | 80.72 | 983 | +| first_author | 91.77 | 90.84 | 91.3 | 982 | +| title | 96.56 | 94.11 | 95.32 | 984 | +| | | | | | +| **all fields (micro avg.)** | **89.8** | **88.4** | **89.1** | 2949 | +| all fields (macro avg.) | 89.83 | 88.4 | 89.11 | 2949 | + +#### Levenshtein Matching (Minimum Levenshtein distance at 0.8) + +**Field-level results** + +| label | precision | recall | f1 | support | +|-----------------------------|-----------|-----------|-----------|---------| +| authors | 93.11 | 92.07 | 92.58 | 983 | +| first_author | 92.08 | 91.14 | 91.61 | 982 | +| title | 98.02 | 95.53 | 96.76 | 984 | +| | | | | | +| **all fields (micro avg.)** | **94.39** | **92.91** | **93.64** | 2949 | +| all fields (macro avg.) | 94.4 | 92.91 | 93.65 | 2949 | + +#### Ratcliff/Obershelp Matching (Minimum Ratcliff/Obershelp similarity at 0.95) + +**Field-level results** + +| label | precision | recall | f1 | support | +|-----------------------------|-----------|-----------|-----------|---------| +| authors | 86.52 | 85.55 | 86.04 | 983 | +| first_author | 91.77 | 90.84 | 91.3 | 982 | +| title | 98.02 | 95.53 | 96.76 | 984 | +| | | | | | +| **all fields (micro avg.)** | **92.08** | **90.64** | **91.35** | 2949 | +| all fields (macro avg.) | 92.1 | 90.64 | 91.36 | 2949 | + +#### Instance-level results + +``` +Total expected instances: 984 +Total correct instances: 713 (strict) +Total correct instances: 766 (soft) +Total correct instances: 854 (Levenshtein) +Total correct instances: 814 (ObservedRatcliffObershelp) + +Instance-level recall: 72.46 (strict) +Instance-level recall: 77.85 (soft) +Instance-level recall: 86.79 (Levenshtein) +Instance-level recall: 82.72 (RatcliffObershelp) +``` + +## Citation metadata + +Evaluation on 984 random PDF files out of 982 PDF (ratio 1.0). + +#### Strict Matching (exact matches) + +**Field-level results** + +| label | precision | recall | f1 | support | +|-----------------------------|-----------|-----------|-----------|---------| +| authors | 79.44 | 78.36 | 78.9 | 63265 | +| date | 95.93 | 94.2 | 95.05 | 63662 | +| first_author | 94.84 | 93.5 | 94.17 | 63265 | +| inTitle | 95.81 | 94.89 | 95.35 | 63213 | +| issue | 1.98 | 75 | 3.86 | 16 | +| page | 96.25 | 95.43 | 95.84 | 53375 | +| title | 90.28 | 90.91 | 90.6 | 62044 | +| volume | 97.91 | 98.4 | 98.15 | 61049 | +| | | | | | +| **all fields (micro avg.)** | **92.71** | **92.14** | **92.42** | 429889 | +| all fields (macro avg.) | 81.56 | 90.09 | 81.49 | 429889 | + +#### Soft Matching (ignoring punctuation, case and space characters mismatches) + +**Field-level results** + +| label | precision | recall | f1 | support | +|-----------------------------|-----------|-----------|-----------|---------| +| authors | 79.58 | 78.49 | 79.03 | 63265 | +| date | 95.93 | 94.2 | 95.05 | 63662 | +| first_author | 94.92 | 93.58 | 94.25 | 63265 | +| inTitle | 96.3 | 95.37 | 95.83 | 63213 | +| issue | 1.98 | 75 | 3.86 | 16 | +| page | 96.25 | 95.43 | 95.84 | 53375 | +| title | 95.95 | 96.62 | 96.28 | 62044 | +| volume | 97.91 | 98.4 | 98.15 | 61049 | +| | | | | | +| **all fields (micro avg.)** | **93.64** | **93.07** | **93.35** | 429889 | +| all fields (macro avg.) | 82.35 | 90.89 | 82.29 | 429889 | + +#### Levenshtein Matching (Minimum Levenshtein distance at 0.8) + +**Field-level results** + +| label | precision | recall | f1 | support | +|-----------------------------|-----------|-----------|-----------|---------| +| authors | 93.32 | 92.05 | 92.68 | 63265 | +| date | 95.93 | 94.2 | 95.05 | 63662 | +| first_author | 95.37 | 94.03 | 94.69 | 63265 | +| inTitle | 96.62 | 95.7 | 96.16 | 63213 | +| issue | 1.98 | 75 | 3.86 | 16 | +| page | 96.25 | 95.43 | 95.84 | 53375 | +| title | 97.66 | 98.34 | 98 | 62044 | +| volume | 97.91 | 98.4 | 98.15 | 61049 | +| | | | | | +| **all fields (micro avg.)** | **96.01** | **95.42** | **95.72** | 429889 | +| all fields (macro avg.) | 84.38 | 92.89 | 84.3 | 429889 | + +#### Ratcliff/Obershelp Matching (Minimum Ratcliff/Obershelp similarity at 0.95) + +**Field-level results** + +| label | precision | recall | f1 | support | +|-----------------------------|-----------|-----------|-----------|---------| +| authors | 86.75 | 85.57 | 86.16 | 63265 | +| date | 95.93 | 94.2 | 95.05 | 63662 | +| first_author | 94.85 | 93.52 | 94.18 | 63265 | +| inTitle | 96.3 | 95.38 | 95.84 | 63213 | +| issue | 1.98 | 75 | 3.86 | 16 | +| page | 96.25 | 95.43 | 95.84 | 53375 | +| title | 97.51 | 98.19 | 97.85 | 62044 | +| volume | 97.91 | 98.4 | 98.15 | 61049 | +| | | | | | +| **all fields (micro avg.)** | **94.91** | **94.33** | **94.62** | 429889 | +| all fields (macro avg.) | 83.44 | 91.96 | 83.37 | 429889 | + +#### Instance-level results + +``` +Total expected instances: 63664 +Total extracted instances: 66390 +Total correct instances: 42407 (strict) +Total correct instances: 45251 (soft) +Total correct instances: 52911 (Levenshtein) +Total correct instances: 49510 (RatcliffObershelp) + +Instance-level precision: 63.88 (strict) +Instance-level precision: 68.16 (soft) +Instance-level precision: 79.7 (Levenshtein) +Instance-level precision: 74.57 (RatcliffObershelp) + +Instance-level recall: 66.61 (strict) +Instance-level recall: 71.08 (soft) +Instance-level recall: 83.11 (Levenshtein) +Instance-level recall: 77.77 (RatcliffObershelp) + +Instance-level f-score: 65.21 (strict) +Instance-level f-score: 69.59 (soft) +Instance-level f-score: 81.37 (Levenshtein) +Instance-level f-score: 76.14 (RatcliffObershelp) + +Matching 1 : 58739 + +Matching 2 : 1008 + +Matching 3 : 1244 + +Matching 4 : 366 + +Total matches : 61357 +``` + +#### Citation context resolution + +``` + +Total expected references: 63664 - 64.7 references per article +Total predicted references: 66390 - 67.47 references per article + +Total expected citation contexts: 109022 - 110.79 citation contexts per article +Total predicted citation contexts: 0 - 0 citation contexts per article + +Total correct predicted citation contexts: 0 - 0 citation contexts per article +Total wrong predicted citation contexts: 0 (wrong callout matching, callout missing in NLM, or matching with a bib. ref. not aligned with a bib.ref. in NLM) + +Precision citation contexts: NaN +Recall citation contexts: 0 +fscore citation contexts: NaN +``` + +Evaluation metrics produced in 1541.928 seconds diff --git a/doc/benchmarks/flavors/article_light_ref/benchmaking-plos.md b/doc/benchmarks/flavors/article_light_ref/benchmaking-plos.md new file mode 100644 index 0000000000..7dce593057 --- /dev/null +++ b/doc/benchmarks/flavors/article_light_ref/benchmaking-plos.md @@ -0,0 +1,218 @@ + +## Header metadata + +Evaluation on 1000 random PDF files out of 998 PDF (ratio 1.0). + +#### Strict Matching (exact matches) + +**Field-level results** + +| label | precision | recall | f1 | support | +|--- |--- |--- |--- |--- | +| authors | 99.18 | 99.28 | 99.23 | 969 | +| first_author | 99.48 | 99.59 | 99.54 | 969 | +| title | 95.89 | 95.7 | 95.8 | 1000 | +| | | | | | +| **all fields (micro avg.)** | **98.16** | **98.16** | **98.16** | 2938 | +| all fields (macro avg.) | 98.18 | 98.19 | 98.19 | 2938 | + + + +#### Soft Matching (ignoring punctuation, case and space characters mismatches) + +**Field-level results** + +| label | precision | recall | f1 | support | +|--- |--- |--- |--- |--- | +| authors | 99.18 | 99.28 | 99.23 | 969 | +| first_author | 99.48 | 99.59 | 99.54 | 969 | +| title | 99.5 | 99.3 | 99.4 | 1000 | +| | | | | | +| **all fields (micro avg.)** | **99.39** | **99.39** | **99.39** | 2938 | +| all fields (macro avg.) | 99.39 | 99.39 | 99.39 | 2938 | + + + +#### Levenshtein Matching (Minimum Levenshtein distance at 0.8) + +**Field-level results** + +| label | precision | recall | f1 | support | +|--- |--- |--- |--- |--- | +| authors | 99.48 | 99.59 | 99.54 | 969 | +| first_author | 99.59 | 99.69 | 99.64 | 969 | +| title | 99.7 | 99.5 | 99.6 | 1000 | +| | | | | | +| **all fields (micro avg.)** | **99.59** | **99.59** | **99.59** | 2938 | +| all fields (macro avg.) | 99.59 | 99.59 | 99.59 | 2938 | + + + +#### Ratcliff/Obershelp Matching (Minimum Ratcliff/Obershelp similarity at 0.95) + +**Field-level results** + +| label | precision | recall | f1 | support | +|--- |--- |--- |--- |--- | +| authors | 99.38 | 99.48 | 99.43 | 969 | +| first_author | 99.48 | 99.59 | 99.54 | 969 | +| title | 99.7 | 99.5 | 99.6 | 1000 | +| | | | | | +| **all fields (micro avg.)** | **99.52** | **99.52** | **99.52** | 2938 | +| all fields (macro avg.) | 99.52 | 99.52 | 99.52 | 2938 | + + +#### Instance-level results + +``` +Total expected instances: 1000 +Total correct instances: 952 (strict) +Total correct instances: 988 (soft) +Total correct instances: 992 (Levenshtein) +Total correct instances: 991 (ObservedRatcliffObershelp) + +Instance-level recall: 95.2 (strict) +Instance-level recall: 98.8 (soft) +Instance-level recall: 99.2 (Levenshtein) +Instance-level recall: 99.1 (RatcliffObershelp) +``` + + +## Citation metadata + +Evaluation on 1000 random PDF files out of 998 PDF (ratio 1.0). + +#### Strict Matching (exact matches) + +**Field-level results** + +| label | precision | recall | f1 | support | +|--- |--- |--- |--- |--- | +| authors | 81.13 | 78.41 | 79.75 | 44770 | +| date | 84.56 | 81.24 | 82.87 | 45457 | +| first_author | 91.44 | 88.34 | 89.86 | 44770 | +| inTitle | 81.61 | 83.57 | 82.58 | 42795 | +| issue | 93.48 | 92.7 | 93.09 | 18983 | +| page | 93.63 | 77.54 | 84.83 | 40844 | +| title | 59.94 | 60.47 | 60.2 | 43101 | +| volume | 95.82 | 96.1 | 95.96 | 40458 | +| | | | | | +| **all fields (micro avg.)** | **84.18** | **81.44** | **82.78** | 321178 | +| all fields (macro avg.) | 85.2 | 82.29 | 83.64 | 321178 | + + + +#### Soft Matching (ignoring punctuation, case and space characters mismatches) + +**Field-level results** + +| label | precision | recall | f1 | support | +|--- |--- |--- |--- |--- | +| authors | 81.45 | 78.71 | 80.06 | 44770 | +| date | 84.56 | 81.24 | 82.87 | 45457 | +| first_author | 91.66 | 88.55 | 90.08 | 44770 | +| inTitle | 85.44 | 87.49 | 86.45 | 42795 | +| issue | 93.48 | 92.7 | 93.09 | 18983 | +| page | 93.63 | 77.54 | 84.83 | 40844 | +| title | 91.92 | 92.74 | 92.33 | 43101 | +| volume | 95.82 | 96.1 | 95.96 | 40458 | +| | | | | | +| **all fields (micro avg.)** | **89.27** | **86.36** | **87.79** | 321178 | +| all fields (macro avg.) | 89.74 | 86.88 | 88.21 | 321178 | + + + +#### Levenshtein Matching (Minimum Levenshtein distance at 0.8) + +**Field-level results** + +| label | precision | recall | f1 | support | +|--- |--- |--- |--- |--- | +| authors | 90.61 | 87.57 | 89.06 | 44770 | +| date | 84.56 | 81.24 | 82.87 | 45457 | +| first_author | 92.19 | 89.07 | 90.6 | 44770 | +| inTitle | 86.38 | 88.45 | 87.41 | 42795 | +| issue | 93.48 | 92.7 | 93.09 | 18983 | +| page | 93.63 | 77.54 | 84.83 | 40844 | +| title | 94.52 | 95.35 | 94.93 | 43101 | +| volume | 95.82 | 96.1 | 95.96 | 40458 | +| | | | | | +| **all fields (micro avg.)** | **91.11** | **88.15** | **89.61** | 321178 | +| all fields (macro avg.) | 91.4 | 88.5 | 89.84 | 321178 | + + + +#### Ratcliff/Obershelp Matching (Minimum Ratcliff/Obershelp similarity at 0.95) + +**Field-level results** + +| label | precision | recall | f1 | support | +|--- |--- |--- |--- |--- | +| authors | 84.9 | 82.05 | 83.45 | 44770 | +| date | 84.56 | 81.24 | 82.87 | 45457 | +| first_author | 91.44 | 88.34 | 89.86 | 44770 | +| inTitle | 85.09 | 87.13 | 86.1 | 42795 | +| issue | 93.48 | 92.7 | 93.09 | 18983 | +| page | 93.63 | 77.54 | 84.83 | 40844 | +| title | 93.9 | 94.73 | 94.31 | 43101 | +| volume | 95.82 | 96.1 | 95.96 | 40458 | +| | | | | | +| **all fields (micro avg.)** | **89.94** | **87.02** | **88.46** | 321178 | +| all fields (macro avg.) | 90.35 | 87.48 | 88.81 | 321178 | + + +#### Instance-level results + +``` +Total expected instances: 48449 +Total extracted instances: 48344 +Total correct instances: 13485 (strict) +Total correct instances: 22253 (soft) +Total correct instances: 24898 (Levenshtein) +Total correct instances: 23252 (RatcliffObershelp) + +Instance-level precision: 27.89 (strict) +Instance-level precision: 46.03 (soft) +Instance-level precision: 51.5 (Levenshtein) +Instance-level precision: 48.1 (RatcliffObershelp) + +Instance-level recall: 27.83 (strict) +Instance-level recall: 45.93 (soft) +Instance-level recall: 51.39 (Levenshtein) +Instance-level recall: 47.99 (RatcliffObershelp) + +Instance-level f-score: 27.86 (strict) +Instance-level f-score: 45.98 (soft) +Instance-level f-score: 51.45 (Levenshtein) +Instance-level f-score: 48.04 (RatcliffObershelp) + +Matching 1 : 35367 + +Matching 2 : 1257 + +Matching 3 : 3269 + +Matching 4 : 1801 + +Total matches : 41694 +``` + + +#### Citation context resolution +``` + +Total expected references: 48449 - 48.45 references per article +Total predicted references: 48344 - 48.34 references per article + +Total expected citation contexts: 69755 - 69.75 citation contexts per article +Total predicted citation contexts: 0 - 0 citation contexts per article + +Total correct predicted citation contexts: 0 - 0 citation contexts per article +Total wrong predicted citation contexts: 0 (wrong callout matching, callout missing in NLM, or matching with a bib. ref. not aligned with a bib.ref. in NLM) + +Precision citation contexts: NaN +Recall citation contexts: 0 +fscore citation contexts: NaN +``` + +Evaluation metrics produced in 893.539 seconds diff --git a/doc/benchmarks/flavors/article_light_ref/benchmaking-pmc.md b/doc/benchmarks/flavors/article_light_ref/benchmaking-pmc.md new file mode 100644 index 0000000000..58d81a0d2a --- /dev/null +++ b/doc/benchmarks/flavors/article_light_ref/benchmaking-pmc.md @@ -0,0 +1,202 @@ +## Header metadata + +Evaluation on 1943 random PDF files out of 1941 PDF (ratio 1.0). + +#### Strict Matching (exact matches) + +**Field-level results** + +| label | precision | recall | f1 | support | +|-----------------------------|-----------|-----------|-----------|---------| +| authors | 92.45 | 92.07 | 92.26 | 1941 | +| first_author | 96.38 | 95.98 | 96.18 | 1941 | +| title | 84.27 | 83.27 | 83.77 | 1943 | +| | | | | | +| **all fields (micro avg.)** | **91.05** | **90.44** | **90.74** | 5825 | +| all fields (macro avg.) | 91.03 | 90.44 | 90.73 | 5825 | + +#### Soft Matching (ignoring punctuation, case and space characters mismatches) + +**Field-level results** + +| label | precision | recall | f1 | support | +|-----------------------------|-----------|-----------|-----------|---------| +| authors | 94.36 | 93.97 | 94.17 | 1941 | +| first_author | 96.74 | 96.34 | 96.54 | 1941 | +| title | 92.03 | 90.94 | 91.48 | 1943 | +| | | | | | +| **all fields (micro avg.)** | **94.38** | **93.75** | **94.07** | 5825 | +| all fields (macro avg.) | 94.38 | 93.75 | 94.06 | 5825 | + +#### Levenshtein Matching (Minimum Levenshtein distance at 0.8) + +**Field-level results** + +| label | precision | recall | f1 | support | +|-----------------------------|-----------|-----------|-----------|---------| +| authors | 96.59 | 96.19 | 96.39 | 1941 | +| first_author | 97.05 | 96.65 | 96.85 | 1941 | +| title | 98.18 | 97.01 | 97.59 | 1943 | +| | | | | | +| **all fields (micro avg.)** | **97.27** | **96.62** | **96.94** | 5825 | +| all fields (macro avg.) | 97.27 | 96.62 | 96.94 | 5825 | + +#### Ratcliff/Obershelp Matching (Minimum Ratcliff/Obershelp similarity at 0.95) + +**Field-level results** + +| label | precision | recall | f1 | support | +|-----------------------------|-----------|-----------|-----------|---------| +| authors | 95.6 | 95.21 | 95.41 | 1941 | +| first_author | 96.38 | 95.98 | 96.18 | 1941 | +| title | 96.2 | 95.06 | 95.63 | 1943 | +| | | | | | +| **all fields (micro avg.)** | **96.06** | **95.42** | **95.74** | 5825 | +| all fields (macro avg.) | 96.06 | 95.42 | 95.74 | 5825 | + +#### Instance-level results + +``` +Total expected instances: 1943 +Total correct instances: 1507 (strict) +Total correct instances: 1672 (soft) +Total correct instances: 1820 (Levenshtein) +Total correct instances: 1764 (ObservedRatcliffObershelp) + +Instance-level recall: 77.56 (strict) +Instance-level recall: 86.05 (soft) +Instance-level recall: 93.67 (Levenshtein) +Instance-level recall: 90.79 (RatcliffObershelp) +``` + +## Citation metadata + +Evaluation on 1943 random PDF files out of 1941 PDF (ratio 1.0). + +#### Strict Matching (exact matches) + +**Field-level results** + +| label | precision | recall | f1 | support | +|-----------------------------|-----------|-----------|-----------|---------| +| authors | 82.75 | 75.91 | 79.18 | 85778 | +| date | 93.93 | 83.72 | 88.53 | 87067 | +| first_author | 89.45 | 82.02 | 85.57 | 85778 | +| inTitle | 72.53 | 71.31 | 71.91 | 81007 | +| issue | 87.61 | 87.48 | 87.54 | 16635 | +| page | 93.43 | 82.97 | 87.89 | 80501 | +| title | 79.19 | 74.82 | 76.94 | 80736 | +| volume | 95.14 | 89.25 | 92.1 | 80067 | +| | | | | | +| **all fields (micro avg.)** | **86.51** | **80.21** | **83.24** | 597569 | +| all fields (macro avg.) | 86.75 | 80.93 | 83.71 | 597569 | + +#### Soft Matching (ignoring punctuation, case and space characters mismatches) + +**Field-level results** + +| label | precision | recall | f1 | support | +|-----------------------------|-----------|-----------|-----------|---------| +| authors | 83.22 | 76.34 | 79.63 | 85778 | +| date | 93.93 | 83.72 | 88.53 | 87067 | +| first_author | 89.62 | 82.18 | 85.74 | 85778 | +| inTitle | 84.16 | 82.74 | 83.45 | 81007 | +| issue | 87.61 | 87.48 | 87.54 | 16635 | +| page | 93.43 | 82.97 | 87.89 | 80501 | +| title | 90.89 | 85.86 | 88.3 | 80736 | +| volume | 95.14 | 89.25 | 92.1 | 80067 | +| | | | | | +| **all fields (micro avg.)** | **89.88** | **83.34** | **86.49** | 597569 | +| all fields (macro avg.) | 89.75 | 83.82 | 86.65 | 597569 | + +#### Levenshtein Matching (Minimum Levenshtein distance at 0.8) + +**Field-level results** + +| label | precision | recall | f1 | support | +|-----------------------------|-----------|-----------|-----------|---------| +| authors | 88.86 | 81.51 | 85.03 | 85778 | +| date | 93.93 | 83.72 | 88.53 | 87067 | +| first_author | 89.81 | 82.35 | 85.92 | 85778 | +| inTitle | 85.4 | 83.96 | 84.68 | 81007 | +| issue | 87.61 | 87.48 | 87.54 | 16635 | +| page | 93.43 | 82.97 | 87.89 | 80501 | +| title | 93.21 | 88.06 | 90.56 | 80736 | +| volume | 95.14 | 89.25 | 92.1 | 80067 | +| | | | | | +| **all fields (micro avg.)** | **91.21** | **84.57** | **87.77** | 597569 | +| all fields (macro avg.) | 90.92 | 84.91 | 87.78 | 597569 | + +#### Ratcliff/Obershelp Matching (Minimum Ratcliff/Obershelp similarity at 0.95) + +**Field-level results** + +| label | precision | recall | f1 | support | +|-----------------------------|-----------|----------|-----------|---------| +| authors | 85.68 | 78.59 | 81.98 | 85778 | +| date | 93.93 | 83.72 | 88.53 | 87067 | +| first_author | 89.46 | 82.04 | 85.59 | 85778 | +| inTitle | 82.76 | 81.36 | 82.05 | 81007 | +| issue | 87.61 | 87.48 | 87.54 | 16635 | +| page | 93.43 | 82.97 | 87.89 | 80501 | +| title | 92.81 | 87.68 | 90.17 | 80736 | +| volume | 95.14 | 89.25 | 92.1 | 80067 | +| | | | | | +| **all fields (micro avg.)** | **90.27** | **83.7** | **86.86** | 597569 | +| all fields (macro avg.) | 90.1 | 84.14 | 86.98 | 597569 | + +#### Instance-level results + +``` +Total expected instances: 90125 +Total extracted instances: 86410 +Total correct instances: 38449 (strict) +Total correct instances: 50473 (soft) +Total correct instances: 55286 (Levenshtein) +Total correct instances: 51882 (RatcliffObershelp) + +Instance-level precision: 44.5 (strict) +Instance-level precision: 58.41 (soft) +Instance-level precision: 63.98 (Levenshtein) +Instance-level precision: 60.04 (RatcliffObershelp) + +Instance-level recall: 42.66 (strict) +Instance-level recall: 56 (soft) +Instance-level recall: 61.34 (Levenshtein) +Instance-level recall: 57.57 (RatcliffObershelp) + +Instance-level f-score: 43.56 (strict) +Instance-level f-score: 57.18 (soft) +Instance-level f-score: 62.63 (Levenshtein) +Instance-level f-score: 58.78 (RatcliffObershelp) + +Matching 1 : 67871 + +Matching 2 : 4150 + +Matching 3 : 1863 + +Matching 4 : 672 + +Total matches : 74556 +``` + +#### Citation context resolution + +``` + +Total expected references: 90125 - 46.38 references per article +Total predicted references: 86410 - 44.47 references per article + +Total expected citation contexts: 139835 - 71.97 citation contexts per article +Total predicted citation contexts: 0 - 0 citation contexts per article + +Total correct predicted citation contexts: 0 - 0 citation contexts per article +Total wrong predicted citation contexts: 0 (wrong callout matching, callout missing in NLM, or matching with a bib. ref. not aligned with a bib.ref. in NLM) + +Precision citation contexts: NaN +Recall citation contexts: 0 +fscore citation contexts: NaN +``` + +Evaluation metrics produced in 1474.976 seconds