Add wiki-all-6-3 BM25 regressions (#2067)

Add wiki-all-6-3 BM25 QA regressions
castorini · Mar 4, 2023 · 4b7662c · 4b7662c
1 parent 1609686
commit 4b7662c
Show file tree

Hide file tree

Showing 3 changed files with 301 additions and 0 deletions.
diff --git a/docs/regressions-wiki-all-6-3-tamber-bm25.md b/docs/regressions-wiki-all-6-3-tamber-bm25.md
@@ -0,0 +1,159 @@
+# Anserini Regressions: QA with wiki-all-6-3-tamber Corpus
+
+**Models**: BM25
+
+This page documents QA regression experiments on the wiki-all-6-3-tamber corpus, which is integrated into Anserini's regression testing framework.
+
+The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/wiki-all-6-3-tamber-bm25.yaml).
+Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/wiki-all-6-3-tamber-bm25.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead.
+
+From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end:
+
+```bash
+python src/main/python/run_regression.py --index --verify --search --regression wiki-all-6-3-tamber-bm25
+```
+
+## Indexing
+
+Typical indexing command:
+
+```bash
+target/appassembler/bin/IndexCollection \
+  -collection MrTyDiCollection \
+  -input /path/to/wiki-all-6-3-tamber \
+  -index indexes/lucene-index.wiki-all-6-3-tamber/ \
+  -generator DefaultLuceneDocumentGenerator \
+  -threads 20 -storeRaw \
+  >& logs/log.wiki-all-6-3-tamber &
+```
+
+The directory `/path/to/wiki-all-6-3-tamber/`should be a directory containing the wiki-all-6-3-tamber passages collection retrieved from [here](https://huggingface.co/datasets/castorini/odqa-wiki-corpora).
+
+For additional details, see explanation of [common indexing options](common-indexing-options.md).
+
+## Retrieval
+
+Topics are stored in [`src/main/resources/topics-and-qrels/`](../src/main/resources/topics-and-qrels/).
+The regression experiments here evaluate on the test set of multiple QA datasets, namely Natural Questions, TriviaQA, SQuAD, and WebQuestions.
+
+After indexing has completed, you should be able to perform retrieval as follows:
+
+```bash
+target/appassembler/bin/SearchCollection \
+  -index indexes/lucene-index.wiki-all-6-3-tamber/ \
+  -topics src/main/resources/topics-and-qrels/topics.dpr.nq.test.txt \
+  -topicreader DprNq \
+  -output runs/run.wiki-all-6-3-tamber.bm25.topics.dpr.nq.test.txt \
+  -bm25 &
+target/appassembler/bin/SearchCollection \
+  -index indexes/lucene-index.wiki-all-6-3-tamber/ \
+  -topics src/main/resources/topics-and-qrels/topics.dpr.trivia.test.txt \
+  -topicreader DprNq \
+  -output runs/run.wiki-all-6-3-tamber.bm25.topics.dpr.trivia.test.txt \
+  -bm25 &
+target/appassembler/bin/SearchCollection \
+  -index indexes/lucene-index.wiki-all-6-3-tamber/ \
+  -topics src/main/resources/topics-and-qrels/topics.dpr.squad.test.txt \
+  -topicreader DprJsonl \
+  -output runs/run.wiki-all-6-3-tamber.bm25.topics.dpr.squad.test.txt \
+  -bm25 &
+target/appassembler/bin/SearchCollection \
+  -index indexes/lucene-index.wiki-all-6-3-tamber/ \
+  -topics src/main/resources/topics-and-qrels/topics.dpr.wq.test.txt \
+  -topicreader DprJsonl \
+  -output runs/run.wiki-all-6-3-tamber.bm25.topics.dpr.wq.test.txt \
+  -bm25 &
+target/appassembler/bin/SearchCollection \
+  -index indexes/lucene-index.wiki-all-6-3-tamber/ \
+  -topics src/main/resources/topics-and-qrels/topics.dpr.curated.test.txt \
+  -topicreader DprJsonl \
+  -output runs/run.wiki-all-6-3-tamber.bm25.topics.dpr.curated.test.txt \
+  -bm25 &
+target/appassembler/bin/SearchCollection \
+  -index indexes/lucene-index.wiki-all-6-3-tamber/ \
+  -topics src/main/resources/topics-and-qrels/topics.nq.test.txt \
+  -topicreader DprNq \
+  -output runs/run.wiki-all-6-3-tamber.bm25.topics.nq.test.txt \
+  -bm25 &
+```
+
+The trec format will need to be converted to DPR's JSON format for evaluation:
+```bash
+python -m pyserini.eval.convert_trec_run_to_dpr_retrieval_run \
+  --index indexes/lucene-index.wiki-all-6-3-tamber/ \
+  --topics dpr-nq-test \
+  --input runs/run.wiki-all-6-3-tamber.bm25.topics.dpr.nq.test.txt \
+  --output runs/run.wiki-all-6-3-tamber.bm25.topics.dpr.nq.test.txt.json \
+  --combine-title-text &
+python -m pyserini.eval.convert_trec_run_to_dpr_retrieval_run \
+  --index indexes/lucene-index.wiki-all-6-3-tamber/ \
+  --topics dpr-trivia-test \
+  --input runs/run.wiki-all-6-3-tamber.bm25.topics.dpr.trivia.test.txt \
+  --output runs/run.wiki-all-6-3-tamber.bm25.topics.dpr.trivia.test.txt.json \
+  --combine-title-text &
+python -m pyserini.eval.convert_trec_run_to_dpr_retrieval_run \
+  --index indexes/lucene-index.wiki-all-6-3-tamber/ \
+  --topics dpr-squad-test \
+  --input runs/run.wiki-all-6-3-tamber.bm25.topics.dpr.squad.test.txt \
+  --output runs/run.wiki-all-6-3-tamber.bm25.topics.dpr.squad.test.txt.json \
+  --combine-title-text &
+python -m pyserini.eval.convert_trec_run_to_dpr_retrieval_run \
+  --index indexes/lucene-index.wiki-all-6-3-tamber/ \
+  --topics dpr-wq-test \
+  --input runs/run.wiki-all-6-3-tamber.bm25.topics.dpr.wq.test.txt \
+  --output runs/run.wiki-all-6-3-tamber.bm25.topics.dpr.wq.test.txt.json \
+  --combine-title-text &
+python -m pyserini.eval.convert_trec_run_to_dpr_retrieval_run \
+  --index indexes/lucene-index.wiki-all-6-3-tamber/ \
+  --topics dpr-curated-test \
+  --input runs/run.wiki-all-6-3-tamber.bm25.topics.dpr.curated.test.txt \
+  --output runs/run.wiki-all-6-3-tamber.bm25.topics.dpr.curated.test.txt.json \
+  --combine-title-text  --regex &
+python -m pyserini.eval.convert_trec_run_to_dpr_retrieval_run \
+  --index indexes/lucene-index.wiki-all-6-3-tamber/ \
+  --topics nq-test \
+  --input runs/run.wiki-all-6-3-tamber.bm25.topics.nq.test.txt \
+  --output runs/run.wiki-all-6-3-tamber.bm25.topics.nq.test.txt.json \
+  --combine-title-text &
+```
+
+Evaluation can be performed using scripts from pyserini:
+
+```bash
+python -m pyserini.eval.evaluate_dpr_retrieval --topk 20 --retrieval runs/run.wiki-all-6-3-tamber.bm25.topics.dpr.nq.test.txt.json
+python -m pyserini.eval.evaluate_dpr_retrieval --topk 100 --retrieval runs/run.wiki-all-6-3-tamber.bm25.topics.dpr.nq.test.txt.json
+python -m pyserini.eval.evaluate_dpr_retrieval --topk 20 --retrieval runs/run.wiki-all-6-3-tamber.bm25.topics.dpr.trivia.test.txt.json
+python -m pyserini.eval.evaluate_dpr_retrieval --topk 100 --retrieval runs/run.wiki-all-6-3-tamber.bm25.topics.dpr.trivia.test.txt.json
+python -m pyserini.eval.evaluate_dpr_retrieval --topk 20 --retrieval runs/run.wiki-all-6-3-tamber.bm25.topics.dpr.squad.test.txt.json
+python -m pyserini.eval.evaluate_dpr_retrieval --topk 100 --retrieval runs/run.wiki-all-6-3-tamber.bm25.topics.dpr.squad.test.txt.json
+python -m pyserini.eval.evaluate_dpr_retrieval --topk 20 --retrieval runs/run.wiki-all-6-3-tamber.bm25.topics.dpr.wq.test.txt.json
+python -m pyserini.eval.evaluate_dpr_retrieval --topk 100 --retrieval runs/run.wiki-all-6-3-tamber.bm25.topics.dpr.wq.test.txt.json
+python -m pyserini.eval.evaluate_dpr_retrieval --topk 20 --retrieval runs/run.wiki-all-6-3-tamber.bm25.topics.dpr.curated.test.txt.json
+python -m pyserini.eval.evaluate_dpr_retrieval --topk 100 --retrieval runs/run.wiki-all-6-3-tamber.bm25.topics.dpr.curated.test.txt.json
+python -m pyserini.eval.evaluate_dpr_retrieval --topk 20 --retrieval runs/run.wiki-all-6-3-tamber.bm25.topics.nq.test.txt.json
+python -m pyserini.eval.evaluate_dpr_retrieval --topk 100 --retrieval runs/run.wiki-all-6-3-tamber.bm25.topics.nq.test.txt.json
+```
+
+## Effectiveness
+
+With the above commands, you should be able to reproduce the following results:
+
+| **top_20_accuracy**                                                                                          | **BM25 (default parameters)**|
+|:-------------------------------------------------------------------------------------------------------------|-----------|
+| [DPR: Natural Questions Test](https://github.com/facebookresearch/DPR)                                       | 0.6604    |
+| [DPR: TriviaQA Test](https://github.com/facebookresearch/DPR)                                                | 0.7832    |
+| [DPR: SQuAD Test](https://github.com/facebookresearch/DPR)                                                   | 0.7265    |
+| [DPR: WebQuestions Test](https://github.com/facebookresearch/DPR)                                            | 0.6403    |
+| [DPR: CuratedTREC Test](https://github.com/facebookresearch/DPR)                                             | 0.8055    |
+| [EfficientQA: Natural Questions Test](https://efficientqa.github.io/)                                        | 0.6665    |
+| **top_100_accuracy**                                                                                         | **BM25 (default parameters)**|
+| [DPR: Natural Questions Test](https://github.com/facebookresearch/DPR)                                       | 0.8083    |
+| [DPR: TriviaQA Test](https://github.com/facebookresearch/DPR)                                                | 0.8482    |
+| [DPR: SQuAD Test](https://github.com/facebookresearch/DPR)                                                   | 0.8325    |
+| [DPR: WebQuestions Test](https://github.com/facebookresearch/DPR)                                            | 0.7874    |
+| [DPR: CuratedTREC Test](https://github.com/facebookresearch/DPR)                                             | 0.9135    |
+| [EfficientQA: Natural Questions Test](https://efficientqa.github.io/)                                        | 0.8166    |
+
+## Reproduction Log[*](reproducibility.md)
+
+To add to this reproduction log, modify [this template](../src/main/resources/docgen/templates/wiki-all-6-3-tamber-bm25.template) and run `bin/build.sh` to rebuild the documentation.
diff --git a/src/main/resources/docgen/templates/wiki-all-6-3-tamber-bm25.template b/src/main/resources/docgen/templates/wiki-all-6-3-tamber-bm25.template
@@ -0,0 +1,58 @@
+# Anserini Regressions: QA with wiki-all-6-3-tamber Corpus
+
+**Models**: BM25
+
+This page documents QA regression experiments on the wiki-all-6-3-tamber corpus, which is integrated into Anserini's regression testing framework.
+
+The exact configurations for these regressions are stored in [this YAML file](${yaml}).
+Note that this page is automatically generated from [this template](${template}) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead.
+
+From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end:
+
+```bash
+python src/main/python/run_regression.py --index --verify --search --regression ${test_name}
+```
+
+## Indexing
+
+Typical indexing command:
+
+```bash
+${index_cmds}
+```
+
+The directory `/path/to/${corpus}/`should be a directory containing the wiki-all-6-3-tamber passages collection retrieved from [here](https://huggingface.co/datasets/castorini/odqa-wiki-corpora).
+
+For additional details, see explanation of [common indexing options](common-indexing-options.md).
+
+## Retrieval
+
+Topics are stored in [`src/main/resources/topics-and-qrels/`](../src/main/resources/topics-and-qrels/).
+The regression experiments here evaluate on the test set of multiple QA datasets, namely Natural Questions, TriviaQA, SQuAD, and WebQuestions.
+
+After indexing has completed, you should be able to perform retrieval as follows:
+
+```bash
+${ranking_cmds}
+```
+
+The trec format will need to be converted to DPR's JSON format for evaluation:
+```bash
+${converting_cmds}
+```
+
+Evaluation can be performed using scripts from pyserini:
+
+```bash
+${eval_cmds}
+```
+
+## Effectiveness
+
+With the above commands, you should be able to reproduce the following results:
+
+${effectiveness}
+
+## Reproduction Log[*](reproducibility.md)
+
+To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation.
diff --git a/src/main/resources/regression/wiki-all-6-3-tamber-bm25.yaml b/src/main/resources/regression/wiki-all-6-3-tamber-bm25.yaml
@@ -0,0 +1,84 @@
+---
+corpus: wiki-all-6-3-tamber
+corpus_path: collections/wikipedia/wiki-all-6-3-tamber
+
+index_path: indexes/lucene-index.wiki-all-6-3-tamber/
+collection_class: MrTyDiCollection
+generator_class: DefaultLuceneDocumentGenerator
+index_threads: 20
+index_options: -storeRaw
+index_stats:
+  documents: 76680040
+  documents (non-empty): 76680037
+  total terms: 5064706668
+
+conversions:  
+  - command: python -m pyserini.eval.convert_trec_run_to_dpr_retrieval_run
+    params:  --combine-title-text
+    in_file_ext: "" 
+    out_file_ext: .json
+
+metrics:
+  - metric: top_20_accuracy
+    command: python -m pyserini.eval.evaluate_dpr_retrieval
+    params:  --topk 20 --retrieval
+    separator: " "
+    parse_index: 1
+    metric_precision: 4
+    can_combine: false
+  - metric: top_100_accuracy
+    command: python -m pyserini.eval.evaluate_dpr_retrieval
+    params:  --topk 100 --retrieval
+    separator: " "
+    parse_index: 1
+    metric_precision: 4
+    can_combine: false
+
+topic_root: src/main/resources/topics-and-qrels/
+qrels_root:
+topics:
+  - name: "[DPR: Natural Questions Test](https://github.com/facebookresearch/DPR)"
+    id: dpr-nq-test
+    path: topics.dpr.nq.test.txt
+    topic_reader: DprNq
+  - name: "[DPR: TriviaQA Test](https://github.com/facebookresearch/DPR)"
+    id: dpr-trivia-test
+    path: topics.dpr.trivia.test.txt
+    topic_reader: DprNq
+  - name: "[DPR: SQuAD Test](https://github.com/facebookresearch/DPR)"
+    id: dpr-squad-test
+    path: topics.dpr.squad.test.txt
+    topic_reader: DprJsonl
+  - name: "[DPR: WebQuestions Test](https://github.com/facebookresearch/DPR)"
+    id: dpr-wq-test
+    path: topics.dpr.wq.test.txt
+    topic_reader: DprJsonl
+  - name: "[DPR: CuratedTREC Test](https://github.com/facebookresearch/DPR)"
+    id: dpr-curated-test
+    path: topics.dpr.curated.test.txt
+    topic_reader: DprJsonl
+    convert_params: --regex
+  - name: "[EfficientQA: Natural Questions Test](https://efficientqa.github.io/)"
+    id: nq-test
+    path: topics.nq.test.txt
+    topic_reader: DprNq
+
+models:
+  - name: bm25
+    display: BM25 (default parameters)
+    params: -bm25
+    results:
+      top_20_accuracy:
+        - 0.6604
+        - 0.7832
+        - 0.7265
+        - 0.6403
+        - 0.8055
+        - 0.6665
+      top_100_accuracy:
+        - 0.8083
+        - 0.8482
+        - 0.8325
+        - 0.7874
+        - 0.9135
+        - 0.8166