Skip to content

Commit

Permalink
Add DL23 passage and DL22/23 doc bindings + BM25 regressions (#2403)
Browse files Browse the repository at this point in the history
  • Loading branch information
lintool authored Mar 4, 2024
1 parent ed443a8 commit a67cabe
Show file tree
Hide file tree
Showing 107 changed files with 2,018 additions and 295 deletions.
48 changes: 24 additions & 24 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -379,19 +379,19 @@ See individual pages for details.

### MS MARCO V2 Passage Regressions

| | dev | DL21 | DL22 |
|--------------------------------------------|:---------------------------------------------------------------------------:|:---------------------------------------------------------------------:|:---------------------------------------------------------------------:|
| **Unsupervised Lexical, Original Corpus** |
| baselines | [+](docs/regressions/regressions-msmarco-v2-passage.md) | [+](docs/regressions/regressions-dl21-passage.md) | [+](docs/regressions/regressions-dl22-passage.md) |
| doc2query-T5 | [+](docs/regressions/regressions-msmarco-v2-passage-d2q-t5.md) | [+](docs/regressions/regressions-dl21-passage-d2q-t5.md) | [+](docs/regressions/regressions-dl22-passage-d2q-t5.md) |
| **Unsupervised Lexical, Augmented Corpus** |
| baselines | [+](docs/regressions/regressions-msmarco-v2-passage-augmented.md) | [+](docs/regressions/regressions-dl21-passage-augmented.md) | [+](docs/regressions/regressions-dl22-passage-augmented.md) |
| doc2query-T5 | [+](docs/regressions/regressions-msmarco-v2-passage-augmented-d2q-t5.md) | [+](docs/regressions/regressions-dl21-passage-augmented-d2q-t5.md) | [+](docs/regressions/regressions-dl22-passage-augmented-d2q-t5.md) |
| **Learned Sparse Lexical** |
| uniCOIL noexp zero-shot | [](docs/regressions/regressions-msmarco-v2-passage-unicoil-noexp-0shot.md) | [](docs/regressions/regressions-dl21-passage-unicoil-noexp-0shot.md) | [](docs/regressions/regressions-dl22-passage-unicoil-noexp-0shot.md) |
| uniCOIL with doc2query-T5 zero-shot | [](docs/regressions/regressions-msmarco-v2-passage-unicoil-0shot.md) | [](docs/regressions/regressions-dl21-passage-unicoil-0shot.md) | [](docs/regressions/regressions-dl22-passage-unicoil-0shot.md) |
| SPLADE++ CoCondenser-EnsembleDistil | [](docs/regressions/regressions-msmarco-v2-passage-splade-pp-ed.md) | [](docs/regressions/regressions-dl21-passage-splade-pp-ed.md) | [](docs/regressions/regressions-dl22-passage-splade-pp-ed.md) |
| SPLADE++ CoCondenser-SelfDistil | [](docs/regressions/regressions-msmarco-v2-passage-splade-pp-sd.md) | [](docs/regressions/regressions-dl21-passage-splade-pp-sd.md) | [](docs/regressions/regressions-dl22-passage-splade-pp-sd.md) |
| | dev | DL21 | DL22 | DL23 |
|--------------------------------------------|:---------------------------------------------------------------------------:|:---------------------------------------------------------------------:|:---------------------------------------------------------------------:|:-----------------------------------------------------------:|
| **Unsupervised Lexical, Original Corpus** | | | | |
| baselines | [+](docs/regressions/regressions-msmarco-v2-passage.md) | [+](docs/regressions/regressions-dl21-passage.md) | [+](docs/regressions/regressions-dl22-passage.md) | [+](docs/regressions/regressions-dl23-passage.md) |
| doc2query-T5 | [+](docs/regressions/regressions-msmarco-v2-passage-d2q-t5.md) | [+](docs/regressions/regressions-dl21-passage-d2q-t5.md) | [+](docs/regressions/regressions-dl22-passage-d2q-t5.md) | |
| **Unsupervised Lexical, Augmented Corpus** | | | | |
| baselines | [+](docs/regressions/regressions-msmarco-v2-passage-augmented.md) | [+](docs/regressions/regressions-dl21-passage-augmented.md) | [+](docs/regressions/regressions-dl22-passage-augmented.md) | [+](docs/regressions/regressions-dl23-passage-augmented.md) |
| doc2query-T5 | [+](docs/regressions/regressions-msmarco-v2-passage-augmented-d2q-t5.md) | [+](docs/regressions/regressions-dl21-passage-augmented-d2q-t5.md) | [+](docs/regressions/regressions-dl22-passage-augmented-d2q-t5.md) | |
| **Learned Sparse Lexical** | | | | |
| uniCOIL noexp zero-shot | [](docs/regressions/regressions-msmarco-v2-passage-unicoil-noexp-0shot.md) | [](docs/regressions/regressions-dl21-passage-unicoil-noexp-0shot.md) | [](docs/regressions/regressions-dl22-passage-unicoil-noexp-0shot.md) | |
| uniCOIL with doc2query-T5 zero-shot | [](docs/regressions/regressions-msmarco-v2-passage-unicoil-0shot.md) | [](docs/regressions/regressions-dl21-passage-unicoil-0shot.md) | [](docs/regressions/regressions-dl22-passage-unicoil-0shot.md) | |
| SPLADE++ CoCondenser-EnsembleDistil | [](docs/regressions/regressions-msmarco-v2-passage-splade-pp-ed.md) | [](docs/regressions/regressions-dl21-passage-splade-pp-ed.md) | [](docs/regressions/regressions-dl22-passage-splade-pp-ed.md) | |
| SPLADE++ CoCondenser-SelfDistil | [](docs/regressions/regressions-msmarco-v2-passage-splade-pp-sd.md) | [](docs/regressions/regressions-dl21-passage-splade-pp-sd.md) | [](docs/regressions/regressions-dl22-passage-splade-pp-sd.md) | |

### Available Corpora for Download

Expand All @@ -408,17 +408,17 @@ See individual pages for details.

### MS MARCO V2 Document Regressions

| | dev | DL21 |
|-----------------------------------------|:------------------------------------------------------------------------------------:|:------------------------------------------------------------------------------:|
| **Unsupervised Lexical, Complete Doc** |
| baselines | [+](docs/regressions/regressions-msmarco-v2-doc.md) | [+](docs/regressions/regressions-dl21-doc.md) |
| doc2query-T5 | [+](docs/regressions/regressions-msmarco-v2-doc-d2q-t5.md) | [+](docs/regressions/regressions-dl21-doc-d2q-t5.md) |
| **Unsupervised Lexical, Segmented Doc** |
| baselines | [+](docs/regressions/regressions-msmarco-v2-doc-segmented.md) | [+](docs/regressions/regressions-dl21-doc-segmented.md) |
| doc2query-T5 | [+](docs/regressions/regressions-msmarco-v2-doc-segmented-d2q-t5.md) | [+](docs/regressions/regressions-dl21-doc-segmented-d2q-t5.md) |
| **Learned Sparse Lexical** |
| uniCOIL noexp zero-shot | [](docs/regressions/regressions-msmarco-v2-doc-segmented-unicoil-noexp-0shot-v2.md) | [](docs/regressions/regressions-dl21-doc-segmented-unicoil-noexp-0shot-v2.md) |
| uniCOIL with doc2query-T5 zero-shot | [](docs/regressions/regressions-msmarco-v2-doc-segmented-unicoil-0shot-v2.md) | [](docs/regressions/regressions-dl21-doc-segmented-unicoil-0shot-v2.md) |
| | dev | DL21 | DL22 | DL23 |
|-----------------------------------------|:------------------------------------------------------------------------------------:|:------------------------------------------------------------------------------:|:-------------------------------------------------------:|:-------------------------------------------------------:|
| **Unsupervised Lexical, Complete Doc** | | | | |
| baselines | [+](docs/regressions/regressions-msmarco-v2-doc.md) | [+](docs/regressions/regressions-dl21-doc.md) | [+](docs/regressions/regressions-dl22-doc.md) | [+](docs/regressions/regressions-dl23-doc.md) |
| doc2query-T5 | [+](docs/regressions/regressions-msmarco-v2-doc-d2q-t5.md) | [+](docs/regressions/regressions-dl21-doc-d2q-t5.md) | | |
| **Unsupervised Lexical, Segmented Doc** | | | | |
| baselines | [+](docs/regressions/regressions-msmarco-v2-doc-segmented.md) | [+](docs/regressions/regressions-dl21-doc-segmented.md) | [+](docs/regressions/regressions-dl22-doc-segmented.md) | [+](docs/regressions/regressions-dl23-doc-segmented.md) |
| doc2query-T5 | [+](docs/regressions/regressions-msmarco-v2-doc-segmented-d2q-t5.md) | [+](docs/regressions/regressions-dl21-doc-segmented-d2q-t5.md) | | |
| **Learned Sparse Lexical** | | | | |
| uniCOIL noexp zero-shot | [](docs/regressions/regressions-msmarco-v2-doc-segmented-unicoil-noexp-0shot-v2.md) | [](docs/regressions/regressions-dl21-doc-segmented-unicoil-noexp-0shot-v2.md) | | |
| uniCOIL with doc2query-T5 zero-shot | [](docs/regressions/regressions-msmarco-v2-doc-segmented-unicoil-0shot-v2.md) | [](docs/regressions/regressions-dl21-doc-segmented-unicoil-0shot-v2.md) | | |

### Available Corpora for Download

Expand Down
9 changes: 9 additions & 0 deletions docs/regressions.md
Original file line number Diff line number Diff line change
Expand Up @@ -207,6 +207,15 @@ nohup python src/main/python/run_regression.py --index --verify --search --regre
nohup python src/main/python/run_regression.py --index --verify --search --regression dl22-passage-unicoil-0shot >& logs/log.dl22-passage-unicoil-0shot &
nohup python src/main/python/run_regression.py --index --verify --search --regression dl22-passage-splade-pp-ed >& logs/log.dl22-passage-splade-pp-ed &
nohup python src/main/python/run_regression.py --index --verify --search --regression dl22-passage-splade-pp-sd >& logs/log.dl22-passage-splade-pp-sd &

nohup python src/main/python/run_regression.py --index --verify --search --regression dl22-doc >& logs/log.dl22-doc &
nohup python src/main/python/run_regression.py --index --verify --search --regression dl22-doc-segmented >& logs/log.dl22-doc-segmented &

nohup python src/main/python/run_regression.py --index --verify --search --regression dl23-passage >& logs/log.dl23-passage &
nohup python src/main/python/run_regression.py --index --verify --search --regression dl23-passage-augmented >& logs/log.dl23-passage-augmented &

nohup python src/main/python/run_regression.py --index --verify --search --regression dl23-doc >& logs/log.dl23-doc &
nohup python src/main/python/run_regression.py --index --verify --search --regression dl23-doc-segmented >& logs/log.dl23-doc-segmented &
```

</details>
Expand Down
9 changes: 4 additions & 5 deletions docs/regressions/regressions-dl21-doc-d2q-t5.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,14 +2,14 @@

**Models**: BM25 on complete documents with doc2query-T5 expansions

This page describes experiments, integrated into Anserini's regression testing framework, on the [TREC 2021 Deep Learning Track document ranking task](https://trec.nist.gov/data/deep2021.html) using the MS MARCO V2 document collection (with doc2query-T5 expansions).
This page describes experiments, integrated into Anserini's regression testing framework, on the [TREC 2021 Deep Learning Track document ranking task](https://trec.nist.gov/data/deep2021.html) using the MS MARCO V2 document corpus (with doc2query-T5 expansions).
For additional instructions on working with the MS MARCO V2 document corpus, refer to [this page](../../docs/experiments-msmarco-v2.md).

Note that the NIST relevance judgments provide far more relevant documents per topic, unlike the "sparse" judgments provided by Microsoft (these are sometimes called "dense" judgments to emphasize this contrast).
For additional instructions on working with MS MARCO V2 document collection, refer to [this page](../../docs/experiments-msmarco-v2.md).

Note that there are four different bag-of-words regression conditions for this task, and this page describes the following:

+ **Indexing Condition:** each document in the MS MARCO V2 document collection is treated as a unit of indexing
+ **Indexing Condition:** each document in the MS MARCO V2 document corpus is treated as a unit of indexing
+ **Expansion Condition:** doc2query-T5

The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/dl21-doc-d2q-t5.yaml).
Expand Down Expand Up @@ -43,8 +43,7 @@ For additional details, see explanation of [common indexing options](../../docs/
## Retrieval

Topics and qrels are stored [here](https://github.com/castorini/anserini-tools/tree/master/topics-and-qrels), which is linked to the Anserini repo as a submodule.
The regression experiments here evaluate on the 57 topics for which NIST has provided judgments as part of the TREC 2021 Deep Learning Track.
The original data can be found [here](https://trec.nist.gov/data/deep2021.html).
The regression experiments here evaluate on the 57 topics for which NIST has provided judgments as part of the [TREC 2021 Deep Learning Track](https://trec.nist.gov/data/deep2021.html).

After indexing has completed, you should be able to perform retrieval as follows:

Expand Down
9 changes: 4 additions & 5 deletions docs/regressions/regressions-dl21-doc-segmented-d2q-t5.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,14 +2,14 @@

**Models**: BM25 on segmented documents with doc2query-T5 expansions

This page describes experiments, integrated into Anserini's regression testing framework, on the [TREC 2021 Deep Learning Track document ranking task](https://trec.nist.gov/data/deep2021.html) using the MS MARCO V2 _segmented_ document collection (with doc2query-T5 expansions).
This page describes experiments, integrated into Anserini's regression testing framework, on the [TREC 2021 Deep Learning Track document ranking task](https://trec.nist.gov/data/deep2021.html) using the MS MARCO V2 _segmented_ document corpus (with doc2query-T5 expansions).
For additional instructions on working with the MS MARCO V2 document corpus, refer to [this page](../../docs/experiments-msmarco-v2.md).

Note that the NIST relevance judgments provide far more relevant documents per topic, unlike the "sparse" judgments provided by Microsoft (these are sometimes called "dense" judgments to emphasize this contrast).
For additional instructions on working with MS MARCO V2 document collection, refer to [this page](../../docs/experiments-msmarco-v2.md).

Note that there are four different bag-of-words regression conditions for this task, and this page describes the following:

+ **Indexing Condition:** each segment in the MS MARCO V2 _segmented_ document collection is treated as a unit of indexing
+ **Indexing Condition:** each segment in the MS MARCO V2 _segmented_ document corpus is treated as a unit of indexing
+ **Expansion Condition:** doc2query-T5

The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/dl21-doc-segmented-d2q-t5.yaml).
Expand Down Expand Up @@ -43,8 +43,7 @@ For additional details, see explanation of [common indexing options](../../docs/
## Retrieval

Topics and qrels are stored [here](https://github.com/castorini/anserini-tools/tree/master/topics-and-qrels), which is linked to the Anserini repo as a submodule.
The regression experiments here evaluate on the 57 topics for which NIST has provided judgments as part of the TREC 2021 Deep Learning Track.
The original data can be found [here](https://trec.nist.gov/data/deep2021.html).
The regression experiments here evaluate on the 57 topics for which NIST has provided judgments as part of the [TREC 2021 Deep Learning Track](https://trec.nist.gov/data/deep2021.html).

After indexing has completed, you should be able to perform retrieval as follows:

Expand Down
Loading

0 comments on commit a67cabe

Please sign in to comment.