From 8010d5c0b066f0316c6c506170274f8f7d558f73 Mon Sep 17 00:00:00 2001 From: Jimmy Lin Date: Tue, 28 Jun 2022 14:42:51 -0400 Subject: [PATCH] Add more Rocchio conditions for MS MARCO v1 and V2 (#1921) Additional changes: + Tweaks to experiments-msmarco-passage.md and experiments-msmarco-doc.md + Fixed (some) incorrect dates on when tuning was performed for MS MARCO v1/v2 doc/passage (and d2q-T5) + Added missing tuned2 conditions to dl19-doc + Added missing ax/bm25prf conditions to dl20-doc and msmarco-doc + Fixed bug in neg Rocchio condition on passage d2q (-rerankCutoff 1000) --- docs/experiments-msmarco-doc.md | 125 +++++++++++------- docs/experiments-msmarco-passage.md | 56 ++++---- docs/regressions-dl19-doc-docTTTTTquery.md | 44 ++++-- ...ssions-dl19-doc-segmented-docTTTTTquery.md | 44 ++++-- docs/regressions-dl19-doc-segmented.md | 2 +- docs/regressions-dl19-doc.md | 93 +++++++++++-- .../regressions-dl19-passage-docTTTTTquery.md | 90 +++++++++++-- docs/regressions-dl20-doc-docTTTTTquery.md | 44 ++++-- ...ssions-dl20-doc-segmented-docTTTTTquery.md | 44 ++++-- docs/regressions-dl20-doc-segmented.md | 2 +- docs/regressions-dl20-doc.md | 94 +++++++++++-- .../regressions-dl20-passage-docTTTTTquery.md | 90 +++++++++++-- docs/regressions-msmarco-doc-docTTTTTquery.md | 44 ++++-- ...ons-msmarco-doc-segmented-docTTTTTquery.md | 44 ++++-- docs/regressions-msmarco-doc-segmented.md | 2 +- docs/regressions-msmarco-doc.md | 94 +++++++++++-- ...gressions-msmarco-passage-docTTTTTquery.md | 14 +- docs/regressions-msmarco-passage.md | 2 +- .../templates/dl19-doc-docTTTTTquery.template | 2 +- .../dl19-doc-segmented-docTTTTTquery.template | 2 +- .../templates/dl19-doc-segmented.template | 2 +- .../docgen/templates/dl19-doc.template | 3 +- .../templates/dl20-doc-docTTTTTquery.template | 2 +- .../dl20-doc-segmented-docTTTTTquery.template | 2 +- .../templates/dl20-doc-segmented.template | 2 +- .../docgen/templates/dl20-doc.template | 4 +- .../msmarco-doc-docTTTTTquery.template | 2 +- ...marco-doc-segmented-docTTTTTquery.template | 2 +- .../templates/msmarco-doc-segmented.template | 2 +- .../docgen/templates/msmarco-doc.template | 4 +- .../docgen/templates/msmarco-passage.template | 2 +- .../regression/dl19-doc-docTTTTTquery.yaml | 24 ++++ .../dl19-doc-segmented-docTTTTTquery.yaml | 24 ++++ src/main/resources/regression/dl19-doc.yaml | 72 ++++++++++ .../dl19-passage-docTTTTTquery.yaml | 74 ++++++++++- .../regression/dl20-doc-docTTTTTquery.yaml | 24 ++++ .../dl20-doc-segmented-docTTTTTquery.yaml | 24 ++++ src/main/resources/regression/dl20-doc.yaml | 74 ++++++++++- .../dl20-passage-docTTTTTquery.yaml | 74 ++++++++++- .../regression/msmarco-doc-docTTTTTquery.yaml | 24 ++++ .../msmarco-doc-segmented-docTTTTTquery.yaml | 24 ++++ .../resources/regression/msmarco-doc.yaml | 74 ++++++++++- .../msmarco-passage-docTTTTTquery.yaml | 30 ++--- 43 files changed, 1269 insertions(+), 232 deletions(-) diff --git a/docs/experiments-msmarco-doc.md b/docs/experiments-msmarco-doc.md index 758f1a38d2..4ed6c38008 100644 --- a/docs/experiments-msmarco-doc.md +++ b/docs/experiments-msmarco-doc.md @@ -3,7 +3,7 @@ This page contains instructions for running BM25 baselines on the [MS MARCO *document* ranking task](https://microsoft.github.io/msmarco/). Note that there is a separate [MS MARCO *passage* ranking task](experiments-msmarco-passage.md). -**Setup Note:** If you're instantiating an Ubuntu VM on your system or on cloud (AWS and GCP), try to provision enough resources as the tasks such as building the index could take some time to finish such as RAM > 8GB and storage > 100 GB (SSD). This will prevent going back and fixing machine configuration again and again. +This exercise will require a machine with >8 GB RAM and at least 40 GB free disk space. If you're a Waterloo undergraduate going through this guide as the [screening exercise](https://github.com/lintool/guide/blob/master/ura.md) of joining my research group, make sure you do the [passage ranking exercise](experiments-msmarco-passage.md) first. Similarly, try to understand what you're actually doing, instead of simply [cargo culting](https://en.wikipedia.org/wiki/Cargo_cult_programming) (i.e., blinding copying and pasting commands into a shell). @@ -13,7 +13,7 @@ Similarly, try to understand what you're actually doing, instead of simply [carg We're going to use the repository's root directory as the working directory. First, we need to download and extract the MS MARCO document dataset: -``` +```bash mkdir collections/msmarco-doc wget https://msmarco.blob.core.windows.net/msmarcoranking/msmarco-docs.trec.gz -P collections/msmarco-doc @@ -30,10 +30,14 @@ To confirm, `msmarco-docs.trec.gz` should have MD5 checksum of `d4863e4f342982b5 There's no need to uncompress the file, as Anserini can directly index gzipped files. Build the index with the following command: -``` -sh target/appassembler/bin/IndexCollection -threads 1 -collection CleanTrecCollection \ - -generator DefaultLuceneDocumentGenerator -input collections/msmarco-doc \ - -index indexes/msmarco-doc/lucene-index-msmarco -storePositions -storeDocvectors -storeRaw +```bash +target/appassembler/bin/IndexCollection \ + -collection CleanTrecCollection \ + -input collections/msmarco-doc \ + -index indexes/msmarco-doc/lucene-index-msmarco \ + -generator DefaultLuceneDocumentGenerator \ + -threads 1 \ + -storePositions -storeDocvectors -storeRaw ``` On a modern desktop with an SSD, indexing takes around 40 minutes. @@ -45,11 +49,14 @@ There should be a total of 3,213,835 documents indexed. After indexing finishes, we can do a retrieval run. The dev queries are already stored in our repo: -``` -target/appassembler/bin/SearchCollection -hits 1000 -parallelism 4 \ - -index indexes/msmarco-doc/lucene-index-msmarco \ - -topicreader TsvInt -topics src/main/resources/topics-and-qrels/topics.msmarco-doc.dev.txt \ - -output runs/run.msmarco-doc.dev.bm25.txt -bm25 +```bash +target/appassembler/bin/SearchCollection \ + -index indexes/msmarco-doc/lucene-index-msmarco \ + -topics src/main/resources/topics-and-qrels/topics.msmarco-doc.dev.txt \ + -topicreader TsvInt \ + -output runs/run.msmarco-doc.dev.bm25.txt \ + -parallelism 4 \ + -bm25 -hits 1000 ``` Retrieval speed will vary by machine: @@ -58,8 +65,9 @@ Adjust the parallelism by changing the `-parallelism` argument. After the run completes, we can evaluate with `trec_eval`: -``` -$ tools/eval/trec_eval.9.0.4/trec_eval -c -mmap -mrecall.1000 src/main/resources/topics-and-qrels/qrels.msmarco-doc.dev.txt runs/run.msmarco-doc.dev.bm25.txt +```bash +$ tools/eval/trec_eval.9.0.4/trec_eval -c -mmap -mrecall.1000 \ + src/main/resources/topics-and-qrels/qrels.msmarco-doc.dev.txt runs/run.msmarco-doc.dev.bm25.txt map all 0.2310 recall_1000 all 0.8856 ``` @@ -67,7 +75,7 @@ recall_1000 all 0.8856 Let's compare to the baselines provided by Microsoft. First, download: -``` +```bash wget https://msmarco.blob.core.windows.net/msmarcoranking/msmarco-docdev-top100.gz -P runs gunzip runs/msmarco-docdev-top100.gz ``` @@ -75,11 +83,13 @@ gunzip runs/msmarco-docdev-top100.gz Then, run `trec_eval` to compare. Note that to be fair, we restrict evaluation to top 100 hits per topic (which is what Microsoft provides): -``` -$ tools/eval/trec_eval.9.0.4/trec_eval -c -mmap -M 100 src/main/resources/topics-and-qrels/qrels.msmarco-doc.dev.txt runs/msmarco-docdev-top100 +```bash +$ tools/eval/trec_eval.9.0.4/trec_eval -c -mmap -M 100 \ + src/main/resources/topics-and-qrels/qrels.msmarco-doc.dev.txt runs/msmarco-docdev-top100 map all 0.2219 -$ tools/eval/trec_eval.9.0.4/trec_eval -c -mmap -M 100 src/main/resources/topics-and-qrels/qrels.msmarco-doc.dev.txt runs/run.msmarco-doc.dev.bm25.txt +$ tools/eval/trec_eval.9.0.4/trec_eval -c -mmap -M 100 \ + src/main/resources/topics-and-qrels/qrels.msmarco-doc.dev.txt runs/run.msmarco-doc.dev.bm25.txt map all 0.2303 ``` @@ -91,18 +101,22 @@ Let's try to reproduce runs on there! A few minor details to pay attention to: the official metric is MRR@100, so we want to only return the top 100 hits, and the submission files to the leaderboard have a slightly different format. ```bash -target/appassembler/bin/SearchCollection -hits 100 -parallelism 4 \ - -index indexes/msmarco-doc/lucene-index-msmarco \ - -topicreader TsvInt -topics src/main/resources/topics-and-qrels/topics.msmarco-doc.dev.txt \ - -output runs/run.msmarco-doc.leaderboard-dev.bm25base.txt -format msmarco \ - -bm25 -bm25.k1 0.9 -bm25.b 0.4 +target/appassembler/bin/SearchCollection \ + -index indexes/msmarco-doc/lucene-index-msmarco \ + -topics src/main/resources/topics-and-qrels/topics.msmarco-doc.dev.txt \ + -topicreader TsvInt \ + -output runs/run.msmarco-doc.leaderboard-dev.bm25base.txt -format msmarco \ + -parallelism 4 \ + -bm25 -bm25.k1 0.9 -bm25.b 0.4 -hits 100 ``` The command above uses the default BM25 parameters (`k1=0.9`, `b=0.4`), and note we set `-hits 100`. Command for evaluation: ```bash -$ python tools/scripts/msmarco/msmarco_doc_eval.py --judgments src/main/resources/topics-and-qrels/qrels.msmarco-doc.dev.txt --run runs/run.msmarco-doc.leaderboard-dev.bm25base.txt +$ python tools/scripts/msmarco/msmarco_doc_eval.py \ + --judgments src/main/resources/topics-and-qrels/qrels.msmarco-doc.dev.txt \ + --run runs/run.msmarco-doc.leaderboard-dev.bm25base.txt ##################### MRR @100: 0.23005723505603573 QueriesRanked: 5193 @@ -114,17 +128,21 @@ The above run corresponds to "Anserini's BM25, default parameters (k1=0.9, b=0.4 Here's the invocation for BM25 with parameters optimized for recall@100 (`k1=4.46`, `b=0.82`): ```bash -target/appassembler/bin/SearchCollection -hits 100 -parallelism 4 \ - -index indexes/msmarco-doc/lucene-index-msmarco \ - -topicreader TsvInt -topics src/main/resources/topics-and-qrels/topics.msmarco-doc.dev.txt \ - -output runs/run.msmarco-doc.leaderboard-dev.bm25tuned.txt -format msmarco \ - -bm25 -bm25.k1 4.46 -bm25.b 0.82 +target/appassembler/bin/SearchCollection \ + -index indexes/msmarco-doc/lucene-index-msmarco \ + -topics src/main/resources/topics-and-qrels/topics.msmarco-doc.dev.txt \ + -topicreader TsvInt \ + -output runs/run.msmarco-doc.leaderboard-dev.bm25tuned.txt -format msmarco \ + -parallelism 4 \ + -bm25 -bm25.k1 4.46 -bm25.b 0.82 -hits 100 ``` Command for evaluation: ```bash -$ python tools/scripts/msmarco/msmarco_doc_eval.py --judgments src/main/resources/topics-and-qrels/qrels.msmarco-doc.dev.txt --run runs/run.msmarco-doc.leaderboard-dev.bm25tuned.txt +$ python tools/scripts/msmarco/msmarco_doc_eval.py \ + --judgments src/main/resources/topics-and-qrels/qrels.msmarco-doc.dev.txt \ + --run runs/run.msmarco-doc.leaderboard-dev.bm25tuned.txt ##################### MRR @100: 0.2770296928568702 QueriesRanked: 5193 @@ -139,7 +157,7 @@ It is well known that BM25 parameter tuning is important. The setting of `k1=0.9`, `b=0.4` is often used as a default. Let's try to do better! -We tuned BM25 using the queries found [here](https://github.com/castorini/Anserini-data/tree/master/MSMARCO): these are five different sets of 10k samples from the training queries (using the `shuf` command). +We tuned BM25 using the queries found [here](https://github.com/castorini/anserini-data/tree/master/MSMARCO): these are five different sets of 10k samples from the training queries (using the `shuf` command). The basic approach is grid search of parameter values in tenth increments. We tuned on each individual set and then averaged parameter values across all five sets (this has the effect of regularization). In separate trials, we optimized for: @@ -151,35 +169,42 @@ It turns out that optimizing for MRR@10 and MAP yields the same settings. Here's the comparison between different parameter settings: -Setting | MRR@100 | MAP | Recall@1000 | -:----------------------------------------------------------------------|--------:|-------:|------------:| -Default (`k1=0.9`, `b=0.4`) | 0.2301 | 0.2310 | 0.8856 | -Optimized for MRR@100/MAP (`k1=3.8`, `b=0.87`) | 0.2784 | 0.2789 | 0.9326 | -Optimized for recall@100 (`k1=4.46`, `b=0.82`) | 0.2770 | 0.2775 | 0.9357 | +| Setting | MRR@100 | MAP | Recall@1000 | +|:-----------------------------------------------|--------:|-------:|------------:| +| Default (`k1=0.9`, `b=0.4`) | 0.2301 | 0.2310 | 0.8856 | +| Optimized for MRR@100/MAP (`k1=3.8`, `b=0.87`) | 0.2784 | 0.2789 | 0.9326 | +| Optimized for recall@100 (`k1=4.46`, `b=0.82`) | 0.2770 | 0.2775 | 0.9357 | As expected, BM25 tuning makes a big difference! Note that MRR@100 is computed with the leaderboard eval script (with 100 hits per query), while the other two metrics are computed with `trec_eval` (with 1000 hits per query). So, we need to use different search programs, for example: -``` -$ target/appassembler/bin/SearchCollection -hits 1000 -parallelism 4 \ - -index indexes/msmarco-doc/lucene-index-msmarco \ - -topicreader TsvInt -topics src/main/resources/topics-and-qrels/topics.msmarco-doc.dev.txt \ - -output runs/run.msmarco-doc.dev.opt-mrr.txt \ - -bm25 -bm25.k1 3.8 -bm25.b 0.87 - -$ tools/eval/trec_eval.9.0.4/trec_eval -c -mmap -mrecall.1000 src/main/resources/topics-and-qrels/qrels.msmarco-doc.dev.txt runs/run.msmarco-doc.dev.opt-mrr.txt +```bash +$ target/appassembler/bin/SearchCollection \ + -index indexes/msmarco-doc/lucene-index-msmarco \ + -topics src/main/resources/topics-and-qrels/topics.msmarco-doc.dev.txt \ + -topicreader TsvInt \ + -output runs/run.msmarco-doc.dev.opt-mrr.txt \ + -parallelism 4 \ + -bm25 -bm25.k1 3.8 -bm25.b 0.87 -hits 1000 + +$ tools/eval/trec_eval.9.0.4/trec_eval -c -mmap -mrecall.1000 \ + src/main/resources/topics-and-qrels/qrels.msmarco-doc.dev.txt runs/run.msmarco-doc.dev.opt-mrr.txt map all 0.2789 recall_1000 all 0.9326 -$ target/appassembler/bin/SearchCollection -hits 100 -parallelism 4 \ - -index indexes/msmarco-doc/lucene-index-msmarco \ - -topicreader TsvInt -topics src/main/resources/topics-and-qrels/topics.msmarco-doc.dev.txt \ - -output runs/run.msmarco-doc.leaderboard-dev.opt-mrr.txt -format msmarco \ - -bm25 -bm25.k1 3.8 -bm25.b 0.87 - -$ python tools/scripts/msmarco/msmarco_doc_eval.py --judgments src/main/resources/topics-and-qrels/qrels.msmarco-doc.dev.txt --run runs/run.msmarco-doc.leaderboard-dev.opt-mrr.txt +$ target/appassembler/bin/SearchCollection \ + -index indexes/msmarco-doc/lucene-index-msmarco \ + -topics src/main/resources/topics-and-qrels/topics.msmarco-doc.dev.txt \ + -topicreader TsvInt \ + -output runs/run.msmarco-doc.leaderboard-dev.opt-mrr.txt -format msmarco \ + -parallelism 4 \ + -bm25 -bm25.k1 3.8 -bm25.b 0.87 -hits 100 + +$ python tools/scripts/msmarco/msmarco_doc_eval.py \ + --judgments src/main/resources/topics-and-qrels/qrels.msmarco-doc.dev.txt \ + --run runs/run.msmarco-doc.leaderboard-dev.opt-mrr.txt ##################### MRR @100: 0.27836767424339787 QueriesRanked: 5193 diff --git a/docs/experiments-msmarco-passage.md b/docs/experiments-msmarco-passage.md index bf5ca7e1a7..db21996661 100644 --- a/docs/experiments-msmarco-passage.md +++ b/docs/experiments-msmarco-passage.md @@ -2,9 +2,8 @@ This page contains instructions for running BM25 baselines on the [MS MARCO *passage* ranking task](https://microsoft.github.io/msmarco/). Note that there is a separate [MS MARCO *document* ranking task](experiments-msmarco-doc.md). -We also have a [separate page](experiments-doc2query.md) describing document expansion experiments (doc2query) for this task. -**Setup Note:** If you're instantiating an Ubuntu VM on your system or on cloud (AWS and GCP) for this particular task, try to provision enough resources as the tasks could take some time to finish such as RAM > 6GB and storage ~ 100 GB (SSD). This will prevent going back and fixing machine configuration again and again. +This exercise will require a machine with >8 GB RAM and at least 15 GB free disk space . If you're a Waterloo undergraduate going through this guide as the [screening exercise](https://github.com/lintool/guide/blob/master/ura.md) of joining my research group, try to understand what you're actually doing, instead of simply [cargo culting](https://en.wikipedia.org/wiki/Cargo_cult_programming) (i.e., blinding copying and pasting commands into a shell). In particular, you'll want to pay attention to the "What's going on here?" sections. @@ -58,8 +57,8 @@ Next, we need to convert the MS MARCO tsv collection into Anserini's jsonl files ```bash python tools/scripts/msmarco/convert_collection_to_jsonl.py \ - --collection-path collections/msmarco-passage/collection.tsv \ - --output-folder collections/msmarco-passage/collection_jsonl + --collection-path collections/msmarco-passage/collection.tsv \ + --output-folder collections/msmarco-passage/collection_jsonl ``` The above script should generate 9 jsonl files in `collections/msmarco-passage/collection_jsonl`, each with 1M lines (except for the last one, which should have 841,823 lines). @@ -70,9 +69,12 @@ The above script should generate 9 jsonl files in `collections/msmarco-passage/c We can now index these docs as a `JsonCollection` using Anserini: ```bash -sh target/appassembler/bin/IndexCollection -threads 9 -collection JsonCollection \ - -generator DefaultLuceneDocumentGenerator -input collections/msmarco-passage/collection_jsonl \ - -index indexes/msmarco-passage/lucene-index-msmarco -storePositions -storeDocvectors -storeRaw +target/appassembler/bin/IndexCollection \ + -collection JsonCollection \ + -input collections/msmarco-passage/collection_jsonl \ + -index indexes/msmarco-passage/lucene-index-msmarco \ + -generator DefaultLuceneDocumentGenerator \ + -threads 9 -storePositions -storeDocvectors -storeRaw ``` Upon completion, we should have an index with 8,841,823 documents. @@ -85,9 +87,9 @@ Since queries of the set are too many (+100k), it would take a long time to retr ```bash python tools/scripts/msmarco/filter_queries.py \ - --qrels collections/msmarco-passage/qrels.dev.small.tsv \ - --queries collections/msmarco-passage/queries.dev.tsv \ - --output collections/msmarco-passage/queries.dev.small.tsv + --qrels collections/msmarco-passage/qrels.dev.small.tsv \ + --queries collections/msmarco-passage/queries.dev.tsv \ + --output collections/msmarco-passage/queries.dev.small.tsv ``` The output queries file should contain 6980 lines. @@ -119,11 +121,13 @@ These queries are taken from Bing search logs, so they're "realistic" web querie We can now perform a retrieval run using this smaller set of queries: ```bash -sh target/appassembler/bin/SearchCollection -hits 1000 -parallelism 4 \ - -index indexes/msmarco-passage/lucene-index-msmarco \ - -topicreader TsvInt -topics collections/msmarco-passage/queries.dev.small.tsv \ - -output runs/run.msmarco-passage.dev.small.tsv -format msmarco \ - -bm25 -bm25.k1 0.82 -bm25.b 0.68 +target/appassembler/bin/SearchCollection \ + -index indexes/msmarco-passage/lucene-index-msmarco \ + -topics collections/msmarco-passage/queries.dev.small.tsv \ + -topicreader TsvInt \ + -output runs/run.msmarco-passage.dev.small.tsv -format msmarco \ + -parallelism 4 \ + -bm25 -bm25.k1 0.82 -bm25.b 0.68 -hits 1000 ``` The above command uses BM25 with tuned parameters `k1=0.82`, `b=0.68`. @@ -244,19 +248,19 @@ For that we first need to convert runs and qrels files to the TREC format: ```bash python tools/scripts/msmarco/convert_msmarco_to_trec_run.py \ - --input runs/run.msmarco-passage.dev.small.tsv \ - --output runs/run.msmarco-passage.dev.small.trec + --input runs/run.msmarco-passage.dev.small.tsv \ + --output runs/run.msmarco-passage.dev.small.trec python tools/scripts/msmarco/convert_msmarco_to_trec_qrels.py \ - --input collections/msmarco-passage/qrels.dev.small.tsv \ - --output collections/msmarco-passage/qrels.dev.small.trec + --input collections/msmarco-passage/qrels.dev.small.tsv \ + --output collections/msmarco-passage/qrels.dev.small.trec ``` And run the `trec_eval` tool: ```bash tools/eval/trec_eval.9.0.4/trec_eval -c -mrecall.1000 -mmap \ - collections/msmarco-passage/qrels.dev.small.trec runs/run.msmarco-passage.dev.small.trec + collections/msmarco-passage/qrels.dev.small.trec runs/run.msmarco-passage.dev.small.trec ``` The output should be: @@ -296,13 +300,11 @@ It turns out that optimizing for MRR@10 and MAP yields the same settings. Here's the comparison between the Anserini default and optimized parameters: -Setting | MRR@10 | MAP | Recall@1000 | -:---------------------------|-------:|-------:|------------:| -Default (`k1=0.9`, `b=0.4`) | 0.1840 | 0.1926 | 0.8526 -Optimized for recall@1000 (`k1=0.82`, `b=0.68`) | 0.1874 | 0.1957 | 0.8573 -Optimized for MRR@10/MAP (`k1=0.60`, `b=0.62`) | 0.1892 | 0.1972 | 0.8555 - -To reproduce these results, the `SearchMsmarco` class above takes `k1` and `b` parameters as command-line arguments, e.g., `-k1 0.60 -b 0.62` (note that the default setting is `k1=0.82` and `b=0.68`). +| Setting | MRR@10 | MAP | Recall@1000 | +|:------------------------------------------------|-------:|-------:|------------:| +| Default (`k1=0.9`, `b=0.4`) | 0.1840 | 0.1926 | 0.8526 | +| Optimized for recall@1000 (`k1=0.82`, `b=0.68`) | 0.1874 | 0.1957 | 0.8573 | +| Optimized for MRR@10/MAP (`k1=0.60`, `b=0.62`) | 0.1892 | 0.1972 | 0.8555 | As mentioned above, the BM25 run with `k1=0.82`, `b=0.68` corresponds to the entry "BM25 (Lucene8, tuned)" dated 2019/06/26 on the [MS MARCO Passage Ranking Leaderboard](https://microsoft.github.io/msmarco/). The BM25 run with default parameters `k1=0.9`, `b=0.4` roughly corresponds to the entry "BM25 (Anserini)" dated 2019/04/10 (but Anserini was using Lucene 7.6 at the time). diff --git a/docs/regressions-dl19-doc-docTTTTTquery.md b/docs/regressions-dl19-doc-docTTTTTquery.md index fa7652964b..5a76d12163 100644 --- a/docs/regressions-dl19-doc-docTTTTTquery.md +++ b/docs/regressions-dl19-doc-docTTTTTquery.md @@ -69,6 +69,13 @@ target/appassembler/bin/SearchCollection \ -output runs/run.msmarco-doc-docTTTTTquery.bm25-default+rm3.topics.dl19-doc.txt \ -bm25 -rm3 & +target/appassembler/bin/SearchCollection \ + -index indexes/lucene-index.msmarco-doc-docTTTTTquery/ \ + -topics src/main/resources/topics-and-qrels/topics.dl19-doc.txt \ + -topicreader TsvInt \ + -output runs/run.msmarco-doc-docTTTTTquery.bm25-default+rocchio.topics.dl19-doc.txt \ + -bm25 -rocchio & + target/appassembler/bin/SearchCollection \ -index indexes/lucene-index.msmarco-doc-docTTTTTquery/ \ -topics src/main/resources/topics-and-qrels/topics.dl19-doc.txt \ @@ -82,6 +89,13 @@ target/appassembler/bin/SearchCollection \ -topicreader TsvInt \ -output runs/run.msmarco-doc-docTTTTTquery.bm25-tuned+rm3.topics.dl19-doc.txt \ -bm25 -bm25.k1 4.68 -bm25.b 0.87 -rm3 & + +target/appassembler/bin/SearchCollection \ + -index indexes/lucene-index.msmarco-doc-docTTTTTquery/ \ + -topics src/main/resources/topics-and-qrels/topics.dl19-doc.txt \ + -topicreader TsvInt \ + -output runs/run.msmarco-doc-docTTTTTquery.bm25-tuned+rocchio.topics.dl19-doc.txt \ + -bm25 -bm25.k1 4.68 -bm25.b 0.87 -rocchio & ``` Evaluation can be performed using `trec_eval`: @@ -97,6 +111,11 @@ tools/eval/trec_eval.9.0.4/trec_eval -c -m ndcg_cut.10 src/main/resources/topics tools/eval/trec_eval.9.0.4/trec_eval -c -m recall.100 src/main/resources/topics-and-qrels/qrels.dl19-doc.txt runs/run.msmarco-doc-docTTTTTquery.bm25-default+rm3.topics.dl19-doc.txt tools/eval/trec_eval.9.0.4/trec_eval -c -m recall.1000 src/main/resources/topics-and-qrels/qrels.dl19-doc.txt runs/run.msmarco-doc-docTTTTTquery.bm25-default+rm3.topics.dl19-doc.txt +tools/eval/trec_eval.9.0.4/trec_eval -c -M 100 -m map src/main/resources/topics-and-qrels/qrels.dl19-doc.txt runs/run.msmarco-doc-docTTTTTquery.bm25-default+rocchio.topics.dl19-doc.txt +tools/eval/trec_eval.9.0.4/trec_eval -c -m ndcg_cut.10 src/main/resources/topics-and-qrels/qrels.dl19-doc.txt runs/run.msmarco-doc-docTTTTTquery.bm25-default+rocchio.topics.dl19-doc.txt +tools/eval/trec_eval.9.0.4/trec_eval -c -m recall.100 src/main/resources/topics-and-qrels/qrels.dl19-doc.txt runs/run.msmarco-doc-docTTTTTquery.bm25-default+rocchio.topics.dl19-doc.txt +tools/eval/trec_eval.9.0.4/trec_eval -c -m recall.1000 src/main/resources/topics-and-qrels/qrels.dl19-doc.txt runs/run.msmarco-doc-docTTTTTquery.bm25-default+rocchio.topics.dl19-doc.txt + tools/eval/trec_eval.9.0.4/trec_eval -c -M 100 -m map src/main/resources/topics-and-qrels/qrels.dl19-doc.txt runs/run.msmarco-doc-docTTTTTquery.bm25-tuned.topics.dl19-doc.txt tools/eval/trec_eval.9.0.4/trec_eval -c -m ndcg_cut.10 src/main/resources/topics-and-qrels/qrels.dl19-doc.txt runs/run.msmarco-doc-docTTTTTquery.bm25-tuned.topics.dl19-doc.txt tools/eval/trec_eval.9.0.4/trec_eval -c -m recall.100 src/main/resources/topics-and-qrels/qrels.dl19-doc.txt runs/run.msmarco-doc-docTTTTTquery.bm25-tuned.topics.dl19-doc.txt @@ -106,26 +125,31 @@ tools/eval/trec_eval.9.0.4/trec_eval -c -M 100 -m map src/main/resources/topics- tools/eval/trec_eval.9.0.4/trec_eval -c -m ndcg_cut.10 src/main/resources/topics-and-qrels/qrels.dl19-doc.txt runs/run.msmarco-doc-docTTTTTquery.bm25-tuned+rm3.topics.dl19-doc.txt tools/eval/trec_eval.9.0.4/trec_eval -c -m recall.100 src/main/resources/topics-and-qrels/qrels.dl19-doc.txt runs/run.msmarco-doc-docTTTTTquery.bm25-tuned+rm3.topics.dl19-doc.txt tools/eval/trec_eval.9.0.4/trec_eval -c -m recall.1000 src/main/resources/topics-and-qrels/qrels.dl19-doc.txt runs/run.msmarco-doc-docTTTTTquery.bm25-tuned+rm3.topics.dl19-doc.txt + +tools/eval/trec_eval.9.0.4/trec_eval -c -M 100 -m map src/main/resources/topics-and-qrels/qrels.dl19-doc.txt runs/run.msmarco-doc-docTTTTTquery.bm25-tuned+rocchio.topics.dl19-doc.txt +tools/eval/trec_eval.9.0.4/trec_eval -c -m ndcg_cut.10 src/main/resources/topics-and-qrels/qrels.dl19-doc.txt runs/run.msmarco-doc-docTTTTTquery.bm25-tuned+rocchio.topics.dl19-doc.txt +tools/eval/trec_eval.9.0.4/trec_eval -c -m recall.100 src/main/resources/topics-and-qrels/qrels.dl19-doc.txt runs/run.msmarco-doc-docTTTTTquery.bm25-tuned+rocchio.topics.dl19-doc.txt +tools/eval/trec_eval.9.0.4/trec_eval -c -m recall.1000 src/main/resources/topics-and-qrels/qrels.dl19-doc.txt runs/run.msmarco-doc-docTTTTTquery.bm25-tuned+rocchio.topics.dl19-doc.txt ``` ## Effectiveness With the above commands, you should be able to reproduce the following results: -| **AP@100** | **BM25 (default)**| **+RM3** | **BM25 (tuned)**| **+RM3** | -|:-------------------------------------------------------------------------------------------------------------|-----------|-----------|-----------|-----------| -| [DL19 (Doc)](https://trec.nist.gov/data/deep2019.html) | 0.2700 | 0.3045 | 0.2620 | 0.2814 | -| **nDCG@10** | **BM25 (default)**| **+RM3** | **BM25 (tuned)**| **+RM3** | -| [DL19 (Doc)](https://trec.nist.gov/data/deep2019.html) | 0.5968 | 0.5897 | 0.5972 | 0.6080 | -| **R@100** | **BM25 (default)**| **+RM3** | **BM25 (tuned)**| **+RM3** | -| [DL19 (Doc)](https://trec.nist.gov/data/deep2019.html) | 0.4198 | 0.4465 | 0.3992 | 0.4119 | -| **R@1000** | **BM25 (default)**| **+RM3** | **BM25 (tuned)**| **+RM3** | -| [DL19 (Doc)](https://trec.nist.gov/data/deep2019.html) | 0.7190 | 0.7738 | 0.6867 | 0.7177 | +| **AP@100** | **BM25 (default)**| **+RM3** | **+Rocchio**| **BM25 (tuned)**| **+RM3** | **+Rocchio**| +|:-------------------------------------------------------------------------------------------------------------|-----------|-----------|-----------|-----------|-----------|-----------| +| [DL19 (Doc)](https://trec.nist.gov/data/deep2019.html) | 0.2700 | 0.3045 | 0.3092 | 0.2620 | 0.2814 | 0.2843 | +| **nDCG@10** | **BM25 (default)**| **+RM3** | **+Rocchio**| **BM25 (tuned)**| **+RM3** | **+Rocchio**| +| [DL19 (Doc)](https://trec.nist.gov/data/deep2019.html) | 0.5968 | 0.5897 | 0.5956 | 0.5972 | 0.6080 | 0.6141 | +| **R@100** | **BM25 (default)**| **+RM3** | **+Rocchio**| **BM25 (tuned)**| **+RM3** | **+Rocchio**| +| [DL19 (Doc)](https://trec.nist.gov/data/deep2019.html) | 0.4198 | 0.4465 | 0.4505 | 0.3992 | 0.4119 | 0.4246 | +| **R@1000** | **BM25 (default)**| **+RM3** | **+Rocchio**| **BM25 (tuned)**| **+RM3** | **+Rocchio**| +| [DL19 (Doc)](https://trec.nist.gov/data/deep2019.html) | 0.7190 | 0.7738 | 0.7758 | 0.6867 | 0.7177 | 0.7276 | Explanation of settings: + The setting "default" refers the default BM25 settings of `k1=0.9`, `b=0.4`. -+ The setting "tuned" refers to `k1=4.68`, `b=0.87`, tuned using the MS MARCO document sparse judgments to optimize for recall@100 (i.e., for first-stage retrieval) on 2019/12. ++ The setting "tuned" refers to `k1=4.68`, `b=0.87`, tuned in 2020/12 using the MS MARCO document sparse judgments to optimize for recall@100 (i.e., for first-stage retrieval). Settings tuned on the MS MARCO document sparse judgments _may not_ work well on the TREC dense judgments. diff --git a/docs/regressions-dl19-doc-segmented-docTTTTTquery.md b/docs/regressions-dl19-doc-segmented-docTTTTTquery.md index 32b8bb65af..65a194c134 100644 --- a/docs/regressions-dl19-doc-segmented-docTTTTTquery.md +++ b/docs/regressions-dl19-doc-segmented-docTTTTTquery.md @@ -70,6 +70,13 @@ target/appassembler/bin/SearchCollection \ -output runs/run.msmarco-doc-segmented-docTTTTTquery.bm25-default+rm3.topics.dl19-doc.txt \ -bm25 -rm3 -hits 10000 -selectMaxPassage -selectMaxPassage.delimiter "#" -selectMaxPassage.hits 1000 & +target/appassembler/bin/SearchCollection \ + -index indexes/lucene-index.msmarco-doc-segmented-docTTTTTquery/ \ + -topics src/main/resources/topics-and-qrels/topics.dl19-doc.txt \ + -topicreader TsvInt \ + -output runs/run.msmarco-doc-segmented-docTTTTTquery.bm25-default+rocchio.topics.dl19-doc.txt \ + -bm25 -rocchio -hits 10000 -selectMaxPassage -selectMaxPassage.delimiter "#" -selectMaxPassage.hits 1000 & + target/appassembler/bin/SearchCollection \ -index indexes/lucene-index.msmarco-doc-segmented-docTTTTTquery/ \ -topics src/main/resources/topics-and-qrels/topics.dl19-doc.txt \ @@ -83,6 +90,13 @@ target/appassembler/bin/SearchCollection \ -topicreader TsvInt \ -output runs/run.msmarco-doc-segmented-docTTTTTquery.bm25-tuned+rm3.topics.dl19-doc.txt \ -bm25 -bm25.k1 2.56 -bm25.b 0.59 -rm3 -hits 10000 -selectMaxPassage -selectMaxPassage.delimiter "#" -selectMaxPassage.hits 1000 & + +target/appassembler/bin/SearchCollection \ + -index indexes/lucene-index.msmarco-doc-segmented-docTTTTTquery/ \ + -topics src/main/resources/topics-and-qrels/topics.dl19-doc.txt \ + -topicreader TsvInt \ + -output runs/run.msmarco-doc-segmented-docTTTTTquery.bm25-tuned+rocchio.topics.dl19-doc.txt \ + -bm25 -bm25.k1 2.56 -bm25.b 0.59 -rocchio -hits 10000 -selectMaxPassage -selectMaxPassage.delimiter "#" -selectMaxPassage.hits 1000 & ``` Evaluation can be performed using `trec_eval`: @@ -98,6 +112,11 @@ tools/eval/trec_eval.9.0.4/trec_eval -c -m ndcg_cut.10 src/main/resources/topics tools/eval/trec_eval.9.0.4/trec_eval -c -m recall.100 src/main/resources/topics-and-qrels/qrels.dl19-doc.txt runs/run.msmarco-doc-segmented-docTTTTTquery.bm25-default+rm3.topics.dl19-doc.txt tools/eval/trec_eval.9.0.4/trec_eval -c -m recall.1000 src/main/resources/topics-and-qrels/qrels.dl19-doc.txt runs/run.msmarco-doc-segmented-docTTTTTquery.bm25-default+rm3.topics.dl19-doc.txt +tools/eval/trec_eval.9.0.4/trec_eval -c -M 100 -m map src/main/resources/topics-and-qrels/qrels.dl19-doc.txt runs/run.msmarco-doc-segmented-docTTTTTquery.bm25-default+rocchio.topics.dl19-doc.txt +tools/eval/trec_eval.9.0.4/trec_eval -c -m ndcg_cut.10 src/main/resources/topics-and-qrels/qrels.dl19-doc.txt runs/run.msmarco-doc-segmented-docTTTTTquery.bm25-default+rocchio.topics.dl19-doc.txt +tools/eval/trec_eval.9.0.4/trec_eval -c -m recall.100 src/main/resources/topics-and-qrels/qrels.dl19-doc.txt runs/run.msmarco-doc-segmented-docTTTTTquery.bm25-default+rocchio.topics.dl19-doc.txt +tools/eval/trec_eval.9.0.4/trec_eval -c -m recall.1000 src/main/resources/topics-and-qrels/qrels.dl19-doc.txt runs/run.msmarco-doc-segmented-docTTTTTquery.bm25-default+rocchio.topics.dl19-doc.txt + tools/eval/trec_eval.9.0.4/trec_eval -c -M 100 -m map src/main/resources/topics-and-qrels/qrels.dl19-doc.txt runs/run.msmarco-doc-segmented-docTTTTTquery.bm25-tuned.topics.dl19-doc.txt tools/eval/trec_eval.9.0.4/trec_eval -c -m ndcg_cut.10 src/main/resources/topics-and-qrels/qrels.dl19-doc.txt runs/run.msmarco-doc-segmented-docTTTTTquery.bm25-tuned.topics.dl19-doc.txt tools/eval/trec_eval.9.0.4/trec_eval -c -m recall.100 src/main/resources/topics-and-qrels/qrels.dl19-doc.txt runs/run.msmarco-doc-segmented-docTTTTTquery.bm25-tuned.topics.dl19-doc.txt @@ -107,26 +126,31 @@ tools/eval/trec_eval.9.0.4/trec_eval -c -M 100 -m map src/main/resources/topics- tools/eval/trec_eval.9.0.4/trec_eval -c -m ndcg_cut.10 src/main/resources/topics-and-qrels/qrels.dl19-doc.txt runs/run.msmarco-doc-segmented-docTTTTTquery.bm25-tuned+rm3.topics.dl19-doc.txt tools/eval/trec_eval.9.0.4/trec_eval -c -m recall.100 src/main/resources/topics-and-qrels/qrels.dl19-doc.txt runs/run.msmarco-doc-segmented-docTTTTTquery.bm25-tuned+rm3.topics.dl19-doc.txt tools/eval/trec_eval.9.0.4/trec_eval -c -m recall.1000 src/main/resources/topics-and-qrels/qrels.dl19-doc.txt runs/run.msmarco-doc-segmented-docTTTTTquery.bm25-tuned+rm3.topics.dl19-doc.txt + +tools/eval/trec_eval.9.0.4/trec_eval -c -M 100 -m map src/main/resources/topics-and-qrels/qrels.dl19-doc.txt runs/run.msmarco-doc-segmented-docTTTTTquery.bm25-tuned+rocchio.topics.dl19-doc.txt +tools/eval/trec_eval.9.0.4/trec_eval -c -m ndcg_cut.10 src/main/resources/topics-and-qrels/qrels.dl19-doc.txt runs/run.msmarco-doc-segmented-docTTTTTquery.bm25-tuned+rocchio.topics.dl19-doc.txt +tools/eval/trec_eval.9.0.4/trec_eval -c -m recall.100 src/main/resources/topics-and-qrels/qrels.dl19-doc.txt runs/run.msmarco-doc-segmented-docTTTTTquery.bm25-tuned+rocchio.topics.dl19-doc.txt +tools/eval/trec_eval.9.0.4/trec_eval -c -m recall.1000 src/main/resources/topics-and-qrels/qrels.dl19-doc.txt runs/run.msmarco-doc-segmented-docTTTTTquery.bm25-tuned+rocchio.topics.dl19-doc.txt ``` ## Effectiveness With the above commands, you should be able to reproduce the following results: -| **AP@100** | **BM25 (default)**| **+RM3** | **BM25 (tuned)**| **+RM3** | -|:-------------------------------------------------------------------------------------------------------------|-----------|-----------|-----------|-----------| -| [DL19 (Doc)](https://trec.nist.gov/data/deep2019.html) | 0.2798 | 0.3021 | 0.2658 | 0.2893 | -| **nDCG@10** | **BM25 (default)**| **+RM3** | **BM25 (tuned)**| **+RM3** | -| [DL19 (Doc)](https://trec.nist.gov/data/deep2019.html) | 0.6119 | 0.6297 | 0.6273 | 0.6239 | -| **R@100** | **BM25 (default)**| **+RM3** | **BM25 (tuned)**| **+RM3** | -| [DL19 (Doc)](https://trec.nist.gov/data/deep2019.html) | 0.4093 | 0.4392 | 0.4026 | 0.4237 | -| **R@1000** | **BM25 (default)**| **+RM3** | **BM25 (tuned)**| **+RM3** | -| [DL19 (Doc)](https://trec.nist.gov/data/deep2019.html) | 0.7165 | 0.7481 | 0.6707 | 0.7066 | +| **AP@100** | **BM25 (default)**| **+RM3** | **+Rocchio**| **BM25 (tuned)**| **+RM3** | **+Rocchio**| +|:-------------------------------------------------------------------------------------------------------------|-----------|-----------|-----------|-----------|-----------|-----------| +| [DL19 (Doc)](https://trec.nist.gov/data/deep2019.html) | 0.2798 | 0.3021 | 0.3074 | 0.2658 | 0.2893 | 0.2913 | +| **nDCG@10** | **BM25 (default)**| **+RM3** | **+Rocchio**| **BM25 (tuned)**| **+RM3** | **+Rocchio**| +| [DL19 (Doc)](https://trec.nist.gov/data/deep2019.html) | 0.6119 | 0.6297 | 0.6295 | 0.6273 | 0.6239 | 0.6244 | +| **R@100** | **BM25 (default)**| **+RM3** | **+Rocchio**| **BM25 (tuned)**| **+RM3** | **+Rocchio**| +| [DL19 (Doc)](https://trec.nist.gov/data/deep2019.html) | 0.4093 | 0.4392 | 0.4483 | 0.4026 | 0.4237 | 0.4271 | +| **R@1000** | **BM25 (default)**| **+RM3** | **+Rocchio**| **BM25 (tuned)**| **+RM3** | **+Rocchio**| +| [DL19 (Doc)](https://trec.nist.gov/data/deep2019.html) | 0.7165 | 0.7481 | 0.7520 | 0.6707 | 0.7066 | 0.7189 | Explanation of settings: + The setting "default" refers the default BM25 settings of `k1=0.9`, `b=0.4`. -+ The setting "tuned" refers to `k1=2.56`, `b=0.59`, tuned using the MS MARCO document sparse judgments to optimize for recall@100 (i.e., for first-stage retrieval) on 2020/12. ++ The setting "tuned" refers to `k1=2.56`, `b=0.59`, tuned in 2020/12 using the MS MARCO document sparse judgments to optimize for recall@100 (i.e., for first-stage retrieval). Settings tuned on the MS MARCO document sparse judgments _may not_ work well on the TREC dense judgments. diff --git a/docs/regressions-dl19-doc-segmented.md b/docs/regressions-dl19-doc-segmented.md index 29e34b826a..a4e4c3c507 100644 --- a/docs/regressions-dl19-doc-segmented.md +++ b/docs/regressions-dl19-doc-segmented.md @@ -222,7 +222,7 @@ With the above commands, you should be able to reproduce the following results: Explanation of settings: + The setting "default" refers the default BM25 settings of `k1=0.9`, `b=0.4`. -+ The setting "tuned" refers to `k1=2.16`, `b=0.61`, tuned using the MS MARCO document sparse judgments to optimize for recall@100 (i.e., for first-stage retrieval) on 2020/12. ++ The setting "tuned" refers to `k1=2.16`, `b=0.61`, tuned in 2020/12 using the MS MARCO document sparse judgments to optimize for recall@100 (i.e., for first-stage retrieval). Settings tuned on the MS MARCO document sparse judgments _may not_ work well on the TREC dense judgments. diff --git a/docs/regressions-dl19-doc.md b/docs/regressions-dl19-doc.md index e146e42486..d2ddfc750e 100644 --- a/docs/regressions-dl19-doc.md +++ b/docs/regressions-dl19-doc.md @@ -138,6 +138,48 @@ target/appassembler/bin/SearchCollection \ -topicreader TsvInt \ -output runs/run.msmarco-doc.bm25-tuned+prf.topics.dl19-doc.txt \ -bm25 -bm25.k1 3.44 -bm25.b 0.87 -bm25prf & + +target/appassembler/bin/SearchCollection \ + -index indexes/lucene-index.msmarco-doc/ \ + -topics src/main/resources/topics-and-qrels/topics.dl19-doc.txt \ + -topicreader TsvInt \ + -output runs/run.msmarco-doc.bm25-tuned2.topics.dl19-doc.txt \ + -bm25 -bm25.k1 4.46 -bm25.b 0.82 & + +target/appassembler/bin/SearchCollection \ + -index indexes/lucene-index.msmarco-doc/ \ + -topics src/main/resources/topics-and-qrels/topics.dl19-doc.txt \ + -topicreader TsvInt \ + -output runs/run.msmarco-doc.bm25-tuned2+rm3.topics.dl19-doc.txt \ + -bm25 -bm25.k1 4.46 -bm25.b 0.82 -rm3 & + +target/appassembler/bin/SearchCollection \ + -index indexes/lucene-index.msmarco-doc/ \ + -topics src/main/resources/topics-and-qrels/topics.dl19-doc.txt \ + -topicreader TsvInt \ + -output runs/run.msmarco-doc.bm25-tuned2+rocchio.topics.dl19-doc.txt \ + -bm25 -bm25.k1 4.46 -bm25.b 0.82 -rocchio & + +target/appassembler/bin/SearchCollection \ + -index indexes/lucene-index.msmarco-doc/ \ + -topics src/main/resources/topics-and-qrels/topics.dl19-doc.txt \ + -topicreader TsvInt \ + -output runs/run.msmarco-doc.bm25-tuned2+rocchio-neg.topics.dl19-doc.txt \ + -bm25 -bm25.k1 4.46 -bm25.b 0.82 -rocchio -rocchio.useNegative -rerankCutoff 1000 & + +target/appassembler/bin/SearchCollection \ + -index indexes/lucene-index.msmarco-doc/ \ + -topics src/main/resources/topics-and-qrels/topics.dl19-doc.txt \ + -topicreader TsvInt \ + -output runs/run.msmarco-doc.bm25-tuned2+ax.topics.dl19-doc.txt \ + -bm25 -bm25.k1 4.46 -bm25.b 0.82 -axiom -axiom.deterministic -rerankCutoff 20 & + +target/appassembler/bin/SearchCollection \ + -index indexes/lucene-index.msmarco-doc/ \ + -topics src/main/resources/topics-and-qrels/topics.dl19-doc.txt \ + -topicreader TsvInt \ + -output runs/run.msmarco-doc.bm25-tuned2+prf.topics.dl19-doc.txt \ + -bm25 -bm25.k1 4.46 -bm25.b 0.82 -bm25prf & ``` Evaluation can be performed using `trec_eval`: @@ -202,26 +244,57 @@ tools/eval/trec_eval.9.0.4/trec_eval -c -M 100 -m map src/main/resources/topics- tools/eval/trec_eval.9.0.4/trec_eval -c -m ndcg_cut.10 src/main/resources/topics-and-qrels/qrels.dl19-doc.txt runs/run.msmarco-doc.bm25-tuned+prf.topics.dl19-doc.txt tools/eval/trec_eval.9.0.4/trec_eval -c -m recall.100 src/main/resources/topics-and-qrels/qrels.dl19-doc.txt runs/run.msmarco-doc.bm25-tuned+prf.topics.dl19-doc.txt tools/eval/trec_eval.9.0.4/trec_eval -c -m recall.1000 src/main/resources/topics-and-qrels/qrels.dl19-doc.txt runs/run.msmarco-doc.bm25-tuned+prf.topics.dl19-doc.txt + +tools/eval/trec_eval.9.0.4/trec_eval -c -M 100 -m map src/main/resources/topics-and-qrels/qrels.dl19-doc.txt runs/run.msmarco-doc.bm25-tuned2.topics.dl19-doc.txt +tools/eval/trec_eval.9.0.4/trec_eval -c -m ndcg_cut.10 src/main/resources/topics-and-qrels/qrels.dl19-doc.txt runs/run.msmarco-doc.bm25-tuned2.topics.dl19-doc.txt +tools/eval/trec_eval.9.0.4/trec_eval -c -m recall.100 src/main/resources/topics-and-qrels/qrels.dl19-doc.txt runs/run.msmarco-doc.bm25-tuned2.topics.dl19-doc.txt +tools/eval/trec_eval.9.0.4/trec_eval -c -m recall.1000 src/main/resources/topics-and-qrels/qrels.dl19-doc.txt runs/run.msmarco-doc.bm25-tuned2.topics.dl19-doc.txt + +tools/eval/trec_eval.9.0.4/trec_eval -c -M 100 -m map src/main/resources/topics-and-qrels/qrels.dl19-doc.txt runs/run.msmarco-doc.bm25-tuned2+rm3.topics.dl19-doc.txt +tools/eval/trec_eval.9.0.4/trec_eval -c -m ndcg_cut.10 src/main/resources/topics-and-qrels/qrels.dl19-doc.txt runs/run.msmarco-doc.bm25-tuned2+rm3.topics.dl19-doc.txt +tools/eval/trec_eval.9.0.4/trec_eval -c -m recall.100 src/main/resources/topics-and-qrels/qrels.dl19-doc.txt runs/run.msmarco-doc.bm25-tuned2+rm3.topics.dl19-doc.txt +tools/eval/trec_eval.9.0.4/trec_eval -c -m recall.1000 src/main/resources/topics-and-qrels/qrels.dl19-doc.txt runs/run.msmarco-doc.bm25-tuned2+rm3.topics.dl19-doc.txt + +tools/eval/trec_eval.9.0.4/trec_eval -c -M 100 -m map src/main/resources/topics-and-qrels/qrels.dl19-doc.txt runs/run.msmarco-doc.bm25-tuned2+rocchio.topics.dl19-doc.txt +tools/eval/trec_eval.9.0.4/trec_eval -c -m ndcg_cut.10 src/main/resources/topics-and-qrels/qrels.dl19-doc.txt runs/run.msmarco-doc.bm25-tuned2+rocchio.topics.dl19-doc.txt +tools/eval/trec_eval.9.0.4/trec_eval -c -m recall.100 src/main/resources/topics-and-qrels/qrels.dl19-doc.txt runs/run.msmarco-doc.bm25-tuned2+rocchio.topics.dl19-doc.txt +tools/eval/trec_eval.9.0.4/trec_eval -c -m recall.1000 src/main/resources/topics-and-qrels/qrels.dl19-doc.txt runs/run.msmarco-doc.bm25-tuned2+rocchio.topics.dl19-doc.txt + +tools/eval/trec_eval.9.0.4/trec_eval -c -M 100 -m map src/main/resources/topics-and-qrels/qrels.dl19-doc.txt runs/run.msmarco-doc.bm25-tuned2+rocchio-neg.topics.dl19-doc.txt +tools/eval/trec_eval.9.0.4/trec_eval -c -m ndcg_cut.10 src/main/resources/topics-and-qrels/qrels.dl19-doc.txt runs/run.msmarco-doc.bm25-tuned2+rocchio-neg.topics.dl19-doc.txt +tools/eval/trec_eval.9.0.4/trec_eval -c -m recall.100 src/main/resources/topics-and-qrels/qrels.dl19-doc.txt runs/run.msmarco-doc.bm25-tuned2+rocchio-neg.topics.dl19-doc.txt +tools/eval/trec_eval.9.0.4/trec_eval -c -m recall.1000 src/main/resources/topics-and-qrels/qrels.dl19-doc.txt runs/run.msmarco-doc.bm25-tuned2+rocchio-neg.topics.dl19-doc.txt + +tools/eval/trec_eval.9.0.4/trec_eval -c -M 100 -m map src/main/resources/topics-and-qrels/qrels.dl19-doc.txt runs/run.msmarco-doc.bm25-tuned2+ax.topics.dl19-doc.txt +tools/eval/trec_eval.9.0.4/trec_eval -c -m ndcg_cut.10 src/main/resources/topics-and-qrels/qrels.dl19-doc.txt runs/run.msmarco-doc.bm25-tuned2+ax.topics.dl19-doc.txt +tools/eval/trec_eval.9.0.4/trec_eval -c -m recall.100 src/main/resources/topics-and-qrels/qrels.dl19-doc.txt runs/run.msmarco-doc.bm25-tuned2+ax.topics.dl19-doc.txt +tools/eval/trec_eval.9.0.4/trec_eval -c -m recall.1000 src/main/resources/topics-and-qrels/qrels.dl19-doc.txt runs/run.msmarco-doc.bm25-tuned2+ax.topics.dl19-doc.txt + +tools/eval/trec_eval.9.0.4/trec_eval -c -M 100 -m map src/main/resources/topics-and-qrels/qrels.dl19-doc.txt runs/run.msmarco-doc.bm25-tuned2+prf.topics.dl19-doc.txt +tools/eval/trec_eval.9.0.4/trec_eval -c -m ndcg_cut.10 src/main/resources/topics-and-qrels/qrels.dl19-doc.txt runs/run.msmarco-doc.bm25-tuned2+prf.topics.dl19-doc.txt +tools/eval/trec_eval.9.0.4/trec_eval -c -m recall.100 src/main/resources/topics-and-qrels/qrels.dl19-doc.txt runs/run.msmarco-doc.bm25-tuned2+prf.topics.dl19-doc.txt +tools/eval/trec_eval.9.0.4/trec_eval -c -m recall.1000 src/main/resources/topics-and-qrels/qrels.dl19-doc.txt runs/run.msmarco-doc.bm25-tuned2+prf.topics.dl19-doc.txt ``` ## Effectiveness With the above commands, you should be able to reproduce the following results: -| **AP@100** | **BM25 (default)**| **+RM3** | **+Rocchio**| **+Rocchio***| **+Ax** | **+PRF** | **BM25 (tuned)**| **+RM3** | **+Rocchio**| **+Rocchio***| **+Ax** | **+PRF** | -|:-------------------------------------------------------------------------------------------------------------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------| -| [DL19 (Doc)](https://trec.nist.gov/data/deep2019.html) | 0.2434 | 0.2774 | 0.2811 | 0.2813 | 0.2454 | 0.2541 | 0.2311 | 0.2684 | 0.2683 | 0.2670 | 0.2792 | 0.2774 | -| **nDCG@10** | **BM25 (default)**| **+RM3** | **+Rocchio**| **+Rocchio***| **+Ax** | **+PRF** | **BM25 (tuned)**| **+RM3** | **+Rocchio**| **+Rocchio***| **+Ax** | **+PRF** | -| [DL19 (Doc)](https://trec.nist.gov/data/deep2019.html) | 0.5176 | 0.5170 | 0.5256 | 0.5279 | 0.4732 | 0.5107 | 0.5139 | 0.5445 | 0.5445 | 0.5419 | 0.5203 | 0.5294 | -| **R@100** | **BM25 (default)**| **+RM3** | **+Rocchio**| **+Rocchio***| **+Ax** | **+PRF** | **BM25 (tuned)**| **+RM3** | **+Rocchio**| **+Rocchio***| **+Ax** | **+PRF** | -| [DL19 (Doc)](https://trec.nist.gov/data/deep2019.html) | 0.3949 | 0.4189 | 0.4261 | 0.4259 | 0.3946 | 0.4003 | 0.3853 | 0.4186 | 0.4254 | 0.4224 | 0.4378 | 0.4295 | -| **R@1000** | **BM25 (default)**| **+RM3** | **+Rocchio**| **+Rocchio***| **+Ax** | **+PRF** | **BM25 (tuned)**| **+RM3** | **+Rocchio**| **+Rocchio***| **+Ax** | **+PRF** | -| [DL19 (Doc)](https://trec.nist.gov/data/deep2019.html) | 0.6966 | 0.7503 | 0.7546 | 0.7530 | 0.7323 | 0.7357 | 0.6804 | 0.7288 | 0.7371 | 0.7376 | 0.7532 | 0.7559 | +| **AP@100** | **BM25 (default)**| **+RM3** | **+Rocchio**| **+Rocchio***| **+Ax** | **+PRF** | **BM25 (tuned)**| **+RM3** | **+Rocchio**| **+Rocchio***| **+Ax** | **+PRF** | **BM25 (tuned2)**| **+RM3** | **+Rocchio**| **+Rocchio***| **+Ax** | **+PRF** | +|:-------------------------------------------------------------------------------------------------------------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------| +| [DL19 (Doc)](https://trec.nist.gov/data/deep2019.html) | 0.2434 | 0.2774 | 0.2811 | 0.2813 | 0.2454 | 0.2541 | 0.2311 | 0.2684 | 0.2683 | 0.2670 | 0.2792 | 0.2774 | 0.2336 | 0.2643 | 0.2657 | 0.2670 | 0.2724 | 0.2815 | +| **nDCG@10** | **BM25 (default)**| **+RM3** | **+Rocchio**| **+Rocchio***| **+Ax** | **+PRF** | **BM25 (tuned)**| **+RM3** | **+Rocchio**| **+Rocchio***| **+Ax** | **+PRF** | **BM25 (tuned2)**| **+RM3** | **+Rocchio**| **+Rocchio***| **+Ax** | **+PRF** | +| [DL19 (Doc)](https://trec.nist.gov/data/deep2019.html) | 0.5176 | 0.5170 | 0.5256 | 0.5279 | 0.4732 | 0.5107 | 0.5139 | 0.5445 | 0.5445 | 0.5419 | 0.5203 | 0.5294 | 0.5233 | 0.5526 | 0.5584 | 0.5567 | 0.5093 | 0.5360 | +| **R@100** | **BM25 (default)**| **+RM3** | **+Rocchio**| **+Rocchio***| **+Ax** | **+PRF** | **BM25 (tuned)**| **+RM3** | **+Rocchio**| **+Rocchio***| **+Ax** | **+PRF** | **BM25 (tuned2)**| **+RM3** | **+Rocchio**| **+Rocchio***| **+Ax** | **+PRF** | +| [DL19 (Doc)](https://trec.nist.gov/data/deep2019.html) | 0.3949 | 0.4189 | 0.4261 | 0.4259 | 0.3946 | 0.4003 | 0.3853 | 0.4186 | 0.4254 | 0.4224 | 0.4378 | 0.4295 | 0.3849 | 0.4131 | 0.4164 | 0.4172 | 0.4332 | 0.4310 | +| **R@1000** | **BM25 (default)**| **+RM3** | **+Rocchio**| **+Rocchio***| **+Ax** | **+PRF** | **BM25 (tuned)**| **+RM3** | **+Rocchio**| **+Rocchio***| **+Ax** | **+PRF** | **BM25 (tuned2)**| **+RM3** | **+Rocchio**| **+Rocchio***| **+Ax** | **+PRF** | +| [DL19 (Doc)](https://trec.nist.gov/data/deep2019.html) | 0.6966 | 0.7503 | 0.7546 | 0.7530 | 0.7323 | 0.7357 | 0.6804 | 0.7288 | 0.7371 | 0.7376 | 0.7532 | 0.7559 | 0.6757 | 0.7189 | 0.7299 | 0.7312 | 0.7474 | 0.7577 | Explanation of settings: + The setting "default" refers the default BM25 settings of `k1=0.9`, `b=0.4`. -+ The setting "tuned" refers to `k1=3.44`, `b=0.87`, tuned using the MS MARCO document sparse judgments on 2019/06. ++ The setting "tuned" refers to `k1=3.44`, `b=0.87`, tuned in 2019/06 using the MS MARCO document sparse judgments to optimize for MAP and used for TREC 2019 Deep Learning Track baseline runs. ++ The setting "tuned2" refers to `k1=4.46`, `b=0.82`, tuned in 2020/12 using the MS MARCO document sparse judgments to optimize for recall@100 (i.e., for first-stage retrieval); see [this page](experiments-msmarco-doc.md) additional details. Settings tuned on the MS MARCO document sparse judgments _may not_ work well on the TREC dense judgments. diff --git a/docs/regressions-dl19-passage-docTTTTTquery.md b/docs/regressions-dl19-passage-docTTTTTquery.md index 28ea68eaea..130fb19ce9 100644 --- a/docs/regressions-dl19-passage-docTTTTTquery.md +++ b/docs/regressions-dl19-passage-docTTTTTquery.md @@ -59,6 +59,20 @@ target/appassembler/bin/SearchCollection \ -output runs/run.msmarco-passage-docTTTTTquery.bm25-default+rm3.topics.dl19-passage.txt \ -bm25 -rm3 & +target/appassembler/bin/SearchCollection \ + -index indexes/lucene-index.msmarco-passage-docTTTTTquery/ \ + -topics src/main/resources/topics-and-qrels/topics.dl19-passage.txt \ + -topicreader TsvInt \ + -output runs/run.msmarco-passage-docTTTTTquery.bm25-default+rocchio.topics.dl19-passage.txt \ + -bm25 -rocchio & + +target/appassembler/bin/SearchCollection \ + -index indexes/lucene-index.msmarco-passage-docTTTTTquery/ \ + -topics src/main/resources/topics-and-qrels/topics.dl19-passage.txt \ + -topicreader TsvInt \ + -output runs/run.msmarco-passage-docTTTTTquery.bm25-default+rocchio-neg.topics.dl19-passage.txt \ + -bm25 -rocchio -rocchio.useNegative -rerankCutoff 1000 & + target/appassembler/bin/SearchCollection \ -index indexes/lucene-index.msmarco-passage-docTTTTTquery/ \ -topics src/main/resources/topics-and-qrels/topics.dl19-passage.txt \ @@ -73,6 +87,20 @@ target/appassembler/bin/SearchCollection \ -output runs/run.msmarco-passage-docTTTTTquery.bm25-tuned+rm3.topics.dl19-passage.txt \ -bm25 -bm25.k1 0.82 -bm25.b 0.68 -rm3 & +target/appassembler/bin/SearchCollection \ + -index indexes/lucene-index.msmarco-passage-docTTTTTquery/ \ + -topics src/main/resources/topics-and-qrels/topics.dl19-passage.txt \ + -topicreader TsvInt \ + -output runs/run.msmarco-passage-docTTTTTquery.bm25-tuned+rocchio.topics.dl19-passage.txt \ + -bm25 -bm25.k1 0.82 -bm25.b 0.68 -rocchio & + +target/appassembler/bin/SearchCollection \ + -index indexes/lucene-index.msmarco-passage-docTTTTTquery/ \ + -topics src/main/resources/topics-and-qrels/topics.dl19-passage.txt \ + -topicreader TsvInt \ + -output runs/run.msmarco-passage-docTTTTTquery.bm25-tuned+rocchio-neg.topics.dl19-passage.txt \ + -bm25 -bm25.k1 0.82 -bm25.b 0.68 -rocchio -rocchio.useNegative -rerankCutoff 1000 & + target/appassembler/bin/SearchCollection \ -index indexes/lucene-index.msmarco-passage-docTTTTTquery/ \ -topics src/main/resources/topics-and-qrels/topics.dl19-passage.txt \ @@ -86,6 +114,20 @@ target/appassembler/bin/SearchCollection \ -topicreader TsvInt \ -output runs/run.msmarco-passage-docTTTTTquery.bm25-tuned2+rm3.topics.dl19-passage.txt \ -bm25 -bm25.k1 2.18 -bm25.b 0.86 -rm3 & + +target/appassembler/bin/SearchCollection \ + -index indexes/lucene-index.msmarco-passage-docTTTTTquery/ \ + -topics src/main/resources/topics-and-qrels/topics.dl19-passage.txt \ + -topicreader TsvInt \ + -output runs/run.msmarco-passage-docTTTTTquery.bm25-tuned2+rocchio.topics.dl19-passage.txt \ + -bm25 -bm25.k1 2.18 -bm25.b 0.86 -rocchio & + +target/appassembler/bin/SearchCollection \ + -index indexes/lucene-index.msmarco-passage-docTTTTTquery/ \ + -topics src/main/resources/topics-and-qrels/topics.dl19-passage.txt \ + -topicreader TsvInt \ + -output runs/run.msmarco-passage-docTTTTTquery.bm25-tuned2+rocchio-neg.topics.dl19-passage.txt \ + -bm25 -bm25.k1 2.18 -bm25.b 0.86 -rocchio -rocchio.useNegative -rerankCutoff 1000 & ``` Evaluation can be performed using `trec_eval`: @@ -101,6 +143,16 @@ tools/eval/trec_eval.9.0.4/trec_eval -m ndcg_cut.10 -c src/main/resources/topics tools/eval/trec_eval.9.0.4/trec_eval -m recall.100 -c -l 2 src/main/resources/topics-and-qrels/qrels.dl19-passage.txt runs/run.msmarco-passage-docTTTTTquery.bm25-default+rm3.topics.dl19-passage.txt tools/eval/trec_eval.9.0.4/trec_eval -m recall.1000 -c -l 2 src/main/resources/topics-and-qrels/qrels.dl19-passage.txt runs/run.msmarco-passage-docTTTTTquery.bm25-default+rm3.topics.dl19-passage.txt +tools/eval/trec_eval.9.0.4/trec_eval -m map -c -l 2 src/main/resources/topics-and-qrels/qrels.dl19-passage.txt runs/run.msmarco-passage-docTTTTTquery.bm25-default+rocchio.topics.dl19-passage.txt +tools/eval/trec_eval.9.0.4/trec_eval -m ndcg_cut.10 -c src/main/resources/topics-and-qrels/qrels.dl19-passage.txt runs/run.msmarco-passage-docTTTTTquery.bm25-default+rocchio.topics.dl19-passage.txt +tools/eval/trec_eval.9.0.4/trec_eval -m recall.100 -c -l 2 src/main/resources/topics-and-qrels/qrels.dl19-passage.txt runs/run.msmarco-passage-docTTTTTquery.bm25-default+rocchio.topics.dl19-passage.txt +tools/eval/trec_eval.9.0.4/trec_eval -m recall.1000 -c -l 2 src/main/resources/topics-and-qrels/qrels.dl19-passage.txt runs/run.msmarco-passage-docTTTTTquery.bm25-default+rocchio.topics.dl19-passage.txt + +tools/eval/trec_eval.9.0.4/trec_eval -m map -c -l 2 src/main/resources/topics-and-qrels/qrels.dl19-passage.txt runs/run.msmarco-passage-docTTTTTquery.bm25-default+rocchio-neg.topics.dl19-passage.txt +tools/eval/trec_eval.9.0.4/trec_eval -m ndcg_cut.10 -c src/main/resources/topics-and-qrels/qrels.dl19-passage.txt runs/run.msmarco-passage-docTTTTTquery.bm25-default+rocchio-neg.topics.dl19-passage.txt +tools/eval/trec_eval.9.0.4/trec_eval -m recall.100 -c -l 2 src/main/resources/topics-and-qrels/qrels.dl19-passage.txt runs/run.msmarco-passage-docTTTTTquery.bm25-default+rocchio-neg.topics.dl19-passage.txt +tools/eval/trec_eval.9.0.4/trec_eval -m recall.1000 -c -l 2 src/main/resources/topics-and-qrels/qrels.dl19-passage.txt runs/run.msmarco-passage-docTTTTTquery.bm25-default+rocchio-neg.topics.dl19-passage.txt + tools/eval/trec_eval.9.0.4/trec_eval -m map -c -l 2 src/main/resources/topics-and-qrels/qrels.dl19-passage.txt runs/run.msmarco-passage-docTTTTTquery.bm25-tuned.topics.dl19-passage.txt tools/eval/trec_eval.9.0.4/trec_eval -m ndcg_cut.10 -c src/main/resources/topics-and-qrels/qrels.dl19-passage.txt runs/run.msmarco-passage-docTTTTTquery.bm25-tuned.topics.dl19-passage.txt tools/eval/trec_eval.9.0.4/trec_eval -m recall.100 -c -l 2 src/main/resources/topics-and-qrels/qrels.dl19-passage.txt runs/run.msmarco-passage-docTTTTTquery.bm25-tuned.topics.dl19-passage.txt @@ -111,6 +163,16 @@ tools/eval/trec_eval.9.0.4/trec_eval -m ndcg_cut.10 -c src/main/resources/topics tools/eval/trec_eval.9.0.4/trec_eval -m recall.100 -c -l 2 src/main/resources/topics-and-qrels/qrels.dl19-passage.txt runs/run.msmarco-passage-docTTTTTquery.bm25-tuned+rm3.topics.dl19-passage.txt tools/eval/trec_eval.9.0.4/trec_eval -m recall.1000 -c -l 2 src/main/resources/topics-and-qrels/qrels.dl19-passage.txt runs/run.msmarco-passage-docTTTTTquery.bm25-tuned+rm3.topics.dl19-passage.txt +tools/eval/trec_eval.9.0.4/trec_eval -m map -c -l 2 src/main/resources/topics-and-qrels/qrels.dl19-passage.txt runs/run.msmarco-passage-docTTTTTquery.bm25-tuned+rocchio.topics.dl19-passage.txt +tools/eval/trec_eval.9.0.4/trec_eval -m ndcg_cut.10 -c src/main/resources/topics-and-qrels/qrels.dl19-passage.txt runs/run.msmarco-passage-docTTTTTquery.bm25-tuned+rocchio.topics.dl19-passage.txt +tools/eval/trec_eval.9.0.4/trec_eval -m recall.100 -c -l 2 src/main/resources/topics-and-qrels/qrels.dl19-passage.txt runs/run.msmarco-passage-docTTTTTquery.bm25-tuned+rocchio.topics.dl19-passage.txt +tools/eval/trec_eval.9.0.4/trec_eval -m recall.1000 -c -l 2 src/main/resources/topics-and-qrels/qrels.dl19-passage.txt runs/run.msmarco-passage-docTTTTTquery.bm25-tuned+rocchio.topics.dl19-passage.txt + +tools/eval/trec_eval.9.0.4/trec_eval -m map -c -l 2 src/main/resources/topics-and-qrels/qrels.dl19-passage.txt runs/run.msmarco-passage-docTTTTTquery.bm25-tuned+rocchio-neg.topics.dl19-passage.txt +tools/eval/trec_eval.9.0.4/trec_eval -m ndcg_cut.10 -c src/main/resources/topics-and-qrels/qrels.dl19-passage.txt runs/run.msmarco-passage-docTTTTTquery.bm25-tuned+rocchio-neg.topics.dl19-passage.txt +tools/eval/trec_eval.9.0.4/trec_eval -m recall.100 -c -l 2 src/main/resources/topics-and-qrels/qrels.dl19-passage.txt runs/run.msmarco-passage-docTTTTTquery.bm25-tuned+rocchio-neg.topics.dl19-passage.txt +tools/eval/trec_eval.9.0.4/trec_eval -m recall.1000 -c -l 2 src/main/resources/topics-and-qrels/qrels.dl19-passage.txt runs/run.msmarco-passage-docTTTTTquery.bm25-tuned+rocchio-neg.topics.dl19-passage.txt + tools/eval/trec_eval.9.0.4/trec_eval -m map -c -l 2 src/main/resources/topics-and-qrels/qrels.dl19-passage.txt runs/run.msmarco-passage-docTTTTTquery.bm25-tuned2.topics.dl19-passage.txt tools/eval/trec_eval.9.0.4/trec_eval -m ndcg_cut.10 -c src/main/resources/topics-and-qrels/qrels.dl19-passage.txt runs/run.msmarco-passage-docTTTTTquery.bm25-tuned2.topics.dl19-passage.txt tools/eval/trec_eval.9.0.4/trec_eval -m recall.100 -c -l 2 src/main/resources/topics-and-qrels/qrels.dl19-passage.txt runs/run.msmarco-passage-docTTTTTquery.bm25-tuned2.topics.dl19-passage.txt @@ -120,21 +182,31 @@ tools/eval/trec_eval.9.0.4/trec_eval -m map -c -l 2 src/main/resources/topics-an tools/eval/trec_eval.9.0.4/trec_eval -m ndcg_cut.10 -c src/main/resources/topics-and-qrels/qrels.dl19-passage.txt runs/run.msmarco-passage-docTTTTTquery.bm25-tuned2+rm3.topics.dl19-passage.txt tools/eval/trec_eval.9.0.4/trec_eval -m recall.100 -c -l 2 src/main/resources/topics-and-qrels/qrels.dl19-passage.txt runs/run.msmarco-passage-docTTTTTquery.bm25-tuned2+rm3.topics.dl19-passage.txt tools/eval/trec_eval.9.0.4/trec_eval -m recall.1000 -c -l 2 src/main/resources/topics-and-qrels/qrels.dl19-passage.txt runs/run.msmarco-passage-docTTTTTquery.bm25-tuned2+rm3.topics.dl19-passage.txt + +tools/eval/trec_eval.9.0.4/trec_eval -m map -c -l 2 src/main/resources/topics-and-qrels/qrels.dl19-passage.txt runs/run.msmarco-passage-docTTTTTquery.bm25-tuned2+rocchio.topics.dl19-passage.txt +tools/eval/trec_eval.9.0.4/trec_eval -m ndcg_cut.10 -c src/main/resources/topics-and-qrels/qrels.dl19-passage.txt runs/run.msmarco-passage-docTTTTTquery.bm25-tuned2+rocchio.topics.dl19-passage.txt +tools/eval/trec_eval.9.0.4/trec_eval -m recall.100 -c -l 2 src/main/resources/topics-and-qrels/qrels.dl19-passage.txt runs/run.msmarco-passage-docTTTTTquery.bm25-tuned2+rocchio.topics.dl19-passage.txt +tools/eval/trec_eval.9.0.4/trec_eval -m recall.1000 -c -l 2 src/main/resources/topics-and-qrels/qrels.dl19-passage.txt runs/run.msmarco-passage-docTTTTTquery.bm25-tuned2+rocchio.topics.dl19-passage.txt + +tools/eval/trec_eval.9.0.4/trec_eval -m map -c -l 2 src/main/resources/topics-and-qrels/qrels.dl19-passage.txt runs/run.msmarco-passage-docTTTTTquery.bm25-tuned2+rocchio-neg.topics.dl19-passage.txt +tools/eval/trec_eval.9.0.4/trec_eval -m ndcg_cut.10 -c src/main/resources/topics-and-qrels/qrels.dl19-passage.txt runs/run.msmarco-passage-docTTTTTquery.bm25-tuned2+rocchio-neg.topics.dl19-passage.txt +tools/eval/trec_eval.9.0.4/trec_eval -m recall.100 -c -l 2 src/main/resources/topics-and-qrels/qrels.dl19-passage.txt runs/run.msmarco-passage-docTTTTTquery.bm25-tuned2+rocchio-neg.topics.dl19-passage.txt +tools/eval/trec_eval.9.0.4/trec_eval -m recall.1000 -c -l 2 src/main/resources/topics-and-qrels/qrels.dl19-passage.txt runs/run.msmarco-passage-docTTTTTquery.bm25-tuned2+rocchio-neg.topics.dl19-passage.txt ``` ## Effectiveness With the above commands, you should be able to reproduce the following results: -| **AP@1000** | **BM25 (default)**| **+RM3** | **BM25 (tuned)**| **+RM3** | **BM25 (tuned2)**| **+RM3** | -|:-------------------------------------------------------------------------------------------------------------|-----------|-----------|-----------|-----------|-----------|-----------| -| [DL19 (Passage)](https://trec.nist.gov/data/deep2019.html) | 0.4034 | 0.4485 | 0.4052 | 0.4520 | 0.4046 | 0.4360 | -| **nDCG@10** | **BM25 (default)**| **+RM3** | **BM25 (tuned)**| **+RM3** | **BM25 (tuned2)**| **+RM3** | -| [DL19 (Passage)](https://trec.nist.gov/data/deep2019.html) | 0.6417 | 0.6548 | 0.6482 | 0.6614 | 0.6336 | 0.6528 | -| **R@100** | **BM25 (default)**| **+RM3** | **BM25 (tuned)**| **+RM3** | **BM25 (tuned2)**| **+RM3** | -| [DL19 (Passage)](https://trec.nist.gov/data/deep2019.html) | 0.5917 | 0.6335 | 0.5910 | 0.6402 | 0.5880 | 0.6046 | -| **R@1000** | **BM25 (default)**| **+RM3** | **BM25 (tuned)**| **+RM3** | **BM25 (tuned2)**| **+RM3** | -| [DL19 (Passage)](https://trec.nist.gov/data/deep2019.html) | 0.8310 | 0.8861 | 0.8269 | 0.8826 | 0.8134 | 0.8424 | +| **AP@1000** | **BM25 (default)**| **+RM3** | **+Rocchio**| **+Rocchio***| **BM25 (tuned)**| **+RM3** | **+Rocchio**| **+Rocchio***| **BM25 (tuned2)**| **+RM3** | **+Rocchio**| **+Rocchio***| +|:-------------------------------------------------------------------------------------------------------------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------| +| [DL19 (Passage)](https://trec.nist.gov/data/deep2019.html) | 0.4034 | 0.4485 | 0.4469 | 0.4441 | 0.4052 | 0.4520 | 0.4525 | 0.4511 | 0.4046 | 0.4360 | 0.4339 | 0.4338 | +| **nDCG@10** | **BM25 (default)**| **+RM3** | **+Rocchio**| **+Rocchio***| **BM25 (tuned)**| **+RM3** | **+Rocchio**| **+Rocchio***| **BM25 (tuned2)**| **+RM3** | **+Rocchio**| **+Rocchio***| +| [DL19 (Passage)](https://trec.nist.gov/data/deep2019.html) | 0.6417 | 0.6548 | 0.6538 | 0.6444 | 0.6482 | 0.6614 | 0.6617 | 0.6591 | 0.6336 | 0.6528 | 0.6559 | 0.6558 | +| **R@100** | **BM25 (default)**| **+RM3** | **+Rocchio**| **+Rocchio***| **BM25 (tuned)**| **+RM3** | **+Rocchio**| **+Rocchio***| **BM25 (tuned2)**| **+RM3** | **+Rocchio**| **+Rocchio***| +| [DL19 (Passage)](https://trec.nist.gov/data/deep2019.html) | 0.5917 | 0.6335 | 0.6338 | 0.6350 | 0.5910 | 0.6402 | 0.6406 | 0.6395 | 0.5880 | 0.6046 | 0.6014 | 0.5981 | +| **R@1000** | **BM25 (default)**| **+RM3** | **+Rocchio**| **+Rocchio***| **BM25 (tuned)**| **+RM3** | **+Rocchio**| **+Rocchio***| **BM25 (tuned2)**| **+RM3** | **+Rocchio**| **+Rocchio***| +| [DL19 (Passage)](https://trec.nist.gov/data/deep2019.html) | 0.8310 | 0.8861 | 0.8855 | 0.8861 | 0.8269 | 0.8826 | 0.8838 | 0.8877 | 0.8134 | 0.8424 | 0.8465 | 0.8488 | Explanation of settings: diff --git a/docs/regressions-dl20-doc-docTTTTTquery.md b/docs/regressions-dl20-doc-docTTTTTquery.md index c8b4723d54..4a2688021a 100644 --- a/docs/regressions-dl20-doc-docTTTTTquery.md +++ b/docs/regressions-dl20-doc-docTTTTTquery.md @@ -69,6 +69,13 @@ target/appassembler/bin/SearchCollection \ -output runs/run.msmarco-doc-docTTTTTquery.bm25-default+rm3.topics.dl20.txt \ -bm25 -rm3 & +target/appassembler/bin/SearchCollection \ + -index indexes/lucene-index.msmarco-doc-docTTTTTquery/ \ + -topics src/main/resources/topics-and-qrels/topics.dl20.txt \ + -topicreader TsvInt \ + -output runs/run.msmarco-doc-docTTTTTquery.bm25-default+rocchio.topics.dl20.txt \ + -bm25 -rocchio & + target/appassembler/bin/SearchCollection \ -index indexes/lucene-index.msmarco-doc-docTTTTTquery/ \ -topics src/main/resources/topics-and-qrels/topics.dl20.txt \ @@ -82,6 +89,13 @@ target/appassembler/bin/SearchCollection \ -topicreader TsvInt \ -output runs/run.msmarco-doc-docTTTTTquery.bm25-tuned+rm3.topics.dl20.txt \ -bm25 -bm25.k1 4.68 -bm25.b 0.87 -rm3 & + +target/appassembler/bin/SearchCollection \ + -index indexes/lucene-index.msmarco-doc-docTTTTTquery/ \ + -topics src/main/resources/topics-and-qrels/topics.dl20.txt \ + -topicreader TsvInt \ + -output runs/run.msmarco-doc-docTTTTTquery.bm25-tuned+rocchio.topics.dl20.txt \ + -bm25 -bm25.k1 4.68 -bm25.b 0.87 -rocchio & ``` Evaluation can be performed using `trec_eval`: @@ -97,6 +111,11 @@ tools/eval/trec_eval.9.0.4/trec_eval -c -m ndcg_cut.10 src/main/resources/topics tools/eval/trec_eval.9.0.4/trec_eval -c -m recall.100 src/main/resources/topics-and-qrels/qrels.dl20-doc.txt runs/run.msmarco-doc-docTTTTTquery.bm25-default+rm3.topics.dl20.txt tools/eval/trec_eval.9.0.4/trec_eval -c -m recall.1000 src/main/resources/topics-and-qrels/qrels.dl20-doc.txt runs/run.msmarco-doc-docTTTTTquery.bm25-default+rm3.topics.dl20.txt +tools/eval/trec_eval.9.0.4/trec_eval -c -M 100 -m map src/main/resources/topics-and-qrels/qrels.dl20-doc.txt runs/run.msmarco-doc-docTTTTTquery.bm25-default+rocchio.topics.dl20.txt +tools/eval/trec_eval.9.0.4/trec_eval -c -m ndcg_cut.10 src/main/resources/topics-and-qrels/qrels.dl20-doc.txt runs/run.msmarco-doc-docTTTTTquery.bm25-default+rocchio.topics.dl20.txt +tools/eval/trec_eval.9.0.4/trec_eval -c -m recall.100 src/main/resources/topics-and-qrels/qrels.dl20-doc.txt runs/run.msmarco-doc-docTTTTTquery.bm25-default+rocchio.topics.dl20.txt +tools/eval/trec_eval.9.0.4/trec_eval -c -m recall.1000 src/main/resources/topics-and-qrels/qrels.dl20-doc.txt runs/run.msmarco-doc-docTTTTTquery.bm25-default+rocchio.topics.dl20.txt + tools/eval/trec_eval.9.0.4/trec_eval -c -M 100 -m map src/main/resources/topics-and-qrels/qrels.dl20-doc.txt runs/run.msmarco-doc-docTTTTTquery.bm25-tuned.topics.dl20.txt tools/eval/trec_eval.9.0.4/trec_eval -c -m ndcg_cut.10 src/main/resources/topics-and-qrels/qrels.dl20-doc.txt runs/run.msmarco-doc-docTTTTTquery.bm25-tuned.topics.dl20.txt tools/eval/trec_eval.9.0.4/trec_eval -c -m recall.100 src/main/resources/topics-and-qrels/qrels.dl20-doc.txt runs/run.msmarco-doc-docTTTTTquery.bm25-tuned.topics.dl20.txt @@ -106,26 +125,31 @@ tools/eval/trec_eval.9.0.4/trec_eval -c -M 100 -m map src/main/resources/topics- tools/eval/trec_eval.9.0.4/trec_eval -c -m ndcg_cut.10 src/main/resources/topics-and-qrels/qrels.dl20-doc.txt runs/run.msmarco-doc-docTTTTTquery.bm25-tuned+rm3.topics.dl20.txt tools/eval/trec_eval.9.0.4/trec_eval -c -m recall.100 src/main/resources/topics-and-qrels/qrels.dl20-doc.txt runs/run.msmarco-doc-docTTTTTquery.bm25-tuned+rm3.topics.dl20.txt tools/eval/trec_eval.9.0.4/trec_eval -c -m recall.1000 src/main/resources/topics-and-qrels/qrels.dl20-doc.txt runs/run.msmarco-doc-docTTTTTquery.bm25-tuned+rm3.topics.dl20.txt + +tools/eval/trec_eval.9.0.4/trec_eval -c -M 100 -m map src/main/resources/topics-and-qrels/qrels.dl20-doc.txt runs/run.msmarco-doc-docTTTTTquery.bm25-tuned+rocchio.topics.dl20.txt +tools/eval/trec_eval.9.0.4/trec_eval -c -m ndcg_cut.10 src/main/resources/topics-and-qrels/qrels.dl20-doc.txt runs/run.msmarco-doc-docTTTTTquery.bm25-tuned+rocchio.topics.dl20.txt +tools/eval/trec_eval.9.0.4/trec_eval -c -m recall.100 src/main/resources/topics-and-qrels/qrels.dl20-doc.txt runs/run.msmarco-doc-docTTTTTquery.bm25-tuned+rocchio.topics.dl20.txt +tools/eval/trec_eval.9.0.4/trec_eval -c -m recall.1000 src/main/resources/topics-and-qrels/qrels.dl20-doc.txt runs/run.msmarco-doc-docTTTTTquery.bm25-tuned+rocchio.topics.dl20.txt ``` ## Effectiveness With the above commands, you should be able to reproduce the following results: -| **AP@100** | **BM25 (default)**| **+RM3** | **BM25 (tuned)**| **+RM3** | -|:-------------------------------------------------------------------------------------------------------------|-----------|-----------|-----------|-----------| -| [DL20 (Doc)](https://trec.nist.gov/data/deep2020.html) | 0.4230 | 0.4229 | 0.4099 | 0.4104 | -| **nDCG@10** | **BM25 (default)**| **+RM3** | **BM25 (tuned)**| **+RM3** | -| [DL20 (Doc)](https://trec.nist.gov/data/deep2020.html) | 0.5885 | 0.5407 | 0.5852 | 0.5743 | -| **R@100** | **BM25 (default)**| **+RM3** | **BM25 (tuned)**| **+RM3** | -| [DL20 (Doc)](https://trec.nist.gov/data/deep2020.html) | 0.6414 | 0.6555 | 0.6178 | 0.6127 | -| **R@1000** | **BM25 (default)**| **+RM3** | **BM25 (tuned)**| **+RM3** | -| [DL20 (Doc)](https://trec.nist.gov/data/deep2020.html) | 0.8403 | 0.8596 | 0.8105 | 0.8240 | +| **AP@100** | **BM25 (default)**| **+RM3** | **+Rocchio**| **BM25 (tuned)**| **+RM3** | **+Rocchio**| +|:-------------------------------------------------------------------------------------------------------------|-----------|-----------|-----------|-----------|-----------|-----------| +| [DL20 (Doc)](https://trec.nist.gov/data/deep2020.html) | 0.4230 | 0.4229 | 0.4218 | 0.4099 | 0.4104 | 0.4151 | +| **nDCG@10** | **BM25 (default)**| **+RM3** | **+Rocchio**| **BM25 (tuned)**| **+RM3** | **+Rocchio**| +| [DL20 (Doc)](https://trec.nist.gov/data/deep2020.html) | 0.5885 | 0.5407 | 0.5416 | 0.5852 | 0.5743 | 0.5733 | +| **R@100** | **BM25 (default)**| **+RM3** | **+Rocchio**| **BM25 (tuned)**| **+RM3** | **+Rocchio**| +| [DL20 (Doc)](https://trec.nist.gov/data/deep2020.html) | 0.6414 | 0.6555 | 0.6627 | 0.6178 | 0.6127 | 0.6230 | +| **R@1000** | **BM25 (default)**| **+RM3** | **+Rocchio**| **BM25 (tuned)**| **+RM3** | **+Rocchio**| +| [DL20 (Doc)](https://trec.nist.gov/data/deep2020.html) | 0.8403 | 0.8596 | 0.8641 | 0.8105 | 0.8240 | 0.8316 | Explanation of settings: + The setting "default" refers the default BM25 settings of `k1=0.9`, `b=0.4`. -+ The setting "tuned" refers to `k1=4.68`, `b=0.87`, tuned using the MS MARCO document sparse judgments to optimize for recall@100 (i.e., for first-stage retrieval) on 2019/12. ++ The setting "tuned" refers to `k1=4.68`, `b=0.87`, tuned in 2020/12 using the MS MARCO document sparse judgments to optimize for recall@100 (i.e., for first-stage retrieval). Settings tuned on the MS MARCO document sparse judgments _may not_ work well on the TREC dense judgments. diff --git a/docs/regressions-dl20-doc-segmented-docTTTTTquery.md b/docs/regressions-dl20-doc-segmented-docTTTTTquery.md index 5ceeed476a..59f67d6114 100644 --- a/docs/regressions-dl20-doc-segmented-docTTTTTquery.md +++ b/docs/regressions-dl20-doc-segmented-docTTTTTquery.md @@ -70,6 +70,13 @@ target/appassembler/bin/SearchCollection \ -output runs/run.msmarco-doc-segmented-docTTTTTquery.bm25-default+rm3.topics.dl20.txt \ -bm25 -rm3 -hits 10000 -selectMaxPassage -selectMaxPassage.delimiter "#" -selectMaxPassage.hits 1000 & +target/appassembler/bin/SearchCollection \ + -index indexes/lucene-index.msmarco-doc-segmented-docTTTTTquery/ \ + -topics src/main/resources/topics-and-qrels/topics.dl20.txt \ + -topicreader TsvInt \ + -output runs/run.msmarco-doc-segmented-docTTTTTquery.bm25-default+rocchio.topics.dl20.txt \ + -bm25 -rocchio -hits 10000 -selectMaxPassage -selectMaxPassage.delimiter "#" -selectMaxPassage.hits 1000 & + target/appassembler/bin/SearchCollection \ -index indexes/lucene-index.msmarco-doc-segmented-docTTTTTquery/ \ -topics src/main/resources/topics-and-qrels/topics.dl20.txt \ @@ -83,6 +90,13 @@ target/appassembler/bin/SearchCollection \ -topicreader TsvInt \ -output runs/run.msmarco-doc-segmented-docTTTTTquery.bm25-tuned+rm3.topics.dl20.txt \ -bm25 -bm25.k1 2.56 -bm25.b 0.59 -rm3 -hits 10000 -selectMaxPassage -selectMaxPassage.delimiter "#" -selectMaxPassage.hits 1000 & + +target/appassembler/bin/SearchCollection \ + -index indexes/lucene-index.msmarco-doc-segmented-docTTTTTquery/ \ + -topics src/main/resources/topics-and-qrels/topics.dl20.txt \ + -topicreader TsvInt \ + -output runs/run.msmarco-doc-segmented-docTTTTTquery.bm25-tuned+rocchio.topics.dl20.txt \ + -bm25 -bm25.k1 2.56 -bm25.b 0.59 -rocchio -hits 10000 -selectMaxPassage -selectMaxPassage.delimiter "#" -selectMaxPassage.hits 1000 & ``` Evaluation can be performed using `trec_eval`: @@ -98,6 +112,11 @@ tools/eval/trec_eval.9.0.4/trec_eval -c -m ndcg_cut.10 src/main/resources/topics tools/eval/trec_eval.9.0.4/trec_eval -c -m recall.100 src/main/resources/topics-and-qrels/qrels.dl20-doc.txt runs/run.msmarco-doc-segmented-docTTTTTquery.bm25-default+rm3.topics.dl20.txt tools/eval/trec_eval.9.0.4/trec_eval -c -m recall.1000 src/main/resources/topics-and-qrels/qrels.dl20-doc.txt runs/run.msmarco-doc-segmented-docTTTTTquery.bm25-default+rm3.topics.dl20.txt +tools/eval/trec_eval.9.0.4/trec_eval -c -M 100 -m map src/main/resources/topics-and-qrels/qrels.dl20-doc.txt runs/run.msmarco-doc-segmented-docTTTTTquery.bm25-default+rocchio.topics.dl20.txt +tools/eval/trec_eval.9.0.4/trec_eval -c -m ndcg_cut.10 src/main/resources/topics-and-qrels/qrels.dl20-doc.txt runs/run.msmarco-doc-segmented-docTTTTTquery.bm25-default+rocchio.topics.dl20.txt +tools/eval/trec_eval.9.0.4/trec_eval -c -m recall.100 src/main/resources/topics-and-qrels/qrels.dl20-doc.txt runs/run.msmarco-doc-segmented-docTTTTTquery.bm25-default+rocchio.topics.dl20.txt +tools/eval/trec_eval.9.0.4/trec_eval -c -m recall.1000 src/main/resources/topics-and-qrels/qrels.dl20-doc.txt runs/run.msmarco-doc-segmented-docTTTTTquery.bm25-default+rocchio.topics.dl20.txt + tools/eval/trec_eval.9.0.4/trec_eval -c -M 100 -m map src/main/resources/topics-and-qrels/qrels.dl20-doc.txt runs/run.msmarco-doc-segmented-docTTTTTquery.bm25-tuned.topics.dl20.txt tools/eval/trec_eval.9.0.4/trec_eval -c -m ndcg_cut.10 src/main/resources/topics-and-qrels/qrels.dl20-doc.txt runs/run.msmarco-doc-segmented-docTTTTTquery.bm25-tuned.topics.dl20.txt tools/eval/trec_eval.9.0.4/trec_eval -c -m recall.100 src/main/resources/topics-and-qrels/qrels.dl20-doc.txt runs/run.msmarco-doc-segmented-docTTTTTquery.bm25-tuned.topics.dl20.txt @@ -107,26 +126,31 @@ tools/eval/trec_eval.9.0.4/trec_eval -c -M 100 -m map src/main/resources/topics- tools/eval/trec_eval.9.0.4/trec_eval -c -m ndcg_cut.10 src/main/resources/topics-and-qrels/qrels.dl20-doc.txt runs/run.msmarco-doc-segmented-docTTTTTquery.bm25-tuned+rm3.topics.dl20.txt tools/eval/trec_eval.9.0.4/trec_eval -c -m recall.100 src/main/resources/topics-and-qrels/qrels.dl20-doc.txt runs/run.msmarco-doc-segmented-docTTTTTquery.bm25-tuned+rm3.topics.dl20.txt tools/eval/trec_eval.9.0.4/trec_eval -c -m recall.1000 src/main/resources/topics-and-qrels/qrels.dl20-doc.txt runs/run.msmarco-doc-segmented-docTTTTTquery.bm25-tuned+rm3.topics.dl20.txt + +tools/eval/trec_eval.9.0.4/trec_eval -c -M 100 -m map src/main/resources/topics-and-qrels/qrels.dl20-doc.txt runs/run.msmarco-doc-segmented-docTTTTTquery.bm25-tuned+rocchio.topics.dl20.txt +tools/eval/trec_eval.9.0.4/trec_eval -c -m ndcg_cut.10 src/main/resources/topics-and-qrels/qrels.dl20-doc.txt runs/run.msmarco-doc-segmented-docTTTTTquery.bm25-tuned+rocchio.topics.dl20.txt +tools/eval/trec_eval.9.0.4/trec_eval -c -m recall.100 src/main/resources/topics-and-qrels/qrels.dl20-doc.txt runs/run.msmarco-doc-segmented-docTTTTTquery.bm25-tuned+rocchio.topics.dl20.txt +tools/eval/trec_eval.9.0.4/trec_eval -c -m recall.1000 src/main/resources/topics-and-qrels/qrels.dl20-doc.txt runs/run.msmarco-doc-segmented-docTTTTTquery.bm25-tuned+rocchio.topics.dl20.txt ``` ## Effectiveness With the above commands, you should be able to reproduce the following results: -| **AP@100** | **BM25 (default)**| **+RM3** | **BM25 (tuned)**| **+RM3** | -|:-------------------------------------------------------------------------------------------------------------|-----------|-----------|-----------|-----------| -| [DL20 (Doc)](https://trec.nist.gov/data/deep2020.html) | 0.4150 | 0.4268 | 0.4047 | 0.4025 | -| **nDCG@10** | **BM25 (default)**| **+RM3** | **BM25 (tuned)**| **+RM3** | -| [DL20 (Doc)](https://trec.nist.gov/data/deep2020.html) | 0.5957 | 0.5850 | 0.5943 | 0.5724 | -| **R@100** | **BM25 (default)**| **+RM3** | **BM25 (tuned)**| **+RM3** | -| [DL20 (Doc)](https://trec.nist.gov/data/deep2020.html) | 0.6201 | 0.6443 | 0.6195 | 0.6394 | -| **R@1000** | **BM25 (default)**| **+RM3** | **BM25 (tuned)**| **+RM3** | -| [DL20 (Doc)](https://trec.nist.gov/data/deep2020.html) | 0.8046 | 0.8270 | 0.7968 | 0.8172 | +| **AP@100** | **BM25 (default)**| **+RM3** | **+Rocchio**| **BM25 (tuned)**| **+RM3** | **+Rocchio**| +|:-------------------------------------------------------------------------------------------------------------|-----------|-----------|-----------|-----------|-----------|-----------| +| [DL20 (Doc)](https://trec.nist.gov/data/deep2020.html) | 0.4150 | 0.4268 | 0.4297 | 0.4047 | 0.4025 | 0.4084 | +| **nDCG@10** | **BM25 (default)**| **+RM3** | **+Rocchio**| **BM25 (tuned)**| **+RM3** | **+Rocchio**| +| [DL20 (Doc)](https://trec.nist.gov/data/deep2020.html) | 0.5957 | 0.5850 | 0.5873 | 0.5943 | 0.5724 | 0.5809 | +| **R@100** | **BM25 (default)**| **+RM3** | **+Rocchio**| **BM25 (tuned)**| **+RM3** | **+Rocchio**| +| [DL20 (Doc)](https://trec.nist.gov/data/deep2020.html) | 0.6201 | 0.6443 | 0.6475 | 0.6195 | 0.6394 | 0.6432 | +| **R@1000** | **BM25 (default)**| **+RM3** | **+Rocchio**| **BM25 (tuned)**| **+RM3** | **+Rocchio**| +| [DL20 (Doc)](https://trec.nist.gov/data/deep2020.html) | 0.8046 | 0.8270 | 0.8365 | 0.7968 | 0.8172 | 0.8233 | Explanation of settings: + The setting "default" refers the default BM25 settings of `k1=0.9`, `b=0.4`. -+ The setting "tuned" refers to `k1=2.56`, `b=0.59`, tuned using the MS MARCO document sparse judgments to optimize for recall@100 (i.e., for first-stage retrieval) on 2019/12. ++ The setting "tuned" refers to `k1=2.56`, `b=0.59`, tuned in 2020/12 using the MS MARCO document sparse judgments to optimize for recall@100 (i.e., for first-stage retrieval). Settings tuned on the MS MARCO document sparse judgments _may not_ work well on the TREC dense judgments. diff --git a/docs/regressions-dl20-doc-segmented.md b/docs/regressions-dl20-doc-segmented.md index c9f7c54bb6..6aefffa2b1 100644 --- a/docs/regressions-dl20-doc-segmented.md +++ b/docs/regressions-dl20-doc-segmented.md @@ -222,7 +222,7 @@ With the above commands, you should be able to reproduce the following results: Explanation of settings: + The setting "default" refers the default BM25 settings of `k1=0.9`, `b=0.4`. -+ The setting "tuned" refers to `k1=2.16`, `b=0.61`, tuned using the MS MARCO document sparse judgments to optimize for recall@100 (i.e., for first-stage retrieval) on 2020/12. ++ The setting "tuned" refers to `k1=2.16`, `b=0.61`, tuned in 2020/12 using the MS MARCO document sparse judgments to optimize for recall@100 (i.e., for first-stage retrieval). Settings tuned on the MS MARCO document sparse judgments _may not_ work well on the TREC dense judgments. diff --git a/docs/regressions-dl20-doc.md b/docs/regressions-dl20-doc.md index c7cee750ec..d10923e7fa 100644 --- a/docs/regressions-dl20-doc.md +++ b/docs/regressions-dl20-doc.md @@ -83,6 +83,20 @@ target/appassembler/bin/SearchCollection \ -output runs/run.msmacro-doc.bm25-default+rocchio-neg.topics.dl20.txt \ -bm25 -rocchio -rocchio.useNegative -rerankCutoff 1000 & +target/appassembler/bin/SearchCollection \ + -index indexes/lucene-index.msmarco-doc/ \ + -topics src/main/resources/topics-and-qrels/topics.dl20.txt \ + -topicreader TsvInt \ + -output runs/run.msmacro-doc.bm25-default+ax.topics.dl20.txt \ + -bm25 -axiom -axiom.deterministic -rerankCutoff 20 & + +target/appassembler/bin/SearchCollection \ + -index indexes/lucene-index.msmarco-doc/ \ + -topics src/main/resources/topics-and-qrels/topics.dl20.txt \ + -topicreader TsvInt \ + -output runs/run.msmacro-doc.bm25-default+prf.topics.dl20.txt \ + -bm25 -bm25prf & + target/appassembler/bin/SearchCollection \ -index indexes/lucene-index.msmarco-doc/ \ -topics src/main/resources/topics-and-qrels/topics.dl20.txt \ @@ -111,6 +125,20 @@ target/appassembler/bin/SearchCollection \ -output runs/run.msmacro-doc.bm25-tuned+rocchio-neg.topics.dl20.txt \ -bm25 -bm25.k1 3.44 -bm25.b 0.87 -rocchio -rocchio.useNegative -rerankCutoff 1000 & +target/appassembler/bin/SearchCollection \ + -index indexes/lucene-index.msmarco-doc/ \ + -topics src/main/resources/topics-and-qrels/topics.dl20.txt \ + -topicreader TsvInt \ + -output runs/run.msmacro-doc.bm25-tuned+ax.topics.dl20.txt \ + -bm25 -bm25.k1 3.44 -bm25.b 0.87 -axiom -axiom.deterministic -rerankCutoff 20 & + +target/appassembler/bin/SearchCollection \ + -index indexes/lucene-index.msmarco-doc/ \ + -topics src/main/resources/topics-and-qrels/topics.dl20.txt \ + -topicreader TsvInt \ + -output runs/run.msmacro-doc.bm25-tuned+prf.topics.dl20.txt \ + -bm25 -bm25.k1 3.44 -bm25.b 0.87 -bm25prf & + target/appassembler/bin/SearchCollection \ -index indexes/lucene-index.msmarco-doc/ \ -topics src/main/resources/topics-and-qrels/topics.dl20.txt \ @@ -138,6 +166,20 @@ target/appassembler/bin/SearchCollection \ -topicreader TsvInt \ -output runs/run.msmacro-doc.bm25-tuned2+rocchio-neg.topics.dl20.txt \ -bm25 -bm25.k1 4.46 -bm25.b 0.82 -rocchio -rocchio.useNegative -rerankCutoff 1000 & + +target/appassembler/bin/SearchCollection \ + -index indexes/lucene-index.msmarco-doc/ \ + -topics src/main/resources/topics-and-qrels/topics.dl20.txt \ + -topicreader TsvInt \ + -output runs/run.msmacro-doc.bm25-tuned2+ax.topics.dl20.txt \ + -bm25 -bm25.k1 4.46 -bm25.b 0.82 -axiom -axiom.deterministic -rerankCutoff 20 & + +target/appassembler/bin/SearchCollection \ + -index indexes/lucene-index.msmarco-doc/ \ + -topics src/main/resources/topics-and-qrels/topics.dl20.txt \ + -topicreader TsvInt \ + -output runs/run.msmacro-doc.bm25-tuned2+prf.topics.dl20.txt \ + -bm25 -bm25.k1 4.46 -bm25.b 0.82 -bm25prf & ``` Evaluation can be performed using `trec_eval`: @@ -163,6 +205,16 @@ tools/eval/trec_eval.9.0.4/trec_eval -c -m ndcg_cut.10 src/main/resources/topics tools/eval/trec_eval.9.0.4/trec_eval -c -m recall.100 src/main/resources/topics-and-qrels/qrels.dl20-doc.txt runs/run.msmacro-doc.bm25-default+rocchio-neg.topics.dl20.txt tools/eval/trec_eval.9.0.4/trec_eval -c -m recall.1000 src/main/resources/topics-and-qrels/qrels.dl20-doc.txt runs/run.msmacro-doc.bm25-default+rocchio-neg.topics.dl20.txt +tools/eval/trec_eval.9.0.4/trec_eval -c -M 100 -m map src/main/resources/topics-and-qrels/qrels.dl20-doc.txt runs/run.msmacro-doc.bm25-default+ax.topics.dl20.txt +tools/eval/trec_eval.9.0.4/trec_eval -c -m ndcg_cut.10 src/main/resources/topics-and-qrels/qrels.dl20-doc.txt runs/run.msmacro-doc.bm25-default+ax.topics.dl20.txt +tools/eval/trec_eval.9.0.4/trec_eval -c -m recall.100 src/main/resources/topics-and-qrels/qrels.dl20-doc.txt runs/run.msmacro-doc.bm25-default+ax.topics.dl20.txt +tools/eval/trec_eval.9.0.4/trec_eval -c -m recall.1000 src/main/resources/topics-and-qrels/qrels.dl20-doc.txt runs/run.msmacro-doc.bm25-default+ax.topics.dl20.txt + +tools/eval/trec_eval.9.0.4/trec_eval -c -M 100 -m map src/main/resources/topics-and-qrels/qrels.dl20-doc.txt runs/run.msmacro-doc.bm25-default+prf.topics.dl20.txt +tools/eval/trec_eval.9.0.4/trec_eval -c -m ndcg_cut.10 src/main/resources/topics-and-qrels/qrels.dl20-doc.txt runs/run.msmacro-doc.bm25-default+prf.topics.dl20.txt +tools/eval/trec_eval.9.0.4/trec_eval -c -m recall.100 src/main/resources/topics-and-qrels/qrels.dl20-doc.txt runs/run.msmacro-doc.bm25-default+prf.topics.dl20.txt +tools/eval/trec_eval.9.0.4/trec_eval -c -m recall.1000 src/main/resources/topics-and-qrels/qrels.dl20-doc.txt runs/run.msmacro-doc.bm25-default+prf.topics.dl20.txt + tools/eval/trec_eval.9.0.4/trec_eval -c -M 100 -m map src/main/resources/topics-and-qrels/qrels.dl20-doc.txt runs/run.msmacro-doc.bm25-tuned.topics.dl20.txt tools/eval/trec_eval.9.0.4/trec_eval -c -m ndcg_cut.10 src/main/resources/topics-and-qrels/qrels.dl20-doc.txt runs/run.msmacro-doc.bm25-tuned.topics.dl20.txt tools/eval/trec_eval.9.0.4/trec_eval -c -m recall.100 src/main/resources/topics-and-qrels/qrels.dl20-doc.txt runs/run.msmacro-doc.bm25-tuned.topics.dl20.txt @@ -183,6 +235,16 @@ tools/eval/trec_eval.9.0.4/trec_eval -c -m ndcg_cut.10 src/main/resources/topics tools/eval/trec_eval.9.0.4/trec_eval -c -m recall.100 src/main/resources/topics-and-qrels/qrels.dl20-doc.txt runs/run.msmacro-doc.bm25-tuned+rocchio-neg.topics.dl20.txt tools/eval/trec_eval.9.0.4/trec_eval -c -m recall.1000 src/main/resources/topics-and-qrels/qrels.dl20-doc.txt runs/run.msmacro-doc.bm25-tuned+rocchio-neg.topics.dl20.txt +tools/eval/trec_eval.9.0.4/trec_eval -c -M 100 -m map src/main/resources/topics-and-qrels/qrels.dl20-doc.txt runs/run.msmacro-doc.bm25-tuned+ax.topics.dl20.txt +tools/eval/trec_eval.9.0.4/trec_eval -c -m ndcg_cut.10 src/main/resources/topics-and-qrels/qrels.dl20-doc.txt runs/run.msmacro-doc.bm25-tuned+ax.topics.dl20.txt +tools/eval/trec_eval.9.0.4/trec_eval -c -m recall.100 src/main/resources/topics-and-qrels/qrels.dl20-doc.txt runs/run.msmacro-doc.bm25-tuned+ax.topics.dl20.txt +tools/eval/trec_eval.9.0.4/trec_eval -c -m recall.1000 src/main/resources/topics-and-qrels/qrels.dl20-doc.txt runs/run.msmacro-doc.bm25-tuned+ax.topics.dl20.txt + +tools/eval/trec_eval.9.0.4/trec_eval -c -M 100 -m map src/main/resources/topics-and-qrels/qrels.dl20-doc.txt runs/run.msmacro-doc.bm25-tuned+prf.topics.dl20.txt +tools/eval/trec_eval.9.0.4/trec_eval -c -m ndcg_cut.10 src/main/resources/topics-and-qrels/qrels.dl20-doc.txt runs/run.msmacro-doc.bm25-tuned+prf.topics.dl20.txt +tools/eval/trec_eval.9.0.4/trec_eval -c -m recall.100 src/main/resources/topics-and-qrels/qrels.dl20-doc.txt runs/run.msmacro-doc.bm25-tuned+prf.topics.dl20.txt +tools/eval/trec_eval.9.0.4/trec_eval -c -m recall.1000 src/main/resources/topics-and-qrels/qrels.dl20-doc.txt runs/run.msmacro-doc.bm25-tuned+prf.topics.dl20.txt + tools/eval/trec_eval.9.0.4/trec_eval -c -M 100 -m map src/main/resources/topics-and-qrels/qrels.dl20-doc.txt runs/run.msmacro-doc.bm25-tuned2.topics.dl20.txt tools/eval/trec_eval.9.0.4/trec_eval -c -m ndcg_cut.10 src/main/resources/topics-and-qrels/qrels.dl20-doc.txt runs/run.msmacro-doc.bm25-tuned2.topics.dl20.txt tools/eval/trec_eval.9.0.4/trec_eval -c -m recall.100 src/main/resources/topics-and-qrels/qrels.dl20-doc.txt runs/run.msmacro-doc.bm25-tuned2.topics.dl20.txt @@ -202,27 +264,37 @@ tools/eval/trec_eval.9.0.4/trec_eval -c -M 100 -m map src/main/resources/topics- tools/eval/trec_eval.9.0.4/trec_eval -c -m ndcg_cut.10 src/main/resources/topics-and-qrels/qrels.dl20-doc.txt runs/run.msmacro-doc.bm25-tuned2+rocchio-neg.topics.dl20.txt tools/eval/trec_eval.9.0.4/trec_eval -c -m recall.100 src/main/resources/topics-and-qrels/qrels.dl20-doc.txt runs/run.msmacro-doc.bm25-tuned2+rocchio-neg.topics.dl20.txt tools/eval/trec_eval.9.0.4/trec_eval -c -m recall.1000 src/main/resources/topics-and-qrels/qrels.dl20-doc.txt runs/run.msmacro-doc.bm25-tuned2+rocchio-neg.topics.dl20.txt + +tools/eval/trec_eval.9.0.4/trec_eval -c -M 100 -m map src/main/resources/topics-and-qrels/qrels.dl20-doc.txt runs/run.msmacro-doc.bm25-tuned2+ax.topics.dl20.txt +tools/eval/trec_eval.9.0.4/trec_eval -c -m ndcg_cut.10 src/main/resources/topics-and-qrels/qrels.dl20-doc.txt runs/run.msmacro-doc.bm25-tuned2+ax.topics.dl20.txt +tools/eval/trec_eval.9.0.4/trec_eval -c -m recall.100 src/main/resources/topics-and-qrels/qrels.dl20-doc.txt runs/run.msmacro-doc.bm25-tuned2+ax.topics.dl20.txt +tools/eval/trec_eval.9.0.4/trec_eval -c -m recall.1000 src/main/resources/topics-and-qrels/qrels.dl20-doc.txt runs/run.msmacro-doc.bm25-tuned2+ax.topics.dl20.txt + +tools/eval/trec_eval.9.0.4/trec_eval -c -M 100 -m map src/main/resources/topics-and-qrels/qrels.dl20-doc.txt runs/run.msmacro-doc.bm25-tuned2+prf.topics.dl20.txt +tools/eval/trec_eval.9.0.4/trec_eval -c -m ndcg_cut.10 src/main/resources/topics-and-qrels/qrels.dl20-doc.txt runs/run.msmacro-doc.bm25-tuned2+prf.topics.dl20.txt +tools/eval/trec_eval.9.0.4/trec_eval -c -m recall.100 src/main/resources/topics-and-qrels/qrels.dl20-doc.txt runs/run.msmacro-doc.bm25-tuned2+prf.topics.dl20.txt +tools/eval/trec_eval.9.0.4/trec_eval -c -m recall.1000 src/main/resources/topics-and-qrels/qrels.dl20-doc.txt runs/run.msmacro-doc.bm25-tuned2+prf.topics.dl20.txt ``` ## Effectiveness With the above commands, you should be able to reproduce the following results: -| **AP@100** | **BM25 (default)**| **+RM3** | **+Rocchio**| **+Rocchio***| **BM25 (tuned)**| **+RM3** | **+Rocchio**| **+Rocchio***| **BM25 (tuned2)**| **+RM3** | **+Rocchio**| **+Rocchio***| -|:-------------------------------------------------------------------------------------------------------------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------| -| [DL20 (Doc)](https://trec.nist.gov/data/deep2020.html) | 0.3793 | 0.4014 | 0.4089 | 0.4096 | 0.3631 | 0.3592 | 0.3634 | 0.3628 | 0.3581 | 0.3619 | 0.3628 | 0.3627 | -| **nDCG@10** | **BM25 (default)**| **+RM3** | **+Rocchio**| **+Rocchio***| **BM25 (tuned)**| **+RM3** | **+Rocchio**| **+Rocchio***| **BM25 (tuned2)**| **+RM3** | **+Rocchio**| **+Rocchio***| -| [DL20 (Doc)](https://trec.nist.gov/data/deep2020.html) | 0.5286 | 0.5225 | 0.5192 | 0.5212 | 0.5070 | 0.5124 | 0.5070 | 0.5127 | 0.5061 | 0.5238 | 0.5199 | 0.5189 | -| **R@100** | **BM25 (default)**| **+RM3** | **+Rocchio**| **+Rocchio***| **BM25 (tuned)**| **+RM3** | **+Rocchio**| **+Rocchio***| **BM25 (tuned2)**| **+RM3** | **+Rocchio**| **+Rocchio***| -| [DL20 (Doc)](https://trec.nist.gov/data/deep2020.html) | 0.6110 | 0.6414 | 0.6425 | 0.6431 | 0.5935 | 0.5977 | 0.6057 | 0.6045 | 0.5860 | 0.5995 | 0.6017 | 0.6013 | -| **R@1000** | **BM25 (default)**| **+RM3** | **+Rocchio**| **+Rocchio***| **BM25 (tuned)**| **+RM3** | **+Rocchio**| **+Rocchio***| **BM25 (tuned2)**| **+RM3** | **+Rocchio**| **+Rocchio***| -| [DL20 (Doc)](https://trec.nist.gov/data/deep2020.html) | 0.8085 | 0.8257 | 0.8273 | 0.8270 | 0.7876 | 0.8116 | 0.8199 | 0.8238 | 0.7776 | 0.8180 | 0.8217 | 0.8267 | +| **AP@100** | **BM25 (default)**| **+RM3** | **+Rocchio**| **+Rocchio***| **+Ax** | **+PRF** | **BM25 (tuned)**| **+RM3** | **+Rocchio**| **+Rocchio***| **+Ax** | **+PRF** | **BM25 (tuned2)**| **+RM3** | **+Rocchio**| **+Rocchio***| **+Ax** | **+PRF** | +|:-------------------------------------------------------------------------------------------------------------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------| +| [DL20 (Doc)](https://trec.nist.gov/data/deep2020.html) | 0.3793 | 0.4014 | 0.4089 | 0.4096 | 0.3133 | 0.3515 | 0.3631 | 0.3592 | 0.3634 | 0.3628 | 0.3462 | 0.3561 | 0.3581 | 0.3619 | 0.3628 | 0.3627 | 0.3498 | 0.3530 | +| **nDCG@10** | **BM25 (default)**| **+RM3** | **+Rocchio**| **+Rocchio***| **+Ax** | **+PRF** | **BM25 (tuned)**| **+RM3** | **+Rocchio**| **+Rocchio***| **+Ax** | **+PRF** | **BM25 (tuned2)**| **+RM3** | **+Rocchio**| **+Rocchio***| **+Ax** | **+PRF** | +| [DL20 (Doc)](https://trec.nist.gov/data/deep2020.html) | 0.5286 | 0.5225 | 0.5192 | 0.5212 | 0.4275 | 0.4680 | 0.5070 | 0.5124 | 0.5070 | 0.5127 | 0.4941 | 0.4785 | 0.5061 | 0.5238 | 0.5199 | 0.5189 | 0.5106 | 0.4775 | +| **R@100** | **BM25 (default)**| **+RM3** | **+Rocchio**| **+Rocchio***| **+Ax** | **+PRF** | **BM25 (tuned)**| **+RM3** | **+Rocchio**| **+Rocchio***| **+Ax** | **+PRF** | **BM25 (tuned2)**| **+RM3** | **+Rocchio**| **+Rocchio***| **+Ax** | **+PRF** | +| [DL20 (Doc)](https://trec.nist.gov/data/deep2020.html) | 0.6110 | 0.6414 | 0.6425 | 0.6431 | 0.5714 | 0.6104 | 0.5935 | 0.5977 | 0.6057 | 0.6045 | 0.6078 | 0.6198 | 0.5860 | 0.5995 | 0.6017 | 0.6013 | 0.5988 | 0.6171 | +| **R@1000** | **BM25 (default)**| **+RM3** | **+Rocchio**| **+Rocchio***| **+Ax** | **+PRF** | **BM25 (tuned)**| **+RM3** | **+Rocchio**| **+Rocchio***| **+Ax** | **+PRF** | **BM25 (tuned2)**| **+RM3** | **+Rocchio**| **+Rocchio***| **+Ax** | **+PRF** | +| [DL20 (Doc)](https://trec.nist.gov/data/deep2020.html) | 0.8085 | 0.8257 | 0.8273 | 0.8270 | 0.8063 | 0.8084 | 0.7876 | 0.8116 | 0.8199 | 0.8238 | 0.8455 | 0.8163 | 0.7776 | 0.8180 | 0.8217 | 0.8267 | 0.8435 | 0.8210 | Explanation of settings: + The setting "default" refers the default BM25 settings of `k1=0.9`, `b=0.4`. -+ The setting "tuned" refers to `k1=3.44`, `b=0.87`, tuned on 2019/06 and used for TREC 2019 Deep Learning Track baseline runs. -+ The setting "tuned2" refers to `k1=4.46`, `b=0.82`, tuned using the MS MARCO document sparse judgments to optimize for recall@100 (i.e., for first-stage retrieval) on 2019/12; see [this page](experiments-msmarco-doc.md) additional details. ++ The setting "tuned" refers to `k1=3.44`, `b=0.87`, tuned in 2019/06 using the MS MARCO document sparse judgments to optimize for MAP and used for TREC 2019 Deep Learning Track baseline runs. ++ The setting "tuned2" refers to `k1=4.46`, `b=0.82`, tuned in 2020/12 using the MS MARCO document sparse judgments to optimize for recall@100 (i.e., for first-stage retrieval); see [this page](experiments-msmarco-doc.md) additional details. Settings tuned on the MS MARCO document sparse judgments _may not_ work well on the TREC dense judgments. diff --git a/docs/regressions-dl20-passage-docTTTTTquery.md b/docs/regressions-dl20-passage-docTTTTTquery.md index 548e70e3c9..2ee0e3ab99 100644 --- a/docs/regressions-dl20-passage-docTTTTTquery.md +++ b/docs/regressions-dl20-passage-docTTTTTquery.md @@ -59,6 +59,20 @@ target/appassembler/bin/SearchCollection \ -output runs/run.msmarco-passage-docTTTTTquery.bm25-default+rm3.topics.dl20.txt \ -bm25 -rm3 & +target/appassembler/bin/SearchCollection \ + -index indexes/lucene-index.msmarco-passage-docTTTTTquery/ \ + -topics src/main/resources/topics-and-qrels/topics.dl20.txt \ + -topicreader TsvInt \ + -output runs/run.msmarco-passage-docTTTTTquery.bm25-default+rocchio.topics.dl20.txt \ + -bm25 -rocchio & + +target/appassembler/bin/SearchCollection \ + -index indexes/lucene-index.msmarco-passage-docTTTTTquery/ \ + -topics src/main/resources/topics-and-qrels/topics.dl20.txt \ + -topicreader TsvInt \ + -output runs/run.msmarco-passage-docTTTTTquery.bm25-default+rocchio-neg.topics.dl20.txt \ + -bm25 -rocchio -rocchio.useNegative -rerankCutoff 1000 & + target/appassembler/bin/SearchCollection \ -index indexes/lucene-index.msmarco-passage-docTTTTTquery/ \ -topics src/main/resources/topics-and-qrels/topics.dl20.txt \ @@ -73,6 +87,20 @@ target/appassembler/bin/SearchCollection \ -output runs/run.msmarco-passage-docTTTTTquery.bm25-tuned+rm3.topics.dl20.txt \ -bm25 -bm25.k1 0.82 -bm25.b 0.68 -rm3 & +target/appassembler/bin/SearchCollection \ + -index indexes/lucene-index.msmarco-passage-docTTTTTquery/ \ + -topics src/main/resources/topics-and-qrels/topics.dl20.txt \ + -topicreader TsvInt \ + -output runs/run.msmarco-passage-docTTTTTquery.bm25-tuned+rocchio.topics.dl20.txt \ + -bm25 -bm25.k1 0.82 -bm25.b 0.68 -rocchio & + +target/appassembler/bin/SearchCollection \ + -index indexes/lucene-index.msmarco-passage-docTTTTTquery/ \ + -topics src/main/resources/topics-and-qrels/topics.dl20.txt \ + -topicreader TsvInt \ + -output runs/run.msmarco-passage-docTTTTTquery.bm25-tuned+rocchio-neg.topics.dl20.txt \ + -bm25 -bm25.k1 0.82 -bm25.b 0.68 -rocchio -rocchio.useNegative -rerankCutoff 1000 & + target/appassembler/bin/SearchCollection \ -index indexes/lucene-index.msmarco-passage-docTTTTTquery/ \ -topics src/main/resources/topics-and-qrels/topics.dl20.txt \ @@ -86,6 +114,20 @@ target/appassembler/bin/SearchCollection \ -topicreader TsvInt \ -output runs/run.msmarco-passage-docTTTTTquery.bm25-tuned2+rm3.topics.dl20.txt \ -bm25 -bm25.k1 2.18 -bm25.b 0.86 -rm3 & + +target/appassembler/bin/SearchCollection \ + -index indexes/lucene-index.msmarco-passage-docTTTTTquery/ \ + -topics src/main/resources/topics-and-qrels/topics.dl20.txt \ + -topicreader TsvInt \ + -output runs/run.msmarco-passage-docTTTTTquery.bm25-tuned2+rocchio.topics.dl20.txt \ + -bm25 -bm25.k1 2.18 -bm25.b 0.86 -rocchio & + +target/appassembler/bin/SearchCollection \ + -index indexes/lucene-index.msmarco-passage-docTTTTTquery/ \ + -topics src/main/resources/topics-and-qrels/topics.dl20.txt \ + -topicreader TsvInt \ + -output runs/run.msmarco-passage-docTTTTTquery.bm25-tuned2+rocchio-neg.topics.dl20.txt \ + -bm25 -bm25.k1 2.18 -bm25.b 0.86 -rocchio -rocchio.useNegative -rerankCutoff 1000 & ``` Evaluation can be performed using `trec_eval`: @@ -101,6 +143,16 @@ tools/eval/trec_eval.9.0.4/trec_eval -m ndcg_cut.10 -c src/main/resources/topics tools/eval/trec_eval.9.0.4/trec_eval -m recall.100 -c -l 2 src/main/resources/topics-and-qrels/qrels.dl20-passage.txt runs/run.msmarco-passage-docTTTTTquery.bm25-default+rm3.topics.dl20.txt tools/eval/trec_eval.9.0.4/trec_eval -m recall.1000 -c -l 2 src/main/resources/topics-and-qrels/qrels.dl20-passage.txt runs/run.msmarco-passage-docTTTTTquery.bm25-default+rm3.topics.dl20.txt +tools/eval/trec_eval.9.0.4/trec_eval -m map -c -l 2 src/main/resources/topics-and-qrels/qrels.dl20-passage.txt runs/run.msmarco-passage-docTTTTTquery.bm25-default+rocchio.topics.dl20.txt +tools/eval/trec_eval.9.0.4/trec_eval -m ndcg_cut.10 -c src/main/resources/topics-and-qrels/qrels.dl20-passage.txt runs/run.msmarco-passage-docTTTTTquery.bm25-default+rocchio.topics.dl20.txt +tools/eval/trec_eval.9.0.4/trec_eval -m recall.100 -c -l 2 src/main/resources/topics-and-qrels/qrels.dl20-passage.txt runs/run.msmarco-passage-docTTTTTquery.bm25-default+rocchio.topics.dl20.txt +tools/eval/trec_eval.9.0.4/trec_eval -m recall.1000 -c -l 2 src/main/resources/topics-and-qrels/qrels.dl20-passage.txt runs/run.msmarco-passage-docTTTTTquery.bm25-default+rocchio.topics.dl20.txt + +tools/eval/trec_eval.9.0.4/trec_eval -m map -c -l 2 src/main/resources/topics-and-qrels/qrels.dl20-passage.txt runs/run.msmarco-passage-docTTTTTquery.bm25-default+rocchio-neg.topics.dl20.txt +tools/eval/trec_eval.9.0.4/trec_eval -m ndcg_cut.10 -c src/main/resources/topics-and-qrels/qrels.dl20-passage.txt runs/run.msmarco-passage-docTTTTTquery.bm25-default+rocchio-neg.topics.dl20.txt +tools/eval/trec_eval.9.0.4/trec_eval -m recall.100 -c -l 2 src/main/resources/topics-and-qrels/qrels.dl20-passage.txt runs/run.msmarco-passage-docTTTTTquery.bm25-default+rocchio-neg.topics.dl20.txt +tools/eval/trec_eval.9.0.4/trec_eval -m recall.1000 -c -l 2 src/main/resources/topics-and-qrels/qrels.dl20-passage.txt runs/run.msmarco-passage-docTTTTTquery.bm25-default+rocchio-neg.topics.dl20.txt + tools/eval/trec_eval.9.0.4/trec_eval -m map -c -l 2 src/main/resources/topics-and-qrels/qrels.dl20-passage.txt runs/run.msmarco-passage-docTTTTTquery.bm25-tuned.topics.dl20.txt tools/eval/trec_eval.9.0.4/trec_eval -m ndcg_cut.10 -c src/main/resources/topics-and-qrels/qrels.dl20-passage.txt runs/run.msmarco-passage-docTTTTTquery.bm25-tuned.topics.dl20.txt tools/eval/trec_eval.9.0.4/trec_eval -m recall.100 -c -l 2 src/main/resources/topics-and-qrels/qrels.dl20-passage.txt runs/run.msmarco-passage-docTTTTTquery.bm25-tuned.topics.dl20.txt @@ -111,6 +163,16 @@ tools/eval/trec_eval.9.0.4/trec_eval -m ndcg_cut.10 -c src/main/resources/topics tools/eval/trec_eval.9.0.4/trec_eval -m recall.100 -c -l 2 src/main/resources/topics-and-qrels/qrels.dl20-passage.txt runs/run.msmarco-passage-docTTTTTquery.bm25-tuned+rm3.topics.dl20.txt tools/eval/trec_eval.9.0.4/trec_eval -m recall.1000 -c -l 2 src/main/resources/topics-and-qrels/qrels.dl20-passage.txt runs/run.msmarco-passage-docTTTTTquery.bm25-tuned+rm3.topics.dl20.txt +tools/eval/trec_eval.9.0.4/trec_eval -m map -c -l 2 src/main/resources/topics-and-qrels/qrels.dl20-passage.txt runs/run.msmarco-passage-docTTTTTquery.bm25-tuned+rocchio.topics.dl20.txt +tools/eval/trec_eval.9.0.4/trec_eval -m ndcg_cut.10 -c src/main/resources/topics-and-qrels/qrels.dl20-passage.txt runs/run.msmarco-passage-docTTTTTquery.bm25-tuned+rocchio.topics.dl20.txt +tools/eval/trec_eval.9.0.4/trec_eval -m recall.100 -c -l 2 src/main/resources/topics-and-qrels/qrels.dl20-passage.txt runs/run.msmarco-passage-docTTTTTquery.bm25-tuned+rocchio.topics.dl20.txt +tools/eval/trec_eval.9.0.4/trec_eval -m recall.1000 -c -l 2 src/main/resources/topics-and-qrels/qrels.dl20-passage.txt runs/run.msmarco-passage-docTTTTTquery.bm25-tuned+rocchio.topics.dl20.txt + +tools/eval/trec_eval.9.0.4/trec_eval -m map -c -l 2 src/main/resources/topics-and-qrels/qrels.dl20-passage.txt runs/run.msmarco-passage-docTTTTTquery.bm25-tuned+rocchio-neg.topics.dl20.txt +tools/eval/trec_eval.9.0.4/trec_eval -m ndcg_cut.10 -c src/main/resources/topics-and-qrels/qrels.dl20-passage.txt runs/run.msmarco-passage-docTTTTTquery.bm25-tuned+rocchio-neg.topics.dl20.txt +tools/eval/trec_eval.9.0.4/trec_eval -m recall.100 -c -l 2 src/main/resources/topics-and-qrels/qrels.dl20-passage.txt runs/run.msmarco-passage-docTTTTTquery.bm25-tuned+rocchio-neg.topics.dl20.txt +tools/eval/trec_eval.9.0.4/trec_eval -m recall.1000 -c -l 2 src/main/resources/topics-and-qrels/qrels.dl20-passage.txt runs/run.msmarco-passage-docTTTTTquery.bm25-tuned+rocchio-neg.topics.dl20.txt + tools/eval/trec_eval.9.0.4/trec_eval -m map -c -l 2 src/main/resources/topics-and-qrels/qrels.dl20-passage.txt runs/run.msmarco-passage-docTTTTTquery.bm25-tuned2.topics.dl20.txt tools/eval/trec_eval.9.0.4/trec_eval -m ndcg_cut.10 -c src/main/resources/topics-and-qrels/qrels.dl20-passage.txt runs/run.msmarco-passage-docTTTTTquery.bm25-tuned2.topics.dl20.txt tools/eval/trec_eval.9.0.4/trec_eval -m recall.100 -c -l 2 src/main/resources/topics-and-qrels/qrels.dl20-passage.txt runs/run.msmarco-passage-docTTTTTquery.bm25-tuned2.topics.dl20.txt @@ -120,21 +182,31 @@ tools/eval/trec_eval.9.0.4/trec_eval -m map -c -l 2 src/main/resources/topics-an tools/eval/trec_eval.9.0.4/trec_eval -m ndcg_cut.10 -c src/main/resources/topics-and-qrels/qrels.dl20-passage.txt runs/run.msmarco-passage-docTTTTTquery.bm25-tuned2+rm3.topics.dl20.txt tools/eval/trec_eval.9.0.4/trec_eval -m recall.100 -c -l 2 src/main/resources/topics-and-qrels/qrels.dl20-passage.txt runs/run.msmarco-passage-docTTTTTquery.bm25-tuned2+rm3.topics.dl20.txt tools/eval/trec_eval.9.0.4/trec_eval -m recall.1000 -c -l 2 src/main/resources/topics-and-qrels/qrels.dl20-passage.txt runs/run.msmarco-passage-docTTTTTquery.bm25-tuned2+rm3.topics.dl20.txt + +tools/eval/trec_eval.9.0.4/trec_eval -m map -c -l 2 src/main/resources/topics-and-qrels/qrels.dl20-passage.txt runs/run.msmarco-passage-docTTTTTquery.bm25-tuned2+rocchio.topics.dl20.txt +tools/eval/trec_eval.9.0.4/trec_eval -m ndcg_cut.10 -c src/main/resources/topics-and-qrels/qrels.dl20-passage.txt runs/run.msmarco-passage-docTTTTTquery.bm25-tuned2+rocchio.topics.dl20.txt +tools/eval/trec_eval.9.0.4/trec_eval -m recall.100 -c -l 2 src/main/resources/topics-and-qrels/qrels.dl20-passage.txt runs/run.msmarco-passage-docTTTTTquery.bm25-tuned2+rocchio.topics.dl20.txt +tools/eval/trec_eval.9.0.4/trec_eval -m recall.1000 -c -l 2 src/main/resources/topics-and-qrels/qrels.dl20-passage.txt runs/run.msmarco-passage-docTTTTTquery.bm25-tuned2+rocchio.topics.dl20.txt + +tools/eval/trec_eval.9.0.4/trec_eval -m map -c -l 2 src/main/resources/topics-and-qrels/qrels.dl20-passage.txt runs/run.msmarco-passage-docTTTTTquery.bm25-tuned2+rocchio-neg.topics.dl20.txt +tools/eval/trec_eval.9.0.4/trec_eval -m ndcg_cut.10 -c src/main/resources/topics-and-qrels/qrels.dl20-passage.txt runs/run.msmarco-passage-docTTTTTquery.bm25-tuned2+rocchio-neg.topics.dl20.txt +tools/eval/trec_eval.9.0.4/trec_eval -m recall.100 -c -l 2 src/main/resources/topics-and-qrels/qrels.dl20-passage.txt runs/run.msmarco-passage-docTTTTTquery.bm25-tuned2+rocchio-neg.topics.dl20.txt +tools/eval/trec_eval.9.0.4/trec_eval -m recall.1000 -c -l 2 src/main/resources/topics-and-qrels/qrels.dl20-passage.txt runs/run.msmarco-passage-docTTTTTquery.bm25-tuned2+rocchio-neg.topics.dl20.txt ``` ## Effectiveness With the above commands, you should be able to reproduce the following results: -| **AP@1000** | **BM25 (default)**| **+RM3** | **BM25 (tuned)**| **+RM3** | **BM25 (tuned2)**| **+RM3** | -|:-------------------------------------------------------------------------------------------------------------|-----------|-----------|-----------|-----------|-----------|-----------| -| [DL20 (Passage)](https://trec.nist.gov/data/deep2020.html) | 0.4074 | 0.4295 | 0.4082 | 0.4296 | 0.4171 | 0.4347 | -| **nDCG@10** | **BM25 (default)**| **+RM3** | **BM25 (tuned)**| **+RM3** | **BM25 (tuned2)**| **+RM3** | -| [DL20 (Passage)](https://trec.nist.gov/data/deep2020.html) | 0.6187 | 0.6172 | 0.6192 | 0.6177 | 0.6265 | 0.6232 | -| **R@100** | **BM25 (default)**| **+RM3** | **BM25 (tuned)**| **+RM3** | **BM25 (tuned2)**| **+RM3** | -| [DL20 (Passage)](https://trec.nist.gov/data/deep2020.html) | 0.7044 | 0.7153 | 0.7046 | 0.7143 | 0.7044 | 0.7109 | -| **R@1000** | **BM25 (default)**| **+RM3** | **BM25 (tuned)**| **+RM3** | **BM25 (tuned2)**| **+RM3** | -| [DL20 (Passage)](https://trec.nist.gov/data/deep2020.html) | 0.8452 | 0.8699 | 0.8443 | 0.8692 | 0.8393 | 0.8609 | +| **AP@1000** | **BM25 (default)**| **+RM3** | **+Rocchio**| **+Rocchio***| **BM25 (tuned)**| **+RM3** | **+Rocchio**| **+Rocchio***| **BM25 (tuned2)**| **+RM3** | **+Rocchio**| **+Rocchio***| +|:-------------------------------------------------------------------------------------------------------------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------| +| [DL20 (Passage)](https://trec.nist.gov/data/deep2020.html) | 0.4074 | 0.4295 | 0.4246 | 0.4272 | 0.4082 | 0.4296 | 0.4269 | 0.4279 | 0.4171 | 0.4347 | 0.4376 | 0.4366 | +| **nDCG@10** | **BM25 (default)**| **+RM3** | **+Rocchio**| **+Rocchio***| **BM25 (tuned)**| **+RM3** | **+Rocchio**| **+Rocchio***| **BM25 (tuned2)**| **+RM3** | **+Rocchio**| **+Rocchio***| +| [DL20 (Passage)](https://trec.nist.gov/data/deep2020.html) | 0.6187 | 0.6172 | 0.6102 | 0.6147 | 0.6192 | 0.6177 | 0.6152 | 0.6173 | 0.6265 | 0.6232 | 0.6224 | 0.6279 | +| **R@100** | **BM25 (default)**| **+RM3** | **+Rocchio**| **+Rocchio***| **BM25 (tuned)**| **+RM3** | **+Rocchio**| **+Rocchio***| **BM25 (tuned2)**| **+RM3** | **+Rocchio**| **+Rocchio***| +| [DL20 (Passage)](https://trec.nist.gov/data/deep2020.html) | 0.7044 | 0.7153 | 0.7239 | 0.7170 | 0.7046 | 0.7143 | 0.7227 | 0.7223 | 0.7044 | 0.7109 | 0.7126 | 0.7125 | +| **R@1000** | **BM25 (default)**| **+RM3** | **+Rocchio**| **+Rocchio***| **BM25 (tuned)**| **+RM3** | **+Rocchio**| **+Rocchio***| **BM25 (tuned2)**| **+RM3** | **+Rocchio**| **+Rocchio***| +| [DL20 (Passage)](https://trec.nist.gov/data/deep2020.html) | 0.8452 | 0.8699 | 0.8675 | 0.8700 | 0.8443 | 0.8692 | 0.8694 | 0.8689 | 0.8393 | 0.8609 | 0.8641 | 0.8657 | Explanation of settings: diff --git a/docs/regressions-msmarco-doc-docTTTTTquery.md b/docs/regressions-msmarco-doc-docTTTTTquery.md index 430f21618b..67c69d0975 100644 --- a/docs/regressions-msmarco-doc-docTTTTTquery.md +++ b/docs/regressions-msmarco-doc-docTTTTTquery.md @@ -64,6 +64,13 @@ target/appassembler/bin/SearchCollection \ -output runs/run.msmarco-doc-docTTTTTquery.bm25-default+rm3.topics.msmarco-doc.dev.txt \ -bm25 -rm3 & +target/appassembler/bin/SearchCollection \ + -index indexes/lucene-index.msmarco-doc-docTTTTTquery/ \ + -topics src/main/resources/topics-and-qrels/topics.msmarco-doc.dev.txt \ + -topicreader TsvInt \ + -output runs/run.msmarco-doc-docTTTTTquery.bm25-default+rocchio.topics.msmarco-doc.dev.txt \ + -bm25 -rocchio & + target/appassembler/bin/SearchCollection \ -index indexes/lucene-index.msmarco-doc-docTTTTTquery/ \ -topics src/main/resources/topics-and-qrels/topics.msmarco-doc.dev.txt \ @@ -77,6 +84,13 @@ target/appassembler/bin/SearchCollection \ -topicreader TsvInt \ -output runs/run.msmarco-doc-docTTTTTquery.bm25-tuned+rm3.topics.msmarco-doc.dev.txt \ -bm25 -bm25.k1 4.68 -bm25.b 0.87 -rm3 & + +target/appassembler/bin/SearchCollection \ + -index indexes/lucene-index.msmarco-doc-docTTTTTquery/ \ + -topics src/main/resources/topics-and-qrels/topics.msmarco-doc.dev.txt \ + -topicreader TsvInt \ + -output runs/run.msmarco-doc-docTTTTTquery.bm25-tuned+rocchio.topics.msmarco-doc.dev.txt \ + -bm25 -bm25.k1 4.68 -bm25.b 0.87 -rocchio & ``` Evaluation can be performed using `trec_eval`: @@ -92,6 +106,11 @@ tools/eval/trec_eval.9.0.4/trec_eval -c -M 100 -m recip_rank src/main/resources/ tools/eval/trec_eval.9.0.4/trec_eval -c -m recall.100 src/main/resources/topics-and-qrels/qrels.msmarco-doc.dev.txt runs/run.msmarco-doc-docTTTTTquery.bm25-default+rm3.topics.msmarco-doc.dev.txt tools/eval/trec_eval.9.0.4/trec_eval -c -m recall.1000 src/main/resources/topics-and-qrels/qrels.msmarco-doc.dev.txt runs/run.msmarco-doc-docTTTTTquery.bm25-default+rm3.topics.msmarco-doc.dev.txt +tools/eval/trec_eval.9.0.4/trec_eval -c -m map src/main/resources/topics-and-qrels/qrels.msmarco-doc.dev.txt runs/run.msmarco-doc-docTTTTTquery.bm25-default+rocchio.topics.msmarco-doc.dev.txt +tools/eval/trec_eval.9.0.4/trec_eval -c -M 100 -m recip_rank src/main/resources/topics-and-qrels/qrels.msmarco-doc.dev.txt runs/run.msmarco-doc-docTTTTTquery.bm25-default+rocchio.topics.msmarco-doc.dev.txt +tools/eval/trec_eval.9.0.4/trec_eval -c -m recall.100 src/main/resources/topics-and-qrels/qrels.msmarco-doc.dev.txt runs/run.msmarco-doc-docTTTTTquery.bm25-default+rocchio.topics.msmarco-doc.dev.txt +tools/eval/trec_eval.9.0.4/trec_eval -c -m recall.1000 src/main/resources/topics-and-qrels/qrels.msmarco-doc.dev.txt runs/run.msmarco-doc-docTTTTTquery.bm25-default+rocchio.topics.msmarco-doc.dev.txt + tools/eval/trec_eval.9.0.4/trec_eval -c -m map src/main/resources/topics-and-qrels/qrels.msmarco-doc.dev.txt runs/run.msmarco-doc-docTTTTTquery.bm25-tuned.topics.msmarco-doc.dev.txt tools/eval/trec_eval.9.0.4/trec_eval -c -M 100 -m recip_rank src/main/resources/topics-and-qrels/qrels.msmarco-doc.dev.txt runs/run.msmarco-doc-docTTTTTquery.bm25-tuned.topics.msmarco-doc.dev.txt tools/eval/trec_eval.9.0.4/trec_eval -c -m recall.100 src/main/resources/topics-and-qrels/qrels.msmarco-doc.dev.txt runs/run.msmarco-doc-docTTTTTquery.bm25-tuned.topics.msmarco-doc.dev.txt @@ -101,26 +120,31 @@ tools/eval/trec_eval.9.0.4/trec_eval -c -m map src/main/resources/topics-and-qre tools/eval/trec_eval.9.0.4/trec_eval -c -M 100 -m recip_rank src/main/resources/topics-and-qrels/qrels.msmarco-doc.dev.txt runs/run.msmarco-doc-docTTTTTquery.bm25-tuned+rm3.topics.msmarco-doc.dev.txt tools/eval/trec_eval.9.0.4/trec_eval -c -m recall.100 src/main/resources/topics-and-qrels/qrels.msmarco-doc.dev.txt runs/run.msmarco-doc-docTTTTTquery.bm25-tuned+rm3.topics.msmarco-doc.dev.txt tools/eval/trec_eval.9.0.4/trec_eval -c -m recall.1000 src/main/resources/topics-and-qrels/qrels.msmarco-doc.dev.txt runs/run.msmarco-doc-docTTTTTquery.bm25-tuned+rm3.topics.msmarco-doc.dev.txt + +tools/eval/trec_eval.9.0.4/trec_eval -c -m map src/main/resources/topics-and-qrels/qrels.msmarco-doc.dev.txt runs/run.msmarco-doc-docTTTTTquery.bm25-tuned+rocchio.topics.msmarco-doc.dev.txt +tools/eval/trec_eval.9.0.4/trec_eval -c -M 100 -m recip_rank src/main/resources/topics-and-qrels/qrels.msmarco-doc.dev.txt runs/run.msmarco-doc-docTTTTTquery.bm25-tuned+rocchio.topics.msmarco-doc.dev.txt +tools/eval/trec_eval.9.0.4/trec_eval -c -m recall.100 src/main/resources/topics-and-qrels/qrels.msmarco-doc.dev.txt runs/run.msmarco-doc-docTTTTTquery.bm25-tuned+rocchio.topics.msmarco-doc.dev.txt +tools/eval/trec_eval.9.0.4/trec_eval -c -m recall.1000 src/main/resources/topics-and-qrels/qrels.msmarco-doc.dev.txt runs/run.msmarco-doc-docTTTTTquery.bm25-tuned+rocchio.topics.msmarco-doc.dev.txt ``` ## Effectiveness With the above commands, you should be able to reproduce the following results: -| **AP@1000** | **BM25 (default)**| **+RM3** | **BM25 (tuned)**| **+RM3** | -|:-------------------------------------------------------------------------------------------------------------|-----------|-----------|-----------|-----------| -| [MS MARCO Doc: Dev](https://github.com/microsoft/MSMARCO-Document-Ranking) | 0.2886 | 0.1839 | 0.3273 | 0.2627 | -| **RR@100** | **BM25 (default)**| **+RM3** | **BM25 (tuned)**| **+RM3** | -| [MS MARCO Doc: Dev](https://github.com/microsoft/MSMARCO-Document-Ranking) | 0.2880 | 0.1831 | 0.3269 | 0.2621 | -| **R@100** | **BM25 (default)**| **+RM3** | **BM25 (tuned)**| **+RM3** | -| [MS MARCO Doc: Dev](https://github.com/microsoft/MSMARCO-Document-Ranking) | 0.7993 | 0.7420 | 0.8612 | 0.8379 | -| **R@1000** | **BM25 (default)**| **+RM3** | **BM25 (tuned)**| **+RM3** | -| [MS MARCO Doc: Dev](https://github.com/microsoft/MSMARCO-Document-Ranking) | 0.9259 | 0.9128 | 0.9553 | 0.9524 | +| **AP@1000** | **BM25 (default)**| **+RM3** | **+Rocchio**| **BM25 (tuned)**| **+RM3** | **+Rocchio**| +|:-------------------------------------------------------------------------------------------------------------|-----------|-----------|-----------|-----------|-----------|-----------| +| [MS MARCO Doc: Dev](https://github.com/microsoft/MSMARCO-Document-Ranking) | 0.2886 | 0.1839 | 0.1841 | 0.3273 | 0.2627 | 0.2647 | +| **RR@100** | **BM25 (default)**| **+RM3** | **+Rocchio**| **BM25 (tuned)**| **+RM3** | **+Rocchio**| +| [MS MARCO Doc: Dev](https://github.com/microsoft/MSMARCO-Document-Ranking) | 0.2880 | 0.1831 | 0.1833 | 0.3269 | 0.2621 | 0.2642 | +| **R@100** | **BM25 (default)**| **+RM3** | **+Rocchio**| **BM25 (tuned)**| **+RM3** | **+Rocchio**| +| [MS MARCO Doc: Dev](https://github.com/microsoft/MSMARCO-Document-Ranking) | 0.7993 | 0.7420 | 0.7441 | 0.8612 | 0.8379 | 0.8448 | +| **R@1000** | **BM25 (default)**| **+RM3** | **+Rocchio**| **BM25 (tuned)**| **+RM3** | **+Rocchio**| +| [MS MARCO Doc: Dev](https://github.com/microsoft/MSMARCO-Document-Ranking) | 0.9259 | 0.9128 | 0.9128 | 0.9553 | 0.9524 | 0.9534 | Explanation of settings: + The setting "default" refers the default BM25 settings of `k1=0.9`, `b=0.4`. -+ The setting "tuned" refers to `k1=4.68`, `b=0.87`, tuned to optimize for recall@100 (i.e., for first-stage retrieval) on 2019/12. ++ The setting "tuned" refers to `k1=4.68`, `b=0.87`, tuned in 2020/12 using the MS MARCO document sparse judgments to optimize for recall@100 (i.e., for first-stage retrieval). In these runs, we are retrieving the top 1000 hits for each query and using `trec_eval` to evaluate all 1000 hits. This lets us measure R@100 and R@1000; the latter is particularly important when these runs are used as first-stage retrieval. diff --git a/docs/regressions-msmarco-doc-segmented-docTTTTTquery.md b/docs/regressions-msmarco-doc-segmented-docTTTTTquery.md index f79af6a61a..9efc5d5574 100644 --- a/docs/regressions-msmarco-doc-segmented-docTTTTTquery.md +++ b/docs/regressions-msmarco-doc-segmented-docTTTTTquery.md @@ -65,6 +65,13 @@ target/appassembler/bin/SearchCollection \ -output runs/run.msmarco-doc-segmented-docTTTTTquery.bm25-default+rm3.topics.msmarco-doc.dev.txt \ -bm25 -rm3 -hits 10000 -selectMaxPassage -selectMaxPassage.delimiter "#" -selectMaxPassage.hits 1000 & +target/appassembler/bin/SearchCollection \ + -index indexes/lucene-index.msmarco-doc-segmented-docTTTTTquery/ \ + -topics src/main/resources/topics-and-qrels/topics.msmarco-doc.dev.txt \ + -topicreader TsvInt \ + -output runs/run.msmarco-doc-segmented-docTTTTTquery.bm25-default+rocchio.topics.msmarco-doc.dev.txt \ + -bm25 -rocchio -hits 10000 -selectMaxPassage -selectMaxPassage.delimiter "#" -selectMaxPassage.hits 1000 & + target/appassembler/bin/SearchCollection \ -index indexes/lucene-index.msmarco-doc-segmented-docTTTTTquery/ \ -topics src/main/resources/topics-and-qrels/topics.msmarco-doc.dev.txt \ @@ -78,6 +85,13 @@ target/appassembler/bin/SearchCollection \ -topicreader TsvInt \ -output runs/run.msmarco-doc-segmented-docTTTTTquery.bm25-tuned+rm3.topics.msmarco-doc.dev.txt \ -bm25 -bm25.k1 2.56 -bm25.b 0.59 -rm3 -hits 10000 -selectMaxPassage -selectMaxPassage.delimiter "#" -selectMaxPassage.hits 1000 & + +target/appassembler/bin/SearchCollection \ + -index indexes/lucene-index.msmarco-doc-segmented-docTTTTTquery/ \ + -topics src/main/resources/topics-and-qrels/topics.msmarco-doc.dev.txt \ + -topicreader TsvInt \ + -output runs/run.msmarco-doc-segmented-docTTTTTquery.bm25-tuned+rocchio.topics.msmarco-doc.dev.txt \ + -bm25 -bm25.k1 2.56 -bm25.b 0.59 -rocchio -hits 10000 -selectMaxPassage -selectMaxPassage.delimiter "#" -selectMaxPassage.hits 1000 & ``` Evaluation can be performed using `trec_eval`: @@ -93,6 +107,11 @@ tools/eval/trec_eval.9.0.4/trec_eval -c -M 100 -m recip_rank src/main/resources/ tools/eval/trec_eval.9.0.4/trec_eval -c -m recall.100 src/main/resources/topics-and-qrels/qrels.msmarco-doc.dev.txt runs/run.msmarco-doc-segmented-docTTTTTquery.bm25-default+rm3.topics.msmarco-doc.dev.txt tools/eval/trec_eval.9.0.4/trec_eval -c -m recall.1000 src/main/resources/topics-and-qrels/qrels.msmarco-doc.dev.txt runs/run.msmarco-doc-segmented-docTTTTTquery.bm25-default+rm3.topics.msmarco-doc.dev.txt +tools/eval/trec_eval.9.0.4/trec_eval -c -m map src/main/resources/topics-and-qrels/qrels.msmarco-doc.dev.txt runs/run.msmarco-doc-segmented-docTTTTTquery.bm25-default+rocchio.topics.msmarco-doc.dev.txt +tools/eval/trec_eval.9.0.4/trec_eval -c -M 100 -m recip_rank src/main/resources/topics-and-qrels/qrels.msmarco-doc.dev.txt runs/run.msmarco-doc-segmented-docTTTTTquery.bm25-default+rocchio.topics.msmarco-doc.dev.txt +tools/eval/trec_eval.9.0.4/trec_eval -c -m recall.100 src/main/resources/topics-and-qrels/qrels.msmarco-doc.dev.txt runs/run.msmarco-doc-segmented-docTTTTTquery.bm25-default+rocchio.topics.msmarco-doc.dev.txt +tools/eval/trec_eval.9.0.4/trec_eval -c -m recall.1000 src/main/resources/topics-and-qrels/qrels.msmarco-doc.dev.txt runs/run.msmarco-doc-segmented-docTTTTTquery.bm25-default+rocchio.topics.msmarco-doc.dev.txt + tools/eval/trec_eval.9.0.4/trec_eval -c -m map src/main/resources/topics-and-qrels/qrels.msmarco-doc.dev.txt runs/run.msmarco-doc-segmented-docTTTTTquery.bm25-tuned.topics.msmarco-doc.dev.txt tools/eval/trec_eval.9.0.4/trec_eval -c -M 100 -m recip_rank src/main/resources/topics-and-qrels/qrels.msmarco-doc.dev.txt runs/run.msmarco-doc-segmented-docTTTTTquery.bm25-tuned.topics.msmarco-doc.dev.txt tools/eval/trec_eval.9.0.4/trec_eval -c -m recall.100 src/main/resources/topics-and-qrels/qrels.msmarco-doc.dev.txt runs/run.msmarco-doc-segmented-docTTTTTquery.bm25-tuned.topics.msmarco-doc.dev.txt @@ -102,26 +121,31 @@ tools/eval/trec_eval.9.0.4/trec_eval -c -m map src/main/resources/topics-and-qre tools/eval/trec_eval.9.0.4/trec_eval -c -M 100 -m recip_rank src/main/resources/topics-and-qrels/qrels.msmarco-doc.dev.txt runs/run.msmarco-doc-segmented-docTTTTTquery.bm25-tuned+rm3.topics.msmarco-doc.dev.txt tools/eval/trec_eval.9.0.4/trec_eval -c -m recall.100 src/main/resources/topics-and-qrels/qrels.msmarco-doc.dev.txt runs/run.msmarco-doc-segmented-docTTTTTquery.bm25-tuned+rm3.topics.msmarco-doc.dev.txt tools/eval/trec_eval.9.0.4/trec_eval -c -m recall.1000 src/main/resources/topics-and-qrels/qrels.msmarco-doc.dev.txt runs/run.msmarco-doc-segmented-docTTTTTquery.bm25-tuned+rm3.topics.msmarco-doc.dev.txt + +tools/eval/trec_eval.9.0.4/trec_eval -c -m map src/main/resources/topics-and-qrels/qrels.msmarco-doc.dev.txt runs/run.msmarco-doc-segmented-docTTTTTquery.bm25-tuned+rocchio.topics.msmarco-doc.dev.txt +tools/eval/trec_eval.9.0.4/trec_eval -c -M 100 -m recip_rank src/main/resources/topics-and-qrels/qrels.msmarco-doc.dev.txt runs/run.msmarco-doc-segmented-docTTTTTquery.bm25-tuned+rocchio.topics.msmarco-doc.dev.txt +tools/eval/trec_eval.9.0.4/trec_eval -c -m recall.100 src/main/resources/topics-and-qrels/qrels.msmarco-doc.dev.txt runs/run.msmarco-doc-segmented-docTTTTTquery.bm25-tuned+rocchio.topics.msmarco-doc.dev.txt +tools/eval/trec_eval.9.0.4/trec_eval -c -m recall.1000 src/main/resources/topics-and-qrels/qrels.msmarco-doc.dev.txt runs/run.msmarco-doc-segmented-docTTTTTquery.bm25-tuned+rocchio.topics.msmarco-doc.dev.txt ``` ## Effectiveness With the above commands, you should be able to reproduce the following results: -| **AP@1000** | **BM25 (default)**| **+RM3** | **BM25 (tuned)**| **+RM3** | -|:-------------------------------------------------------------------------------------------------------------|-----------|-----------|-----------|-----------| -| [MS MARCO Doc: Dev](https://github.com/microsoft/MSMARCO-Document-Ranking) | 0.3184 | 0.2823 | 0.3213 | 0.2989 | -| **RR@100** | **BM25 (default)**| **+RM3** | **BM25 (tuned)**| **+RM3** | -| [MS MARCO Doc: Dev](https://github.com/microsoft/MSMARCO-Document-Ranking) | 0.3179 | 0.2818 | 0.3209 | 0.2985 | -| **R@100** | **BM25 (default)**| **+RM3** | **BM25 (tuned)**| **+RM3** | -| [MS MARCO Doc: Dev](https://github.com/microsoft/MSMARCO-Document-Ranking) | 0.8479 | 0.8479 | 0.8627 | 0.8556 | -| **R@1000** | **BM25 (default)**| **+RM3** | **BM25 (tuned)**| **+RM3** | -| [MS MARCO Doc: Dev](https://github.com/microsoft/MSMARCO-Document-Ranking) | 0.9490 | 0.9547 | 0.9530 | 0.9567 | +| **AP@1000** | **BM25 (default)**| **+RM3** | **+Rocchio**| **BM25 (tuned)**| **+RM3** | **+Rocchio**| +|:-------------------------------------------------------------------------------------------------------------|-----------|-----------|-----------|-----------|-----------|-----------| +| [MS MARCO Doc: Dev](https://github.com/microsoft/MSMARCO-Document-Ranking) | 0.3184 | 0.2823 | 0.2846 | 0.3213 | 0.2989 | 0.2998 | +| **RR@100** | **BM25 (default)**| **+RM3** | **+Rocchio**| **BM25 (tuned)**| **+RM3** | **+Rocchio**| +| [MS MARCO Doc: Dev](https://github.com/microsoft/MSMARCO-Document-Ranking) | 0.3179 | 0.2818 | 0.2841 | 0.3209 | 0.2985 | 0.2994 | +| **R@100** | **BM25 (default)**| **+RM3** | **+Rocchio**| **BM25 (tuned)**| **+RM3** | **+Rocchio**| +| [MS MARCO Doc: Dev](https://github.com/microsoft/MSMARCO-Document-Ranking) | 0.8479 | 0.8479 | 0.8479 | 0.8627 | 0.8556 | 0.8600 | +| **R@1000** | **BM25 (default)**| **+RM3** | **+Rocchio**| **BM25 (tuned)**| **+RM3** | **+Rocchio**| +| [MS MARCO Doc: Dev](https://github.com/microsoft/MSMARCO-Document-Ranking) | 0.9490 | 0.9547 | 0.9551 | 0.9530 | 0.9567 | 0.9571 | Explanation of settings: + The setting "default" refers the default BM25 settings of `k1=0.9`, `b=0.4`. -+ The setting "tuned" refers to `k1=2.56`, `b=0.59`, tuned to optimize for recall@100 (i.e., for first-stage retrieval) on 2019/12. ++ The setting "tuned" refers to `k1=2.56`, `b=0.59`, tuned in 2020/12 using the MS MARCO document sparse judgments to optimize for recall@100 (i.e., for first-stage retrieval). In these runs, we are retrieving the top 1000 hits for each query and using `trec_eval` to evaluate all 1000 hits. Since we're in the passage condition, we fetch the 10000 passages and select the top 1000 documents using MaxP. diff --git a/docs/regressions-msmarco-doc-segmented.md b/docs/regressions-msmarco-doc-segmented.md index 1edba8cb60..efd6cb58ec 100644 --- a/docs/regressions-msmarco-doc-segmented.md +++ b/docs/regressions-msmarco-doc-segmented.md @@ -217,7 +217,7 @@ With the above commands, you should be able to reproduce the following results: Explanation of settings: + The setting "default" refers the default BM25 settings of `k1=0.9`, `b=0.4`. -+ The setting "tuned" refers to `k1=2.16`, `b=0.61`, tuned to optimize for recall@100 (i.e., for first-stage retrieval) on 2019/12. ++ The setting "tuned" refers to `k1=2.16`, `b=0.61`, tuned in 2020/12 using the MS MARCO document sparse judgments to optimize for recall@100 (i.e., for first-stage retrieval). In these runs, we are retrieving the top 1000 hits for each query and using `trec_eval` to evaluate all 1000 hits. Since we're in the passage condition, we fetch the 10000 passages and select the top 1000 documents using MaxP. diff --git a/docs/regressions-msmarco-doc.md b/docs/regressions-msmarco-doc.md index b4fffceb07..e901284e2c 100644 --- a/docs/regressions-msmarco-doc.md +++ b/docs/regressions-msmarco-doc.md @@ -78,6 +78,20 @@ target/appassembler/bin/SearchCollection \ -output runs/run.msmarco-doc.bm25-default+rocchio-neg.topics.msmarco-doc.dev.txt \ -bm25 -rocchio -rocchio.useNegative -rerankCutoff 1000 & +target/appassembler/bin/SearchCollection \ + -index indexes/lucene-index.msmarco-doc/ \ + -topics src/main/resources/topics-and-qrels/topics.msmarco-doc.dev.txt \ + -topicreader TsvInt \ + -output runs/run.msmarco-doc.bm25-default+ax.topics.msmarco-doc.dev.txt \ + -bm25 -axiom -axiom.deterministic -rerankCutoff 20 & + +target/appassembler/bin/SearchCollection \ + -index indexes/lucene-index.msmarco-doc/ \ + -topics src/main/resources/topics-and-qrels/topics.msmarco-doc.dev.txt \ + -topicreader TsvInt \ + -output runs/run.msmarco-doc.bm25-default+prf.topics.msmarco-doc.dev.txt \ + -bm25 -bm25prf & + target/appassembler/bin/SearchCollection \ -index indexes/lucene-index.msmarco-doc/ \ -topics src/main/resources/topics-and-qrels/topics.msmarco-doc.dev.txt \ @@ -106,6 +120,20 @@ target/appassembler/bin/SearchCollection \ -output runs/run.msmarco-doc.bm25-tuned+rocchio-neg.topics.msmarco-doc.dev.txt \ -bm25 -bm25.k1 3.44 -bm25.b 0.87 -rocchio -rocchio.useNegative -rerankCutoff 1000 & +target/appassembler/bin/SearchCollection \ + -index indexes/lucene-index.msmarco-doc/ \ + -topics src/main/resources/topics-and-qrels/topics.msmarco-doc.dev.txt \ + -topicreader TsvInt \ + -output runs/run.msmarco-doc.bm25-tuned+ax.topics.msmarco-doc.dev.txt \ + -bm25 -bm25.k1 3.44 -bm25.b 0.87 -axiom -axiom.deterministic -rerankCutoff 20 & + +target/appassembler/bin/SearchCollection \ + -index indexes/lucene-index.msmarco-doc/ \ + -topics src/main/resources/topics-and-qrels/topics.msmarco-doc.dev.txt \ + -topicreader TsvInt \ + -output runs/run.msmarco-doc.bm25-tuned+prf.topics.msmarco-doc.dev.txt \ + -bm25 -bm25.k1 3.44 -bm25.b 0.87 -bm25prf & + target/appassembler/bin/SearchCollection \ -index indexes/lucene-index.msmarco-doc/ \ -topics src/main/resources/topics-and-qrels/topics.msmarco-doc.dev.txt \ @@ -133,6 +161,20 @@ target/appassembler/bin/SearchCollection \ -topicreader TsvInt \ -output runs/run.msmarco-doc.bm25-tuned2+rocchio-neg.topics.msmarco-doc.dev.txt \ -bm25 -bm25.k1 4.46 -bm25.b 0.82 -rocchio -rocchio.useNegative -rerankCutoff 1000 & + +target/appassembler/bin/SearchCollection \ + -index indexes/lucene-index.msmarco-doc/ \ + -topics src/main/resources/topics-and-qrels/topics.msmarco-doc.dev.txt \ + -topicreader TsvInt \ + -output runs/run.msmarco-doc.bm25-tuned2+ax.topics.msmarco-doc.dev.txt \ + -bm25 -bm25.k1 4.46 -bm25.b 0.82 -axiom -axiom.deterministic -rerankCutoff 20 & + +target/appassembler/bin/SearchCollection \ + -index indexes/lucene-index.msmarco-doc/ \ + -topics src/main/resources/topics-and-qrels/topics.msmarco-doc.dev.txt \ + -topicreader TsvInt \ + -output runs/run.msmarco-doc.bm25-tuned2+prf.topics.msmarco-doc.dev.txt \ + -bm25 -bm25.k1 4.46 -bm25.b 0.82 -bm25prf & ``` Evaluation can be performed using `trec_eval`: @@ -158,6 +200,16 @@ tools/eval/trec_eval.9.0.4/trec_eval -c -M 100 -m recip_rank src/main/resources/ tools/eval/trec_eval.9.0.4/trec_eval -c -m recall.100 src/main/resources/topics-and-qrels/qrels.msmarco-doc.dev.txt runs/run.msmarco-doc.bm25-default+rocchio-neg.topics.msmarco-doc.dev.txt tools/eval/trec_eval.9.0.4/trec_eval -c -m recall.1000 src/main/resources/topics-and-qrels/qrels.msmarco-doc.dev.txt runs/run.msmarco-doc.bm25-default+rocchio-neg.topics.msmarco-doc.dev.txt +tools/eval/trec_eval.9.0.4/trec_eval -c -m map src/main/resources/topics-and-qrels/qrels.msmarco-doc.dev.txt runs/run.msmarco-doc.bm25-default+ax.topics.msmarco-doc.dev.txt +tools/eval/trec_eval.9.0.4/trec_eval -c -M 100 -m recip_rank src/main/resources/topics-and-qrels/qrels.msmarco-doc.dev.txt runs/run.msmarco-doc.bm25-default+ax.topics.msmarco-doc.dev.txt +tools/eval/trec_eval.9.0.4/trec_eval -c -m recall.100 src/main/resources/topics-and-qrels/qrels.msmarco-doc.dev.txt runs/run.msmarco-doc.bm25-default+ax.topics.msmarco-doc.dev.txt +tools/eval/trec_eval.9.0.4/trec_eval -c -m recall.1000 src/main/resources/topics-and-qrels/qrels.msmarco-doc.dev.txt runs/run.msmarco-doc.bm25-default+ax.topics.msmarco-doc.dev.txt + +tools/eval/trec_eval.9.0.4/trec_eval -c -m map src/main/resources/topics-and-qrels/qrels.msmarco-doc.dev.txt runs/run.msmarco-doc.bm25-default+prf.topics.msmarco-doc.dev.txt +tools/eval/trec_eval.9.0.4/trec_eval -c -M 100 -m recip_rank src/main/resources/topics-and-qrels/qrels.msmarco-doc.dev.txt runs/run.msmarco-doc.bm25-default+prf.topics.msmarco-doc.dev.txt +tools/eval/trec_eval.9.0.4/trec_eval -c -m recall.100 src/main/resources/topics-and-qrels/qrels.msmarco-doc.dev.txt runs/run.msmarco-doc.bm25-default+prf.topics.msmarco-doc.dev.txt +tools/eval/trec_eval.9.0.4/trec_eval -c -m recall.1000 src/main/resources/topics-and-qrels/qrels.msmarco-doc.dev.txt runs/run.msmarco-doc.bm25-default+prf.topics.msmarco-doc.dev.txt + tools/eval/trec_eval.9.0.4/trec_eval -c -m map src/main/resources/topics-and-qrels/qrels.msmarco-doc.dev.txt runs/run.msmarco-doc.bm25-tuned.topics.msmarco-doc.dev.txt tools/eval/trec_eval.9.0.4/trec_eval -c -M 100 -m recip_rank src/main/resources/topics-and-qrels/qrels.msmarco-doc.dev.txt runs/run.msmarco-doc.bm25-tuned.topics.msmarco-doc.dev.txt tools/eval/trec_eval.9.0.4/trec_eval -c -m recall.100 src/main/resources/topics-and-qrels/qrels.msmarco-doc.dev.txt runs/run.msmarco-doc.bm25-tuned.topics.msmarco-doc.dev.txt @@ -178,6 +230,16 @@ tools/eval/trec_eval.9.0.4/trec_eval -c -M 100 -m recip_rank src/main/resources/ tools/eval/trec_eval.9.0.4/trec_eval -c -m recall.100 src/main/resources/topics-and-qrels/qrels.msmarco-doc.dev.txt runs/run.msmarco-doc.bm25-tuned+rocchio-neg.topics.msmarco-doc.dev.txt tools/eval/trec_eval.9.0.4/trec_eval -c -m recall.1000 src/main/resources/topics-and-qrels/qrels.msmarco-doc.dev.txt runs/run.msmarco-doc.bm25-tuned+rocchio-neg.topics.msmarco-doc.dev.txt +tools/eval/trec_eval.9.0.4/trec_eval -c -m map src/main/resources/topics-and-qrels/qrels.msmarco-doc.dev.txt runs/run.msmarco-doc.bm25-tuned+ax.topics.msmarco-doc.dev.txt +tools/eval/trec_eval.9.0.4/trec_eval -c -M 100 -m recip_rank src/main/resources/topics-and-qrels/qrels.msmarco-doc.dev.txt runs/run.msmarco-doc.bm25-tuned+ax.topics.msmarco-doc.dev.txt +tools/eval/trec_eval.9.0.4/trec_eval -c -m recall.100 src/main/resources/topics-and-qrels/qrels.msmarco-doc.dev.txt runs/run.msmarco-doc.bm25-tuned+ax.topics.msmarco-doc.dev.txt +tools/eval/trec_eval.9.0.4/trec_eval -c -m recall.1000 src/main/resources/topics-and-qrels/qrels.msmarco-doc.dev.txt runs/run.msmarco-doc.bm25-tuned+ax.topics.msmarco-doc.dev.txt + +tools/eval/trec_eval.9.0.4/trec_eval -c -m map src/main/resources/topics-and-qrels/qrels.msmarco-doc.dev.txt runs/run.msmarco-doc.bm25-tuned+prf.topics.msmarco-doc.dev.txt +tools/eval/trec_eval.9.0.4/trec_eval -c -M 100 -m recip_rank src/main/resources/topics-and-qrels/qrels.msmarco-doc.dev.txt runs/run.msmarco-doc.bm25-tuned+prf.topics.msmarco-doc.dev.txt +tools/eval/trec_eval.9.0.4/trec_eval -c -m recall.100 src/main/resources/topics-and-qrels/qrels.msmarco-doc.dev.txt runs/run.msmarco-doc.bm25-tuned+prf.topics.msmarco-doc.dev.txt +tools/eval/trec_eval.9.0.4/trec_eval -c -m recall.1000 src/main/resources/topics-and-qrels/qrels.msmarco-doc.dev.txt runs/run.msmarco-doc.bm25-tuned+prf.topics.msmarco-doc.dev.txt + tools/eval/trec_eval.9.0.4/trec_eval -c -m map src/main/resources/topics-and-qrels/qrels.msmarco-doc.dev.txt runs/run.msmarco-doc.bm25-tuned2.topics.msmarco-doc.dev.txt tools/eval/trec_eval.9.0.4/trec_eval -c -M 100 -m recip_rank src/main/resources/topics-and-qrels/qrels.msmarco-doc.dev.txt runs/run.msmarco-doc.bm25-tuned2.topics.msmarco-doc.dev.txt tools/eval/trec_eval.9.0.4/trec_eval -c -m recall.100 src/main/resources/topics-and-qrels/qrels.msmarco-doc.dev.txt runs/run.msmarco-doc.bm25-tuned2.topics.msmarco-doc.dev.txt @@ -197,27 +259,37 @@ tools/eval/trec_eval.9.0.4/trec_eval -c -m map src/main/resources/topics-and-qre tools/eval/trec_eval.9.0.4/trec_eval -c -M 100 -m recip_rank src/main/resources/topics-and-qrels/qrels.msmarco-doc.dev.txt runs/run.msmarco-doc.bm25-tuned2+rocchio-neg.topics.msmarco-doc.dev.txt tools/eval/trec_eval.9.0.4/trec_eval -c -m recall.100 src/main/resources/topics-and-qrels/qrels.msmarco-doc.dev.txt runs/run.msmarco-doc.bm25-tuned2+rocchio-neg.topics.msmarco-doc.dev.txt tools/eval/trec_eval.9.0.4/trec_eval -c -m recall.1000 src/main/resources/topics-and-qrels/qrels.msmarco-doc.dev.txt runs/run.msmarco-doc.bm25-tuned2+rocchio-neg.topics.msmarco-doc.dev.txt + +tools/eval/trec_eval.9.0.4/trec_eval -c -m map src/main/resources/topics-and-qrels/qrels.msmarco-doc.dev.txt runs/run.msmarco-doc.bm25-tuned2+ax.topics.msmarco-doc.dev.txt +tools/eval/trec_eval.9.0.4/trec_eval -c -M 100 -m recip_rank src/main/resources/topics-and-qrels/qrels.msmarco-doc.dev.txt runs/run.msmarco-doc.bm25-tuned2+ax.topics.msmarco-doc.dev.txt +tools/eval/trec_eval.9.0.4/trec_eval -c -m recall.100 src/main/resources/topics-and-qrels/qrels.msmarco-doc.dev.txt runs/run.msmarco-doc.bm25-tuned2+ax.topics.msmarco-doc.dev.txt +tools/eval/trec_eval.9.0.4/trec_eval -c -m recall.1000 src/main/resources/topics-and-qrels/qrels.msmarco-doc.dev.txt runs/run.msmarco-doc.bm25-tuned2+ax.topics.msmarco-doc.dev.txt + +tools/eval/trec_eval.9.0.4/trec_eval -c -m map src/main/resources/topics-and-qrels/qrels.msmarco-doc.dev.txt runs/run.msmarco-doc.bm25-tuned2+prf.topics.msmarco-doc.dev.txt +tools/eval/trec_eval.9.0.4/trec_eval -c -M 100 -m recip_rank src/main/resources/topics-and-qrels/qrels.msmarco-doc.dev.txt runs/run.msmarco-doc.bm25-tuned2+prf.topics.msmarco-doc.dev.txt +tools/eval/trec_eval.9.0.4/trec_eval -c -m recall.100 src/main/resources/topics-and-qrels/qrels.msmarco-doc.dev.txt runs/run.msmarco-doc.bm25-tuned2+prf.topics.msmarco-doc.dev.txt +tools/eval/trec_eval.9.0.4/trec_eval -c -m recall.1000 src/main/resources/topics-and-qrels/qrels.msmarco-doc.dev.txt runs/run.msmarco-doc.bm25-tuned2+prf.topics.msmarco-doc.dev.txt ``` ## Effectiveness With the above commands, you should be able to reproduce the following results: -| **AP@1000** | **BM25 (default)**| **+RM3** | **+Rocchio**| **+Rocchio***| **BM25 (tuned)**| **+RM3** | **+Rocchio**| **+Rocchio***| **BM25 (tuned2)**| **+RM3** | **+Rocchio**| **+Rocchio***| -|:-------------------------------------------------------------------------------------------------------------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------| -| [MS MARCO Doc: Dev](https://github.com/microsoft/MSMARCO-Document-Ranking) | 0.2305 | 0.1631 | 0.1632 | 0.1630 | 0.2784 | 0.2289 | 0.2280 | 0.2271 | 0.2774 | 0.2239 | 0.2248 | 0.2231 | -| **RR@100** | **BM25 (default)**| **+RM3** | **+Rocchio**| **+Rocchio***| **BM25 (tuned)**| **+RM3** | **+Rocchio**| **+Rocchio***| **BM25 (tuned2)**| **+RM3** | **+Rocchio**| **+Rocchio***| -| [MS MARCO Doc: Dev](https://github.com/microsoft/MSMARCO-Document-Ranking) | 0.2299 | 0.1622 | 0.1624 | 0.1622 | 0.2778 | 0.2282 | 0.2274 | 0.2264 | 0.2768 | 0.2231 | 0.2242 | 0.2224 | -| **R@100** | **BM25 (default)**| **+RM3** | **+Rocchio**| **+Rocchio***| **BM25 (tuned)**| **+RM3** | **+Rocchio**| **+Rocchio***| **BM25 (tuned2)**| **+RM3** | **+Rocchio**| **+Rocchio***| -| [MS MARCO Doc: Dev](https://github.com/microsoft/MSMARCO-Document-Ranking) | 0.7281 | 0.6767 | 0.6763 | 0.6792 | 0.8069 | 0.7878 | 0.7901 | 0.7922 | 0.8070 | 0.7791 | 0.7878 | 0.7863 | -| **R@1000** | **BM25 (default)**| **+RM3** | **+Rocchio**| **+Rocchio***| **BM25 (tuned)**| **+RM3** | **+Rocchio**| **+Rocchio***| **BM25 (tuned2)**| **+RM3** | **+Rocchio**| **+Rocchio***| -| [MS MARCO Doc: Dev](https://github.com/microsoft/MSMARCO-Document-Ranking) | 0.8856 | 0.8791 | 0.8789 | 0.8808 | 0.9324 | 0.9314 | 0.9334 | 0.9326 | 0.9357 | 0.9305 | 0.9316 | 0.9316 | +| **AP@1000** | **BM25 (default)**| **+RM3** | **+Rocchio**| **+Rocchio***| **+Ax** | **+PRF** | **BM25 (tuned)**| **+RM3** | **+Rocchio**| **+Rocchio***| **+Ax** | **+PRF** | **BM25 (tuned2)**| **+RM3** | **+Rocchio**| **+Rocchio***| **+Ax** | **+PRF** | +|:-------------------------------------------------------------------------------------------------------------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------| +| [MS MARCO Doc: Dev](https://github.com/microsoft/MSMARCO-Document-Ranking) | 0.2305 | 0.1631 | 0.1632 | 0.1630 | 0.1146 | 0.1357 | 0.2784 | 0.2289 | 0.2280 | 0.2271 | 0.1888 | 0.1559 | 0.2774 | 0.2239 | 0.2248 | 0.2231 | 0.1886 | 0.1530 | +| **RR@100** | **BM25 (default)**| **+RM3** | **+Rocchio**| **+Rocchio***| **+Ax** | **+PRF** | **BM25 (tuned)**| **+RM3** | **+Rocchio**| **+Rocchio***| **+Ax** | **+PRF** | **BM25 (tuned2)**| **+RM3** | **+Rocchio**| **+Rocchio***| **+Ax** | **+PRF** | +| [MS MARCO Doc: Dev](https://github.com/microsoft/MSMARCO-Document-Ranking) | 0.2299 | 0.1622 | 0.1624 | 0.1622 | 0.1135 | 0.1347 | 0.2778 | 0.2282 | 0.2274 | 0.2264 | 0.1880 | 0.1550 | 0.2768 | 0.2231 | 0.2242 | 0.2224 | 0.1878 | 0.1521 | +| **R@100** | **BM25 (default)**| **+RM3** | **+Rocchio**| **+Rocchio***| **+Ax** | **+PRF** | **BM25 (tuned)**| **+RM3** | **+Rocchio**| **+Rocchio***| **+Ax** | **+PRF** | **BM25 (tuned2)**| **+RM3** | **+Rocchio**| **+Rocchio***| **+Ax** | **+PRF** | +| [MS MARCO Doc: Dev](https://github.com/microsoft/MSMARCO-Document-Ranking) | 0.7281 | 0.6767 | 0.6763 | 0.6792 | 0.5754 | 0.6374 | 0.8069 | 0.7878 | 0.7901 | 0.7922 | 0.7560 | 0.6852 | 0.8070 | 0.7791 | 0.7878 | 0.7863 | 0.7526 | 0.6825 | +| **R@1000** | **BM25 (default)**| **+RM3** | **+Rocchio**| **+Rocchio***| **+Ax** | **+PRF** | **BM25 (tuned)**| **+RM3** | **+Rocchio**| **+Rocchio***| **+Ax** | **+PRF** | **BM25 (tuned2)**| **+RM3** | **+Rocchio**| **+Rocchio***| **+Ax** | **+PRF** | +| [MS MARCO Doc: Dev](https://github.com/microsoft/MSMARCO-Document-Ranking) | 0.8856 | 0.8791 | 0.8789 | 0.8808 | 0.8373 | 0.8471 | 0.9324 | 0.9314 | 0.9334 | 0.9326 | 0.9268 | 0.8760 | 0.9357 | 0.9305 | 0.9316 | 0.9316 | 0.9249 | 0.8766 | Explanation of settings: + The setting "default" refers the default BM25 settings of `k1=0.9`, `b=0.4`. -+ The setting "tuned" refers to `k1=3.44`, `b=0.87`, tuned on 2019/06 and used for TREC 2019 Deep Learning Track baseline runs. -+ The setting "tuned2" refers to `k1=4.46`, `b=0.82`, tuned to optimize for recall@100 (i.e., for first-stage retrieval) on 2019/12; see [this page](experiments-msmarco-doc.md) additional details. ++ The setting "tuned" refers to `k1=3.44`, `b=0.87`, tuned in 2019/06 using the MS MARCO document sparse judgments to optimize for MAP and used for TREC 2019 Deep Learning Track baseline runs. ++ The setting "tuned2" refers to `k1=4.46`, `b=0.82`, tuned in 2020/12 using the MS MARCO document sparse judgments to optimize for recall@100 (i.e., for first-stage retrieval); see [this page](experiments-msmarco-doc.md) additional details. See [this page](experiments-msmarco-doc.md) for more details on tuning. diff --git a/docs/regressions-msmarco-passage-docTTTTTquery.md b/docs/regressions-msmarco-passage-docTTTTTquery.md index aeb2de1cfd..015ffb99d7 100644 --- a/docs/regressions-msmarco-passage-docTTTTTquery.md +++ b/docs/regressions-msmarco-passage-docTTTTTquery.md @@ -70,7 +70,7 @@ target/appassembler/bin/SearchCollection \ -topics src/main/resources/topics-and-qrels/topics.msmarco-passage.dev-subset.txt \ -topicreader TsvInt \ -output runs/run.msmarco-passage-docTTTTTquery.bm25-default+rocchio-neg.topics.msmarco-passage.dev-subset.txt \ - -bm25 -rocchio -rocchio.useNegative & + -bm25 -rocchio -rocchio.useNegative -rerankCutoff 1000 & target/appassembler/bin/SearchCollection \ -index indexes/lucene-index.msmarco-passage-docTTTTTquery/ \ @@ -98,7 +98,7 @@ target/appassembler/bin/SearchCollection \ -topics src/main/resources/topics-and-qrels/topics.msmarco-passage.dev-subset.txt \ -topicreader TsvInt \ -output runs/run.msmarco-passage-docTTTTTquery.bm25-tuned+rocchio-neg.topics.msmarco-passage.dev-subset.txt \ - -bm25 -bm25.k1 0.82 -bm25.b 0.68 -rocchio -rocchio.useNegative & + -bm25 -bm25.k1 0.82 -bm25.b 0.68 -rocchio -rocchio.useNegative -rerankCutoff 1000 & target/appassembler/bin/SearchCollection \ -index indexes/lucene-index.msmarco-passage-docTTTTTquery/ \ @@ -126,7 +126,7 @@ target/appassembler/bin/SearchCollection \ -topics src/main/resources/topics-and-qrels/topics.msmarco-passage.dev-subset.txt \ -topicreader TsvInt \ -output runs/run.msmarco-passage-docTTTTTquery.bm25-tuned2+rocchio-neg.topics.msmarco-passage.dev-subset.txt \ - -bm25 -bm25.k1 2.18 -bm25.b 0.86 -rocchio -rocchio.useNegative & + -bm25 -bm25.k1 2.18 -bm25.b 0.86 -rocchio -rocchio.useNegative -rerankCutoff 1000 & ``` Evaluation can be performed using `trec_eval`: @@ -199,13 +199,13 @@ With the above commands, you should be able to reproduce the following results: | **AP@1000** | **BM25 (default)**| **+RM3** | **+Rocchio**| **+Rocchio***| **BM25 (tuned)**| **+RM3** | **+Rocchio**| **+Rocchio***| **BM25 (tuned2)**| **+RM3** | **+Rocchio**| **+Rocchio***| |:-------------------------------------------------------------------------------------------------------------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------| -| [MS MARCO Passage: Dev](https://github.com/microsoft/MSMARCO-Passage-Ranking) | 0.2805 | 0.2243 | 0.2260 | 0.2255 | 0.2850 | 0.2266 | 0.2278 | 0.2280 | 0.2893 | 0.2464 | 0.2489 | 0.2492 | +| [MS MARCO Passage: Dev](https://github.com/microsoft/MSMARCO-Passage-Ranking) | 0.2805 | 0.2243 | 0.2260 | 0.2234 | 0.2850 | 0.2266 | 0.2278 | 0.2276 | 0.2893 | 0.2464 | 0.2489 | 0.2467 | | **RR@10** | **BM25 (default)**| **+RM3** | **+Rocchio**| **+Rocchio***| **BM25 (tuned)**| **+RM3** | **+Rocchio**| **+Rocchio***| **BM25 (tuned2)**| **+RM3** | **+Rocchio**| **+Rocchio***| -| [MS MARCO Passage: Dev](https://github.com/microsoft/MSMARCO-Passage-Ranking) | 0.2723 | 0.2140 | 0.2158 | 0.2155 | 0.2768 | 0.2162 | 0.2174 | 0.2176 | 0.2816 | 0.2374 | 0.2396 | 0.2400 | +| [MS MARCO Passage: Dev](https://github.com/microsoft/MSMARCO-Passage-Ranking) | 0.2723 | 0.2140 | 0.2158 | 0.2129 | 0.2768 | 0.2162 | 0.2174 | 0.2173 | 0.2816 | 0.2374 | 0.2396 | 0.2373 | | **R@100** | **BM25 (default)**| **+RM3** | **+Rocchio**| **+Rocchio***| **BM25 (tuned)**| **+RM3** | **+Rocchio**| **+Rocchio***| **BM25 (tuned2)**| **+RM3** | **+Rocchio**| **+Rocchio***| -| [MS MARCO Passage: Dev](https://github.com/microsoft/MSMARCO-Passage-Ranking) | 0.8192 | 0.7995 | 0.8017 | 0.8054 | 0.8190 | 0.8033 | 0.8052 | 0.8079 | 0.8277 | 0.8228 | 0.8238 | 0.8265 | +| [MS MARCO Passage: Dev](https://github.com/microsoft/MSMARCO-Passage-Ranking) | 0.8192 | 0.7995 | 0.8017 | 0.8023 | 0.8190 | 0.8033 | 0.8052 | 0.8071 | 0.8277 | 0.8228 | 0.8238 | 0.8227 | | **R@1000** | **BM25 (default)**| **+RM3** | **+Rocchio**| **+Rocchio***| **BM25 (tuned)**| **+RM3** | **+Rocchio**| **+Rocchio***| **BM25 (tuned2)**| **+RM3** | **+Rocchio**| **+Rocchio***| -| [MS MARCO Passage: Dev](https://github.com/microsoft/MSMARCO-Passage-Ranking) | 0.9470 | 0.9463 | 0.9467 | 0.9474 | 0.9471 | 0.9479 | 0.9496 | 0.9497 | 0.9506 | 0.9528 | 0.9535 | 0.9534 | +| [MS MARCO Passage: Dev](https://github.com/microsoft/MSMARCO-Passage-Ranking) | 0.9470 | 0.9463 | 0.9467 | 0.9475 | 0.9471 | 0.9479 | 0.9496 | 0.9493 | 0.9506 | 0.9528 | 0.9535 | 0.9539 | Explanation of settings: diff --git a/docs/regressions-msmarco-passage.md b/docs/regressions-msmarco-passage.md index a83ae447f7..2842473a06 100644 --- a/docs/regressions-msmarco-passage.md +++ b/docs/regressions-msmarco-passage.md @@ -207,7 +207,7 @@ With the above commands, you should be able to reproduce the following results: Explanation of settings: + The setting "default" refers the default BM25 settings of `k1=0.9`, `b=0.4`. -+ The setting "tuned" refers to `k1=0.82`, `b=0.68`, as described in [this page](experiments-msmarco-passage.md). Note that results here are slightly different from the above referenced page because those experiments used `SearchMsmarco` to generate runs in the MS MARCO format, and then converted them into the TREC format, which is slightly lossy (due to tie-breaking effects). ++ The setting "tuned" refers to `k1=0.82`, `b=0.68`, as described in [this page](experiments-msmarco-passage.md). To generate runs corresponding to the submissions on the [MS MARCO Passage Ranking Leaderboard](https://microsoft.github.io/msmarco/), follow the instructions below: diff --git a/src/main/resources/docgen/templates/dl19-doc-docTTTTTquery.template b/src/main/resources/docgen/templates/dl19-doc-docTTTTTquery.template index 27e790b362..0b0afe237a 100644 --- a/src/main/resources/docgen/templates/dl19-doc-docTTTTTquery.template +++ b/src/main/resources/docgen/templates/dl19-doc-docTTTTTquery.template @@ -67,7 +67,7 @@ ${effectiveness} Explanation of settings: + The setting "default" refers the default BM25 settings of `k1=0.9`, `b=0.4`. -+ The setting "tuned" refers to `k1=4.68`, `b=0.87`, tuned using the MS MARCO document sparse judgments to optimize for recall@100 (i.e., for first-stage retrieval) on 2019/12. ++ The setting "tuned" refers to `k1=4.68`, `b=0.87`, tuned in 2020/12 using the MS MARCO document sparse judgments to optimize for recall@100 (i.e., for first-stage retrieval). Settings tuned on the MS MARCO document sparse judgments _may not_ work well on the TREC dense judgments. diff --git a/src/main/resources/docgen/templates/dl19-doc-segmented-docTTTTTquery.template b/src/main/resources/docgen/templates/dl19-doc-segmented-docTTTTTquery.template index b6de276896..31ffdea3f5 100644 --- a/src/main/resources/docgen/templates/dl19-doc-segmented-docTTTTTquery.template +++ b/src/main/resources/docgen/templates/dl19-doc-segmented-docTTTTTquery.template @@ -68,7 +68,7 @@ ${effectiveness} Explanation of settings: + The setting "default" refers the default BM25 settings of `k1=0.9`, `b=0.4`. -+ The setting "tuned" refers to `k1=2.56`, `b=0.59`, tuned using the MS MARCO document sparse judgments to optimize for recall@100 (i.e., for first-stage retrieval) on 2020/12. ++ The setting "tuned" refers to `k1=2.56`, `b=0.59`, tuned in 2020/12 using the MS MARCO document sparse judgments to optimize for recall@100 (i.e., for first-stage retrieval). Settings tuned on the MS MARCO document sparse judgments _may not_ work well on the TREC dense judgments. diff --git a/src/main/resources/docgen/templates/dl19-doc-segmented.template b/src/main/resources/docgen/templates/dl19-doc-segmented.template index 3d0cba8fa1..83d2f17db7 100644 --- a/src/main/resources/docgen/templates/dl19-doc-segmented.template +++ b/src/main/resources/docgen/templates/dl19-doc-segmented.template @@ -68,7 +68,7 @@ ${effectiveness} Explanation of settings: + The setting "default" refers the default BM25 settings of `k1=0.9`, `b=0.4`. -+ The setting "tuned" refers to `k1=2.16`, `b=0.61`, tuned using the MS MARCO document sparse judgments to optimize for recall@100 (i.e., for first-stage retrieval) on 2020/12. ++ The setting "tuned" refers to `k1=2.16`, `b=0.61`, tuned in 2020/12 using the MS MARCO document sparse judgments to optimize for recall@100 (i.e., for first-stage retrieval). Settings tuned on the MS MARCO document sparse judgments _may not_ work well on the TREC dense judgments. diff --git a/src/main/resources/docgen/templates/dl19-doc.template b/src/main/resources/docgen/templates/dl19-doc.template index 0d4f0f0908..cc048f596b 100644 --- a/src/main/resources/docgen/templates/dl19-doc.template +++ b/src/main/resources/docgen/templates/dl19-doc.template @@ -67,7 +67,8 @@ ${effectiveness} Explanation of settings: + The setting "default" refers the default BM25 settings of `k1=0.9`, `b=0.4`. -+ The setting "tuned" refers to `k1=3.44`, `b=0.87`, tuned using the MS MARCO document sparse judgments on 2019/06. ++ The setting "tuned" refers to `k1=3.44`, `b=0.87`, tuned in 2019/06 using the MS MARCO document sparse judgments to optimize for MAP and used for TREC 2019 Deep Learning Track baseline runs. ++ The setting "tuned2" refers to `k1=4.46`, `b=0.82`, tuned in 2020/12 using the MS MARCO document sparse judgments to optimize for recall@100 (i.e., for first-stage retrieval); see [this page](experiments-msmarco-doc.md) additional details. Settings tuned on the MS MARCO document sparse judgments _may not_ work well on the TREC dense judgments. diff --git a/src/main/resources/docgen/templates/dl20-doc-docTTTTTquery.template b/src/main/resources/docgen/templates/dl20-doc-docTTTTTquery.template index 42ea024b16..59d32fdf67 100644 --- a/src/main/resources/docgen/templates/dl20-doc-docTTTTTquery.template +++ b/src/main/resources/docgen/templates/dl20-doc-docTTTTTquery.template @@ -67,7 +67,7 @@ ${effectiveness} Explanation of settings: + The setting "default" refers the default BM25 settings of `k1=0.9`, `b=0.4`. -+ The setting "tuned" refers to `k1=4.68`, `b=0.87`, tuned using the MS MARCO document sparse judgments to optimize for recall@100 (i.e., for first-stage retrieval) on 2019/12. ++ The setting "tuned" refers to `k1=4.68`, `b=0.87`, tuned in 2020/12 using the MS MARCO document sparse judgments to optimize for recall@100 (i.e., for first-stage retrieval). Settings tuned on the MS MARCO document sparse judgments _may not_ work well on the TREC dense judgments. diff --git a/src/main/resources/docgen/templates/dl20-doc-segmented-docTTTTTquery.template b/src/main/resources/docgen/templates/dl20-doc-segmented-docTTTTTquery.template index 3a5126672d..342999c592 100644 --- a/src/main/resources/docgen/templates/dl20-doc-segmented-docTTTTTquery.template +++ b/src/main/resources/docgen/templates/dl20-doc-segmented-docTTTTTquery.template @@ -68,7 +68,7 @@ ${effectiveness} Explanation of settings: + The setting "default" refers the default BM25 settings of `k1=0.9`, `b=0.4`. -+ The setting "tuned" refers to `k1=2.56`, `b=0.59`, tuned using the MS MARCO document sparse judgments to optimize for recall@100 (i.e., for first-stage retrieval) on 2019/12. ++ The setting "tuned" refers to `k1=2.56`, `b=0.59`, tuned in 2020/12 using the MS MARCO document sparse judgments to optimize for recall@100 (i.e., for first-stage retrieval). Settings tuned on the MS MARCO document sparse judgments _may not_ work well on the TREC dense judgments. diff --git a/src/main/resources/docgen/templates/dl20-doc-segmented.template b/src/main/resources/docgen/templates/dl20-doc-segmented.template index d1a30aa600..7825806558 100644 --- a/src/main/resources/docgen/templates/dl20-doc-segmented.template +++ b/src/main/resources/docgen/templates/dl20-doc-segmented.template @@ -68,7 +68,7 @@ ${effectiveness} Explanation of settings: + The setting "default" refers the default BM25 settings of `k1=0.9`, `b=0.4`. -+ The setting "tuned" refers to `k1=2.16`, `b=0.61`, tuned using the MS MARCO document sparse judgments to optimize for recall@100 (i.e., for first-stage retrieval) on 2020/12. ++ The setting "tuned" refers to `k1=2.16`, `b=0.61`, tuned in 2020/12 using the MS MARCO document sparse judgments to optimize for recall@100 (i.e., for first-stage retrieval). Settings tuned on the MS MARCO document sparse judgments _may not_ work well on the TREC dense judgments. diff --git a/src/main/resources/docgen/templates/dl20-doc.template b/src/main/resources/docgen/templates/dl20-doc.template index 61f270dd62..2923b1eb4b 100644 --- a/src/main/resources/docgen/templates/dl20-doc.template +++ b/src/main/resources/docgen/templates/dl20-doc.template @@ -67,8 +67,8 @@ ${effectiveness} Explanation of settings: + The setting "default" refers the default BM25 settings of `k1=0.9`, `b=0.4`. -+ The setting "tuned" refers to `k1=3.44`, `b=0.87`, tuned on 2019/06 and used for TREC 2019 Deep Learning Track baseline runs. -+ The setting "tuned2" refers to `k1=4.46`, `b=0.82`, tuned using the MS MARCO document sparse judgments to optimize for recall@100 (i.e., for first-stage retrieval) on 2019/12; see [this page](experiments-msmarco-doc.md) additional details. ++ The setting "tuned" refers to `k1=3.44`, `b=0.87`, tuned in 2019/06 using the MS MARCO document sparse judgments to optimize for MAP and used for TREC 2019 Deep Learning Track baseline runs. ++ The setting "tuned2" refers to `k1=4.46`, `b=0.82`, tuned in 2020/12 using the MS MARCO document sparse judgments to optimize for recall@100 (i.e., for first-stage retrieval); see [this page](experiments-msmarco-doc.md) additional details. Settings tuned on the MS MARCO document sparse judgments _may not_ work well on the TREC dense judgments. diff --git a/src/main/resources/docgen/templates/msmarco-doc-docTTTTTquery.template b/src/main/resources/docgen/templates/msmarco-doc-docTTTTTquery.template index b400fe9a29..24d611781f 100644 --- a/src/main/resources/docgen/templates/msmarco-doc-docTTTTTquery.template +++ b/src/main/resources/docgen/templates/msmarco-doc-docTTTTTquery.template @@ -62,7 +62,7 @@ ${effectiveness} Explanation of settings: + The setting "default" refers the default BM25 settings of `k1=0.9`, `b=0.4`. -+ The setting "tuned" refers to `k1=4.68`, `b=0.87`, tuned to optimize for recall@100 (i.e., for first-stage retrieval) on 2019/12. ++ The setting "tuned" refers to `k1=4.68`, `b=0.87`, tuned in 2020/12 using the MS MARCO document sparse judgments to optimize for recall@100 (i.e., for first-stage retrieval). In these runs, we are retrieving the top 1000 hits for each query and using `trec_eval` to evaluate all 1000 hits. This lets us measure R@100 and R@1000; the latter is particularly important when these runs are used as first-stage retrieval. diff --git a/src/main/resources/docgen/templates/msmarco-doc-segmented-docTTTTTquery.template b/src/main/resources/docgen/templates/msmarco-doc-segmented-docTTTTTquery.template index 5609bb081e..b3d40a62d4 100644 --- a/src/main/resources/docgen/templates/msmarco-doc-segmented-docTTTTTquery.template +++ b/src/main/resources/docgen/templates/msmarco-doc-segmented-docTTTTTquery.template @@ -63,7 +63,7 @@ ${effectiveness} Explanation of settings: + The setting "default" refers the default BM25 settings of `k1=0.9`, `b=0.4`. -+ The setting "tuned" refers to `k1=2.56`, `b=0.59`, tuned to optimize for recall@100 (i.e., for first-stage retrieval) on 2019/12. ++ The setting "tuned" refers to `k1=2.56`, `b=0.59`, tuned in 2020/12 using the MS MARCO document sparse judgments to optimize for recall@100 (i.e., for first-stage retrieval). In these runs, we are retrieving the top 1000 hits for each query and using `trec_eval` to evaluate all 1000 hits. Since we're in the passage condition, we fetch the 10000 passages and select the top 1000 documents using MaxP. diff --git a/src/main/resources/docgen/templates/msmarco-doc-segmented.template b/src/main/resources/docgen/templates/msmarco-doc-segmented.template index 24db33133d..da0be3a83f 100644 --- a/src/main/resources/docgen/templates/msmarco-doc-segmented.template +++ b/src/main/resources/docgen/templates/msmarco-doc-segmented.template @@ -63,7 +63,7 @@ ${effectiveness} Explanation of settings: + The setting "default" refers the default BM25 settings of `k1=0.9`, `b=0.4`. -+ The setting "tuned" refers to `k1=2.16`, `b=0.61`, tuned to optimize for recall@100 (i.e., for first-stage retrieval) on 2019/12. ++ The setting "tuned" refers to `k1=2.16`, `b=0.61`, tuned in 2020/12 using the MS MARCO document sparse judgments to optimize for recall@100 (i.e., for first-stage retrieval). In these runs, we are retrieving the top 1000 hits for each query and using `trec_eval` to evaluate all 1000 hits. Since we're in the passage condition, we fetch the 10000 passages and select the top 1000 documents using MaxP. diff --git a/src/main/resources/docgen/templates/msmarco-doc.template b/src/main/resources/docgen/templates/msmarco-doc.template index cbe363df22..6ae5c43c8d 100644 --- a/src/main/resources/docgen/templates/msmarco-doc.template +++ b/src/main/resources/docgen/templates/msmarco-doc.template @@ -62,8 +62,8 @@ ${effectiveness} Explanation of settings: + The setting "default" refers the default BM25 settings of `k1=0.9`, `b=0.4`. -+ The setting "tuned" refers to `k1=3.44`, `b=0.87`, tuned on 2019/06 and used for TREC 2019 Deep Learning Track baseline runs. -+ The setting "tuned2" refers to `k1=4.46`, `b=0.82`, tuned to optimize for recall@100 (i.e., for first-stage retrieval) on 2019/12; see [this page](experiments-msmarco-doc.md) additional details. ++ The setting "tuned" refers to `k1=3.44`, `b=0.87`, tuned in 2019/06 using the MS MARCO document sparse judgments to optimize for MAP and used for TREC 2019 Deep Learning Track baseline runs. ++ The setting "tuned2" refers to `k1=4.46`, `b=0.82`, tuned in 2020/12 using the MS MARCO document sparse judgments to optimize for recall@100 (i.e., for first-stage retrieval); see [this page](experiments-msmarco-doc.md) additional details. See [this page](experiments-msmarco-doc.md) for more details on tuning. diff --git a/src/main/resources/docgen/templates/msmarco-passage.template b/src/main/resources/docgen/templates/msmarco-passage.template index 789695c89d..464181f447 100644 --- a/src/main/resources/docgen/templates/msmarco-passage.template +++ b/src/main/resources/docgen/templates/msmarco-passage.template @@ -53,7 +53,7 @@ ${effectiveness} Explanation of settings: + The setting "default" refers the default BM25 settings of `k1=0.9`, `b=0.4`. -+ The setting "tuned" refers to `k1=0.82`, `b=0.68`, as described in [this page](experiments-msmarco-passage.md). Note that results here are slightly different from the above referenced page because those experiments used `SearchMsmarco` to generate runs in the MS MARCO format, and then converted them into the TREC format, which is slightly lossy (due to tie-breaking effects). ++ The setting "tuned" refers to `k1=0.82`, `b=0.68`, as described in [this page](experiments-msmarco-passage.md). To generate runs corresponding to the submissions on the [MS MARCO Passage Ranking Leaderboard](https://microsoft.github.io/msmarco/), follow the instructions below: diff --git a/src/main/resources/regression/dl19-doc-docTTTTTquery.yaml b/src/main/resources/regression/dl19-doc-docTTTTTquery.yaml index 54789a3375..d9066e8931 100644 --- a/src/main/resources/regression/dl19-doc-docTTTTTquery.yaml +++ b/src/main/resources/regression/dl19-doc-docTTTTTquery.yaml @@ -76,6 +76,18 @@ models: - 0.4465 R@1000: - 0.7738 + - name: bm25-default+rocchio + display: +Rocchio + params: -bm25 -rocchio + results: + AP@100: + - 0.3092 + nDCG@10: + - 0.5956 + R@100: + - 0.4505 + R@1000: + - 0.7758 - name: bm25-tuned display: BM25 (tuned) params: -bm25 -bm25.k1 4.68 -bm25.b 0.87 @@ -100,3 +112,15 @@ models: - 0.4119 R@1000: - 0.7177 + - name: bm25-tuned+rocchio + display: +Rocchio + params: -bm25 -bm25.k1 4.68 -bm25.b 0.87 -rocchio + results: + AP@100: + - 0.2843 + nDCG@10: + - 0.6141 + R@100: + - 0.4246 + R@1000: + - 0.7276 diff --git a/src/main/resources/regression/dl19-doc-segmented-docTTTTTquery.yaml b/src/main/resources/regression/dl19-doc-segmented-docTTTTTquery.yaml index bf8ba2205c..919277a503 100644 --- a/src/main/resources/regression/dl19-doc-segmented-docTTTTTquery.yaml +++ b/src/main/resources/regression/dl19-doc-segmented-docTTTTTquery.yaml @@ -76,6 +76,18 @@ models: - 0.4392 R@1000: - 0.7481 + - name: bm25-default+rocchio + display: +Rocchio + params: -bm25 -rocchio -hits 10000 -selectMaxPassage -selectMaxPassage.delimiter "#" -selectMaxPassage.hits 1000 + results: + AP@100: + - 0.3074 + nDCG@10: + - 0.6295 + R@100: + - 0.4483 + R@1000: + - 0.7520 - name: bm25-tuned display: BM25 (tuned) params: -bm25 -bm25.k1 2.56 -bm25.b 0.59 -hits 10000 -selectMaxPassage -selectMaxPassage.delimiter "#" -selectMaxPassage.hits 1000 @@ -100,3 +112,15 @@ models: - 0.4237 R@1000: - 0.7066 + - name: bm25-tuned+rocchio + display: +Rocchio + params: -bm25 -bm25.k1 2.56 -bm25.b 0.59 -rocchio -hits 10000 -selectMaxPassage -selectMaxPassage.delimiter "#" -selectMaxPassage.hits 1000 + results: + AP@100: + - 0.2913 + nDCG@10: + - 0.6244 + R@100: + - 0.4271 + R@1000: + - 0.7189 diff --git a/src/main/resources/regression/dl19-doc.yaml b/src/main/resources/regression/dl19-doc.yaml index 0157cffcb5..273ad94126 100644 --- a/src/main/resources/regression/dl19-doc.yaml +++ b/src/main/resources/regression/dl19-doc.yaml @@ -196,3 +196,75 @@ models: - 0.4295 R@1000: - 0.7559 + - name: bm25-tuned2 + display: BM25 (tuned2) + params: -bm25 -bm25.k1 4.46 -bm25.b 0.82 + results: + AP@100: + - 0.2336 + nDCG@10: + - 0.5233 + R@100: + - 0.3849 + R@1000: + - 0.6757 + - name: bm25-tuned2+rm3 + display: +RM3 + params: -bm25 -bm25.k1 4.46 -bm25.b 0.82 -rm3 + results: + AP@100: + - 0.2643 + nDCG@10: + - 0.5526 + R@100: + - 0.4131 + R@1000: + - 0.7189 + - name: bm25-tuned2+rocchio + display: +Rocchio + params: -bm25 -bm25.k1 4.46 -bm25.b 0.82 -rocchio + results: + AP@100: + - 0.2657 + nDCG@10: + - 0.5584 + R@100: + - 0.4164 + R@1000: + - 0.7299 + - name: bm25-tuned2+rocchio-neg + display: +Rocchio* + params: -bm25 -bm25.k1 4.46 -bm25.b 0.82 -rocchio -rocchio.useNegative -rerankCutoff 1000 + results: + AP@100: + - 0.2670 + nDCG@10: + - 0.5567 + R@100: + - 0.4172 + R@1000: + - 0.7312 + - name: bm25-tuned2+ax + display: +Ax + params: -bm25 -bm25.k1 4.46 -bm25.b 0.82 -axiom -axiom.deterministic -rerankCutoff 20 + results: + AP@100: + - 0.2724 + nDCG@10: + - 0.5093 + R@100: + - 0.4332 + R@1000: + - 0.7474 + - name: bm25-tuned2+prf + display: +PRF + params: -bm25 -bm25.k1 4.46 -bm25.b 0.82 -bm25prf + results: + AP@100: + - 0.2815 + nDCG@10: + - 0.5360 + R@100: + - 0.4310 + R@1000: + - 0.7577 diff --git a/src/main/resources/regression/dl19-passage-docTTTTTquery.yaml b/src/main/resources/regression/dl19-passage-docTTTTTquery.yaml index 2654310fde..2534c73c53 100644 --- a/src/main/resources/regression/dl19-passage-docTTTTTquery.yaml +++ b/src/main/resources/regression/dl19-passage-docTTTTTquery.yaml @@ -76,6 +76,30 @@ models: - 0.6335 R@1000: - 0.8861 + - name: bm25-default+rocchio + display: +Rocchio + params: -bm25 -rocchio + results: + AP@1000: + - 0.4469 + nDCG@10: + - 0.6538 + R@100: + - 0.6338 + R@1000: + - 0.8855 + - name: bm25-default+rocchio-neg + display: +Rocchio* + params: -bm25 -rocchio -rocchio.useNegative -rerankCutoff 1000 + results: + AP@1000: + - 0.4441 + nDCG@10: + - 0.6444 + R@100: + - 0.6350 + R@1000: + - 0.8861 - name: bm25-tuned display: BM25 (tuned) params: -bm25 -bm25.k1 0.82 -bm25.b 0.68 @@ -100,6 +124,30 @@ models: - 0.6402 R@1000: - 0.8826 + - name: bm25-tuned+rocchio + display: +Rocchio + params: -bm25 -bm25.k1 0.82 -bm25.b 0.68 -rocchio + results: + AP@1000: + - 0.4525 + nDCG@10: + - 0.6617 + R@100: + - 0.6406 + R@1000: + - 0.8838 + - name: bm25-tuned+rocchio-neg + display: +Rocchio* + params: -bm25 -bm25.k1 0.82 -bm25.b 0.68 -rocchio -rocchio.useNegative -rerankCutoff 1000 + results: + AP@1000: + - 0.4511 + nDCG@10: + - 0.6591 + R@100: + - 0.6395 + R@1000: + - 0.8877 - name: bm25-tuned2 display: BM25 (tuned2) params: -bm25 -bm25.k1 2.18 -bm25.b 0.86 @@ -123,4 +171,28 @@ models: R@100: - 0.6046 R@1000: - - 0.8424 \ No newline at end of file + - 0.8424 + - name: bm25-tuned2+rocchio + display: +Rocchio + params: -bm25 -bm25.k1 2.18 -bm25.b 0.86 -rocchio + results: + AP@1000: + - 0.4339 + nDCG@10: + - 0.6559 + R@100: + - 0.6014 + R@1000: + - 0.8465 + - name: bm25-tuned2+rocchio-neg + display: +Rocchio* + params: -bm25 -bm25.k1 2.18 -bm25.b 0.86 -rocchio -rocchio.useNegative -rerankCutoff 1000 + results: + AP@1000: + - 0.4338 + nDCG@10: + - 0.6558 + R@100: + - 0.5981 + R@1000: + - 0.8488 diff --git a/src/main/resources/regression/dl20-doc-docTTTTTquery.yaml b/src/main/resources/regression/dl20-doc-docTTTTTquery.yaml index 6b0220ae86..554eb98f78 100644 --- a/src/main/resources/regression/dl20-doc-docTTTTTquery.yaml +++ b/src/main/resources/regression/dl20-doc-docTTTTTquery.yaml @@ -76,6 +76,18 @@ models: - 0.6555 R@1000: - 0.8596 + - name: bm25-default+rocchio + display: +Rocchio + params: -bm25 -rocchio + results: + AP@100: + - 0.4218 + nDCG@10: + - 0.5416 + R@100: + - 0.6627 + R@1000: + - 0.8641 - name: bm25-tuned display: BM25 (tuned) params: -bm25 -bm25.k1 4.68 -bm25.b 0.87 @@ -100,3 +112,15 @@ models: - 0.6127 R@1000: - 0.8240 + - name: bm25-tuned+rocchio + display: +Rocchio + params: -bm25 -bm25.k1 4.68 -bm25.b 0.87 -rocchio + results: + AP@100: + - 0.4151 + nDCG@10: + - 0.5733 + R@100: + - 0.6230 + R@1000: + - 0.8316 diff --git a/src/main/resources/regression/dl20-doc-segmented-docTTTTTquery.yaml b/src/main/resources/regression/dl20-doc-segmented-docTTTTTquery.yaml index 8b6572a671..0c61d523c7 100644 --- a/src/main/resources/regression/dl20-doc-segmented-docTTTTTquery.yaml +++ b/src/main/resources/regression/dl20-doc-segmented-docTTTTTquery.yaml @@ -76,6 +76,18 @@ models: - 0.6443 R@1000: - 0.8270 + - name: bm25-default+rocchio + display: +Rocchio + params: -bm25 -rocchio -hits 10000 -selectMaxPassage -selectMaxPassage.delimiter "#" -selectMaxPassage.hits 1000 + results: + AP@100: + - 0.4297 + nDCG@10: + - 0.5873 + R@100: + - 0.6475 + R@1000: + - 0.8365 - name: bm25-tuned display: BM25 (tuned) params: -bm25 -bm25.k1 2.56 -bm25.b 0.59 -hits 10000 -selectMaxPassage -selectMaxPassage.delimiter "#" -selectMaxPassage.hits 1000 @@ -100,3 +112,15 @@ models: - 0.6394 R@1000: - 0.8172 + - name: bm25-tuned+rocchio + display: +Rocchio + params: -bm25 -bm25.k1 2.56 -bm25.b 0.59 -rocchio -hits 10000 -selectMaxPassage -selectMaxPassage.delimiter "#" -selectMaxPassage.hits 1000 + results: + AP@100: + - 0.4084 + nDCG@10: + - 0.5809 + R@100: + - 0.6432 + R@1000: + - 0.8233 diff --git a/src/main/resources/regression/dl20-doc.yaml b/src/main/resources/regression/dl20-doc.yaml index 1673877f04..89641c9792 100644 --- a/src/main/resources/regression/dl20-doc.yaml +++ b/src/main/resources/regression/dl20-doc.yaml @@ -100,6 +100,30 @@ models: - 0.6431 R@1000: - 0.8270 + - name: bm25-default+ax + display: +Ax + params: -bm25 -axiom -axiom.deterministic -rerankCutoff 20 + results: + AP@100: + - 0.3133 + nDCG@10: + - 0.4275 + R@100: + - 0.5714 + R@1000: + - 0.8063 + - name: bm25-default+prf + display: +PRF + params: -bm25 -bm25prf + results: + AP@100: + - 0.3515 + nDCG@10: + - 0.4680 + R@100: + - 0.6104 + R@1000: + - 0.8084 - name: bm25-tuned display: BM25 (tuned) params: -bm25 -bm25.k1 3.44 -bm25.b 0.87 @@ -148,6 +172,30 @@ models: - 0.6045 R@1000: - 0.8238 + - name: bm25-tuned+ax + display: +Ax + params: -bm25 -bm25.k1 3.44 -bm25.b 0.87 -axiom -axiom.deterministic -rerankCutoff 20 + results: + AP@100: + - 0.3462 + nDCG@10: + - 0.4941 + R@100: + - 0.6078 + R@1000: + - 0.8455 + - name: bm25-tuned+prf + display: +PRF + params: -bm25 -bm25.k1 3.44 -bm25.b 0.87 -bm25prf + results: + AP@100: + - 0.3561 + nDCG@10: + - 0.4785 + R@100: + - 0.6198 + R@1000: + - 0.8163 - name: bm25-tuned2 display: BM25 (tuned2) params: -bm25 -bm25.k1 4.46 -bm25.b 0.82 @@ -195,4 +243,28 @@ models: R@100: - 0.6013 R@1000: - - 0.8267 \ No newline at end of file + - 0.8267 + - name: bm25-tuned2+ax + display: +Ax + params: -bm25 -bm25.k1 4.46 -bm25.b 0.82 -axiom -axiom.deterministic -rerankCutoff 20 + results: + AP@100: + - 0.3498 + nDCG@10: + - 0.5106 + R@100: + - 0.5988 + R@1000: + - 0.8435 + - name: bm25-tuned2+prf + display: +PRF + params: -bm25 -bm25.k1 4.46 -bm25.b 0.82 -bm25prf + results: + AP@100: + - 0.3530 + nDCG@10: + - 0.4775 + R@100: + - 0.6171 + R@1000: + - 0.8210 diff --git a/src/main/resources/regression/dl20-passage-docTTTTTquery.yaml b/src/main/resources/regression/dl20-passage-docTTTTTquery.yaml index c1e92fb201..aa228fdbbb 100644 --- a/src/main/resources/regression/dl20-passage-docTTTTTquery.yaml +++ b/src/main/resources/regression/dl20-passage-docTTTTTquery.yaml @@ -76,6 +76,30 @@ models: - 0.7153 R@1000: - 0.8699 + - name: bm25-default+rocchio + display: +Rocchio + params: -bm25 -rocchio + results: + AP@1000: + - 0.4246 + nDCG@10: + - 0.6102 + R@100: + - 0.7239 + R@1000: + - 0.8675 + - name: bm25-default+rocchio-neg + display: +Rocchio* + params: -bm25 -rocchio -rocchio.useNegative -rerankCutoff 1000 + results: + AP@1000: + - 0.4272 + nDCG@10: + - 0.6147 + R@100: + - 0.7170 + R@1000: + - 0.8700 - name: bm25-tuned display: BM25 (tuned) params: -bm25 -bm25.k1 0.82 -bm25.b 0.68 @@ -100,6 +124,30 @@ models: - 0.7143 R@1000: - 0.8692 + - name: bm25-tuned+rocchio + display: +Rocchio + params: -bm25 -bm25.k1 0.82 -bm25.b 0.68 -rocchio + results: + AP@1000: + - 0.4269 + nDCG@10: + - 0.6152 + R@100: + - 0.7227 + R@1000: + - 0.8694 + - name: bm25-tuned+rocchio-neg + display: +Rocchio* + params: -bm25 -bm25.k1 0.82 -bm25.b 0.68 -rocchio -rocchio.useNegative -rerankCutoff 1000 + results: + AP@1000: + - 0.4279 + nDCG@10: + - 0.6173 + R@100: + - 0.7223 + R@1000: + - 0.8689 - name: bm25-tuned2 display: BM25 (tuned2) params: -bm25 -bm25.k1 2.18 -bm25.b 0.86 @@ -123,4 +171,28 @@ models: R@100: - 0.7109 R@1000: - - 0.8609 \ No newline at end of file + - 0.8609 + - name: bm25-tuned2+rocchio + display: +Rocchio + params: -bm25 -bm25.k1 2.18 -bm25.b 0.86 -rocchio + results: + AP@1000: + - 0.4376 + nDCG@10: + - 0.6224 + R@100: + - 0.7126 + R@1000: + - 0.8641 + - name: bm25-tuned2+rocchio-neg + display: +Rocchio* + params: -bm25 -bm25.k1 2.18 -bm25.b 0.86 -rocchio -rocchio.useNegative -rerankCutoff 1000 + results: + AP@1000: + - 0.4366 + nDCG@10: + - 0.6279 + R@100: + - 0.7125 + R@1000: + - 0.8657 diff --git a/src/main/resources/regression/msmarco-doc-docTTTTTquery.yaml b/src/main/resources/regression/msmarco-doc-docTTTTTquery.yaml index 746a221518..b66a21daa1 100644 --- a/src/main/resources/regression/msmarco-doc-docTTTTTquery.yaml +++ b/src/main/resources/regression/msmarco-doc-docTTTTTquery.yaml @@ -76,6 +76,18 @@ models: - 0.7420 R@1000: - 0.9128 + - name: bm25-default+rocchio + display: +Rocchio + params: -bm25 -rocchio + results: + AP@1000: + - 0.1841 + RR@100: + - 0.1833 + R@100: + - 0.7441 + R@1000: + - 0.9128 - name: bm25-tuned display: BM25 (tuned) params: -bm25 -bm25.k1 4.68 -bm25.b 0.87 @@ -100,3 +112,15 @@ models: - 0.8379 R@1000: - 0.9524 + - name: bm25-tuned+rocchio + display: +Rocchio + params: -bm25 -bm25.k1 4.68 -bm25.b 0.87 -rocchio + results: + AP@1000: + - 0.2647 + RR@100: + - 0.2642 + R@100: + - 0.8448 + R@1000: + - 0.9534 diff --git a/src/main/resources/regression/msmarco-doc-segmented-docTTTTTquery.yaml b/src/main/resources/regression/msmarco-doc-segmented-docTTTTTquery.yaml index 449e03dad2..544f8e4bb3 100644 --- a/src/main/resources/regression/msmarco-doc-segmented-docTTTTTquery.yaml +++ b/src/main/resources/regression/msmarco-doc-segmented-docTTTTTquery.yaml @@ -76,6 +76,18 @@ models: - 0.8479 R@1000: - 0.9547 + - name: bm25-default+rocchio + display: +Rocchio + params: -bm25 -rocchio -hits 10000 -selectMaxPassage -selectMaxPassage.delimiter "#" -selectMaxPassage.hits 1000 + results: + AP@1000: + - 0.2846 + RR@100: + - 0.2841 + R@100: + - 0.8479 + R@1000: + - 0.9551 - name: bm25-tuned display: BM25 (tuned) params: -bm25 -bm25.k1 2.56 -bm25.b 0.59 -hits 10000 -selectMaxPassage -selectMaxPassage.delimiter "#" -selectMaxPassage.hits 1000 @@ -100,3 +112,15 @@ models: - 0.8556 R@1000: - 0.9567 + - name: bm25-tuned+rocchio + display: +Rocchio + params: -bm25 -bm25.k1 2.56 -bm25.b 0.59 -rocchio -hits 10000 -selectMaxPassage -selectMaxPassage.delimiter "#" -selectMaxPassage.hits 1000 + results: + AP@1000: + - 0.2998 + RR@100: + - 0.2994 + R@100: + - 0.8600 + R@1000: + - 0.9571 diff --git a/src/main/resources/regression/msmarco-doc.yaml b/src/main/resources/regression/msmarco-doc.yaml index 7185b2a89e..1c629e8c9c 100644 --- a/src/main/resources/regression/msmarco-doc.yaml +++ b/src/main/resources/regression/msmarco-doc.yaml @@ -100,6 +100,30 @@ models: - 0.6792 R@1000: - 0.8808 + - name: bm25-default+ax + display: +Ax + params: -bm25 -axiom -axiom.deterministic -rerankCutoff 20 + results: + AP@1000: + - 0.1146 + RR@100: + - 0.1135 + R@100: + - 0.5754 + R@1000: + - 0.8373 + - name: bm25-default+prf + display: +PRF + params: -bm25 -bm25prf + results: + AP@1000: + - 0.1357 + RR@100: + - 0.1347 + R@100: + - 0.6374 + R@1000: + - 0.8471 - name: bm25-tuned display: BM25 (tuned) params: -bm25 -bm25.k1 3.44 -bm25.b 0.87 @@ -148,6 +172,30 @@ models: - 0.7922 R@1000: - 0.9326 + - name: bm25-tuned+ax + display: +Ax + params: -bm25 -bm25.k1 3.44 -bm25.b 0.87 -axiom -axiom.deterministic -rerankCutoff 20 + results: + AP@1000: + - 0.1888 + RR@100: + - 0.1880 + R@100: + - 0.7560 + R@1000: + - 0.9268 + - name: bm25-tuned+prf + display: +PRF + params: -bm25 -bm25.k1 3.44 -bm25.b 0.87 -bm25prf + results: + AP@1000: + - 0.1559 + RR@100: + - 0.1550 + R@100: + - 0.6852 + R@1000: + - 0.8760 - name: bm25-tuned2 display: BM25 (tuned2) params: -bm25 -bm25.k1 4.46 -bm25.b 0.82 @@ -195,4 +243,28 @@ models: R@100: - 0.7863 R@1000: - - 0.9316 \ No newline at end of file + - 0.9316 + - name: bm25-tuned2+ax + display: +Ax + params: -bm25 -bm25.k1 4.46 -bm25.b 0.82 -axiom -axiom.deterministic -rerankCutoff 20 + results: + AP@1000: + - 0.1886 + RR@100: + - 0.1878 + R@100: + - 0.7526 + R@1000: + - 0.9249 + - name: bm25-tuned2+prf + display: +PRF + params: -bm25 -bm25.k1 4.46 -bm25.b 0.82 -bm25prf + results: + AP@1000: + - 0.1530 + RR@100: + - 0.1521 + R@100: + - 0.6825 + R@1000: + - 0.8766 diff --git a/src/main/resources/regression/msmarco-passage-docTTTTTquery.yaml b/src/main/resources/regression/msmarco-passage-docTTTTTquery.yaml index ae148b4f08..e11185dcf8 100644 --- a/src/main/resources/regression/msmarco-passage-docTTTTTquery.yaml +++ b/src/main/resources/regression/msmarco-passage-docTTTTTquery.yaml @@ -90,16 +90,16 @@ models: - 0.9467 - name: bm25-default+rocchio-neg display: +Rocchio* - params: -bm25 -rocchio -rocchio.useNegative + params: -bm25 -rocchio -rocchio.useNegative -rerankCutoff 1000 results: AP@1000: - - 0.2255 + - 0.2234 RR@10: - - 0.2155 + - 0.2129 R@100: - - 0.8054 + - 0.8023 R@1000: - - 0.9474 + - 0.9475 - name: bm25-tuned display: BM25 (tuned) params: -bm25 -bm25.k1 0.82 -bm25.b 0.68 @@ -138,16 +138,16 @@ models: - 0.9496 - name: bm25-tuned+rocchio-neg display: +Rocchio* - params: -bm25 -bm25.k1 0.82 -bm25.b 0.68 -rocchio -rocchio.useNegative + params: -bm25 -bm25.k1 0.82 -bm25.b 0.68 -rocchio -rocchio.useNegative -rerankCutoff 1000 results: AP@1000: - - 0.2280 + - 0.2276 RR@10: - - 0.2176 + - 0.2173 R@100: - - 0.8079 + - 0.8071 R@1000: - - 0.9497 + - 0.9493 - name: bm25-tuned2 display: BM25 (tuned2) params: -bm25 -bm25.k1 2.18 -bm25.b 0.86 @@ -186,13 +186,13 @@ models: - 0.9535 - name: bm25-tuned2+rocchio-neg display: +Rocchio* - params: -bm25 -bm25.k1 2.18 -bm25.b 0.86 -rocchio -rocchio.useNegative + params: -bm25 -bm25.k1 2.18 -bm25.b 0.86 -rocchio -rocchio.useNegative -rerankCutoff 1000 results: AP@1000: - - 0.2492 + - 0.2467 RR@10: - - 0.2400 + - 0.2373 R@100: - - 0.8265 + - 0.8227 R@1000: - - 0.9534 \ No newline at end of file + - 0.9539