Skip to content

Commit

Permalink
More regression refactoring (#1708)
Browse files Browse the repository at this point in the history
More work on #1680
  • Loading branch information
lintool committed Dec 18, 2021
1 parent c0fa772 commit 6500560
Show file tree
Hide file tree
Showing 153 changed files with 4,436 additions and 5,095 deletions.
56 changes: 30 additions & 26 deletions docs/regressions-backgroundlinking18.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,11 +9,12 @@ Note that this page is automatically generated from [this template](../src/main/
Typical indexing command:

```
nohup sh target/appassembler/bin/IndexCollection -collection WashingtonPostCollection \
-input /path/to/wapo.v2 \
-index indexes/lucene-index.wapo.v2.pos+docvectors+raw \
-generator WashingtonPostGenerator \
-threads 1 -storePositions -storeDocvectors -storeRaw \
target/appassembler/bin/IndexCollection \
-collection WashingtonPostCollection \
-input /path/to/wapo.v2 \
-index indexes/lucene-index.wapo.v2 \
-generator WashingtonPostGenerator \
-threads 1 -storePositions -storeDocvectors -storeRaw \
>& logs/log.wapo.v2 &
```

Expand All @@ -32,42 +33,45 @@ Topics and qrels are stored in [`src/main/resources/topics-and-qrels/`](../src/m
After indexing has completed, you should be able to perform retrieval as follows:

```
nohup target/appassembler/bin/SearchCollection -index indexes/lucene-index.wapo.v2.pos+docvectors+raw \
-topicreader BackgroundLinking -topics src/main/resources/topics-and-qrels/topics.backgroundlinking18.txt \
-output runs/run.wapo.v2.bm25.topics.backgroundlinking18.txt \
-backgroundlinking -backgroundlinking.k 100 -bm25 -hits 100 &
nohup target/appassembler/bin/SearchCollection -index indexes/lucene-index.wapo.v2.pos+docvectors+raw \
-topicreader BackgroundLinking -topics src/main/resources/topics-and-qrels/topics.backgroundlinking18.txt \
-output runs/run.wapo.v2.bm25+rm3.topics.backgroundlinking18.txt \
-backgroundlinking -backgroundlinking.k 100 -bm25 -rm3 -hits 100 &
nohup target/appassembler/bin/SearchCollection -index indexes/lucene-index.wapo.v2.pos+docvectors+raw \
-topicreader BackgroundLinking -topics src/main/resources/topics-and-qrels/topics.backgroundlinking18.txt \
-output runs/run.wapo.v2.bm25+rm3+df.topics.backgroundlinking18.txt \
-backgroundlinking -backgroundlinking.datefilter -backgroundlinking.k 100 -bm25 -rm3 -hits 100 &
target/appassembler/bin/SearchCollection \
-index indexes/lucene-index.wapo.v2 \
-topics src/main/resources/topics-and-qrels/topics.backgroundlinking18.txt -topicreader BackgroundLinking \
-output runs/run.wapo.v2.bm25.topics.backgroundlinking18.txt \
-backgroundlinking -backgroundlinking.k 100 -bm25 -hits 100 &
target/appassembler/bin/SearchCollection \
-index indexes/lucene-index.wapo.v2 \
-topics src/main/resources/topics-and-qrels/topics.backgroundlinking18.txt -topicreader BackgroundLinking \
-output runs/run.wapo.v2.bm25+rm3.topics.backgroundlinking18.txt \
-backgroundlinking -backgroundlinking.k 100 -bm25 -rm3 -hits 100 &
target/appassembler/bin/SearchCollection \
-index indexes/lucene-index.wapo.v2 \
-topics src/main/resources/topics-and-qrels/topics.backgroundlinking18.txt -topicreader BackgroundLinking \
-output runs/run.wapo.v2.bm25+rm3+df.topics.backgroundlinking18.txt \
-backgroundlinking -backgroundlinking.datefilter -backgroundlinking.k 100 -bm25 -rm3 -hits 100 &
```

Evaluation can be performed using `trec_eval`:

```
tools/eval/trec_eval.9.0.4/trec_eval -c -M1000 -m ndcg_cut.5 -c -M1000 -m map src/main/resources/topics-and-qrels/qrels.backgroundlinking18.txt runs/run.wapo.v2.bm25.topics.backgroundlinking18.txt
tools/eval/trec_eval.9.0.4/trec_eval -c -M1000 -m map -c -M1000 -m ndcg_cut.5 src/main/resources/topics-and-qrels/qrels.backgroundlinking18.txt runs/run.wapo.v2.bm25.topics.backgroundlinking18.txt
tools/eval/trec_eval.9.0.4/trec_eval -c -M1000 -m ndcg_cut.5 -c -M1000 -m map src/main/resources/topics-and-qrels/qrels.backgroundlinking18.txt runs/run.wapo.v2.bm25+rm3.topics.backgroundlinking18.txt
tools/eval/trec_eval.9.0.4/trec_eval -c -M1000 -m map -c -M1000 -m ndcg_cut.5 src/main/resources/topics-and-qrels/qrels.backgroundlinking18.txt runs/run.wapo.v2.bm25+rm3.topics.backgroundlinking18.txt
tools/eval/trec_eval.9.0.4/trec_eval -c -M1000 -m ndcg_cut.5 -c -M1000 -m map src/main/resources/topics-and-qrels/qrels.backgroundlinking18.txt runs/run.wapo.v2.bm25+rm3+df.topics.backgroundlinking18.txt
tools/eval/trec_eval.9.0.4/trec_eval -c -M1000 -m map -c -M1000 -m ndcg_cut.5 src/main/resources/topics-and-qrels/qrels.backgroundlinking18.txt runs/run.wapo.v2.bm25+rm3+df.topics.backgroundlinking18.txt
```

## Effectiveness

With the above commands, you should be able to reproduce the following results:

NCDG@5 | BM25 | +RM3 | +RM3+DF |
MAP | BM25 | +RM3 | +RM3+DF |
:---------------------------------------|-----------|-----------|-----------|
[TREC 2018 Topics](../src/main/resources/topics-and-qrels/topics.backgroundlinking18.txt)| 0.3293 | 0.3526 | 0.4171 |
[TREC 2018 Topics](../src/main/resources/topics-and-qrels/topics.backgroundlinking18.txt)| 0.2490 | 0.2642 | 0.2692 |


AP | BM25 | +RM3 | +RM3+DF |
nDCG@5 | BM25 | +RM3 | +RM3+DF |
:---------------------------------------|-----------|-----------|-----------|
[TREC 2018 Topics](../src/main/resources/topics-and-qrels/topics.backgroundlinking18.txt)| 0.2490 | 0.2642 | 0.2692 |
[TREC 2018 Topics](../src/main/resources/topics-and-qrels/topics.backgroundlinking18.txt)| 0.3293 | 0.3526 | 0.4171 |

56 changes: 30 additions & 26 deletions docs/regressions-backgroundlinking19.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,11 +9,12 @@ Note that this page is automatically generated from [this template](../src/main/
Typical indexing command:

```
nohup sh target/appassembler/bin/IndexCollection -collection WashingtonPostCollection \
-input /path/to/wapo.v2 \
-index indexes/lucene-index.wapo.v2.pos+docvectors+raw \
-generator WashingtonPostGenerator \
-threads 1 -storePositions -storeDocvectors -storeRaw \
target/appassembler/bin/IndexCollection \
-collection WashingtonPostCollection \
-input /path/to/wapo.v2 \
-index indexes/lucene-index.wapo.v2 \
-generator WashingtonPostGenerator \
-threads 1 -storePositions -storeDocvectors -storeRaw \
>& logs/log.wapo.v2 &
```

Expand All @@ -32,42 +33,45 @@ Topics and qrels are stored in [`src/main/resources/topics-and-qrels/`](../src/m
After indexing has completed, you should be able to perform retrieval as follows:

```
nohup target/appassembler/bin/SearchCollection -index indexes/lucene-index.wapo.v2.pos+docvectors+raw \
-topicreader BackgroundLinking -topics src/main/resources/topics-and-qrels/topics.backgroundlinking19.txt \
-output runs/run.wapo.v2.bm25.topics.backgroundlinking19.txt \
-backgroundlinking -backgroundlinking.k 100 -bm25 -hits 100 &
nohup target/appassembler/bin/SearchCollection -index indexes/lucene-index.wapo.v2.pos+docvectors+raw \
-topicreader BackgroundLinking -topics src/main/resources/topics-and-qrels/topics.backgroundlinking19.txt \
-output runs/run.wapo.v2.bm25+rm3.topics.backgroundlinking19.txt \
-backgroundlinking -backgroundlinking.k 100 -bm25 -rm3 -hits 100 &
nohup target/appassembler/bin/SearchCollection -index indexes/lucene-index.wapo.v2.pos+docvectors+raw \
-topicreader BackgroundLinking -topics src/main/resources/topics-and-qrels/topics.backgroundlinking19.txt \
-output runs/run.wapo.v2.bm25+rm3+df.topics.backgroundlinking19.txt \
-backgroundlinking -backgroundlinking.datefilter -backgroundlinking.k 100 -bm25 -rm3 -hits 100 &
target/appassembler/bin/SearchCollection \
-index indexes/lucene-index.wapo.v2 \
-topics src/main/resources/topics-and-qrels/topics.backgroundlinking19.txt -topicreader BackgroundLinking \
-output runs/run.wapo.v2.bm25.topics.backgroundlinking19.txt \
-backgroundlinking -backgroundlinking.k 100 -bm25 -hits 100 &
target/appassembler/bin/SearchCollection \
-index indexes/lucene-index.wapo.v2 \
-topics src/main/resources/topics-and-qrels/topics.backgroundlinking19.txt -topicreader BackgroundLinking \
-output runs/run.wapo.v2.bm25+rm3.topics.backgroundlinking19.txt \
-backgroundlinking -backgroundlinking.k 100 -bm25 -rm3 -hits 100 &
target/appassembler/bin/SearchCollection \
-index indexes/lucene-index.wapo.v2 \
-topics src/main/resources/topics-and-qrels/topics.backgroundlinking19.txt -topicreader BackgroundLinking \
-output runs/run.wapo.v2.bm25+rm3+df.topics.backgroundlinking19.txt \
-backgroundlinking -backgroundlinking.datefilter -backgroundlinking.k 100 -bm25 -rm3 -hits 100 &
```

Evaluation can be performed using `trec_eval`:

```
tools/eval/trec_eval.9.0.4/trec_eval -c -M1000 -m ndcg_cut.5 -c -M1000 -m map src/main/resources/topics-and-qrels/qrels.backgroundlinking19.txt runs/run.wapo.v2.bm25.topics.backgroundlinking19.txt
tools/eval/trec_eval.9.0.4/trec_eval -c -M1000 -m map -c -M1000 -m ndcg_cut.5 src/main/resources/topics-and-qrels/qrels.backgroundlinking19.txt runs/run.wapo.v2.bm25.topics.backgroundlinking19.txt
tools/eval/trec_eval.9.0.4/trec_eval -c -M1000 -m ndcg_cut.5 -c -M1000 -m map src/main/resources/topics-and-qrels/qrels.backgroundlinking19.txt runs/run.wapo.v2.bm25+rm3.topics.backgroundlinking19.txt
tools/eval/trec_eval.9.0.4/trec_eval -c -M1000 -m map -c -M1000 -m ndcg_cut.5 src/main/resources/topics-and-qrels/qrels.backgroundlinking19.txt runs/run.wapo.v2.bm25+rm3.topics.backgroundlinking19.txt
tools/eval/trec_eval.9.0.4/trec_eval -c -M1000 -m ndcg_cut.5 -c -M1000 -m map src/main/resources/topics-and-qrels/qrels.backgroundlinking19.txt runs/run.wapo.v2.bm25+rm3+df.topics.backgroundlinking19.txt
tools/eval/trec_eval.9.0.4/trec_eval -c -M1000 -m map -c -M1000 -m ndcg_cut.5 src/main/resources/topics-and-qrels/qrels.backgroundlinking19.txt runs/run.wapo.v2.bm25+rm3+df.topics.backgroundlinking19.txt
```

## Effectiveness

With the above commands, you should be able to reproduce the following results:

NCDG@5 | BM25 | +RM3 | +RM3+DF |
MAP | BM25 | +RM3 | +RM3+DF |
:---------------------------------------|-----------|-----------|-----------|
[TREC 2019 Topics](../src/main/resources/topics-and-qrels/topics.backgroundlinking19.txt)| 0.4785 | 0.5217 | 0.5051 |
[TREC 2019 Topics](../src/main/resources/topics-and-qrels/topics.backgroundlinking19.txt)| 0.3029 | 0.3786 | 0.3154 |


AP | BM25 | +RM3 | +RM3+DF |
nDCG@5 | BM25 | +RM3 | +RM3+DF |
:---------------------------------------|-----------|-----------|-----------|
[TREC 2019 Topics](../src/main/resources/topics-and-qrels/topics.backgroundlinking19.txt)| 0.3029 | 0.3786 | 0.3154 |
[TREC 2019 Topics](../src/main/resources/topics-and-qrels/topics.backgroundlinking19.txt)| 0.4785 | 0.5217 | 0.5051 |

56 changes: 30 additions & 26 deletions docs/regressions-backgroundlinking20.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,11 +9,12 @@ Note that this page is automatically generated from [this template](../src/main/
Typical indexing command:

```
nohup sh target/appassembler/bin/IndexCollection -collection WashingtonPostCollection \
-input /path/to/wapo.v3 \
-index indexes/lucene-index.wapo.v3.pos+docvectors+raw \
-generator WashingtonPostGenerator \
-threads 1 -storePositions -storeDocvectors -storeRaw \
target/appassembler/bin/IndexCollection \
-collection WashingtonPostCollection \
-input /path/to/wapo.v3 \
-index indexes/lucene-index.wapo.v3 \
-generator WashingtonPostGenerator \
-threads 1 -storePositions -storeDocvectors -storeRaw \
>& logs/log.wapo.v3 &
```

Expand All @@ -32,42 +33,45 @@ Topics and qrels are stored in [`src/main/resources/topics-and-qrels/`](../src/m
After indexing has completed, you should be able to perform retrieval as follows:

```
nohup target/appassembler/bin/SearchCollection -index indexes/lucene-index.wapo.v3.pos+docvectors+raw \
-topicreader BackgroundLinking -topics src/main/resources/topics-and-qrels/topics.backgroundlinking20.txt \
-output runs/run.wapo.v3.bm25.topics.backgroundlinking20.txt \
-backgroundlinking -backgroundlinking.k 100 -bm25 -hits 100 &
nohup target/appassembler/bin/SearchCollection -index indexes/lucene-index.wapo.v3.pos+docvectors+raw \
-topicreader BackgroundLinking -topics src/main/resources/topics-and-qrels/topics.backgroundlinking20.txt \
-output runs/run.wapo.v3.bm25+rm3.topics.backgroundlinking20.txt \
-backgroundlinking -backgroundlinking.k 100 -bm25 -rm3 -hits 100 &
nohup target/appassembler/bin/SearchCollection -index indexes/lucene-index.wapo.v3.pos+docvectors+raw \
-topicreader BackgroundLinking -topics src/main/resources/topics-and-qrels/topics.backgroundlinking20.txt \
-output runs/run.wapo.v3.bm25+rm3+df.topics.backgroundlinking20.txt \
-backgroundlinking -backgroundlinking.datefilter -backgroundlinking.k 100 -bm25 -rm3 -hits 100 &
target/appassembler/bin/SearchCollection \
-index indexes/lucene-index.wapo.v3 \
-topics src/main/resources/topics-and-qrels/topics.backgroundlinking20.txt -topicreader BackgroundLinking \
-output runs/run.wapo.v3.bm25.topics.backgroundlinking20.txt \
-backgroundlinking -backgroundlinking.k 100 -bm25 -hits 100 &
target/appassembler/bin/SearchCollection \
-index indexes/lucene-index.wapo.v3 \
-topics src/main/resources/topics-and-qrels/topics.backgroundlinking20.txt -topicreader BackgroundLinking \
-output runs/run.wapo.v3.bm25+rm3.topics.backgroundlinking20.txt \
-backgroundlinking -backgroundlinking.k 100 -bm25 -rm3 -hits 100 &
target/appassembler/bin/SearchCollection \
-index indexes/lucene-index.wapo.v3 \
-topics src/main/resources/topics-and-qrels/topics.backgroundlinking20.txt -topicreader BackgroundLinking \
-output runs/run.wapo.v3.bm25+rm3+df.topics.backgroundlinking20.txt \
-backgroundlinking -backgroundlinking.datefilter -backgroundlinking.k 100 -bm25 -rm3 -hits 100 &
```

Evaluation can be performed using `trec_eval`:

```
tools/eval/trec_eval.9.0.4/trec_eval -c -M1000 -m ndcg_cut.5 -c -M1000 -m map src/main/resources/topics-and-qrels/qrels.backgroundlinking20.txt runs/run.wapo.v3.bm25.topics.backgroundlinking20.txt
tools/eval/trec_eval.9.0.4/trec_eval -c -M1000 -m map -c -M1000 -m ndcg_cut.5 src/main/resources/topics-and-qrels/qrels.backgroundlinking20.txt runs/run.wapo.v3.bm25.topics.backgroundlinking20.txt
tools/eval/trec_eval.9.0.4/trec_eval -c -M1000 -m ndcg_cut.5 -c -M1000 -m map src/main/resources/topics-and-qrels/qrels.backgroundlinking20.txt runs/run.wapo.v3.bm25+rm3.topics.backgroundlinking20.txt
tools/eval/trec_eval.9.0.4/trec_eval -c -M1000 -m map -c -M1000 -m ndcg_cut.5 src/main/resources/topics-and-qrels/qrels.backgroundlinking20.txt runs/run.wapo.v3.bm25+rm3.topics.backgroundlinking20.txt
tools/eval/trec_eval.9.0.4/trec_eval -c -M1000 -m ndcg_cut.5 -c -M1000 -m map src/main/resources/topics-and-qrels/qrels.backgroundlinking20.txt runs/run.wapo.v3.bm25+rm3+df.topics.backgroundlinking20.txt
tools/eval/trec_eval.9.0.4/trec_eval -c -M1000 -m map -c -M1000 -m ndcg_cut.5 src/main/resources/topics-and-qrels/qrels.backgroundlinking20.txt runs/run.wapo.v3.bm25+rm3+df.topics.backgroundlinking20.txt
```

## Effectiveness

With the above commands, you should be able to reproduce the following results:

NCDG@5 | BM25 | +RM3 | +RM3+DF |
MAP | BM25 | +RM3 | +RM3+DF |
:---------------------------------------|-----------|-----------|-----------|
[TREC 2020 Topics](../src/main/resources/topics-and-qrels/topics.backgroundlinking20.txt)| 0.5231 | 0.5673 | 0.5316 |
[TREC 2020 Topics](../src/main/resources/topics-and-qrels/topics.backgroundlinking20.txt)| 0.3286 | 0.4519 | 0.3438 |


AP | BM25 | +RM3 | +RM3+DF |
nDCG@5 | BM25 | +RM3 | +RM3+DF |
:---------------------------------------|-----------|-----------|-----------|
[TREC 2020 Topics](../src/main/resources/topics-and-qrels/topics.backgroundlinking20.txt)| 0.3286 | 0.4519 | 0.3438 |
[TREC 2020 Topics](../src/main/resources/topics-and-qrels/topics.backgroundlinking20.txt)| 0.5231 | 0.5673 | 0.5316 |

Loading

0 comments on commit 6500560

Please sign in to comment.