Skip to content

Commit

Permalink
Simplify options for HNSW indexes (castorini#2533)
Browse files Browse the repository at this point in the history
HNSW regressions for MS MARCO regressions: revert to "default settings"
Continuation of castorini#2531 (which was for BEIR)
Will need to adjust score tolerance, but will circle back in a separate PR for that.
  • Loading branch information
lintool authored Jun 24, 2024
1 parent afa6ab0 commit e92370a
Show file tree
Hide file tree
Showing 146 changed files with 970 additions and 970 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -36,15 +36,15 @@ Download the corpus and unpack into `collections/`:

```bash
wget https://rgw.cs.uwaterloo.ca/pyserini/data/msmarco-passage-bge-base-en-v1.5.tar -P collections/
tar xvf collections/msmarco-passage.bge-base-en-v1.5.tar -C collections/
tar xvf collections/msmarco-passage-bge-base-en-v1.5.tar -C collections/
```

To confirm, `msmarco-passage.bge-base-en-v1.5.tar` is 59 GB and has MD5 checksum `353d2c9e72e858897ad479cca4ea0db1`.
To confirm, `msmarco-passage-bge-base-en-v1.5.tar` is 59 GB and has MD5 checksum `353d2c9e72e858897ad479cca4ea0db1`.
With the corpus downloaded, the following command will perform the remaining steps below:

```bash
python src/main/python/run_regression.py --index --verify --search --regression dl19-passage.bge-base-en-v1.5.flat-int8.cached \
--corpus-path collections/msmarco-passage.bge-base-en-v1.5
--corpus-path collections/msmarco-passage-bge-base-en-v1.5
```

## Indexing
Expand All @@ -54,14 +54,14 @@ Sample indexing command, building quantized flat indexes:
```bash
bin/run.sh io.anserini.index.IndexCollection \
-collection JsonDenseVectorCollection \
-input /path/to/msmarco-passage.bge-base-en-v1.5 \
-input /path/to/msmarco-passage-bge-base-en-v1.5 \
-generator DenseVectorDocumentGenerator \
-index indexes/lucene-flat-int8.msmarco-v1-passage.bge-base-en-v1.5/ \
-threads 16 -quantize.int8 \
>& logs/log.msmarco-passage.bge-base-en-v1.5 &
>& logs/log.msmarco-passage-bge-base-en-v1.5 &
```

The path `/path/to/msmarco-passage.bge-base-en-v1.5/` should point to the corpus downloaded above.
The path `/path/to/msmarco-passage-bge-base-en-v1.5/` should point to the corpus downloaded above.
Upon completion, we should have an index with 8,841,823 documents.

## Retrieval
Expand All @@ -77,17 +77,17 @@ bin/run.sh io.anserini.search.SearchCollection \
-index indexes/lucene-flat-int8.msmarco-v1-passage.bge-base-en-v1.5/ \
-topics tools/topics-and-qrels/topics.dl19-passage.bge-base-en-v1.5.jsonl.gz \
-topicReader JsonIntVector \
-output runs/run.msmarco-passage.bge-base-en-v1.5.bge-flat-int8-cached.topics.dl19-passage.bge-base-en-v1.5.jsonl.txt \
-output runs/run.msmarco-passage-bge-base-en-v1.5.bge-flat-int8-cached.topics.dl19-passage.bge-base-en-v1.5.jsonl.txt \
-generator VectorQueryGenerator -topicField vector -threads 16 -hits 1000 &
```

Evaluation can be performed using `trec_eval`:

```bash
bin/trec_eval -m map -c -l 2 tools/topics-and-qrels/qrels.dl19-passage.txt runs/run.msmarco-passage.bge-base-en-v1.5.bge-flat-int8-cached.topics.dl19-passage.bge-base-en-v1.5.jsonl.txt
bin/trec_eval -m ndcg_cut.10 -c tools/topics-and-qrels/qrels.dl19-passage.txt runs/run.msmarco-passage.bge-base-en-v1.5.bge-flat-int8-cached.topics.dl19-passage.bge-base-en-v1.5.jsonl.txt
bin/trec_eval -m recall.100 -c -l 2 tools/topics-and-qrels/qrels.dl19-passage.txt runs/run.msmarco-passage.bge-base-en-v1.5.bge-flat-int8-cached.topics.dl19-passage.bge-base-en-v1.5.jsonl.txt
bin/trec_eval -m recall.1000 -c -l 2 tools/topics-and-qrels/qrels.dl19-passage.txt runs/run.msmarco-passage.bge-base-en-v1.5.bge-flat-int8-cached.topics.dl19-passage.bge-base-en-v1.5.jsonl.txt
bin/trec_eval -m map -c -l 2 tools/topics-and-qrels/qrels.dl19-passage.txt runs/run.msmarco-passage-bge-base-en-v1.5.bge-flat-int8-cached.topics.dl19-passage.bge-base-en-v1.5.jsonl.txt
bin/trec_eval -m ndcg_cut.10 -c tools/topics-and-qrels/qrels.dl19-passage.txt runs/run.msmarco-passage-bge-base-en-v1.5.bge-flat-int8-cached.topics.dl19-passage.bge-base-en-v1.5.jsonl.txt
bin/trec_eval -m recall.100 -c -l 2 tools/topics-and-qrels/qrels.dl19-passage.txt runs/run.msmarco-passage-bge-base-en-v1.5.bge-flat-int8-cached.topics.dl19-passage.bge-base-en-v1.5.jsonl.txt
bin/trec_eval -m recall.1000 -c -l 2 tools/topics-and-qrels/qrels.dl19-passage.txt runs/run.msmarco-passage-bge-base-en-v1.5.bge-flat-int8-cached.topics.dl19-passage.bge-base-en-v1.5.jsonl.txt
```

## Effectiveness
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -36,15 +36,15 @@ Download the corpus and unpack into `collections/`:

```bash
wget https://rgw.cs.uwaterloo.ca/pyserini/data/msmarco-passage-bge-base-en-v1.5.tar -P collections/
tar xvf collections/msmarco-passage.bge-base-en-v1.5.tar -C collections/
tar xvf collections/msmarco-passage-bge-base-en-v1.5.tar -C collections/
```

To confirm, `msmarco-passage.bge-base-en-v1.5.tar` is 59 GB and has MD5 checksum `353d2c9e72e858897ad479cca4ea0db1`.
To confirm, `msmarco-passage-bge-base-en-v1.5.tar` is 59 GB and has MD5 checksum `353d2c9e72e858897ad479cca4ea0db1`.
With the corpus downloaded, the following command will perform the remaining steps below:

```bash
python src/main/python/run_regression.py --index --verify --search --regression dl19-passage.bge-base-en-v1.5.flat-int8.onnx \
--corpus-path collections/msmarco-passage.bge-base-en-v1.5
--corpus-path collections/msmarco-passage-bge-base-en-v1.5
```

## Indexing
Expand All @@ -54,14 +54,14 @@ Sample indexing command, building quantized flat indexes:
```bash
bin/run.sh io.anserini.index.IndexCollection \
-collection JsonDenseVectorCollection \
-input /path/to/msmarco-passage.bge-base-en-v1.5 \
-input /path/to/msmarco-passage-bge-base-en-v1.5 \
-generator DenseVectorDocumentGenerator \
-index indexes/lucene-flat-int8.msmarco-v1-passage.bge-base-en-v1.5/ \
-threads 16 -quantize.int8 \
>& logs/log.msmarco-passage.bge-base-en-v1.5 &
>& logs/log.msmarco-passage-bge-base-en-v1.5 &
```

The path `/path/to/msmarco-passage.bge-base-en-v1.5/` should point to the corpus downloaded above.
The path `/path/to/msmarco-passage-bge-base-en-v1.5/` should point to the corpus downloaded above.
Upon completion, we should have an index with 8,841,823 documents.

## Retrieval
Expand All @@ -77,17 +77,17 @@ bin/run.sh io.anserini.search.SearchCollection \
-index indexes/lucene-flat-int8.msmarco-v1-passage.bge-base-en-v1.5/ \
-topics tools/topics-and-qrels/topics.dl19-passage.txt \
-topicReader TsvInt \
-output runs/run.msmarco-passage.bge-base-en-v1.5.bge-flat-int8-onnx.topics.dl19-passage.txt \
-output runs/run.msmarco-passage-bge-base-en-v1.5.bge-flat-int8-onnx.topics.dl19-passage.txt \
-generator VectorQueryGenerator -topicField title -threads 16 -hits 1000 -encoder BgeBaseEn15 &
```

Evaluation can be performed using `trec_eval`:

```bash
bin/trec_eval -m map -c -l 2 tools/topics-and-qrels/qrels.dl19-passage.txt runs/run.msmarco-passage.bge-base-en-v1.5.bge-flat-int8-onnx.topics.dl19-passage.txt
bin/trec_eval -m ndcg_cut.10 -c tools/topics-and-qrels/qrels.dl19-passage.txt runs/run.msmarco-passage.bge-base-en-v1.5.bge-flat-int8-onnx.topics.dl19-passage.txt
bin/trec_eval -m recall.100 -c -l 2 tools/topics-and-qrels/qrels.dl19-passage.txt runs/run.msmarco-passage.bge-base-en-v1.5.bge-flat-int8-onnx.topics.dl19-passage.txt
bin/trec_eval -m recall.1000 -c -l 2 tools/topics-and-qrels/qrels.dl19-passage.txt runs/run.msmarco-passage.bge-base-en-v1.5.bge-flat-int8-onnx.topics.dl19-passage.txt
bin/trec_eval -m map -c -l 2 tools/topics-and-qrels/qrels.dl19-passage.txt runs/run.msmarco-passage-bge-base-en-v1.5.bge-flat-int8-onnx.topics.dl19-passage.txt
bin/trec_eval -m ndcg_cut.10 -c tools/topics-and-qrels/qrels.dl19-passage.txt runs/run.msmarco-passage-bge-base-en-v1.5.bge-flat-int8-onnx.topics.dl19-passage.txt
bin/trec_eval -m recall.100 -c -l 2 tools/topics-and-qrels/qrels.dl19-passage.txt runs/run.msmarco-passage-bge-base-en-v1.5.bge-flat-int8-onnx.topics.dl19-passage.txt
bin/trec_eval -m recall.1000 -c -l 2 tools/topics-and-qrels/qrels.dl19-passage.txt runs/run.msmarco-passage-bge-base-en-v1.5.bge-flat-int8-onnx.topics.dl19-passage.txt
```

## Effectiveness
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -36,15 +36,15 @@ Download the corpus and unpack into `collections/`:

```bash
wget https://rgw.cs.uwaterloo.ca/pyserini/data/msmarco-passage-bge-base-en-v1.5.tar -P collections/
tar xvf collections/msmarco-passage.bge-base-en-v1.5.tar -C collections/
tar xvf collections/msmarco-passage-bge-base-en-v1.5.tar -C collections/
```

To confirm, `msmarco-passage.bge-base-en-v1.5.tar` is 59 GB and has MD5 checksum `353d2c9e72e858897ad479cca4ea0db1`.
To confirm, `msmarco-passage-bge-base-en-v1.5.tar` is 59 GB and has MD5 checksum `353d2c9e72e858897ad479cca4ea0db1`.
With the corpus downloaded, the following command will perform the remaining steps below:

```bash
python src/main/python/run_regression.py --index --verify --search --regression dl19-passage.bge-base-en-v1.5.flat.cached \
--corpus-path collections/msmarco-passage.bge-base-en-v1.5
--corpus-path collections/msmarco-passage-bge-base-en-v1.5
```

## Indexing
Expand All @@ -54,14 +54,14 @@ Sample indexing command, building flat indexes:
```bash
bin/run.sh io.anserini.index.IndexCollection \
-collection JsonDenseVectorCollection \
-input /path/to/msmarco-passage.bge-base-en-v1.5 \
-input /path/to/msmarco-passage-bge-base-en-v1.5 \
-generator DenseVectorDocumentGenerator \
-index indexes/lucene-flat.msmarco-v1-passage.bge-base-en-v1.5/ \
-threads 16 \
>& logs/log.msmarco-passage.bge-base-en-v1.5 &
>& logs/log.msmarco-passage-bge-base-en-v1.5 &
```

The path `/path/to/msmarco-passage.bge-base-en-v1.5/` should point to the corpus downloaded above.
The path `/path/to/msmarco-passage-bge-base-en-v1.5/` should point to the corpus downloaded above.
Upon completion, we should have an index with 8,841,823 documents.

## Retrieval
Expand All @@ -77,17 +77,17 @@ bin/run.sh io.anserini.search.SearchCollection \
-index indexes/lucene-flat.msmarco-v1-passage.bge-base-en-v1.5/ \
-topics tools/topics-and-qrels/topics.dl19-passage.bge-base-en-v1.5.jsonl.gz \
-topicReader JsonIntVector \
-output runs/run.msmarco-passage.bge-base-en-v1.5.bge-flat-cached.topics.dl19-passage.bge-base-en-v1.5.jsonl.txt \
-output runs/run.msmarco-passage-bge-base-en-v1.5.bge-flat-cached.topics.dl19-passage.bge-base-en-v1.5.jsonl.txt \
-generator VectorQueryGenerator -topicField vector -threads 16 -hits 1000 &
```

Evaluation can be performed using `trec_eval`:

```bash
bin/trec_eval -m map -c -l 2 tools/topics-and-qrels/qrels.dl19-passage.txt runs/run.msmarco-passage.bge-base-en-v1.5.bge-flat-cached.topics.dl19-passage.bge-base-en-v1.5.jsonl.txt
bin/trec_eval -m ndcg_cut.10 -c tools/topics-and-qrels/qrels.dl19-passage.txt runs/run.msmarco-passage.bge-base-en-v1.5.bge-flat-cached.topics.dl19-passage.bge-base-en-v1.5.jsonl.txt
bin/trec_eval -m recall.100 -c -l 2 tools/topics-and-qrels/qrels.dl19-passage.txt runs/run.msmarco-passage.bge-base-en-v1.5.bge-flat-cached.topics.dl19-passage.bge-base-en-v1.5.jsonl.txt
bin/trec_eval -m recall.1000 -c -l 2 tools/topics-and-qrels/qrels.dl19-passage.txt runs/run.msmarco-passage.bge-base-en-v1.5.bge-flat-cached.topics.dl19-passage.bge-base-en-v1.5.jsonl.txt
bin/trec_eval -m map -c -l 2 tools/topics-and-qrels/qrels.dl19-passage.txt runs/run.msmarco-passage-bge-base-en-v1.5.bge-flat-cached.topics.dl19-passage.bge-base-en-v1.5.jsonl.txt
bin/trec_eval -m ndcg_cut.10 -c tools/topics-and-qrels/qrels.dl19-passage.txt runs/run.msmarco-passage-bge-base-en-v1.5.bge-flat-cached.topics.dl19-passage.bge-base-en-v1.5.jsonl.txt
bin/trec_eval -m recall.100 -c -l 2 tools/topics-and-qrels/qrels.dl19-passage.txt runs/run.msmarco-passage-bge-base-en-v1.5.bge-flat-cached.topics.dl19-passage.bge-base-en-v1.5.jsonl.txt
bin/trec_eval -m recall.1000 -c -l 2 tools/topics-and-qrels/qrels.dl19-passage.txt runs/run.msmarco-passage-bge-base-en-v1.5.bge-flat-cached.topics.dl19-passage.bge-base-en-v1.5.jsonl.txt
```

## Effectiveness
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -36,15 +36,15 @@ Download the corpus and unpack into `collections/`:

```bash
wget https://rgw.cs.uwaterloo.ca/pyserini/data/msmarco-passage-bge-base-en-v1.5.tar -P collections/
tar xvf collections/msmarco-passage.bge-base-en-v1.5.tar -C collections/
tar xvf collections/msmarco-passage-bge-base-en-v1.5.tar -C collections/
```

To confirm, `msmarco-passage.bge-base-en-v1.5.tar` is 59 GB and has MD5 checksum `353d2c9e72e858897ad479cca4ea0db1`.
To confirm, `msmarco-passage-bge-base-en-v1.5.tar` is 59 GB and has MD5 checksum `353d2c9e72e858897ad479cca4ea0db1`.
With the corpus downloaded, the following command will perform the remaining steps below:

```bash
python src/main/python/run_regression.py --index --verify --search --regression dl19-passage.bge-base-en-v1.5.flat.onnx \
--corpus-path collections/msmarco-passage.bge-base-en-v1.5
--corpus-path collections/msmarco-passage-bge-base-en-v1.5
```

## Indexing
Expand All @@ -54,14 +54,14 @@ Sample indexing command, building flat indexes:
```bash
bin/run.sh io.anserini.index.IndexCollection \
-collection JsonDenseVectorCollection \
-input /path/to/msmarco-passage.bge-base-en-v1.5 \
-input /path/to/msmarco-passage-bge-base-en-v1.5 \
-generator DenseVectorDocumentGenerator \
-index indexes/lucene-flat.msmarco-v1-passage.bge-base-en-v1.5/ \
-threads 16 \
>& logs/log.msmarco-passage.bge-base-en-v1.5 &
>& logs/log.msmarco-passage-bge-base-en-v1.5 &
```

The path `/path/to/msmarco-passage.bge-base-en-v1.5/` should point to the corpus downloaded above.
The path `/path/to/msmarco-passage-bge-base-en-v1.5/` should point to the corpus downloaded above.
Upon completion, we should have an index with 8,841,823 documents.

## Retrieval
Expand All @@ -77,17 +77,17 @@ bin/run.sh io.anserini.search.SearchCollection \
-index indexes/lucene-flat.msmarco-v1-passage.bge-base-en-v1.5/ \
-topics tools/topics-and-qrels/topics.dl19-passage.txt \
-topicReader TsvInt \
-output runs/run.msmarco-passage.bge-base-en-v1.5.bge-flat-onnx.topics.dl19-passage.txt \
-output runs/run.msmarco-passage-bge-base-en-v1.5.bge-flat-onnx.topics.dl19-passage.txt \
-generator VectorQueryGenerator -topicField title -threads 16 -hits 1000 -encoder BgeBaseEn15 &
```

Evaluation can be performed using `trec_eval`:

```bash
bin/trec_eval -m map -c -l 2 tools/topics-and-qrels/qrels.dl19-passage.txt runs/run.msmarco-passage.bge-base-en-v1.5.bge-flat-onnx.topics.dl19-passage.txt
bin/trec_eval -m ndcg_cut.10 -c tools/topics-and-qrels/qrels.dl19-passage.txt runs/run.msmarco-passage.bge-base-en-v1.5.bge-flat-onnx.topics.dl19-passage.txt
bin/trec_eval -m recall.100 -c -l 2 tools/topics-and-qrels/qrels.dl19-passage.txt runs/run.msmarco-passage.bge-base-en-v1.5.bge-flat-onnx.topics.dl19-passage.txt
bin/trec_eval -m recall.1000 -c -l 2 tools/topics-and-qrels/qrels.dl19-passage.txt runs/run.msmarco-passage.bge-base-en-v1.5.bge-flat-onnx.topics.dl19-passage.txt
bin/trec_eval -m map -c -l 2 tools/topics-and-qrels/qrels.dl19-passage.txt runs/run.msmarco-passage-bge-base-en-v1.5.bge-flat-onnx.topics.dl19-passage.txt
bin/trec_eval -m ndcg_cut.10 -c tools/topics-and-qrels/qrels.dl19-passage.txt runs/run.msmarco-passage-bge-base-en-v1.5.bge-flat-onnx.topics.dl19-passage.txt
bin/trec_eval -m recall.100 -c -l 2 tools/topics-and-qrels/qrels.dl19-passage.txt runs/run.msmarco-passage-bge-base-en-v1.5.bge-flat-onnx.topics.dl19-passage.txt
bin/trec_eval -m recall.1000 -c -l 2 tools/topics-and-qrels/qrels.dl19-passage.txt runs/run.msmarco-passage-bge-base-en-v1.5.bge-flat-onnx.topics.dl19-passage.txt
```

## Effectiveness
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ bin/run.sh io.anserini.index.IndexHnswDenseVectors \
-input /path/to/msmarco-passage-bge-base-en-v1.5 \
-generator DenseVectorDocumentGenerator \
-index indexes/lucene-hnsw-int8.msmarco-v1-passage.bge-base-en-v1.5/ \
-threads 16 -M 16 -efC 100 -memoryBuffer 65536 -noMerge -quantize.int8 \
-threads 16 -M 16 -efC 100 -quantize.int8 \
>& logs/log.msmarco-passage-bge-base-en-v1.5 &
```

Expand All @@ -82,17 +82,17 @@ bin/run.sh io.anserini.search.SearchHnswDenseVectors \
-index indexes/lucene-hnsw-int8.msmarco-v1-passage.bge-base-en-v1.5/ \
-topics tools/topics-and-qrels/topics.dl19-passage.bge-base-en-v1.5.jsonl.gz \
-topicReader JsonIntVector \
-output runs/run.msmarco-passage-bge-base-en-v1.5.bge-hnsw-cached.topics.dl19-passage.bge-base-en-v1.5.jsonl.txt \
-output runs/run.msmarco-passage-bge-base-en-v1.5.bge-hnsw-int8-cached.topics.dl19-passage.bge-base-en-v1.5.jsonl.txt \
-generator VectorQueryGenerator -topicField vector -threads 16 -hits 1000 -efSearch 1000 &
```

Evaluation can be performed using `trec_eval`:

```bash
bin/trec_eval -m map -c -l 2 tools/topics-and-qrels/qrels.dl19-passage.txt runs/run.msmarco-passage-bge-base-en-v1.5.bge-hnsw-cached.topics.dl19-passage.bge-base-en-v1.5.jsonl.txt
bin/trec_eval -m ndcg_cut.10 -c tools/topics-and-qrels/qrels.dl19-passage.txt runs/run.msmarco-passage-bge-base-en-v1.5.bge-hnsw-cached.topics.dl19-passage.bge-base-en-v1.5.jsonl.txt
bin/trec_eval -m recall.100 -c -l 2 tools/topics-and-qrels/qrels.dl19-passage.txt runs/run.msmarco-passage-bge-base-en-v1.5.bge-hnsw-cached.topics.dl19-passage.bge-base-en-v1.5.jsonl.txt
bin/trec_eval -m recall.1000 -c -l 2 tools/topics-and-qrels/qrels.dl19-passage.txt runs/run.msmarco-passage-bge-base-en-v1.5.bge-hnsw-cached.topics.dl19-passage.bge-base-en-v1.5.jsonl.txt
bin/trec_eval -m map -c -l 2 tools/topics-and-qrels/qrels.dl19-passage.txt runs/run.msmarco-passage-bge-base-en-v1.5.bge-hnsw-int8-cached.topics.dl19-passage.bge-base-en-v1.5.jsonl.txt
bin/trec_eval -m ndcg_cut.10 -c tools/topics-and-qrels/qrels.dl19-passage.txt runs/run.msmarco-passage-bge-base-en-v1.5.bge-hnsw-int8-cached.topics.dl19-passage.bge-base-en-v1.5.jsonl.txt
bin/trec_eval -m recall.100 -c -l 2 tools/topics-and-qrels/qrels.dl19-passage.txt runs/run.msmarco-passage-bge-base-en-v1.5.bge-hnsw-int8-cached.topics.dl19-passage.bge-base-en-v1.5.jsonl.txt
bin/trec_eval -m recall.1000 -c -l 2 tools/topics-and-qrels/qrels.dl19-passage.txt runs/run.msmarco-passage-bge-base-en-v1.5.bge-hnsw-int8-cached.topics.dl19-passage.bge-base-en-v1.5.jsonl.txt
```

## Effectiveness
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ bin/run.sh io.anserini.index.IndexHnswDenseVectors \
-input /path/to/msmarco-passage-bge-base-en-v1.5 \
-generator DenseVectorDocumentGenerator \
-index indexes/lucene-hnsw-int8.msmarco-v1-passage.bge-base-en-v1.5/ \
-threads 16 -M 16 -efC 100 -memoryBuffer 65536 -noMerge -quantize.int8 \
-threads 16 -M 16 -efC 100 -quantize.int8 \
>& logs/log.msmarco-passage-bge-base-en-v1.5 &
```

Expand All @@ -82,17 +82,17 @@ bin/run.sh io.anserini.search.SearchHnswDenseVectors \
-index indexes/lucene-hnsw-int8.msmarco-v1-passage.bge-base-en-v1.5/ \
-topics tools/topics-and-qrels/topics.dl19-passage.txt \
-topicReader TsvInt \
-output runs/run.msmarco-passage-bge-base-en-v1.5.bge-hnsw-onnx.topics.dl19-passage.txt \
-output runs/run.msmarco-passage-bge-base-en-v1.5.bge-hnsw-int8-onnx.topics.dl19-passage.txt \
-generator VectorQueryGenerator -topicField title -threads 16 -hits 1000 -efSearch 1000 -encoder BgeBaseEn15 &
```

Evaluation can be performed using `trec_eval`:

```bash
bin/trec_eval -m map -c -l 2 tools/topics-and-qrels/qrels.dl19-passage.txt runs/run.msmarco-passage-bge-base-en-v1.5.bge-hnsw-onnx.topics.dl19-passage.txt
bin/trec_eval -m ndcg_cut.10 -c tools/topics-and-qrels/qrels.dl19-passage.txt runs/run.msmarco-passage-bge-base-en-v1.5.bge-hnsw-onnx.topics.dl19-passage.txt
bin/trec_eval -m recall.100 -c -l 2 tools/topics-and-qrels/qrels.dl19-passage.txt runs/run.msmarco-passage-bge-base-en-v1.5.bge-hnsw-onnx.topics.dl19-passage.txt
bin/trec_eval -m recall.1000 -c -l 2 tools/topics-and-qrels/qrels.dl19-passage.txt runs/run.msmarco-passage-bge-base-en-v1.5.bge-hnsw-onnx.topics.dl19-passage.txt
bin/trec_eval -m map -c -l 2 tools/topics-and-qrels/qrels.dl19-passage.txt runs/run.msmarco-passage-bge-base-en-v1.5.bge-hnsw-int8-onnx.topics.dl19-passage.txt
bin/trec_eval -m ndcg_cut.10 -c tools/topics-and-qrels/qrels.dl19-passage.txt runs/run.msmarco-passage-bge-base-en-v1.5.bge-hnsw-int8-onnx.topics.dl19-passage.txt
bin/trec_eval -m recall.100 -c -l 2 tools/topics-and-qrels/qrels.dl19-passage.txt runs/run.msmarco-passage-bge-base-en-v1.5.bge-hnsw-int8-onnx.topics.dl19-passage.txt
bin/trec_eval -m recall.1000 -c -l 2 tools/topics-and-qrels/qrels.dl19-passage.txt runs/run.msmarco-passage-bge-base-en-v1.5.bge-hnsw-int8-onnx.topics.dl19-passage.txt
```

## Effectiveness
Expand Down
Loading

0 comments on commit e92370a

Please sign in to comment.