Skip to content

Commit

Permalink
Update BEIR scores using BGE w/ ONNX (#2388)
Browse files Browse the repository at this point in the history
+ Tweaked scores by averaging over 4 trials
+ Added documentation for Cohere DL19/20
  • Loading branch information
lintool committed Feb 22, 2024
1 parent 2ab4619 commit f0b37dd
Show file tree
Hide file tree
Showing 89 changed files with 160 additions and 151 deletions.
4 changes: 4 additions & 0 deletions docs/regressions.md
Original file line number Diff line number Diff line change
Expand Up @@ -95,6 +95,8 @@ nohup python src/main/python/run_regression.py --index --verify --search --regre
nohup python src/main/python/run_regression.py --index --verify --search --regression dl19-passage-bge-base-en-v1.5-hnsw >& logs/log.dl19-passage-bge-base-en-v1.5-hnsw &
nohup python src/main/python/run_regression.py --index --verify --search --regression dl19-passage-bge-base-en-v1.5-hnsw-int8 >& logs/log.dl19-passage-bge-base-en-v1.5-hnsw-int8 &
nohup python src/main/python/run_regression.py --index --verify --search --regression dl19-passage-openai-ada2 >& logs/log.dl19-passage-openai-ada2 &
nohup python src/main/python/run_regression.py --index --verify --search --regression dl19-passage-cohere-embed-english-v3-hnsw >& logs/log.dl19-passage-cohere-embed-english-v3-hnsw &
nohup python src/main/python/run_regression.py --index --verify --search --regression dl19-passage-cohere-embed-english-v3-hnsw-int8 >& logs/log.dl19-passage-cohere-embed-english-v3-hnsw-int8 &

nohup python src/main/python/run_regression.py --index --verify --search --regression dl19-passage-splade-pp-ed-onnx >& logs/log.dl19-passage-splade-pp-ed-onnx &
nohup python src/main/python/run_regression.py --index --verify --search --regression dl19-passage-splade-pp-sd-onnx >& logs/log.dl19-passage-splade-pp-sd-onnx &
Expand Down Expand Up @@ -132,6 +134,8 @@ nohup python src/main/python/run_regression.py --index --verify --search --regre
nohup python src/main/python/run_regression.py --index --verify --search --regression dl20-passage-bge-base-en-v1.5-hnsw >& logs/log.dl20-passage-bge-base-en-v1.5-hnsw &
nohup python src/main/python/run_regression.py --index --verify --search --regression dl20-passage-bge-base-en-v1.5-hnsw-int8 >& logs/log.dl20-passage-bge-base-en-v1.5-hnsw-int8 &
nohup python src/main/python/run_regression.py --index --verify --search --regression dl20-passage-openai-ada2 >& logs/log.dl20-passage-openai-ada2 &
nohup python src/main/python/run_regression.py --index --verify --search --regression dl20-passage-cohere-embed-english-v3-hnsw >& logs/log.dl20-passage-cohere-embed-english-v3-hnsw &
nohup python src/main/python/run_regression.py --index --verify --search --regression dl20-passage-cohere-embed-english-v3-hnsw-int8 >& logs/log.dl20-passage-cohere-embed-english-v3-hnsw-int8 &

nohup python src/main/python/run_regression.py --index --verify --search --regression dl20-passage-splade-pp-ed-onnx >& logs/log.dl20-passage-splade-pp-ed-onnx &
nohup python src/main/python/run_regression.py --index --verify --search --regression dl20-passage-splade-pp-sd-onnx >& logs/log.dl20-passage-splade-pp-sd-onnx &
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -74,11 +74,11 @@ With the above commands, you should be able to reproduce the following results:

| **nDCG@10** | **BGE-base-en-v1.5**|
|:-------------------------------------------------------------------------------------------------------------|-----------|
| BEIR (v1.0.0): ArguAna | 0.635 |
| BEIR (v1.0.0): ArguAna | 0.621 |
| **R@100** | **BGE-base-en-v1.5**|
| BEIR (v1.0.0): ArguAna | 0.991 |
| BEIR (v1.0.0): ArguAna | 0.971 |
| **R@1000** | **BGE-base-en-v1.5**|
| BEIR (v1.0.0): ArguAna | 0.996 |
| BEIR (v1.0.0): ArguAna | 0.994 |

Note that due to the non-deterministic nature of HNSW indexing, results may differ slightly between each experimental run.
Nevertheless, scores are generally within 0.005 of the reference values recorded in [our YAML configuration file](../../src/main/resources/regression/beir-v1.0.0-arguana-bge-base-en-v1.5-hnsw-int8-onnx.yaml).
Original file line number Diff line number Diff line change
Expand Up @@ -74,11 +74,11 @@ With the above commands, you should be able to reproduce the following results:

| **nDCG@10** | **BGE-base-en-v1.5**|
|:-------------------------------------------------------------------------------------------------------------|-----------|
| BEIR (v1.0.0): ArguAna | 0.636 |
| BEIR (v1.0.0): ArguAna | 0.623 |
| **R@100** | **BGE-base-en-v1.5**|
| BEIR (v1.0.0): ArguAna | 0.992 |
| BEIR (v1.0.0): ArguAna | 0.972 |
| **R@1000** | **BGE-base-en-v1.5**|
| BEIR (v1.0.0): ArguAna | 0.996 |
| BEIR (v1.0.0): ArguAna | 0.993 |

Note that due to the non-deterministic nature of HNSW indexing, results may differ slightly between each experimental run.
Nevertheless, scores are generally within 0.005 of the reference values recorded in [our YAML configuration file](../../src/main/resources/regression/beir-v1.0.0-arguana-bge-base-en-v1.5-hnsw-onnx.yaml).
Original file line number Diff line number Diff line change
Expand Up @@ -74,11 +74,11 @@ With the above commands, you should be able to reproduce the following results:

| **nDCG@10** | **BGE-base-en-v1.5**|
|:-------------------------------------------------------------------------------------------------------------|-----------|
| BEIR (v1.0.0): BioASQ | 0.407 |
| BEIR (v1.0.0): BioASQ | 0.408 |
| **R@100** | **BGE-base-en-v1.5**|
| BEIR (v1.0.0): BioASQ | 0.624 |
| **R@1000** | **BGE-base-en-v1.5**|
| BEIR (v1.0.0): BioASQ | 0.795 |
| BEIR (v1.0.0): BioASQ | 0.797 |

Note that due to the non-deterministic nature of HNSW indexing, results may differ slightly between each experimental run.
Nevertheless, scores are generally within 0.005 of the reference values recorded in [our YAML configuration file](../../src/main/resources/regression/beir-v1.0.0-bioasq-bge-base-en-v1.5-hnsw-int8-onnx.yaml).
Original file line number Diff line number Diff line change
Expand Up @@ -74,11 +74,11 @@ With the above commands, you should be able to reproduce the following results:

| **nDCG@10** | **BGE-base-en-v1.5**|
|:-------------------------------------------------------------------------------------------------------------|-----------|
| BEIR (v1.0.0): BioASQ | 0.410 |
| BEIR (v1.0.0): BioASQ | 0.414 |
| **R@100** | **BGE-base-en-v1.5**|
| BEIR (v1.0.0): BioASQ | 0.622 |
| BEIR (v1.0.0): BioASQ | 0.628 |
| **R@1000** | **BGE-base-en-v1.5**|
| BEIR (v1.0.0): BioASQ | 0.794 |
| BEIR (v1.0.0): BioASQ | 0.802 |

Note that due to the non-deterministic nature of HNSW indexing, results may differ slightly between each experimental run.
Nevertheless, scores are generally within 0.005 of the reference values recorded in [our YAML configuration file](../../src/main/resources/regression/beir-v1.0.0-bioasq-bge-base-en-v1.5-hnsw-onnx.yaml).
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,7 @@ With the above commands, you should be able to reproduce the following results:

| **nDCG@10** | **BGE-base-en-v1.5**|
|:-------------------------------------------------------------------------------------------------------------|-----------|
| BEIR (v1.0.0): Climate-FEVER | 0.309 |
| BEIR (v1.0.0): Climate-FEVER | 0.308 |
| **R@100** | **BGE-base-en-v1.5**|
| BEIR (v1.0.0): Climate-FEVER | 0.633 |
| **R@1000** | **BGE-base-en-v1.5**|
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -76,9 +76,9 @@ With the above commands, you should be able to reproduce the following results:
|:-------------------------------------------------------------------------------------------------------------|-----------|
| BEIR (v1.0.0): Climate-FEVER | 0.312 |
| **R@100** | **BGE-base-en-v1.5**|
| BEIR (v1.0.0): Climate-FEVER | 0.636 |
| BEIR (v1.0.0): Climate-FEVER | 0.635 |
| **R@1000** | **BGE-base-en-v1.5**|
| BEIR (v1.0.0): Climate-FEVER | 0.829 |
| BEIR (v1.0.0): Climate-FEVER | 0.830 |

Note that due to the non-deterministic nature of HNSW indexing, results may differ slightly between each experimental run.
Nevertheless, scores are generally within 0.005 of the reference values recorded in [our YAML configuration file](../../src/main/resources/regression/beir-v1.0.0-climate-fever-bge-base-en-v1.5-hnsw-onnx.yaml).
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,7 @@ With the above commands, you should be able to reproduce the following results:
|:-------------------------------------------------------------------------------------------------------------|-----------|
| BEIR (v1.0.0): CQADupStack-android | 0.509 |
| **R@100** | **BGE-base-en-v1.5**|
| BEIR (v1.0.0): CQADupStack-android | 0.844 |
| BEIR (v1.0.0): CQADupStack-android | 0.843 |
| **R@1000** | **BGE-base-en-v1.5**|
| BEIR (v1.0.0): CQADupStack-android | 0.962 |

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,7 @@ With the above commands, you should be able to reproduce the following results:

| **nDCG@10** | **BGE-base-en-v1.5**|
|:-------------------------------------------------------------------------------------------------------------|-----------|
| BEIR (v1.0.0): CQADupStack-android | 0.507 |
| BEIR (v1.0.0): CQADupStack-android | 0.508 |
| **R@100** | **BGE-base-en-v1.5**|
| BEIR (v1.0.0): CQADupStack-android | 0.845 |
| **R@1000** | **BGE-base-en-v1.5**|
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -78,7 +78,7 @@ With the above commands, you should be able to reproduce the following results:
| **R@100** | **BGE-base-en-v1.5**|
| BEIR (v1.0.0): CQADupStack-english | 0.756 |
| **R@1000** | **BGE-base-en-v1.5**|
| BEIR (v1.0.0): CQADupStack-english | 0.883 |
| BEIR (v1.0.0): CQADupStack-english | 0.882 |

Note that due to the non-deterministic nature of HNSW indexing, results may differ slightly between each experimental run.
Nevertheless, scores are generally within 0.005 of the reference values recorded in [our YAML configuration file](../../src/main/resources/regression/beir-v1.0.0-cqadupstack-english-bge-base-en-v1.5-hnsw-int8-onnx.yaml).
Original file line number Diff line number Diff line change
Expand Up @@ -74,11 +74,11 @@ With the above commands, you should be able to reproduce the following results:

| **nDCG@10** | **BGE-base-en-v1.5**|
|:-------------------------------------------------------------------------------------------------------------|-----------|
| BEIR (v1.0.0): CQADupStack-english | 0.485 |
| BEIR (v1.0.0): CQADupStack-english | 0.484 |
| **R@100** | **BGE-base-en-v1.5**|
| BEIR (v1.0.0): CQADupStack-english | 0.757 |
| BEIR (v1.0.0): CQADupStack-english | 0.756 |
| **R@1000** | **BGE-base-en-v1.5**|
| BEIR (v1.0.0): CQADupStack-english | 0.882 |
| BEIR (v1.0.0): CQADupStack-english | 0.881 |

Note that due to the non-deterministic nature of HNSW indexing, results may differ slightly between each experimental run.
Nevertheless, scores are generally within 0.005 of the reference values recorded in [our YAML configuration file](../../src/main/resources/regression/beir-v1.0.0-cqadupstack-english-bge-base-en-v1.5-hnsw-onnx.yaml).
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,7 @@ With the above commands, you should be able to reproduce the following results:

| **nDCG@10** | **BGE-base-en-v1.5**|
|:-------------------------------------------------------------------------------------------------------------|-----------|
| BEIR (v1.0.0): CQADupStack-gis | 0.415 |
| BEIR (v1.0.0): CQADupStack-gis | 0.416 |
| **R@100** | **BGE-base-en-v1.5**|
| BEIR (v1.0.0): CQADupStack-gis | 0.767 |
| **R@1000** | **BGE-base-en-v1.5**|
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,7 @@ With the above commands, you should be able to reproduce the following results:

| **nDCG@10** | **BGE-base-en-v1.5**|
|:-------------------------------------------------------------------------------------------------------------|-----------|
| BEIR (v1.0.0): CQADupStack-gis | 0.412 |
| BEIR (v1.0.0): CQADupStack-gis | 0.413 |
| **R@100** | **BGE-base-en-v1.5**|
| BEIR (v1.0.0): CQADupStack-gis | 0.767 |
| **R@1000** | **BGE-base-en-v1.5**|
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,7 @@ With the above commands, you should be able to reproduce the following results:

| **nDCG@10** | **BGE-base-en-v1.5**|
|:-------------------------------------------------------------------------------------------------------------|-----------|
| BEIR (v1.0.0): CQADupStack-physics | 0.474 |
| BEIR (v1.0.0): CQADupStack-physics | 0.473 |
| **R@100** | **BGE-base-en-v1.5**|
| BEIR (v1.0.0): CQADupStack-physics | 0.810 |
| **R@1000** | **BGE-base-en-v1.5**|
Expand Down
Loading

0 comments on commit f0b37dd

Please sign in to comment.