Skip to content

Commit

Permalink
Addressed CR. added note of errors to openai int8 indexes.
Browse files Browse the repository at this point in the history
  • Loading branch information
lintool committed Dec 19, 2023
1 parent e6253e7 commit 2b6e14a
Show file tree
Hide file tree
Showing 8 changed files with 14 additions and 2 deletions.
2 changes: 2 additions & 0 deletions docs/regressions/regressions-dl19-passage-openai-ada2-int8.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
# Anserini Regressions: TREC 2019 Deep Learning Track (Passage)

**NOTE:** We're currently having issues with this regression, which throws "Retried waiting for GCLocker too often" errors.

**Model**: OpenAI-ada2 embeddings (using pre-encoded queries) with HNSW indexes

This page describes regression experiments, integrated into Anserini's regression testing framework, using OpenAI-ada2 embeddings on the [TREC 2019 Deep Learning Track passage ranking task](https://trec.nist.gov/data/deep2019.html), as described in the following paper:
Expand Down
2 changes: 2 additions & 0 deletions docs/regressions/regressions-dl20-passage-openai-ada2-int8.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
# Anserini Regressions: TREC 2020 Deep Learning Track (Passage)

**NOTE:** We're currently having issues with this regression, which throws "Retried waiting for GCLocker too often" errors.

**Model**: OpenAI-ada2 embeddings (using pre-encoded queries) with HNSW indexes

This page describes regression experiments, integrated into Anserini's regression testing framework, using OpenAI-ada2 embeddings on the [TREC 2020 Deep Learning Track passage ranking task](https://trec.nist.gov/data/deep2019.html), as described in the following paper:
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
# Anserini Regressions: MS MARCO Passage Ranking

**NOTE:** We're currently having issues with this regression, which throws "Retried waiting for GCLocker too often" errors.

**Model**: OpenAI-ada2 embeddings (using pre-encoded queries) with HNSW indexes

This page describes regression experiments, integrated into Anserini's regression testing framework, using OpenAI-ada2 embeddings on the [MS MARCO passage ranking task](https://github.com/microsoft/MSMARCO-Passage-Ranking), as described in the following paper:
Expand Down
2 changes: 1 addition & 1 deletion src/main/java/io/anserini/index/IndexCollection.java
Original file line number Diff line number Diff line change
Expand Up @@ -177,7 +177,7 @@ public IndexCollection(Args args) throws Exception {
}

final Directory dir = FSDirectory.open(Paths.get(args.index));
final IndexWriterConfig config = new IndexWriterConfig(getAnalyzer()).setCodec(new Lucene99Codec());
final IndexWriterConfig config = new IndexWriterConfig(getAnalyzer());

if (args.bm25Accurate) {
// Necessary during indexing as the norm used in BM25 is already determined at index time.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -100,7 +100,7 @@ public IndexInvertedDenseVectors(Args args) {

try {
final Directory dir = FSDirectory.open(Paths.get(args.index));
final IndexWriterConfig config = new IndexWriterConfig(analyzer).setCodec(new Lucene99Codec());
final IndexWriterConfig config = new IndexWriterConfig(analyzer);
config.setOpenMode(IndexWriterConfig.OpenMode.CREATE);
config.setRAMBufferSizeMB(args.memoryBuffer);
config.setUseCompoundFile(false);
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
# Anserini Regressions: TREC 2019 Deep Learning Track (Passage)

**NOTE:** We're currently having issues with this regression, which throws "Retried waiting for GCLocker too often" errors.

**Model**: OpenAI-ada2 embeddings (using pre-encoded queries) with HNSW indexes

This page describes regression experiments, integrated into Anserini's regression testing framework, using OpenAI-ada2 embeddings on the [TREC 2019 Deep Learning Track passage ranking task](https://trec.nist.gov/data/deep2019.html), as described in the following paper:
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
# Anserini Regressions: TREC 2020 Deep Learning Track (Passage)

**NOTE:** We're currently having issues with this regression, which throws "Retried waiting for GCLocker too often" errors.

**Model**: OpenAI-ada2 embeddings (using pre-encoded queries) with HNSW indexes

This page describes regression experiments, integrated into Anserini's regression testing framework, using OpenAI-ada2 embeddings on the [TREC 2020 Deep Learning Track passage ranking task](https://trec.nist.gov/data/deep2019.html), as described in the following paper:
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
# Anserini Regressions: MS MARCO Passage Ranking

**NOTE:** We're currently having issues with this regression, which throws "Retried waiting for GCLocker too often" errors.

**Model**: OpenAI-ada2 embeddings (using pre-encoded queries) with HNSW indexes

This page describes regression experiments, integrated into Anserini's regression testing framework, using OpenAI-ada2 embeddings on the [MS MARCO passage ranking task](https://github.com/microsoft/MSMARCO-Passage-Ranking), as described in the following paper:
Expand Down

0 comments on commit 2b6e14a

Please sign in to comment.