Skip to content

Commit

Permalink
Extract common code paths in indexing pipeline (#2275)
Browse files Browse the repository at this point in the history
Major refactoring of indexing pipeline (IndexCollection, IndexHnswDenseVectors, and IndexInvertedDenseVectors),
extracting common code paths into AbstractIndexer.
  • Loading branch information
lintool committed Dec 13, 2023
1 parent 6df6c41 commit b6a7534
Show file tree
Hide file tree
Showing 35 changed files with 749 additions and 1,020 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ target/appassembler/bin/IndexHnswDenseVectors \
-input /path/to/msmarco-passage-cos-dpr-distil \
-generator HnswDenseVectorDocumentGenerator \
-index indexes/lucene-hnsw.msmarco-passage-cos-dpr-distil/ \
-threads 16 -M 16 -efC 100 \
-threads 16 -M 16 -efC 100 -memoryBuffer 65536 \
>& logs/log.msmarco-passage-cos-dpr-distil &
```

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ target/appassembler/bin/IndexHnswDenseVectors \
-input /path/to/msmarco-passage-cos-dpr-distil \
-generator HnswDenseVectorDocumentGenerator \
-index indexes/lucene-hnsw.msmarco-passage-cos-dpr-distil/ \
-threads 16 -M 16 -efC 100 \
-threads 16 -M 16 -efC 100 -memoryBuffer 65536 \
>& logs/log.msmarco-passage-cos-dpr-distil &
```

Expand Down
2 changes: 1 addition & 1 deletion docs/regressions/regressions-dl19-passage-openai-ada2.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ target/appassembler/bin/IndexHnswDenseVectors \
-input /path/to/msmarco-passage-openai-ada2 \
-generator HnswDenseVectorDocumentGenerator \
-index indexes/lucene-hnsw.msmarco-passage-openai-ada2/ \
-threads 16 -M 16 -efC 100 -memorybuffer 65536 \
-threads 16 -M 16 -efC 100 -memoryBuffer 65536 \
>& logs/log.msmarco-passage-openai-ada2 &
```

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ target/appassembler/bin/IndexHnswDenseVectors \
-input /path/to/msmarco-passage-cos-dpr-distil \
-generator HnswDenseVectorDocumentGenerator \
-index indexes/lucene-hnsw.msmarco-passage-cos-dpr-distil/ \
-threads 16 -M 16 -efC 100 \
-threads 16 -M 16 -efC 100 -memoryBuffer 65536 \
>& logs/log.msmarco-passage-cos-dpr-distil &
```

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ target/appassembler/bin/IndexHnswDenseVectors \
-input /path/to/msmarco-passage-cos-dpr-distil \
-generator HnswDenseVectorDocumentGenerator \
-index indexes/lucene-hnsw.msmarco-passage-cos-dpr-distil/ \
-threads 16 -M 16 -efC 100 \
-threads 16 -M 16 -efC 100 -memoryBuffer 65536 \
>& logs/log.msmarco-passage-cos-dpr-distil &
```

Expand Down
2 changes: 1 addition & 1 deletion docs/regressions/regressions-dl20-passage-openai-ada2.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ target/appassembler/bin/IndexHnswDenseVectors \
-input /path/to/msmarco-passage-openai-ada2 \
-generator HnswDenseVectorDocumentGenerator \
-index indexes/lucene-hnsw.msmarco-passage-openai-ada2/ \
-threads 16 -M 16 -efC 100 -memorybuffer 65536 \
-threads 16 -M 16 -efC 100 -memoryBuffer 65536 \
>& logs/log.msmarco-passage-openai-ada2 &
```

Expand Down
2 changes: 1 addition & 1 deletion docs/regressions/regressions-mb11.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ target/appassembler/bin/IndexCollection \
-input /path/to/mb11 \
-generator TweetGenerator \
-index indexes/lucene-index.mb11/ \
-threads 44 -storePositions -storeDocvectors -storeRaw -uniqueDocid -tweet.keepUrls -tweet.stemming \
-threads 44 -storePositions -storeDocvectors -storeRaw -tweet.keepUrls -tweet.stemming \
>& logs/log.mb11 &
```

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ target/appassembler/bin/IndexHnswDenseVectors \
-input /path/to/msmarco-passage-cos-dpr-distil \
-generator HnswDenseVectorDocumentGenerator \
-index indexes/lucene-hnsw.msmarco-passage-cos-dpr-distil/ \
-threads 16 -M 16 -efC 100 \
-threads 16 -M 16 -efC 100 -memoryBuffer 65536 \
>& logs/log.msmarco-passage-cos-dpr-distil &
```

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ target/appassembler/bin/IndexHnswDenseVectors \
-input /path/to/msmarco-passage-cos-dpr-distil \
-generator HnswDenseVectorDocumentGenerator \
-index indexes/lucene-hnsw.msmarco-passage-cos-dpr-distil/ \
-threads 16 -M 16 -efC 100 \
-threads 16 -M 16 -efC 100 -memoryBuffer 65536 \
>& logs/log.msmarco-passage-cos-dpr-distil &
```

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ target/appassembler/bin/IndexHnswDenseVectors \
-input /path/to/msmarco-passage-openai-ada2 \
-generator HnswDenseVectorDocumentGenerator \
-index indexes/lucene-hnsw.msmarco-passage-openai-ada2/ \
-threads 16 -M 16 -efC 100 -memorybuffer 65536 \
-threads 16 -M 16 -efC 100 -memoryBuffer 65536 \
>& logs/log.msmarco-passage-openai-ada2 &
```

Expand Down
Loading

0 comments on commit b6a7534

Please sign in to comment.