-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Colbert #456
Colbert #456
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR Summary
Here's a concise summary of the key changes in this PR that adds ColBERT model support:
Added support for multiple model replicas and device placement across GPUs:
- Introduced new
--device-id
CLI argument to specify GPU/MPS device placement (e.g. "0,1") for model distribution - Added LoadingStrategy class to manage device mapping, dtype selection, and quantization across replicas
- Modified BatchHandler to support multiple ModelWorker instances running parallel model replicas
- Updated AsyncEmbeddingEngine to handle multiple model replicas instead of single instance
Added ColBERT-specific functionality:
- Introduced mode_colbert flag in SentenceTransformerPatched for ColBERT model detection
- Modified embedding processing to handle ColBERT's token-level embeddings and attention masks
- Added normalize_embeddings flag to control embedding normalization behavior
- Added test coverage for ColBERT models with token-level embedding verification
The changes enable efficient parallel processing across devices while maintaining backward compatibility with existing models.
25 file(s) reviewed, 9 comment(s)
Edit PR Review Bot Settings | Greptile
@@ -53,6 +60,7 @@ class EngineArgs: | |||
model_warmup: bool = MANAGER.model_warmup[0] | |||
vector_disk_cache_path: str = "" | |||
device: Device = Device[MANAGER.device[0]] | |||
device_id: DeviceID = field(default_factory=lambda: DeviceID(MANAGER.device_id[0])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: device_id is not included in from_env() method's zip_longest parameters, causing potential initialization issues
embeddings: "Tensor" = out_features.to(torch.float32) | ||
if self.normalize_embeddings: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: redundant check for not self.mode_colbert since it's already in an if not self.mode_colbert block
libs/infinity_emb/infinity_emb/transformer/embedder/sentence_transformer.py
Show resolved
Hide resolved
libs/infinity_emb/tests/end_to_end/test_sentence_transformers_colbert.py
Show resolved
Hide resolved
libs/infinity_emb/tests/end_to_end/test_sentence_transformers_colbert.py
Show resolved
Hide resolved
libs/infinity_emb/tests/end_to_end/test_sentence_transformers_colbert.py
Show resolved
Hide resolved
Codecov ReportAttention: Patch coverage is
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## main #456 +/- ##
==========================================
+ Coverage 79.12% 79.19% +0.07%
==========================================
Files 42 42
Lines 3367 3379 +12
==========================================
+ Hits 2664 2676 +12
Misses 703 703 ☔ View full report in Codecov by Sentry. |
Description
Please provide a clear and concise description of the changes in this PR.
Related Issue
If applicable, link the issue this PR addresses.
Types of Change
Checklist
Additional Notes
Add any other context about the PR here.
License
By submitting this PR, I confirm that my contribution is made under the terms of the MIT license.