-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Model: Device-id and data-parallel inference in CLI and Torch #452
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR Summary
Based on my analysis of the pull request, here is a concise summary of the key changes:
Added device-id and data-parallel inference capabilities to enable running models across multiple GPUs/devices:
- Added new
--device-id
CLI option that accepts comma-separated device IDs (e.g. "0,1") for model placement across multiple GPUs/devices - Introduced LoadingStrategy class to manage device mapping, dtype configuration, and quantization settings across different hardware
- Modified BatchHandler to support multiple model replicas running in parallel across specified devices
- Updated test suite with retry logic and adjusted tolerance parameters to handle numerical differences from parallel processing
- Added proper error handling for device validation and unavailable hardware configurations
The changes enable better scaling and performance through parallel inference while maintaining the existing API interface.
23 file(s) reviewed, 32 comment(s)
Edit PR Review Bot Settings | Greptile
│ --device-id TEXT device id defines the model │ | ||
│ placement. e.g. `0,1` will │ | ||
│ place the model on │ | ||
│ MPS/CUDA/GPU 0 and 1 each │ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: The phrase 'each' at the end of this line is ambiguous - does it mean the model is replicated on each device or split across devices?
def update_loading_stategy(self): | ||
"""Assign a device id to the EngineArgs object.""" | ||
from infinity_emb.inference import loading_strategy # type: ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
syntax: update_loading_stategy() has a typo in its name (should be 'strategy')
if self._loading_strategy is None: | ||
self.update_loading_stategy() | ||
elif isinstance(self._loading_strategy, dict): | ||
object.__setattr__(self, "_loading_strategy", LoadingStrategy(**self._loading_strategy)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: loading strategy initialization should happen before pydantic validation to ensure the complete object is validated
@@ -61,6 +67,8 @@ class EngineArgs: | |||
embedding_dtype: EmbeddingDtype = EmbeddingDtype[MANAGER.embedding_dtype[0]] | |||
served_model_name: str = MANAGER.served_model_name[0] | |||
|
|||
_loading_strategy: Optional[LoadingStrategy] = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: _loading_strategy should be documented with a type hint comment explaining its purpose and structure
Codecov ReportAttention: Patch coverage is
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## main #452 +/- ##
==========================================
- Coverage 79.18% 79.15% -0.04%
==========================================
Files 41 42 +1
Lines 3248 3363 +115
==========================================
+ Hits 2572 2662 +90
- Misses 676 701 +25 ☔ View full report in Codecov by Sentry. |
Description
Please provide a clear and concise description of the changes in this PR.
Related Issue
If applicable, link the issue this PR addresses.
Types of Change
Checklist
Additional Notes
Add any other context about the PR here.
License
By submitting this PR, I confirm that my contribution is made under the terms of the MIT license.