Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add dtype-based loading #461

Merged
merged 1 commit into from
Nov 13, 2024
Merged

add dtype-based loading #461

merged 1 commit into from
Nov 13, 2024

Conversation

michaelfeil
Copy link
Owner

This pull request includes several changes to improve the handling of loading strategies, device placement, and quantization in the infinity_emb library. The most important changes involve updates to the SentenceClassifier, CrossEncoder, and SentenceTransformer classes to incorporate new loading strategies and device placements, as well as handling different data types and quantization.

Improvements to handling loading strategies and device placement:

License & CLA

By submitting this PR, I confirm that my contribution is made under the terms of the MIT license.
-->

Related Issue

Checklist

  • I have read the CONTRIBUTING guidelines.
  • I have added tests to cover my changes.
  • I have updated the documentation (docs folder) accordingly.

Additional Notes

Add any other context about the PR here.

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Summary

This PR implements dtype-based loading strategies and device placement across transformer models, replacing manual dtype/device management with a more consistent approach.

  • Added loading strategy support in /libs/infinity_emb/infinity_emb/transformer/embedder/sentence_transformer.py with loading_dtype parameter for model initialization
  • Integrated quantization interface via quant_interface in /libs/infinity_emb/infinity_emb/transformer/classifier/torch.py and /libs/infinity_emb/infinity_emb/transformer/crossencoder/torch.py
  • Added torch.compile support in classifier and crossencoder implementations
  • Standardized float32 numpy output in CrossEncoder's encode_post method
  • Removed manual half-precision conversion in favor of loading_dtype across transformer classes

3 file(s) reviewed, 4 comment(s)
Edit PR Review Bot Settings | Greptile

@codecov-commenter
Copy link

codecov-commenter commented Nov 13, 2024

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

Attention: Patch coverage is 87.09677% with 4 lines in your changes missing coverage. Please review.

Project coverage is 79.08%. Comparing base (4ab717b) to head (11fc52b).

Files with missing lines Patch % Lines
...y_emb/infinity_emb/transformer/classifier/torch.py 78.57% 3 Missing ⚠️
...emb/infinity_emb/transformer/crossencoder/torch.py 92.85% 1 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #461      +/-   ##
==========================================
+ Coverage   78.97%   79.08%   +0.10%     
==========================================
  Files          42       42              
  Lines        3392     3414      +22     
==========================================
+ Hits         2679     2700      +21     
- Misses        713      714       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@michaelfeil michaelfeil merged commit 0a688b6 into main Nov 13, 2024
36 checks passed
@michaelfeil michaelfeil deleted the dtype-based-loading branch November 13, 2024 06:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants