-
Notifications
You must be signed in to change notification settings - Fork 661
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[api] Add batch support to TextEmbeddingServingTranslator #3084
Conversation
Codecov ReportAttention: Patch coverage is
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## master #3084 +/- ##
=========================================
Coverage 72.28% 72.29%
- Complexity 7302 7324 +22
=========================================
Files 722 723 +1
Lines 32593 32677 +84
Branches 3410 3427 +17
=========================================
+ Hits 23561 23624 +63
- Misses 7407 7432 +25
+ Partials 1625 1621 -4 ☔ View full report in Codecov by Sentry. |
if (metrics != null) { | ||
waitToRead(list); | ||
long tmp = System.nanoTime(); | ||
long duration = (tmp - timestamp) / 1000; | ||
long duration = (tmp - timestamp) / 1000 / batchSize; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does it make sense to report the average latency per item as the preprocess latency? do we also report the batch size metric somewhere?
I feel like we should report the total time for the full batch, but if this is consistent with how we're doing it elsewhere, or what users typically expect, i'm ok with it. same foes for the other methods we report a similar metric for
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
try to align with TEI for batch case. We do report DynamicBatchSize in metrics. We have total inference time as well
Description
Brief description of what this PR is about