Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: include tokens usage for streamed output #4282

Merged
merged 3 commits into from
Nov 28, 2024

Conversation

mintyleaf
Copy link
Contributor

Description

This PR fixes didn't find such issue

i found out that usage body part is always zeroed when localai is used through webui, e.g. the output is streamed
the issue is grpc server is supposed to return a stream of Reply structures, but for some reason they was used to retrieve just message bytes every chunk and nothing more
But, the logic for holding mysterious empty usage data for every chunk is present (even it have sent only at the last closing chunk, that's why i was forced to do a hack with empty choices to retrieve just usage data through the tokenCallback)

i don't really know about testing the tools part (and what the hell it is, lol), but since it uses usage data returned from ComputeChoices function - i have verified that it indeed returns non-empty usage result

at the same time just Predict (not streaming) part using protocol.Reply all the way down through the layers of abstractions and correctly showing the token usage data

…to get the proper usage data in reply streaming mode at the last [DONE] frame
Copy link

netlify bot commented Nov 27, 2024

Deploy Preview for localai ready!

Name Link
🔨 Latest commit e459118
🔍 Latest deploy log https://app.netlify.com/sites/localai/deploys/67482889efbc0f0008f0d32b
😎 Deploy Preview https://deploy-preview-4282--localai.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@@ -39,11 +39,15 @@ func ChatEndpoint(cl *config.BackendConfigLoader, ml *model.ModelLoader, startup
responses <- initialMessage

ComputeChoices(req, s, config, startupOptions, loader, func(s string, c *[]schema.Choice) {}, func(s string, usage backend.TokenUsage) bool {
choices := []schema.Choice{}
if s != "" {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not entirely sure about this as I didn't checked what OpenAI does in this case (and hard to replicate).

If the token is empty does it send back an empty choice, or just skips it? Probably doesn't matter either, just making sure we keep an eye to not break any user flow

Copy link
Contributor Author

@mintyleaf mintyleaf Nov 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't really know, but that token callback function is used only for streaming reply mode

and in that mode we chop runes of the token, and if there is some - we calling that function, so before that commit in that specific case the tokenCallback never will be called with empty first argument which i check here

the problem is the real usage is sent by llama as final reply where message field bytes is empty

and in that specific case i explicitly call tokenCallback just to capture the usage part

next, about choices - they are always was strictly set to a slice with exactly one element before
so i just used it as an opportunity to not rewrite much of the code and somehow flag out that everything except usage part is need to be ignored, so the real final reply can be constructed after receiving from backend depending on the tools stuff e.t.c.

@mudler mudler changed the title Fix OpenAI tokens usage for streamed output fix: include tokens usage for streamed output Nov 28, 2024
@mudler mudler changed the title fix: include tokens usage for streamed output feat: include tokens usage for streamed output Nov 28, 2024
@mudler mudler added the enhancement New feature or request label Nov 28, 2024
@mudler mudler merged commit 0d6c3a7 into mudler:master Nov 28, 2024
30 of 31 checks passed
mudler added a commit that referenced this pull request Dec 8, 2024
mudler added a commit that referenced this pull request Dec 8, 2024
mudler added a commit that referenced this pull request Dec 8, 2024
Revert "feat: include tokens usage for streamed output (#4282)"

This reverts commit 0d6c3a7.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants