-
-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(tools): support Tool calls in the API #1715
Conversation
✅ Deploy Preview for localai canceled.
|
Co-authored-by: =?UTF-8?q?Stephan=20A=C3=9Fmus?= <stephan.assmus@sap.com>
As for the missing streaming support... Do you think it might be an idea to stream |
yes, that's exactly what I'm looking at now - at least for the moment we provide compatibility, we can get back at this later on to try to stream the whole result from the LLM directly |
9024c5f
to
03f802e
Compare
I'm amazed by your quick progress! This is the project I'm working on: https://github.com/stippi/voice-assistant |
that's super cool! I'd be happy to add it in the community section in the README :) |
I didn't give it a shot yet at this PR, but I think most of the pieces should be in place by now, if you have some cycles to spend and test this out would be great! Thanks for the quick feedback @stippi2, that really helps me through, really appreciated! |
Ok, I'll give it a shot. Maybe two questions.
|
This should be covered by dddd67d (non-streaming mode), for streaming mode we just reply with the tools format (as we didn't supported streaming functions before, I would not port deprecated APIs to the new SSE feature)
Multiple tools at once is not supported (yet), see #1275, as it requires changes to the BNF grammar and would probably require few rounds of tests first, but replies in case a tool is not selected is supported - however, at the moment I didn't wired it up with streaming responses, but that's easy to get after this bunch of changes. |
This is an example messages array I see in the network tab of the Chrome dev tools: {
"role": "user",
"content": "Play songs from Peter Fox, please."
},
{
"role": "assistant",
"content": null,
"tool_calls": [
{
"id": "call_C8tn5mmpCnlWgaSBvIYXv85p",
"type": "function",
"function": {
"name": "find_artists_and_play_top_songs_on_spotify",
"arguments": "{\"queries\":[\"Peter Fox\"]}"
}
}
]
},
{
"role": "tool",
"name": "find_artists_and_play_top_songs_on_spotify",
"tool_call_id": "call_C8tn5mmpCnlWgaSBvIYXv85p",
"content": "{\"result\":\"playback started\"}"
} |
🤦 I didn't read it right. I totally missed up the reply to a tool result. Will try to have a look at it later/tomorrow Update: well, thinking again, actually that should be covered already without any changes - just map the "tool" role in the model config. We should pass the name, but for a first implementation should be already working. |
To what string should it be mapped? |
It would ideally depend on how the model is fine-tuned, if it didn't saw any "tool" or function it might be problematic. The role is used when constructing back the prompt feeded back to the LLM
Going to have a look soon |
aaf2758
to
2510451
Compare
okay, this should be working now at least - I didn't give it a deep look into the API diffs, but I think we are very much closer now |
mm that looks more a model misconfiguration - or somehow the LLM output is not entirely JSON at the end. Did you tried to set the stopwords on the model? Example: stopwords:
- "<dummy32000>" |
This is my config:
|
I think you should add the string you see in there in the final string in the output as a stopword, definitely should be not part of the response as we force JSON - but seems it is not entirely JSON in your case |
Will try, thanks. |
For reference: name: mistral
mmap: true
parameters:
model: mistral-7b-openorca.Q6_K.gguf
temperature: 0.2
top_k: 40
top_p: 0.95
template:
chat_message: chatml
chat: chatml-block
completion: completion
context_size: 4096
f16: true
stopwords:
- <|im_end|>
- <dummy32000>
threads: 4 |
With that change, the function call works. Still waiting for the chat response after sending the tool result to the model. Note that I didn't add a new role mapping anywhere. Should I? |
Hm. It keeps calling the tool. Are you sure the LLM is being forwarded the tool result? I'll try to add a mapping for the "tool" role, if I can find where it needs to be added. {
"role": "user",
"content": "Hi. How are you?"
},
{
"content": "I'm doing well, thank you for asking. How can I assist you today?\n\n",
"role": "assistant"
},
{
"role": "user",
"content": "Can you tell me the current weather?"
},
{
"role": "assistant",
"content": null,
"tool_calls": [
{
"id": "9618f662-741e-49b3-8a92-87b6d2b4567e",
"type": "function",
"function": {
"name": "get_current_weather",
"arguments": "{\"latitude\":52.3497672,\"longitude\":13.3008244}"
}
}
]
},
{
"role": "tool",
"name": "get_current_weather",
"tool_call_id": "9618f662-741e-49b3-8a92-87b6d2b4567e",
"content": "{\"result\":{\"weather\":\"light intensity shower rain\",\"temperature\":12,\"temperature_feels_like\":12,\"humidity\":93,\"wind_speed\":4.63,\"wind_direction\":270}}"
},
{
"role": "assistant",
"content": null,
"tool_calls": [
{
"id": "9618f662-741e-49b3-8a92-87b6d2b4567e",
"type": "function",
"function": {
"name": "get_current_weather",
"arguments": "{\"latitude\":52.349785,\"longitude\":13.300821}"
}
}
]
},
{
"role": "tool",
"name": "get_current_weather",
"tool_call_id": "9618f662-741e-49b3-8a92-87b6d2b4567e",
"content": "{\"result\":{\"weather\":\"light intensity shower rain\",\"temperature\":12,\"temperature_feels_like\":12,\"humidity\":93,\"wind_speed\":4.63,\"wind_direction\":270}}"
} |
It does, if you turn off streaming, you should be able to see the prompt being passed by, for instance:
And the request is:
What you are seeing is the LLM model capacity to do follow-ups, which is a bit limited. For function calling I can tell you need at least models bigger than >30b to perform "good enough". But I can tell by experience if you want something meaningful you need to aim at 70b models or leverage speculative sampling with even bigger ones. I've summed up some of my experience with it in https://github.com/mudler/LocalAGI - feel free to have a look in there, it basically forces the LLM to reason over the results and it improves results even with smaller models (but still, you need at least a 30b model, 7b won't be enough). Edit: You can alternatively also play with roles mapping to check how the prompt is formatted back to the LLM. It improves accuracy by a long shot if the model can recognize correctly results. Proper templating helps in that, but still my suggestion above is relevant (use bigger models, 7b won't cut it) |
I'm going to merge this as it looks good from a functional perspective, LLM/models limitation is out of scope of this PR. I'll add tests as soon as I get a bit more time to test this from master images in my home setup. |
Ok, thanks for the insight, and especially for implementing this so quickly!! Awesome! |
….0 by renovate (#18546) This PR contains the following updates: | Package | Update | Change | |---|---|---| | [docker.io/localai/localai](https://togithub.com/mudler/LocalAI) | minor | `v2.8.2-cublas-cuda11-ffmpeg-core` -> `v2.9.0-cublas-cuda11-ffmpeg-core` | | [docker.io/localai/localai](https://togithub.com/mudler/LocalAI) | minor | `v2.8.2-cublas-cuda11-core` -> `v2.9.0-cublas-cuda11-core` | | [docker.io/localai/localai](https://togithub.com/mudler/LocalAI) | minor | `v2.8.2-cublas-cuda12-ffmpeg-core` -> `v2.9.0-cublas-cuda12-ffmpeg-core` | | [docker.io/localai/localai](https://togithub.com/mudler/LocalAI) | minor | `v2.8.2-cublas-cuda12-core` -> `v2.9.0-cublas-cuda12-core` | | [docker.io/localai/localai](https://togithub.com/mudler/LocalAI) | minor | `v2.8.2-ffmpeg-core` -> `v2.9.0-ffmpeg-core` | | [docker.io/localai/localai](https://togithub.com/mudler/LocalAI) | minor | `v2.8.2` -> `v2.9.0` | --- > [!WARNING] > Some dependencies could not be looked up. Check the Dependency Dashboard for more information. --- ### Release Notes <details> <summary>mudler/LocalAI (docker.io/localai/localai)</summary> ### [`v2.9.0`](https://togithub.com/mudler/LocalAI/releases/tag/v2.9.0) [Compare Source](https://togithub.com/mudler/LocalAI/compare/v2.8.2...v2.9.0) This release brings many enhancements, fixes, and a special thanks to the community for the amazing work and contributions! We now have sycl images for Intel GPUs, ROCm images for AMD GPUs,and much more: - You can find the AMD GPU images tags between the container images available - look for `hipblas`. For example, [master-hipblas-ffmpeg-core](https://quay.io/repository/go-skynet/local-ai/tag/master-hipblas-ffmpeg-core). Thanks to [@​fenfir](https://togithub.com/fenfir) for this nice contribution! - Intel GPU images are tagged with `sycl`. You can find images with two flavors, sycl-f16 and sycl-f32 respectively. For example, [master-sycl-f16](https://quay.io/repository/go-skynet/local-ai/tag/master-sycl-f16-core). Work is in progress to support also diffusers and transformers on Intel GPUs. - Thanks to [@​christ66](https://togithub.com/christ66) first efforts in supporting the Assistant API were made, and we are planning to support the Assistant API! Stay tuned for more! - Now LocalAI supports the Tools API endpoint - it also supports the (now deprecated) functions API call as usual. We now also have support for SSE with function calling. See [https://github.com/mudler/LocalAI/pull/1726](https://togithub.com/mudler/LocalAI/pull/1726) for more - Support for Gemma models - did you hear? Google released OSS models and LocalAI supports it already! - Thanks to [@​dave-gray101](https://togithub.com/dave-gray101) in [https://github.com/mudler/LocalAI/pull/1728](https://togithub.com/mudler/LocalAI/pull/1728) to put efforts in refactoring parts of the code - we are going to support soon more ways to interface with LocalAI, and not only restful api! ##### Support the project First off, a massive thank you to each and every one of you who've chipped in to squash bugs and suggest cool new features for LocalAI. Your help, kind words, and brilliant ideas are truly appreciated - more than words can say! And to those of you who've been heros, giving up your own time to help out fellow users on Discord and in our repo, you're absolutely amazing. We couldn't have asked for a better community. Just so you know, LocalAI doesn't have the luxury of big corporate sponsors behind it. It's all us, folks. So, if you've found value in what we're building together and want to keep the momentum going, consider showing your support. A little shoutout on your favorite social platforms using [@​LocalAI_OSS](https://twitter.com/LocalAI_API) and [@​mudler_it](https://twitter.com/mudler_it) or joining our sponsorship program can make a big difference. Also, if you haven't yet joined our Discord, come on over! Here's the link: https://discord.gg/uJAeKSAGDy Every bit of support, every mention, and every star adds up and helps us keep this ship sailing. Let's keep making LocalAI awesome together! Thanks a ton, and here's to more exciting times ahead with LocalAI! 🚀 ##### What's Changed ##### Bug fixes 🐛 - Add TTS dependency for cuda based builds fixes [#​1727](https://togithub.com/mudler/LocalAI/issues/1727) by [@​blob42](https://togithub.com/blob42) in [https://github.com/mudler/LocalAI/pull/1730](https://togithub.com/mudler/LocalAI/pull/1730) ##### Exciting New Features 🎉 - Build docker container for ROCm by [@​fenfir](https://togithub.com/fenfir) in [https://github.com/mudler/LocalAI/pull/1595](https://togithub.com/mudler/LocalAI/pull/1595) - feat(tools): support Tool calls in the API by [@​mudler](https://togithub.com/mudler) in [https://github.com/mudler/LocalAI/pull/1715](https://togithub.com/mudler/LocalAI/pull/1715) - Initial implementation of upload files api. by [@​christ66](https://togithub.com/christ66) in [https://github.com/mudler/LocalAI/pull/1703](https://togithub.com/mudler/LocalAI/pull/1703) - feat(tools): Parallel function calling by [@​mudler](https://togithub.com/mudler) in [https://github.com/mudler/LocalAI/pull/1726](https://togithub.com/mudler/LocalAI/pull/1726) - refactor: move part of api packages to core by [@​dave-gray101](https://togithub.com/dave-gray101) in [https://github.com/mudler/LocalAI/pull/1728](https://togithub.com/mudler/LocalAI/pull/1728) - deps(llama.cpp): update, support Gemma models by [@​mudler](https://togithub.com/mudler) in [https://github.com/mudler/LocalAI/pull/1734](https://togithub.com/mudler/LocalAI/pull/1734) ##### 👒 Dependencies - deps(llama.cpp): update by [@​mudler](https://togithub.com/mudler) in [https://github.com/mudler/LocalAI/pull/1714](https://togithub.com/mudler/LocalAI/pull/1714) - ⬆️ Update ggerganov/llama.cpp by [@​localai-bot](https://togithub.com/localai-bot) in [https://github.com/mudler/LocalAI/pull/1740](https://togithub.com/mudler/LocalAI/pull/1740) ##### Other Changes - ⬆️ Update docs version mudler/LocalAI by [@​localai-bot](https://togithub.com/localai-bot) in [https://github.com/mudler/LocalAI/pull/1718](https://togithub.com/mudler/LocalAI/pull/1718) - ⬆️ Update ggerganov/llama.cpp by [@​localai-bot](https://togithub.com/localai-bot) in [https://github.com/mudler/LocalAI/pull/1705](https://togithub.com/mudler/LocalAI/pull/1705) - Update README.md by [@​lunamidori5](https://togithub.com/lunamidori5) in [https://github.com/mudler/LocalAI/pull/1739](https://togithub.com/mudler/LocalAI/pull/1739) - ⬆️ Update ggerganov/llama.cpp by [@​localai-bot](https://togithub.com/localai-bot) in [https://github.com/mudler/LocalAI/pull/1750](https://togithub.com/mudler/LocalAI/pull/1750) ##### New Contributors - [@​fenfir](https://togithub.com/fenfir) made their first contribution in [https://github.com/mudler/LocalAI/pull/1595](https://togithub.com/mudler/LocalAI/pull/1595) - [@​christ66](https://togithub.com/christ66) made their first contribution in [https://github.com/mudler/LocalAI/pull/1703](https://togithub.com/mudler/LocalAI/pull/1703) - [@​blob42](https://togithub.com/blob42) made their first contribution in [https://github.com/mudler/LocalAI/pull/1730](https://togithub.com/mudler/LocalAI/pull/1730) **Full Changelog**: mudler/LocalAI@v2.8.2...v2.9.0 </details> --- ### Configuration 📅 **Schedule**: Branch creation - "before 10pm on monday" in timezone Europe/Amsterdam, Automerge - At any time (no schedule defined). 🚦 **Automerge**: Enabled. ♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about these updates again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box --- This PR has been generated by [Renovate Bot](https://togithub.com/renovatebot/renovate). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNy4yMTMuMCIsInVwZGF0ZWRJblZlciI6IjM3LjIxMy4wIiwidGFyZ2V0QnJhbmNoIjoibWFzdGVyIn0=-->
Description
Part of #1712
Notes for Reviewers
still missing stream support