Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] avoid second round trip for function call #656

Conversation

Grogdunn
Copy link
Contributor

Reefer issue #652

For some use case after local function is called no need to pass the results to LLMs. (eg: data extraction from documents)

@Grogdunn Grogdunn marked this pull request as ready for review April 30, 2024 10:12
@Grogdunn Grogdunn force-pushed the avoid-second-roundtrip-in-function-calls branch from d57914f to 78e24fd Compare April 30, 2024 14:33
@tzolov tzolov self-assigned this May 4, 2024
@tzolov tzolov added this to the 1.0.0-M1 milestone May 4, 2024
Copy link
Contributor

@tzolov tzolov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @Grogdunn,
I believe we are getting closer and this will be very useful function calling feature.

Still I believe that the CompleteRoundTripBox and the runtime detection of needCompleteRoundTrip are unnecessary.
Since in the FunctionCallbackContext you have already detected Void response (and Consumer) functions it should be possible add new boolean noReturnFunction() method to the FunctionCallback interface.
Then in the AbstractFunctionCallSupport#handleFunctionCallOrReturn you should be able to use this to decide how to handle the return?
This way you can implement everything inside the abstract support classes without changing the Model implementations?
Maybe I'm missing some detail?
Let me know if you would be interested to explore the suggested approach? or should I give it a try.
Thank you very much for the contribution!

@tzolov
Copy link
Contributor

tzolov commented May 13, 2024

@Grogdunn i'm afraid you've committed many changes unrelated to this PR.
Please fix the PR or create new one that includes only the related changes.

To prevent such situations, please don't reformat the project code. Especially code not related to the PR.
Also use pull rebase requests (e.g. git pull -r upstream main) to rebase to the top of main. Never merge with upstream main.

@Grogdunn Grogdunn force-pushed the avoid-second-roundtrip-in-function-calls branch from 5948013 to 87308e2 Compare May 13, 2024 09:32
@Grogdunn
Copy link
Contributor Author

@tzolov removed commit with all import * fixed and re-played only for touched subproject.
rebased on top on upstream master (as requested)

conflicts are gone! 🎉

@tzolov
Copy link
Contributor

tzolov commented May 17, 2024

Thanks @Grogdunn.
Still the majority of the changes (> 100 files) in the PR are code reformatting unrelated to the PR goal. For example import reordering in AnthropicChatOptions.java, AnthropicApi.java ....
I'm presuming that actual changes would affect much fewer files.
Can you please undo those changes (i guess your IDE is doing them automatically) or copy only the related changes in new PR.

@Grogdunn
Copy link
Contributor Author

Sure! Today I'll try to reduce PR changeset.

@Grogdunn Grogdunn force-pushed the avoid-second-roundtrip-in-function-calls branch from 87308e2 to 5edafc9 Compare May 17, 2024 08:28
@Grogdunn
Copy link
Contributor Author

Ok, squashed, rebased onto main, cleaned up.

@tzolov
Copy link
Contributor

tzolov commented May 19, 2024

Hi @Grogdunn ,
I've been reviewing the PR and doing some improvements when I realised that this approach (and likely any approach) will be incorrect.

First, I noticed that the PR doesn't support streaming function calling and while reasoning about ways to provide such support I realised that we are on a wrong path.

Basically we can not safely abruptly stop the function calling message exchange and leave the conversation in a safe state.

Simply we don't know what other actions the LLM might need to perform before safely answering the user prompt.
For example if you your prompt is " Cancels my flight and let me know what is the weather in Amsterdam" and the Function<FlightToCancel, Void> doesn't return value. The PR will abruptly stop the conversation and will not allow the LLM to call the second WeatherInfo function and complete the conversation.
It gets even more complicated for streaming case.

So IMO the only way this can ever work is if the LLM Function Calling API provide a protocol to send EmptyResponse for a Tool call. Any other heuristics will be incorrect.

Please let me know what do you think.

@Grogdunn
Copy link
Contributor Author

Grogdunn commented May 20, 2024

🤔
Well the PR was made before the other PR with function streaming come to main, so I can port the changes to function-streaming.

I think that is possible because before call local function we wait for streaming-stop signal (I don't remeber the correct terms), then the stream stops and we call the local functions.
After that we have, with your example, two results: 1 - null and 2 - {"temperature": 30, "sky": "sunny", "blabla": "lorem ipsum"}
So we evalutate all the local responses and complete the second roundtrip because we have the answer number 2.

I'll try to spike that today.

@tzolov
Copy link
Contributor

tzolov commented May 20, 2024

@Grogdunn , the problem is not related to the streaming but to the sync calls as well.
Simply, current LLM API demands that you should always return a function call result message back to the LLM. Even for a void functions.
Perhaps a way to emulate a void function call behaviour is to return an empty string or something like "DONE" in case of void functions.

If you do not return a value you break the conversation and prevent the LLM to do some follow up function calls.

@Grogdunn
Copy link
Contributor Author

Well in terms of conversation you are right. But you can use LLM to make something else instead of "chatbot". Some of my customers need to grab structured data from unstructured data, like name, surname, other information from emails, attachments, extract information from datasheet ad so on...
In this case we do not have a "conversation" but a single-shot request with a lot of context that ends with the function call.

A simple prompt can be, for instance:

U: Extract name, surname, emotional state, address, city, and summarize the following text, and call the function SAVEITNOW with extracted data:
--
Lorem ipsum dolor sit amet...[a lot of text]

the function is described as usual. No other interaction with this data will be done in the future.

@Grogdunn
Copy link
Contributor Author

I add, in my case, the second roundtrip is useless, and make double the costs for any interaction.

@tzolov
Copy link
Contributor

tzolov commented May 20, 2024

I understand your use case and the goal to reduce the cost.
But the solution is sort of a hack that breaks the LLM APIs/protocols. Therefore we will be reluctant to add such hacks to the general purpose Spring AI.
Perhaps the Structured Output Converter can help to returns one-shot structured response for other use case?
Other options is to think of some extension/customisation points that might allow one hack the spring ai function calling API in their applications.

At the same time i've reworked our PR to returns "DONE" in case of void function definition, which is still a valid user case and modified the test like this:
https://github.com/tzolov/spring-ai/blob/8f832e337c6d4fc7a244de68cd61c5bd8e0c5876/models/spring-ai-azure-openai/src/test/java/org/springframework/ai/azure/openai/function/AzureOpenAiChatClientFunctionCallIT.java#L123

This seems to work (most of the time) for sync and stream calls.

@Grogdunn
Copy link
Contributor Author

The "done" message is exactly how we have addressed the "second round trip" at the moment.

I think, if I've understood well, the Structured Output Converter is not enough because when you extract data and call the functions the LLM better respects the data structure, while the other way around (parse the text in output and grab the JSON provided) is more prone to hallucinations (I've used some time ago before functions/tools era).

So the better way is to think an extension point to leave the hack external to te spring-ai.

@markpollack
Copy link
Member

While doing triage for the M1 release, we will move this issue to the M2 release. There is a Pr that @tzolov will submit that takes this discussion further. Thanks for your patience.

@markpollack markpollack modified the milestones: 1.0.0-M1, 1.0.0-M2 May 24, 2024
@Grogdunn
Copy link
Contributor Author

I've done some experiment with structured output, and with GPT-4o simply works. But remain hallucination prone.

Maybe that feature is not necessary anymore?

@markpollack
Copy link
Member

I'm afraid I have to bump this again, apologies.

@tzolov
Copy link
Contributor

tzolov commented Sep 23, 2024

@Grogdunn I think that 5017749 might help addressing this issue?
Please, check the OpenAiChatModelProxyToolCallsIT.java for examples how to run the function calling entirely on the client side.
Let me know if you have further questions.

@Grogdunn
Copy link
Contributor Author

@tzolov Thanks for the hint! I see the intent of this commit and if I understand well is what I need! I try to use as soon as possible

@markpollack
Copy link
Member

Going to close this issue. Please open a new one should the current feature set not work out. Thanks @Grogdunn

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants