Implement Response Streaming #726

carlodek · 2024-08-01T14:06:37Z

Motivation and Context (Why the change? What's the scenario?)

Add option to stream Ask result tokens without waiting for the full answer to be ready.

High level description (Approach, Design)

New stream boolean option for the Ask API, false by default. When true, answer tokens are streamed as soon as they are generated by LLMs.
New MemoryAnswer.StreamState enum property: Error, Reset, Append, Last.
If moderation is enabled, the content is validated at the end. In case of moderation failure, the service returns an answer with StreamState = Reset and the new content to show to the end user.
Streaming uses SSE message format.
By default, SSE streams end with a [DONE] token. This can be disabled via KM settings.
SSE payload is optimized, returning RelevantSources only in the first SSE message.

Example request:

curl 'http://127.0.0.1:9001/ask' --header 'Content-Type: application/json' \
    --data '{"question": "which storage engines can I use with Kernel Memory?", "stream": true }'

Response:

data: {"streamState":"append","question":"which storage engines can I use with Kernel Memory?","noResult":false,"text":"","relevantSources":[... cut ...]}

data: {"streamState":"append","noResult":false,"text":"The"}

data: {"streamState":"append","noResult":false,"text":" storage"}

data: {"streamState":"append","noResult":false,"text":" engines"}

data: {"streamState":"append","noResult":false,"text":" that"}

[...]

data: {"streamState":"append","noResult":false,"text":"work"}

data: {"streamState":"append","noResult":false,"text":" in"}

data: {"streamState":"append","noResult":false,"text":" progress"}

data: {"streamState":"append","noResult":false,"text":")"}

data: [DONE]

you can now call azure openAI and OpenAI with streaming

carlodek · 2024-08-01T14:13:49Z

@microsoft-github-policy-service agree

…

________________________________ Da: microsoft-github-policy-service[bot] ***@***.***> Inviato: giovedì 1 agosto 2024 16:07 A: microsoft/kernel-memory ***@***.***> Cc: carlodek ***@***.***>; Mention ***@***.***> Oggetto: Re: [microsoft/kernel-memory] added ask_stream endpoint (PR #726) @carlodek<https://github.com/carlodek> please read the following Contributor License Agreement(CLA). If you agree with the CLA, please reply with the following information. @microsoft-github-policy-service agree [company="{your company}"] Options: * (default - no company specified) I have sole ownership of intellectual property rights to my Submissions and I am not making Submissions in the course of work for my employer. @microsoft-github-policy-service agree * (when company given) I am making Submissions in the course of work for my employer (or my employer has intellectual property rights in my Submissions by contract or applicable law). I have permission from my employer to make Submissions and enter into this Agreement on behalf of my employer. By signing below, the defined term “You” includes me and my employer. @microsoft-github-policy-service agree company="Microsoft" Contributor License Agreement Contribution License Agreement This Contribution License Agreement (“Agreement”) is agreed to by the party signing below (“You”), and conveys certain license rights to Microsoft Corporation and its affiliates (“Microsoft”) for Your contributions to Microsoft open source projects. This Agreement is effective as of the latest signature date below. 1. Definitions. “Code” means the computer software code, whether in human-readable or machine-executable form, that is delivered by You to Microsoft under this Agreement. “Project” means any of the projects owned or managed by Microsoft and offered under a license approved by the Open Source Initiative (www.opensource.org<http://www.opensource.org>). “Submit” is the act of uploading, submitting, transmitting, or distributing code or other content to any Project, including but not limited to communication on electronic mailing lists, source code control systems, and issue tracking systems that are managed by, or on behalf of, the Project for the purpose of discussing and improving that Project, but excluding communication that is conspicuously marked or otherwise designated in writing by You as “Not a Submission.” “Submission” means the Code and any other copyrightable material Submitted by You, including any associated comments and documentation. 2. Your Submission. You must agree to the terms of this Agreement before making a Submission to any Project. This Agreement covers any and all Submissions that You, now or in the future (except as described in Section 4 below), Submit to any Project. 3. Originality of Work. You represent that each of Your Submissions is entirely Your original work. Should You wish to Submit materials that are not Your original work, You may Submit them separately to the Project if You (a) retain all copyright and license information that was in the materials as You received them, (b) in the description accompanying Your Submission, include the phrase “Submission containing materials of a third party:” followed by the names of the third party and any licenses or other restrictions of which You are aware, and (c) follow any other instructions in the Project’s written guidelines concerning Submissions. 4. Your Employer. References to “employer” in this Agreement include Your employer or anyone else for whom You are acting in making Your Submission, e.g. as a contractor, vendor, or agent. If Your Submission is made in the course of Your work for an employer or Your employer has intellectual property rights in Your Submission by contract or applicable law, You must secure permission from Your employer to make the Submission before signing this Agreement. In that case, the term “You” in this Agreement will refer to You and the employer collectively. If You change employers in the future and desire to Submit additional Submissions for the new employer, then You agree to sign a new Agreement and secure permission from the new employer before Submitting those Submissions. 5. Licenses. * Copyright License. You grant Microsoft, and those who receive the Submission directly or indirectly from Microsoft, a perpetual, worldwide, non-exclusive, royalty-free, irrevocable license in the Submission to reproduce, prepare derivative works of, publicly display, publicly perform, and distribute the Submission and such derivative works, and to sublicense any or all of the foregoing rights to third parties. * Patent License. You grant Microsoft, and those who receive the Submission directly or indirectly from Microsoft, a perpetual, worldwide, non-exclusive, royalty-free, irrevocable license under Your patent claims that are necessarily infringed by the Submission or the combination of the Submission with the Project to which it was Submitted to make, have made, use, offer to sell, sell and import or otherwise dispose of the Submission alone or with the Project. * Other Rights Reserved. Each party reserves all rights not expressly granted in this Agreement. No additional licenses or rights whatsoever (including, without limitation, any implied licenses) are granted by implication, exhaustion, estoppel or otherwise. 1. Representations and Warranties. You represent that You are legally entitled to grant the above licenses. You represent that each of Your Submissions is entirely Your original work (except as You may have disclosed under Section 3). You represent that You have secured permission from Your employer to make the Submission in cases where Your Submission is made in the course of Your work for Your employer or Your employer has intellectual property rights in Your Submission by contract or applicable law. If You are signing this Agreement on behalf of Your employer, You represent and warrant that You have the necessary authority to bind the listed employer to the obligations contained in this Agreement. You are not expected to provide support for Your Submission, unless You choose to do so. UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING, AND EXCEPT FOR THE WARRANTIES EXPRESSLY STATED IN SECTIONS 3, 4, AND 6, THE SUBMISSION PROVIDED UNDER THIS AGREEMENT IS PROVIDED WITHOUT WARRANTY OF ANY KIND, INCLUDING, BUT NOT LIMITED TO, ANY WARRANTY OF NONINFRINGEMENT, MERCHANTABILITY, OR FITNESS FOR A PARTICULAR PURPOSE. 2. Notice to Microsoft. You agree to notify Microsoft in writing of any facts or circumstances of which You later become aware that would make Your representations in this Agreement inaccurate in any respect. 3. Information about Submissions. You agree that contributions to Projects and information about contributions may be maintained indefinitely and disclosed publicly, including Your name and other information that You submit with Your Submission. 4. Governing Law/Jurisdiction. This Agreement is governed by the laws of the State of Washington, and the parties consent to exclusive jurisdiction and venue in the federal courts sitting in King County, Washington, unless no federal subject matter jurisdiction exists, in which case the parties consent to exclusive jurisdiction and venue in the Superior Court of King County, Washington. The parties waive all defenses of lack of personal jurisdiction and forum non-conveniens. 5. Entire Agreement/Assignment. This Agreement is the entire agreement between the parties, and supersedes any and all prior agreements, understandings or communications, written or oral, between the parties relating to the subject matter hereof. This Agreement may be assigned by Microsoft. — Reply to this email directly, view it on GitHub<#726 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ANLPLIEYC2H5UKAJPSYOUWLZPI6JBAVCNFSM6AAAAABL2UX5SOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENRTGE3TCNZQGE>. You are receiving this because you were mentioned.Message ID: ***@***.***>

dluc · 2024-10-16T20:31:06Z

Update: for this feature to be merged, there's a couple of things to do:

Check this similar PR Implement new streaming ask endpoint (WIP) #400 and decide which approach to take
Support content moderation. The stream of tokens needs to be validated while streamed, on a configurable frequence. If at any point the text moderation fails, the stream needs to be reset, e.g. sending a special token or similar.

# Conflicts: # .gitignore # Directory.Packages.props # Dockerfile

…e/docker

nurkmez2 · 2024-11-22T19:42:28Z

I have been waiting for a long time if you could add Ask Stream Endpoint in the next version, it would be great

nurkmez2 · 2024-11-22T19:52:16Z

Hi @carlodek
It would be great if you could complete the feature of ask_stream endpoint.
Thanks and Regards

roldengarm · 2024-11-25T03:31:02Z

Any updates on this please @dluc it has been a long time, hopefully we can get this sorted soon.

# Conflicts: # service/Core/Search/SearchClient.cs # service/Service.AspNetCore/WebAPIEndpoints.cs

carlodek · 2024-11-28T16:44:14Z

Hello, I've noticed that your work works like a charm with Ollama too! Still with Ollama Stream call is working with ask_stream endpoint. @dluc let me know if I have to do something more

dluc

There are several changes out of scope and unnecessary ones, plus some affecting the solution security. First thing I would ask is to undo all these changes, including code style ones, limiting to the bare minimum. Thanks!

.dockerignore

.gitignore

Directory.Packages.props

Dockerfile

service/Core/Core.csproj

service/Service/Program.cs

Dockerfile

service/Service/Program.cs

service/Core/Search/SearchClient.cs

service/Service/Program.cs

service/Core/Search/SearchClient.cs

service/Service.AspNetCore/WebAPIEndpoints.cs

dluc · 2024-11-29T02:42:36Z

I'll make some changes to see if we can merge it:

move the [DONE] token at the end of the stream, separate from the stream chunks
merge stream behavior into the existing /ask endpoint, using an optional boolean flag to choose whether to stream or not
remove code duplication in SearchClient
revisit how content moderation is implemented

service/Core/Search/SearchClient.cs

service/Service/Service.csproj

changed x in token

removed unnecessary blank line

- reduce code duplication - reduce stream payload size - stream reset on moderation - handle errors - add streaming examples

added ask_stream endpoint

f082f5d

you can now call azure openAI and OpenAI with streaming

carlodek requested a review from dluc as a code owner August 1, 2024 14:06

carlodek added 4 commits August 1, 2024 17:03

added streaming client

274b3f8

added qdrant

3edabdb

Merge branch 'main' into stream_response

02d1e9e

docker for us

488b8d5

dluc mentioned this pull request Oct 16, 2024

Implement new streaming ask endpoint (WIP) #400

Closed

dluc added the waiting for author Waiting for author to reply or address comments label Oct 16, 2024

carlodek added 7 commits October 28, 2024 15:22

added gitignore for AzureOpenAI

5c54916

Merge branch 'main' into feature/docker

6816b36

# Conflicts: # .gitignore # Directory.Packages.props # Dockerfile

custom notFound response implemented correctly

01f1fdc

added response if no memory is found

dcb54ba

content moderation

5b926ce

Merge branch 'main' of github.com:microsoft/kernel-memory into featur…

f07dbe1

…e/docker

removed circular import

202cecf

carlodek added 5 commits November 28, 2024 16:12

Merge branch 'main' into feature/docker

947a4fe

# Conflicts: # service/Core/Search/SearchClient.cs # service/Service.AspNetCore/WebAPIEndpoints.cs

changed docker file

7158bb8

Merge branch 'feature/docker' into stream_response

257e101

code cleaning

f9d9eaa

cleaned code for build automation

1afbb85

dluc requested changes Nov 29, 2024

View reviewed changes