Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Dall-E 3 Image Generation #185

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

Conversation

jdtoombs
Copy link

@jdtoombs jdtoombs commented Nov 14, 2024

appsettings.Development.json changes needed after merge

Small changes to appsettings will be needed when this is merged, refer to the helm changes. Only need to apply this to the eastus endpoint.

Motivation and Context

Allow the user to generate images while chatting with Q-Pilot. I am making a separate ticket for enabling this only for certain specializations to keep the PRs a bit more compact.

Description

Dall-E 3 has been added as a deployment; however, it is only available with our east-us services. Currently it just looks for trigger words when the user is chatting with the bot, I am going to further investigate if we can get the Semantic Kernel to automatically detect and decide which deployment to use.

A flag is set on the bot response for IsImage and this flag is interpreted by the frontend to display the source given from the dall-e-3 deployment in an <img>.

image

Contribution Checklist

@@ -207,6 +207,24 @@ QAzureOpenAIChatOptions.OpenAIDeploymentConnection connection in this._qAzureOpe
return chatCompletionDeployments;
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't actually use this yet... wondering if we are going to eventually select deployments like the chat completion model, or if it will always be one (i.e dalle-3)

@@ -971,11 +993,40 @@ private async Task<CopilotChatMessage> StreamResponseToClientAsync(
);
}
}
var chatHistory = prompt.MetaPromptTemplate;

// logic to get the last user message and extract relevant text
Copy link
Author

@jdtoombs jdtoombs Nov 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really like this (lines 998-1003), must be a better way. Throwing it up in the meantime to get some eyes on it. It essentially does some string manipulation to the prompt object to get the description text that dall-e-3 will be called with

@@ -905,6 +906,23 @@ private Dictionary<string, int> GetTokenUsages(KernelArguments kernelArguments,
return tokenUsageDict;
}

private bool IsImageRequest(string prompt)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally would have the semantic kernel interpret the prompt and use its built in capabilities to determine what the appropriate service is. For now we have custom logic to determine if it is an image request. Would like to iterate on this

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason we cant have the semantic kernel interpret the prompt at the moment?

@@ -905,6 +906,23 @@ private Dictionary<string, int> GetTokenUsages(KernelArguments kernelArguments,
return tokenUsageDict;
}

private bool IsImageRequest(string prompt)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason we cant have the semantic kernel interpret the prompt at the moment?

/// </summary>
public List<string> GetAllTextToImageDeployments()
{
var textToImageDeployments = new List<string>();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see duplicate code between this and GetAllChatCompletionDeployments that could be shaved down

@@ -53,9 +53,18 @@ public class OpenAIDeploymentConnection
public string APIKey { get; set; } = string.Empty;
public IList<ChatCompletionDeployment> ChatCompletionDeployments { get; set; } =
new List<ChatCompletionDeployment>();
public IList<string> ImageGenerationDeployments { get; set; } = new List<string>();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of IList<string> I think we should be doing something similar to the way ChatCompletionDeployments is declared

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants