Support autonomous vision input for Gemini✨ #302

uezo · 2024-06-19T12:42:57Z

Implement functionality for Gemini to autonomously determine when to capture images (e.g. from a camera) based on user requests. Enhanced the agent's ability to handle multimodal inputs for improved user interaction.

Also improve handling streaming chunks.

#GoogleForJapan

Implement functionality for Gemini to autonomously determine when to capture images (e.g. from a camera) based on user requests. Enhanced the agent's ability to handle multimodal inputs for improved user interaction. Also improve handling streaming chunks. #GoogleForJapan

uezo · 2024-06-19T12:46:18Z

Add SimpleCamera prefab to the scene and set it as a member of script (in this example code, simpleCamera).

Include system instruction like below:

## Using Vision

If you need an image to process a user's request, you can obtain it using the following methods:

- camera
- screenshot

If an image is needed to process the request, add an instruction like [vision:camera] to your response to request an image from the user.

By adding this instruction, the user will provide an image in their next utterance. No comments about the image itself are necessary.

Example:

user: Look! This is the picture I painted.
assistant: [vision:camera] Let me take a look.

And, implement CaptureImage.

private async UniTask<byte[]> CaptureImageAsync(string source)
{
    if (simpleCamera != null)
    {
        try
        {
            return await simpleCamera.CaptureImageAsync();
        }
        catch (Exception ex)
        {
            Debug.LogError($"Error at CaptureImageAsync: {ex.Message}\n{ex.StackTrace}");
        }
    }

    return null;
}

gameObject.GetComponent<GeminiService>().CaptureImage = CaptureImageAsync;

uezo merged commit e4f011e into master Jun 19, 2024

uezo deleted the autonomous-vision-support-gemini branch October 5, 2024 17:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support autonomous vision input for Gemini✨ #302

Support autonomous vision input for Gemini✨ #302

uezo commented Jun 19, 2024

uezo commented Jun 19, 2024 •

edited

Loading

Support autonomous vision input for Gemini✨ #302

Support autonomous vision input for Gemini✨ #302

Conversation

uezo commented Jun 19, 2024

uezo commented Jun 19, 2024 • edited Loading

uezo commented Jun 19, 2024 •

edited

Loading