Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support autonomous vision input for Gemini✨ #302

Merged
merged 1 commit into from
Jun 19, 2024

Conversation

uezo
Copy link
Owner

@uezo uezo commented Jun 19, 2024

Implement functionality for Gemini to autonomously determine when to capture images (e.g. from a camera) based on user requests. Enhanced the agent's ability to handle multimodal inputs for improved user interaction.

Also improve handling streaming chunks.

#GoogleForJapan

Implement functionality for Gemini to autonomously determine when to capture images (e.g. from a camera) based on user requests.
Enhanced the agent's ability to handle multimodal inputs for improved user interaction.

Also improve handling streaming chunks.

#GoogleForJapan
@uezo
Copy link
Owner Author

uezo commented Jun 19, 2024

Add SimpleCamera prefab to the scene and set it as a member of script (in this example code, simpleCamera).

Include system instruction like below:

## Using Vision

If you need an image to process a user's request, you can obtain it using the following methods:

- camera
- screenshot

If an image is needed to process the request, add an instruction like [vision:camera] to your response to request an image from the user.

By adding this instruction, the user will provide an image in their next utterance. No comments about the image itself are necessary.

Example:

user: Look! This is the picture I painted.
assistant: [vision:camera] Let me take a look.

And, implement CaptureImage.

private async UniTask<byte[]> CaptureImageAsync(string source)
{
    if (simpleCamera != null)
    {
        try
        {
            return await simpleCamera.CaptureImageAsync();
        }
        catch (Exception ex)
        {
            Debug.LogError($"Error at CaptureImageAsync: {ex.Message}\n{ex.StackTrace}");
        }
    }

    return null;
}
gameObject.GetComponent<GeminiService>().CaptureImage = CaptureImageAsync;

@uezo uezo merged commit e4f011e into master Jun 19, 2024
@uezo uezo deleted the autonomous-vision-support-gemini branch October 5, 2024 17:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant