GitHub - RMNCLDYO/gemini-ai-toolkit: Unlock the potential of Google's Gemini AI models with this versatile toolkit. Offering seamless chat, text generation, and multimodal interactions, supporting various file types, including PDF's, images, videos, audio, text and more. Enjoy real-time responses, customizable parameters, and easy integration for diverse AI tasks.

Note

This toolkit supports Google's newest Gemini 1.5 Pro and 1.5 Flash stable & experimental models (as of October 3, 2024)

The Gemini AI Toolkit is the easiest way for developers to build with Google's Gemini AI models. It offers seamless integration for chat, text generation, and multimodal interactions, allowing you to process and analyze text, images, audio, video, code, and more—all in one comprehensive package with minimal dependencies.

🚀 Features

Multimodal Interaction: Effortlessly process and analyze a wide array of file types—including PDFs, images, videos, audio files, text documents, and code snippets—unlocking new dimensions of AI-assisted understanding.
Interactive Chat: Engage in dynamic, context-aware conversations with Gemini, enabling real-time dialogue that adapts to your needs.
Smart File Handling: Seamlessly upload and process files from local paths or URLs, with automatic temporary storage management to keep your workspace clutter-free.
Command Support: Utilize intuitive commands to control the toolkit's functionality, enhancing efficiency and user experience.
Customizable Parameters: Tailor your AI interactions by enabling structured JSON output for automated processing, using streaming responses for faster interactions, and adjusting temperature, token limits, and safety thresholds and more to suit your needs
Lightweight Design: Enjoy a streamlined experience with minimal dependencies—primarily leveraging the requests package—making setup and deployment a breeze.

🛠 Installation

Clone the repository:

git clone https://github.com/RMNCLDYO/gemini-ai-toolkit.git

Navigate to the repository folder:
```
cd gemini-ai-toolkit
```
Install the required dependencies:
```
pip install -r requirements.txt
```

🔑 Configuration

Obtain an API key from Google AI Studio.
You have three options for managing your API key:
Click here to view the API key configuration options
- Setting it as an environment variable on your device (recommended for everyday use)
  - Navigate to your terminal.
  - Add your API key like so:
    export GEMINI_API_KEY=your_api_key
  This method allows the API key to be loaded automatically when using the wrapper or CLI.
- Using an .env file (recommended for development):
  - Install python-dotenv if you haven't already: pip install python-dotenv.
  - Create a .env file in the project's root directory or rename example.env in the root folder to .env and replace your_api_key_here with your API key.
  - Add your API key to the .env file like so:
    GEMINI_API_KEY=your_api_key
  This method allows the API key to be loaded automatically when using the wrapper or CLI, assuming you have python-dotenv installed and set up correctly.
- Direct Input:
  - If you prefer not to use a .env file, you can directly pass your API key as an argument to the CLI or the wrapper functions.
    
    CLI
    --api_key "your_api_key"
    Wrapper
    api_key="your_api_key"
  This method requires manually inputting your API key each time you initiate an API call, ensuring flexibility for different deployment environments.

💻 Usage

Multimodal Mode

For processing multiple input types including audio, video, text, images, code and a wide range of files. This mode allows you to upload files (from local paths or URLs), chat with the AI about the content, and maintain a knowledge base throughout the conversation.

CLI

python cli.py --multimodal --prompt "Analyze both of these files and provide a summary of each, one by one. Don't overlook any details." --files file1.jpg https://example.com/file2.pdf

Wrapper

from gemini import Multimodal

Multimodal().run(prompt="Analyze both of these files and provide a summary of each, one by one. Don't overlook any details.", files=["file1.jpg", "https://example.com/file2.pdf"])

Chat Mode

For interactive conversations with the AI model.

CLI

python cli.py --chat

Wrapper

from gemini import Chat

Chat().run()

Text Mode

For generating text based on a prompt or a set of instructions.

CLI

python cli.py --text --prompt "Write a story about a magic backpack."

Wrapper

from gemini import Text

Text().run(prompt="Write a story about a magic backpack.")

🔧 Special Commands

During interaction with the toolkit, you can use the following special commands:

/exit or /quit: End the conversation and exit the program.
/clear: Clear the conversation history (useful for saving API credits).
/upload: Upload a file for multimodal processing.
- Usage: /upload file_path_and_or_url [optional prompt]
- Example: /upload file1.jpg https://example.com/file2.pdf Analyze the files and provide a summary of each

⚙️ Advanced Configuration

Description	CLI Flags	CLI Usage	Wrapper Usage
Chat mode	`-c`, `--chat`	`--chat`	See mode usage above.
Text mode	`-t`, `--text`	`--text`	See mode usage above.
Multimodal mode	`-m`, `--multimodal`	`--multimodal`	See mode usage above.
User prompt	`-p`, `--prompt`	`--prompt "Your prompt here"`	`prompt="Your prompt here"`
File inputs	`-f`, `--files`	`--files file1.jpg https://example.com/file2.pdf`	`files=["file1.jpg", "https://example.com/file2.pdf"]`
Enable streaming	`-s`, `--stream`	`--stream`	`stream=True`
Enable JSON output	`-js`, `--json`	`--json`	`json=True`
API Key	`-ak`, `--api_key`	`--api_key "your_api_key"`	`api_key="your_api_key"`
Model name	`-md`, `--model`	`--model "gemini-1.5-flash-8b"`	`model="gemini-1.5-flash-8b"`
System prompt	`-sp`, `--system_prompt`	`--system_prompt "Set custom instructions"`	`system_prompt="Set custom instructions"`
Max tokens	`-mt`, `--max_tokens`	`--max_tokens 1024`	`max_tokens=1024`
Temperature	`-tm`, `--temperature`	`--temperature 0.7`	`temperature=0.7`
Top-p	`-tp`, `--top_p`	`--top_p 0.9`	`top_p=0.9`
Top-k	`-tk`, `--top_k`	`--top_k 40`	`top_k=40`
Candidate count	`-cc`, `--candidate_count`	`--candidate_count 1`	`candidate_count=1`
Stop sequences	`-ss`, `--stop_sequences`	`--stop_sequences ["\n", "."]`	`stop_sequences=["\n", "."]`
Safety categories	`-sc`, `--safety_categories`	`--safety_categories ["HARM_CATEGORY_HARASSMENT"]`	`safety_categories=["HARM_CATEGORY_HARASSMENT"]`
Safety thresholds	`-st`, `--safety_thresholds`	`--safety_thresholds ["BLOCK_NONE"]`	`safety_thresholds=["BLOCK_NONE"]`

📊 Supported Models

Base Models

Model	Inputs	Context Length
`gemini-1.5-pro-002` (stable)	Text, images, audio, video	8192
`gemini-1.5-pro`	Text, images, audio, video	8192
`gemini-1.5-flash-002` (stable)	Text, images, audio, video	8192
`gemini-1.5-flash`	Text, images, audio, video	8192
`gemini-1.5-flash-8b`	Text, images, audio, video	8192
`gemini-1.0-pro`	Text	2048

Note

*On October 3rd, Google released a new Gemini Flash model, gemini-1.5-flash-8b which is now available for production usage. On September 24th, Google released two new stable Gemini models, gemini-1.5-pro-002 and gemini-1.5-flash-002. The gemini-1.5-pro and gemini-1.5-flash base models will default to use the -002 versions automatically on October 8, 2024.

Experimental Models

Model	Inputs	Context Length
`gemini-1.5-pro-exp-0827`	Text, images, audio, video	8192
`gemini-1.5-flash-exp-0827`	Text, images, audio, video	8192
`gemini-1.5-flash-8b-exp-0827`	Text, images, audio, video	8192

Note

The availability of specific models may be subject to change. Always refer to Google's official documentation for the most up-to-date information on model availability and capabilities. See base models docs here and experimental model docs here.

🔒 Error Handling and Safety

The Gemini AI Toolkit now includes robust error handling to help you diagnose and resolve issues quickly. Here are some common error codes and their solutions:

HTTP Code	Status	Description	Solution
400	INVALID_ARGUMENT	Malformed request body	Check API reference for correct format and supported versions
400	FAILED_PRECONDITION	API not available in your country	Enable billing on your project in Google AI Studio
403	PERMISSION_DENIED	API key lacks permissions	Verify API key and access rights
404	NOT_FOUND	Resource not found	Check if all parameters are valid for your API version
429	RESOURCE_EXHAUSTED	Rate limit exceeded	Ensure you're within model rate limits or request a quota increase
500	INTERNAL	Unexpected error on Google's side	Retry after a short wait; report persistent issues
503	UNAVAILABLE	Service temporarily overloaded/down	Retry after a short wait; report persistent issues

For rate limit errors (429), the toolkit will automatically pause for 15 seconds before retrying the request.

📁 Supported File Types

The Gemini AI Toolkit supports a wide range of file types for multimodal processing. Here are the supported file extensions:

Category	File Extensions
Images	`jpg`, `jpeg`, `png`, `webp`, `gif`, `heic`, `heif`
Videos	`mp4`, `mpeg`, `mpg`, `mov`, `avi`, `flv`, `webm`, `wmv`, `3gp`
Audio	`wav`, `mp3`, `aiff`, `aac`, `ogg`, `flac`
Text/Documents	`txt`, `html`, `css`, `js`, `ts`, `csv`, `md`, `py`, `json`, `xml`, `rtf`, `pdf`

Note

Google's Files API lets you store up to 20 GB of files per project, with a per-file maximum size of 2 GB. Files are stored for 48 hours.

💾 Caching and Cleanup

The Gemini AI Toolkit implements a caching mechanism for downloaded files to improve performance and reduce unnecessary network requests. Here's how it works:

When a file is downloaded from a URL, it's stored in a temporary cache folder (.gemini_ai_toolkit_cache).
The file will be used to process the request and will be stored locally due to Google's upload requirements.
The cache is automatically cleaned up at the end of each session to prevent accumulation of temporary files.

You don't need to manage this cache manually, but it's good to be aware of its existence, especially if you're processing large files or have limited storage space.

🤝 Contributing

Contributions are welcome!

Please refer to CONTRIBUTING.md for detailed guidelines on how to contribute to this project.

🐛 Issues and Support

Encountered a bug? We'd love to hear about it. Please follow these steps to report any issues:

Check if the issue has already been reported.
Use the Bug Report template to create a detailed report.
Submit the report here.

Your report will help us make the project better for everyone.

💡 Feature Requests

Got an idea for a new feature? Feel free to suggest it. Here's how:

Check if the feature has already been suggested or implemented.
Use the Feature Request template to create a detailed request.
Submit the request here.

Your suggestions for improvements are always welcome.

🔁 Versioning and Changelog

Stay up-to-date with the latest changes and improvements in each version:

CHANGELOG.md provides detailed descriptions of each release.

🔐 Security

Your security is important to us. If you discover a security vulnerability, please follow our responsible disclosure guidelines found in SECURITY.md. Please refrain from disclosing any vulnerabilities publicly until said vulnerability has been reported and addressed.

📄 License

Licensed under the MIT License. See LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 230 Commits
.github		.github
examples		examples
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
cli.py		cli.py
client.py		client.py
config.py		config.py
example.env		example.env
file_handler.py		file_handler.py
gemini.py		gemini.py
gemini_base.py		gemini_base.py
loading.py		loading.py
requirements.txt		requirements.txt
validators.py		validators.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🚀 Features

📋 Table of Contents

🛠 Installation

🔑 Configuration

💻 Usage

Multimodal Mode

Chat Mode

Text Mode

🔧 Special Commands

⚙️ Advanced Configuration

📊 Supported Models

Base Models

Experimental Models

🔒 Error Handling and Safety

📁 Supported File Types

💾 Caching and Cleanup

🤝 Contributing

🐛 Issues and Support

💡 Feature Requests

🔁 Versioning and Changelog

🔐 Security

📄 License

About

Releases 3

Packages

Languages

License

RMNCLDYO/gemini-ai-toolkit

Folders and files

Latest commit

History

Repository files navigation

🚀 Features

📋 Table of Contents

🛠 Installation

🔑 Configuration

💻 Usage

Multimodal Mode

Chat Mode

Text Mode

🔧 Special Commands

⚙️ Advanced Configuration

📊 Supported Models

Base Models

Experimental Models

🔒 Error Handling and Safety

📁 Supported File Types

💾 Caching and Cleanup

🤝 Contributing

🐛 Issues and Support

💡 Feature Requests

🔁 Versioning and Changelog

🔐 Security

📄 License

About

Topics

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases 3

Packages 0

Languages

Packages