VisionVault is an innovative project that transforms the way we capture and document moments. By integrating an ESP32-CAM mounted on goggles, this system allows users to stream a live video feed and capture images effortlessly with a keystroke. The captured images are processed through the Gemmini Vision API to generate detailed descriptions, creating a virtual memory bank.
- Live Video Feed: Stream visuals from an ESP32-CAM in real-time.
- Instant Image Capture: Press "c" to save an image from the feed.
- Image Description: Generate meaningful descriptions via the Gemmini Vision API.
- Hands-Free Operation: Capture moments without using a phone or camera.
Check out the demo video to see VisionVault in action:
Click the image above or this link to watch the demo video.
- Personal memory archival
- Remote surveillance
- Assistive technology for visually impaired individuals
- Hardware Setup: Mount the ESP32-CAM on goggles or a suitable frame and connect it to a network.
- Live Feed Streaming: A Python script streams the live feed from the ESP32-CAM.
- Image Capture: Press "c" to capture an image from the live feed.
- Description Generation: The captured image is sent to the Gemmini Vision API, which returns a detailed description.
-
Hardware:
- ESP32-CAM module
- Goggles or a mountable frame
-
Software:
- Arduino IDE (for ESP32-CAM setup)
- Python 3.7+
- Required libraries:
opencv-python
requests
flask
-
API: Access to the Gemmini Vision API
- Clone the repository:
git clone https://github.com/your-username/VisionVault.git cd VisionVault
- Install dependencies:
pip install -r requirements.txt
- Add your Gemmini Vision API key to the
.env
file:GENAI_API_KEY="your-gemmini-api-key"
- Run the Python script:
python file_name.py
- View the live feed on the displayed window.
- Press "c" to capture an image and generate its description.
- Voice Commands: Integrate speech-to-text functionality for hands-free control.
- Mobile App: Develop a companion app for easier accessibility.
- API Expansion: Support additional APIs for varied use cases.
Contributions are welcome! Here's how you can contribute:
- Fork the repository
- Create a feature branch:
git checkout -b feature-name
- Commit your changes:
git commit -m "Description of feature"
- Push to the branch:
git push origin feature-name
- Submit a pull request
- The Gemmini Vision API for image description generation
- The OpenCV community for their robust computer vision tools
- The Python community for supporting open-source projects