Logo image from: https://ideogram.ai/g/O8Jujvj3T5C4wftcDOgS8A/3
- Create Gemini-AI API key access as variable and input
- Create Multipage (1) Main apps; (2) Documentation how to use it; (3) About
- (1) Create Main apps [in progress]
- Create tabs for input images
- Make it repetitive and can generate story based on new image input and text-stories that already generated;
- Input:
- Images input
- Writing style: e.g Fantasy, Romance, Sci-fi etc (If it first initialization)
- Story Theme: e.g Good vs Evil, beauty, loyalty, friendship, etc (If it first initialization)
- Image Type: e.g Character, backstory, moments, etc
- Total paragraph result: 1 to 3 paragraph
- Additional message (optional)
- Create tabs for stories
- Create tabs for chat with stories
- Create tabs for input images
- (2) Create documentation
- How to use it (in quick tutorial)
- Graph how it works
- (3) Create About
- App description
- Teams
- Contact Dev (if there's bug)
- (1) Create Main apps [in progress]
- Chat With Story
- Upload document to continue story
-
Hackathon Inspiration: This app was born from the challenges presented at the Google AI Hackathon. The aim was to create a cutting-edge app using Google's Generative AI tools, specifically focusing on pushing the boundaries of what Gen AI apps can do with Gemini.
-
Long-Held Passion: Since my teenage years, I've had a deep love for art, especially visual art. Although I'm not currently skilled in drawing, I've spent 2-3 years honing my basic art skills. While I can imagine stories, I prefer expressing them visually. With advancements in technology, like AI that can generate images from text prompts (text-to-image), I can now bring my creative narratives to life.
-
Embracing New Tech: The rise of advanced technologies such as multimodal models like Gemini-AI opens up endless possibilities. It enables me to turn any creative concept into reality. The idea is to use multimodal models to create stories inspired by images, removing the need for a custom model for image-to-story creation.
MI2S (Multimodal Image2Stories) is an innovative application designed to transform images into captivating narratives. This cutting-edge tool utilizes multimodal technology, combining visual and textual elements to generate short stories or even full-length novels based on input images. By leveraging Gemini-AI, MI2S analyzes the content, context, and emotions conveyed in the image to craft immersive and engaging storytelling experiences. Whether you're seeking to create compelling short stories or embark on novel-writing adventures, MI2S opens up endless possibilities for creative expression through the fusion of visual and literary arts.
- App: https://multimodal-image2story.streamlit.app/
- Video: https://youtu.be/DGMBAHWCLaA?si=_1W-oMRDJp1qfk27
- Devpost Project: https://devpost.com/software/project-cw5gkdms0it3
- Python
- Streamlit (python framework and deployment)
- Gemini AI (access via API)
- String to doc converter library like IO, docx, odt, pdf.
- seiweiqing - image2story (Github Repo)
- Let Storytelling Tell Vivid Stories: An Expressive and Fluent Multimodal Storyteller (Journal)
- LLaVA: Large Language and Vision Assistant (Journal & Github Repo)
- Photo story apps (Website apps)
My team:
- Muhammad Rizqi: me, develop it alone. I take responsible why my code is dirty.
- Muhammad Azka Nabhan Sauqi: he help me make a video about this app
Damn, i do this all alone. I hope it's works as expected. It's easy to finish project if i handle it myself. I code directly in Github because my laptop suck. That's why you'll find lot of commit in this repo.