Multimodal image2story app (MI2S / MITS)

Logo image from: https://ideogram.ai/g/O8Jujvj3T5C4wftcDOgS8A/3

To-Do List

Additional Feature that i would add later

Chat With Story
Upload document to continue story

Motivation Behind Creating this Application

Hackathon Inspiration: This app was born from the challenges presented at the Google AI Hackathon. The aim was to create a cutting-edge app using Google's Generative AI tools, specifically focusing on pushing the boundaries of what Gen AI apps can do with Gemini.
Long-Held Passion: Since my teenage years, I've had a deep love for art, especially visual art. Although I'm not currently skilled in drawing, I've spent 2-3 years honing my basic art skills. While I can imagine stories, I prefer expressing them visually. With advancements in technology, like AI that can generate images from text prompts (text-to-image), I can now bring my creative narratives to life.
Embracing New Tech: The rise of advanced technologies such as multimodal models like Gemini-AI opens up endless possibilities. It enables me to turn any creative concept into reality. The idea is to use multimodal models to create stories inspired by images, removing the need for a custom model for image-to-story creation.

App Description

MI2S (Multimodal Image2Stories) is an innovative application designed to transform images into captivating narratives. This cutting-edge tool utilizes multimodal technology, combining visual and textual elements to generate short stories or even full-length novels based on input images. By leveraging Gemini-AI, MI2S analyzes the content, context, and emotions conveyed in the image to craft immersive and engaging storytelling experiences. Whether you're seeking to create compelling short stories or embark on novel-writing adventures, MI2S opens up endless possibilities for creative expression through the fusion of visual and literary arts.

Technologies

Python
Streamlit (python framework and deployment)
Gemini AI (access via API)
String to doc converter library like IO, docx, odt, pdf.

Journals/Reposities/App Reference & Inspiration

seiweiqing - image2story (Github Repo)
Let Storytelling Tell Vivid Stories: An Expressive and Fluent Multimodal Storyteller (Journal)
LLaVA: Large Language and Vision Assistant (Journal & Github Repo)
Photo story apps (Website apps)

My team:

Muhammad Rizqi: me, develop it alone. I take responsible why my code is dirty.
Muhammad Azka Nabhan Sauqi: he help me make a video about this app

Damn, i do this all alone. I hope it's works as expected. It's easy to finish project if i handle it myself. I code directly in Github because my laptop suck. That's why you'll find lot of commit in this repo.

Name		Name	Last commit message	Last commit date
Latest commit History 225 Commits
images		images
pages		pages
1_Main.py		1_Main.py
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multimodal image2story app (MI2S / MITS)

To-Do List

Additional Feature that i would add later

Motivation Behind Creating this Application

App Description

Related Link

Technologies

Journals/Reposities/App Reference & Inspiration

About

Releases

Packages

Languages

License

Kingki19/Multimodal-image2story-app

Folders and files

Latest commit

History

Repository files navigation

Multimodal image2story app (MI2S / MITS)

To-Do List

Additional Feature that i would add later

Motivation Behind Creating this Application

App Description

Related Link

Technologies

Journals/Reposities/App Reference & Inspiration

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages