GitHub - Vinny0712/AiSL: ASL to text caption for sign language Tik Tok Creaters

AiSL

Sign Language Accessibility for TikTok Creators and Audience

2024 TikTok TechJam

View Demo · Report Bug · Request Feature

👋🏻 Introducing `AiSL`

Introducing AiSL, an AI-powered tool that turns videos with sign language into inclusive and exciting videos with auto-generated captions, auto-generated voice-over, and auto-generated emoji captions.

AiSL allows TikTok Deaf Creators to create and understand accessible and inclusive content with sign language easily with AiSL's AI-powered generation.

🚀 Demo

Here is a quick demo of the app. We hope you enjoy it.

Liked it? Please give a ⭐️ to AiSL.

🔥 Features

AiSL comes with 3 key AI features:

Feature 1: Sign-Language-to-Text 📑

Sign-Language-to-Text converts sign language to text captions as sign language appears in the video.

Feature 2: Sign-Language-to-Speech 🔊

Sign-Language-to-Speech converts sign language to a voiceover that plays over the video as the sign language appears in the video.

Feature 3: Sign-Language-to-Emoji 👋🏻

Sign-Language-to-Emoji converts sign language to emoji text captions as the sign language appears in the video.

AI Architecture

User's original video (in .mp4) is passed as input to the MediaPipe Gesture Recognizer model that we have fine tuned, and the model outputs the captions with the appropriate time stamps.
The output captions with time stamps are processed algorithmically before passing to the next processing stage.
The output captions are passed as inputs to the Text-to-Speech model (Google Translate text-to-speech API). An audio file is outputted here.
At the same time, the output captions are turned into embeddings using sentence-transformers/all-MiniLM-L6-v2 before conducting RAG retrieval from the vector store containing documents of emojis and descriptions. The retrieved documents (context) are passed as prompt together with the original generated captions to the Gemini Pro model for text to emoji translation.
The generated captions, generated audio file, and generated emoji captions are processed together with the original video to generate an edited video using python cv2 package.

💪🏻 Try Yourself

Get a copy of this repository by opening up your terminal and run:

git clone https://github.com/Vinny0712/AiSL.git

Frontend Setup Instructions

Install dependencies

In the frontend/ directory, run

yarn

Set up Environment Variables

Create a .env file in the frontend/ directory with all the environment variables listed in the .env.example.

# .env file with all your environment variables

NEXT_PUBLIC_PRODUCTION_SERVER_URL=

Start up the application

yarn dev

And you are ready to start using the frontend! The web application is running on http://localhost:3000/.

Backend Setup Instructions

In the backend/ directory, create a python virtual environment and activate it.

python -m venv .venv
. .venv\Scripts\activate # The .venv activation command might differ depending on your operating system

Install the required packages.

pip install -r requirements.txt

Set up Environment Variables

Create a .env file in the backend/ directory with all the environment variables listed in the .env.example.

# .env file with all your environment variables

HUGGINGFACE_TOKEN=
GOOGLE_API_KEY=
PRODUCTION_CLIENT_URL=

In the /app directory, start the application.

cd app
uvicorn main:app --reload

And you are ready to start using the Backend! The server application is running on http://127.0.0.1:8000/

Script for quick startup:

cd backend
. .venv/Scripts/activate
cd app
uvicorn main:app --reload

Congratulations, you have successfully created your own copy of AiSL.

🏗️ Tech Stack

Frontend

Next.js (Deployed on Vercel)

Backend

FastAPI (Deployed on Google Cloud Run)

Video Editing

CV2 python package.

AI Models

Sign Language to Text
- Model: MediaPipe Gesture Recognizer (Finetune)
- Finetune Dataset: WLASL Video (https://www.kaggle.com/datasets/risangbaskoro/wlasl-processed)
Text to Speech
- Model: Google Translate text-to-speech API
Text to Emoji
- Vectorstore (RAG) with emoji.csv as datasource
- Embeddings: sentence-transformers/all-MiniLM-L6-v2
- LLM Model: gemini-pro

Datasets

Finetune Dataset: WLASL Video (https://www.kaggle.com/datasets/risangbaskoro/wlasl-processed)
RAG Dataset: Emoji.csv (~500 records of emoji with description generated from OpenAI ChatGPT)

APIs used

HuggingFace API
- Embeddings: sentence-transformers/all-MiniLM-L6-v2
Google API
- LLM Model: gemini-pro
- Text to Speech: Google Translate text-to-speech API

✨ Contributors

💡 Contributing

Have an idea or improvement to make? Create an issue and make a pull request!

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
backend		backend
frontend		frontend
AiSL_model_training.ipynb		AiSL_model_training.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AiSL

Sign Language Accessibility for TikTok Creators and Audience

2024 TikTok TechJam

👋🏻 Introducing `AiSL`

🚀 Demo

🔥 Features

Feature 1: Sign-Language-to-Text 📑

Feature 2: Sign-Language-to-Speech 🔊

Feature 3: Sign-Language-to-Emoji 👋🏻

AI Architecture

💪🏻 Try Yourself

Frontend Setup Instructions

Backend Setup Instructions

🏗️ Tech Stack

✨ Contributors

💡 Contributing

About

Releases

Packages

Contributors 2

Languages

Vinny0712/AiSL

Folders and files

Latest commit

History

Repository files navigation

AiSL

Sign Language Accessibility for TikTok Creators and Audience 2024 TikTok TechJam

👋🏻 Introducing AiSL

🚀 Demo

🔥 Features

Feature 1: Sign-Language-to-Text 📑

Feature 2: Sign-Language-to-Speech 🔊

Feature 3: Sign-Language-to-Emoji 👋🏻

AI Architecture

💪🏻 Try Yourself

Frontend Setup Instructions

Backend Setup Instructions

🏗️ Tech Stack

✨ Contributors

💡 Contributing

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Sign Language Accessibility for TikTok Creators and Audience

2024 TikTok TechJam

👋🏻 Introducing `AiSL`

Packages