This web application allows users to transcribe and translate their voice recordings or uploaded MP3 files. The app is built with React and styled using Tailwind CSS. It leverages machine learning models to perform transcription and translation, specifically using OpenAI's Whisper model for transcription and Xenova's NLLB-200-Distilled-600M model for translation.
- Voice Recording Transcription: Record your voice and get the transcription in real-time.
- MP3 Upload Transcription: Upload MP3 files and receive a transcription of the audio.
- Translation: Translate the transcribed text into various languages.
- English Transcription: Currently supports only English transcription with plans to add more languages in the future.
- Frontend: React, Tailwind CSS
- Transcription Model: OpenAI Whisper
- Translation Model: Xenova NLLB-200-Distilled-600M
https://voicescroll.vercel.app/
- Node.js (v14 or later)
- npm (v6 or later)
- Clone the repository:
git clone https://github.com/jericho909/voicescroll.git
- Install the dependencies:
npm install
To start the development server, run:
npm run dev
The app will be available at http://localhost:3000
.
To create a production build, run:
npm run build
The production-ready files will be in the build
directory.
- Add support for transcribing additional languages (transcription only works in English for now).
- Improve user interface and user experience.
- Implement user authentication and save transcriptions for logged-in users.
Contributions are welcome! Please open an issue or submit a pull request for any improvements or bug fixes.
This project is licensed under the MIT License. See the LICENSE file for details.
- OpenAI Whisper for the transcription model.
- Xenova NLLB-200-Distilled-600M for the translation model.