Skip to content

Japan7/yohane

Repository files navigation

yohane

Takes a song and its lyrics, extracts the vocals, splits the syllables and computes a forced alignment to generate a karaoke in an Aegisub subtitles file (.ass).

Getting Started

Notebook

Open the notebook in Google Colab to use their offered GPU resources:

Open In Colab

The full pipeline will be completed in less than a minute in their environment.

Local environment

With uv

Requirements:

uvx --from git+https://github.com/Japan7/yohane.git[cli] --python 3.11 yohane --help

With pixi

Requirement: pixi

git clone https://github.com/Japan7/yohane.git
cd yohane/
pixi run yohane --help

Caveats

  • Yohane's syllable splitting is only optimized for Japanese lyrics at the moment
  • Syllables at the end of lines are often shortened
  • Forced alignment can't deal with overlapping vocals
  • It is not fully accurate, you should still check and edit the result!

Recommended workflow

  1. Get the song and its lyrics
  2. Use the yohane notebook or the CLI locally to generate the karaoke file

In Aegisub:

  1. Load the .ass and the video
  2. Replace the Default style with your own
  3. Due to the normalization during the process, lines are lowercased and special characters have been removed: use the original lines in comments to fix the timed lines
  4. Subtitle > Select Lines… > check Comments and Set selection > OK and delete the selected lines
  5. Listen to each line and fix their End time
  6. Add a 1s karaoke lead-in to every line
  7. Iterate over each line in karaoke mode and merge/fix syllable timings

Sample

KAF, ZOOKARADERU - PV - Himitsu no Kotoba: Video, Output

References