Convert WebVTT to JSON, optionally removing duplicate lines
Install this tool using pip
:
pip install webvtt-to-json
To output JSON for a WebVTT file:
webvtt-to-json subtitles.vtt
This will output to standard output. Use -o filename
to send it to a specified file.
Subtitles can often include duplicate lines. Add -d
or --dedupe
to attempt to remove those duplicates from the output:
webvtt-to-json --dedupe subtitles.vtt
Use -s
or --single
to output single "line"
keys instead of a "lines"
array.
You can also use:
python -m webvtt_to_json ...
Standard output:
[
{
"start": "00:00:00.000",
"end": "00:00:01.829",
"lines": [
" ",
"my<00:00:00.160><c> career</c><00:00:00.480><c> in</c><00:00:00.640><c> side</c><00:00:00.880><c> projects</c><00:00:01.280><c> and</c><00:00:01.520><c> open</c>"
]
}
]
--dedupe
output:
[
{
"start": "00:00:01.829",
"end": "00:00:01.839",
"lines": ["my career in side projects and open"]
}
]
--dedupe --single
output:
[
{
"start": "00:00:01.829",
"end": "00:00:01.839",
"line": "my career in side projects and open"
}
]
To contribute to this tool, first checkout the code. Then create a new virtual environment:
cd webvtt-to-json
python -m venv venv
source venv/bin/activate
Now install the dependencies and test dependencies:
pip install -e '.[test]'
To run the tests:
pytest