Better captioning #8

veltman · 2016-07-21T21:49:08Z

Have a mostly-working branch that allows for entering and positioning multiple captions, but the manual entry/interface is a real drag, especially for a long video. Worth exploring some improvements.

Forced aligners?

Using a forced aligner like Gentle to take a bulk transcript and automatically time it to the audio would help - then you could type in the whole thing (or paste from a transcript) and it could automatically break it into chunks.

Pros: Much faster if you have a full transcript already (paste the whole thing rather than pasting line-by-line and tweaking the timing).
Cons: Not much faster if you don't have a transcript. A lot more code complexity (all the OSS aligners seem to be Python). Would probably still need to tweak the captions into sensible breaks (e.g. avoid orphan words).

Auto transcribe

Use some sort of speech-to-text to take a first pass at transcribing the audio. In-browser options include PocketSphinx and the Web Speech API in certain browsers. Server-side options include normal Sphinx or the Watson API.

Pros: Great when it works.
Cons: Doesn't always work, especially for non-English languages or clips with music, background noise, etc. Still doesn't work out timing. If it's server-side, would require a second round-trip before the form submission. Could take a long time for long pieces of audio.

Parse timestamped transcripts?

Could allow people to upload an SRT or some other timecoded transcript format in the editor. The parsing wouldn't be that hard, but it's unclear how often audio orgs use these.

veltman · 2016-07-30T20:05:36Z

Looks like the Web Speech API doesn't provide any way to connect it to a non-mic source, but PocketSphinx does (with some fiddling).

kookster · 2016-08-01T15:48:07Z

you could also use other APIs like speechmatics (https://speechmatics.com/), or https://cloud.google.com/speech/ ?

veltman · 2016-08-01T15:51:31Z

Yup, true - though I'm a little reluctant to rely on an external API rather than something that can be bundled (ditto Watson).

pietrop · 2016-11-30T19:00:28Z

Hey @veltman,
Gentle could be modified to generate a transcription when the text is not available. This already works in the REST API, see the curl example if you don't pass the text file it returns a transcription. but it doesn't work in the python terminal command. The code would need to be modified accordingly, which is something I am looking into.

I also played a round with pocket sphinx, packaging it as a node module https://github.com/OpenNewsLabs/offline_speech_to_text.
I extracted it from video grep electron app.

iankevinmcdonald · 2016-12-19T09:45:13Z

Considering that the effective maximum on social media is 30s, I think that expecting users to supply a transcript is absolutely fine.

It doesn't scale to generating complete videos from long-form shows, but I think that's acceptable - it's still a big benefit for most uses.

I'm a one-person band working on my own community/radio niche narrative history series, and I've used SRT, using a free online manual transcriber (called, originally enough, "Transcriber"). Though I'm about as unrepresentative as you can possibly get.

pietrop · 2017-01-10T16:57:36Z

For the srt option I've wrote an srt parse composer that is also on npm.

Can be used to parse the srt into a word accurate json (original code to make it word accurate is from popcorn js srt parsing module parser also on github) with that is possible to make a "hyper transcript" where the user can make word accurate selections. I've done something similar in quickQuote (now refactored in node and in autoEdit) inspired by the hyperaudio project.

pettarin · 2017-03-06T10:44:23Z

Shameless plug, I hope you find it informative.

I maintain a Python/C forced aligner called aeneas ( http://www.readbeyond.it/aeneas/ and https://github.com/readbeyond/aeneas/ ). Its approach is not based on speech recognition (like Gentle and basically all other forced aligners out there), but on an older technique known as Dynamic Time Warping. It works decently well (and much faster) if you align text at sentence/phrase level, but it is worse at word-level. Its real time factor (ratio between processing time and real audio length) is between 0.005 and 0.02, depending on the parameters and machine CPU, since all the computational parts are written in C.

(In theory, one can port the core of aeneas to C, and from there to JS, via emscripten. It is a huge task, but it would enable decently fast alignment in JS land. Unfortunately, I have not had time/resources to do it.)

BTW, I maintain a list of forced aligners here: https://github.com/pettarin/forced-alignment-tools

pietrop · 2018-09-28T13:32:59Z

In case anyone is still looking into this turns out that @martymcguire had done a write up where he describe how he modified the BBC News Labs fork of Audiogram to work with Gentle Speech To Text Forced Aligner output, see his repo here.

veltman added the enhancement label Jul 21, 2016

veltman changed the title ~~Support SRT files for closed captioning~~ Better closed captioning Jul 29, 2016

veltman mentioned this issue Jul 29, 2016

Allow for a highlight color in captions #11

Open

veltman changed the title ~~Better closed captioning~~ Better captioning Aug 1, 2016

veltman mentioned this issue Aug 1, 2016

v1.0 #23

Closed

veltman added this to the v1.0 milestone Aug 1, 2016

veltman mentioned this issue Aug 17, 2016

Captions as subtitles? #41

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better captioning #8

Better captioning #8

veltman commented Jul 21, 2016 •

edited

Loading

veltman commented Jul 30, 2016

kookster commented Aug 1, 2016

veltman commented Aug 1, 2016

pietrop commented Nov 30, 2016

iankevinmcdonald commented Dec 19, 2016

pietrop commented Jan 10, 2017

pettarin commented Mar 6, 2017

pietrop commented Sep 28, 2018

Better captioning #8

Better captioning #8

Comments

veltman commented Jul 21, 2016 • edited Loading

veltman commented Jul 30, 2016

kookster commented Aug 1, 2016

veltman commented Aug 1, 2016

pietrop commented Nov 30, 2016

iankevinmcdonald commented Dec 19, 2016

pietrop commented Jan 10, 2017

pettarin commented Mar 6, 2017

pietrop commented Sep 28, 2018

veltman commented Jul 21, 2016 •

edited

Loading