-
Notifications
You must be signed in to change notification settings - Fork 37
Using FAVE align
FAVE-align is built upon the Penn Phonetics Lab Forced Aligner, or p2fa. Its acoustic models were trained on 25 hours of hand aligned US Supreme Court oral arguments. For more information, see:
Yuan, Jiahong., & Liberman, Mark. (2008). Speaker identification on the SCOTUS corpus. In Proceedings of Acoustics, '08.
In using FAVE-align, there are two key steps.
- Checking the transcription for out-of-dictionary words, and providing necessary transcriptions.
- Aligning the transcription to the audio.
You must provide a transcription to FAVE-align for it to align to the audio. There is a very strict format which must be adhered to in order for FAVE-align to interpret the transcription. There must be 5, tab-delimited columns:
- A brief code for the speaker
- The speaker's full name or pseudonym
- An onset time, in seconds
- An offset time, in seconds
- A transcription of the speech between the onset and offset times
It will look something like this:
JF Josef Fruehwald 4.151 4.586 Yeah?
JF Josef Fruehwald 13.594 14.279 (( ))
JF Josef Fruehwald 20.604 21.601 Yeah you do. You do.
JF Josef Fruehwald 22.283 23.607 Are -- are you working or --
JF Josef Fruehwald 24.494 25.614 Okay. Okay.
JF Josef Fruehwald 26.276 26.93 What's
JF Josef Fruehwald 28.093 28.524 here?
JF Josef Fruehwald 31.933 34.136 {LG} Well -- well no, it's just
JF Josef Fruehwald 34.196 38.706 Um I -- I -- I -- I have some questions, but -- but really just -- I'm really --
JF Josef Fruehwald 39.476 41.759 {LG} Well that's good.
JF Josef Fruehwald 42.028 42.57 That's good.
JF Josef Fruehwald 42.791 44.041 Well what -- how -- how old are you?
This transcription can be automatically exported from ELAN. We strongly recommend referring to the FAVE introduction to ELAN, and the FAVE transcription guidelines at the beginning of a research project.
FAVE-align looks up words from the orthographic transcription you provide in the CMU pronouncing dictionary. cmudict is rather large, but it will inevitably not contain many entries for words which appear in even moderately long recordings. You'll have to provide these out-of-dictionary words yourself.
Move the transcription file to the FAVE-align directory, and run FAAValign.py
with the dictionary check option -c [filename]
where [filename]
is where FAAValign.py
will write out-of-dictionary words.
python FAAValign.py -v -c unknown.txt speaker1.txt
FAAValign.py
has now created a file called unknownBM.txt
which is a tab-delimited file of unknown words.
We recommend opening this file in a spreadsheet program.
It contains four columns.
- The unknown or truncated word
- A guess at the phonemic transcription
- A "clue word"
- The text of the annotation unit containing the unknown or truncated word
A "clue word" is a word beginning with a +
which has been inserted by the transcriber after a truncated word if the transcriber is reasonably sure that this is the word the speaker intended. It is meant as a clue to the person doing forced alignment, and will not itself be included in the alignment. For example, you could transcribe someone paused while saying "huge", then continued their thought, this way.
They had this hu- huge plot of land here
It would be good idea to include a clue word here, though, producing this equivalent (for the purpose of alignment) transcription.
They had this hu- +huge huge plot of land here.
There are a few different kinds of entries in the unknown words spreadsheet.
- If the suggestion transcription is correct, nothing needs to be done, and you can delete the line.
- If the suggestion transcription is not correct, or there is no suggestion transcription, then enter the correct transcription in Arpabet into the second column. You can enter several transcription alternatives separated by commas. Make sure to indicate syllable stress on each vowel.
- If the entry is due to a spelling mistake, we recommend fixing the spelling mistake in the original transcription, regenerating the tab-delimited file, and re-running the dictionary check on the corrected version.
- Enter the correct transcription in Arpabet into the second column. You can enter several transcription alternatives separated by commas. Make sure to indicate syllable stress on each vowel.
Now, delete everything except the first and second columns, and save the file.
We recommend saving it under a new name, like input.txt
.
We also recommend re-running FAAValign.py
one more time in dictionary check mode, this time importing the transcriptions you provided, just to double check that you've caught all out-of-dictionary words.
python FAAValign.py -v -i input.txt -c still_unknown.txt speaker1.txt
This time, all that should be included in still_unknown.txt
should be truncated words.
The entries in input.txt
will be written to the file ``added_dict_entries.txt`.
Having checked for out-of-dictionary words, you can now run FAAValign.py
in aligning mode.
python FAAValign.py -v -i input.txt speaker1.wav speaker1.txt
If your transcription file and your sound file have the same name, as they do in this example, you don't need to pass the transcription file to FAAValign.py
explicitly.
FAAValign.py
will produce
- a Praat TextGrid file
- a
.FAAVlog
file - an
.errorlog
file, if there were any errors in alignment