Skip to content
Josef Fruehwald edited this page Nov 25, 2013 · 2 revisions

FAVE-align is built upon the Penn Phonetics Lab Forced Aligner, or p2fa. Its acoustic models were trained on 25 hours of hand aligned US Supreme Court oral arguments. For more information, see:

Yuan, Jiahong., & Liberman, Mark. (2008). Speaker identification on the SCOTUS corpus. In Proceedings of Acoustics, '08.

In using FAVE-align, there are two key steps.

  1. Checking the transcription for out-of-dictionary words, and providing necessary transcriptions.
  2. Aligning the transcription to the audio.

The Transcription

You must provide a transcription to FAVE-align for it to align to the audio. There is a very strict format which must be adhered to in order for FAVE-align to interpret the transcription. There must be 5, tab-delimited columns:

  1. A brief code for the speaker
  2. The speaker's full name or pseudonym
  3. An onset time, in seconds
  4. An offset time, in seconds
  5. A transcription of the speech between the onset and offset times

It will look something like this:

JF	Josef Fruehwald	4.151	4.586	Yeah?
JF	Josef Fruehwald	13.594	14.279	(( ))
JF	Josef Fruehwald	20.604	21.601	Yeah you do. You do.
JF	Josef Fruehwald	22.283	23.607	Are -- are you working or --
JF	Josef Fruehwald	24.494	25.614	Okay. Okay.
JF	Josef Fruehwald	26.276	26.93	What's
JF	Josef Fruehwald	28.093	28.524	here?
JF	Josef Fruehwald	31.933	34.136	{LG} Well -- well no, it's just
JF	Josef Fruehwald	34.196	38.706	Um I -- I -- I -- I have some questions, but -- but really just -- I'm really --
JF	Josef Fruehwald	39.476	41.759	{LG} Well that's good.
JF	Josef Fruehwald	42.028	42.57	That's good.
JF	Josef Fruehwald	42.791	44.041	Well what -- how -- how old are you?

This transcription can be automatically exported from ELAN. We strongly recommend referring to the FAVE introduction to ELAN, and the FAVE transcription guidelines at the beginning of a research project.

Checking for out-of-dictionary words

FAVE-align looks up words from the orthographic transcription you provide in the CMU pronouncing dictionary. cmudict is rather large, but it will inevitably not contain many entries for words which appear in even moderately long recordings. You'll have to provide these out-of-dictionary words yourself.

Move the transcription file to the FAVE-align directory, and run FAAValign.py with the dictionary check option -c [filename] where [filename] is where FAAValign.py will write out-of-dictionary words.

python FAAValign.py -v -c unknown.txt speaker1.txt

FAAValign.py has now created a file called unknownBM.txt which is a tab-delimited file of unknown words. We recommend opening this file in a spreadsheet program. It contains four columns.

  1. The unknown or truncated word
  2. A guess at the phonemic transcription
  3. A "clue word"
  4. The text 
of 
the 
annotation
 unit 
containing 
the 
unknown 
or 
truncated 
word


A "clue word" is a word beginning with a + which has been inserted by the transcriber after a truncated word if the transcriber is reasonably sure that this is the word the speaker intended. It is meant as a clue to the person doing forced alignment, and will not itself be included in the alignment. For example, you could transcribe someone paused while saying "huge", then continued their thought, this way.

They had this hu- huge plot of land here

It would be good idea to include a clue word here, though, producing this equivalent (for the purpose of alignment) transcription.

They had this hu- +huge huge plot of land here.

Providing transcriptions

There are a few different kinds of entries in the unknown words spreadsheet.

dict_check

The entry is a truncated word (Example 1)

  • If the suggestion transcription is correct, nothing needs to be done, and you can delete the line.
  • If the suggestion transcription is not correct, or there is no suggestion transcription, then enter the correct transcription in Arpabet into the second column. You can enter several transcription alternatives separated by commas. Make sure to indicate syllable stress on each vowel.

The entry is due to a spelling mistake (Example 2)

  • If the entry is due to a spelling mistake, we recommend fixing the spelling mistake in the original transcription, regenerating the tab-delimited file, and re-running the dictionary check on the corrected version.

The entry is simply an unknown word (Example 3)

  • Enter the correct transcription in Arpabet into the second column. You can enter several transcription alternatives separated by commas. Make sure to indicate syllable stress on each vowel.

Now, delete everything except the first and second columns, and save the file. We recommend saving it under a new name, like input.txt.

We also recommend re-running FAAValign.py one more time in dictionary check mode, this time importing the transcriptions you provided, just to double check that you've caught all out-of-dictionary words.

python FAAValign.py -v -i input.txt -c still_unknown.txt speaker1.txt

This time, all that should be included in still_unknown.txt should be truncated words. The entries in input.txt will be written to the file ``added_dict_entries.txt`.

Aligning

Having checked for out-of-dictionary words, you can now run FAAValign.py in aligning mode.

python FAAValign.py -v -i input.txt speaker1.wav speaker1.txt

If your transcription file and your sound file have the same name, as they do in this example, you don't need to pass the transcription file to FAAValign.py explicitly.

Output

FAAValign.py will produce

  • a Praat TextGrid file
  • a .FAAVlog file
  • an .errorlog file, if there were any errors in alignment