This README is under construction
Newly updated syllable parser. See the old scripts here.
This set of scripts defines several functions (the foremost of which is syllabify()
) that together can syllabify a phonetic transcription using principles taught to students in Phonology 1.
The following functionality has not yet been incorporated into the new version.
Some of the provided functions (namely,
transcribe()
) interface with a local copy of the Carnegie Mellon University (CMU) Pronouncing Dictionary, which provides phonetic transcriptions (using ARPABET sequences) for over 134,000 words in the English lexicon. Scripts are also provided that check the dictionary for updates, download the most recent version from the dictionary's website, and format it, including converting ARPABET codes to International Phonetic Alphabet (IPA) characters.
The parser basically uses a strategy taught to undergraduate students of Phonology when they are learning how to determine the syllabification of a word:
- Identify syllable nuclei by looking for the vowels. If there are two adjacent vowels, check if those vowels together are a known diphthong in the language. If they are, consider them part of the same syllable nucleus. Standalone vowels each count as the nucleus of their own syllable.
- Identify syllable onsets (this occurs before step 3 in accordance with the Onset Principle). Automatically parse consonants preceding the first vowel into the first syllable. Parse a consonant immediately preceding any of the other vowels into the syllable of that vowel. If any remaining unparsed consonants are not after the last vowel in the word and have a consonant following them, check if that consonant and the following are a legal onset in the language (defined as a consonant cluster that can occur at the beginning of a word, as long as that word is not a borrowing). If they're a legal onset, parse that consonant into the syllable of the consonant following it. If they're not a legal onset, leave that consonant unparsed.
- Identify syllable codas by parsing any remaining unparsed consonants into the syllables preceding them.
- R programming language environment
- The following R packages:
- Optional packages:
- Clone this repository into a directory of your choosing:
git clone https://github.com/jakewvincent/R-syllable-parser.git
- Open an R terminal and set your working directory to the directory where you cloned this repository.
- Source
master.R
by runningsource(file = "master.R")
in your R terminal.
- Download this repository as a zip file and unzip it into a directory of your choosing.
- Open an R terminal and set your working directory to the directory where you unzipped the zipped repository file.
- Source
master.R
by runningsource(file = "master.R")
in your R terminal.
After sourcing 0_master.R
as above, all of the functions defined by these scripts are available for use. Some of these functions (namely, cvify()
and sonority()
) are mainly used internally. Here is a description of each function:
- Re-incorporate CMU pronouncing dictionary
-
random_word()
function -
transcribe()
function
-
- Make outputs when
verbose = TRUE
consistent and predictable - Recognize syllabic consonants (e.g. words that end in [ɪzm] should have the [m] assigned to a nucleus)