Skip to content

Commit

Permalink
fix(crg): fix various rule feeding and ordering bugs for Michif
Browse files Browse the repository at this point in the history
Both Michif -> IPA mappings had issues due to incorrect rule ordering
which mean that some words, e.g. "booñ" or "not", would not get g2p'd
correctly.

as-written is still required because apply-longest-first would break the
"g" rule, which does indeed need to be applied first for both DV and
TMD. But a rule whose input is a superstring of another rule must come
first, and that was not respected everywhere.

Note, in TMD I know "n,,in|an|en|un|on|ɑːn|æn|ɛn|eːn|ɪn|iːn|oːn|uːn" is
super ugly. I was trying to express context_before is VOWEL + "n", but
I don't know how to do it. None of these work: [VOWEL]n (VOWEL)n VOWELn
  • Loading branch information
joanise authored and dhdaines committed Sep 25, 2023
1 parent 6625e8a commit fa27730
Show file tree
Hide file tree
Showing 4 changed files with 27 additions and 16 deletions.
16 changes: 8 additions & 8 deletions g2p/mappings/langs/crg/crg-dv-to-crg-ipa.csv
Original file line number Diff line number Diff line change
@@ -1,27 +1,27 @@
g,ŋ,ñ
aeñ,ɛ̃ː
ay,aj
aa,ɑː
ae,æ
ee,eː
oñ,ɔ̃ː
ooñ,ɔ̃ː
oñ,ɔ̃ː
hp,ʰp
ht,ʰt
hk,ʰk
sh,ʃ
zh,ʒ
ch,tʃ
hch,ʰtʃ
ch,tʃ
uu,uː
añ,ɑ̃ː
aañ,ɑ̃ː
añ,ɑ̃ː
iiñ,ĩː
ay,aj
aa,ɑː
ae,æ
ee,eː
oo,oː
ii,iː
i,ɪ
o,o
u,ʊ
y,j
j,dʒ
e,ɛ
e,ɛ
12 changes: 6 additions & 6 deletions g2p/mappings/langs/crg/crg-tmd-to-crg-ipa.csv
Original file line number Diff line number Diff line change
@@ -1,6 +1,9 @@
g,ŋ,n
nn,n,VOWEL
n,,in|an|en|un|on|ɑːn|æn|ɛn|eːn|ɪn|iːn|oːn|uːn
aen,ɛ̃ː
awn,ɑ̃ː
een,ĩː
oun,ɔ̃
oow,oaw
ow,aw
uy,aj
Expand All @@ -10,18 +13,15 @@ ae,æ
ee,iː
ay,eː
oo,uː
awn,ɑ̃ː
in,ĩ
een,ĩː
oun,ɔ̃
hp,ʰp
ht,ʰt
hk,ʰk
sh,ʃ
zh,ʒ
ch,tʃ
hch,ʰtʃ
ch,tʃ
e,ɛ
i,ɪ
j,dʒ
y,j
y,j
4 changes: 3 additions & 1 deletion g2p/mappings/langs/generated/crg-ipa_to_eng-ipa.json
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,9 @@
{"in": "ɑ̃ː", "out": "ɑ̃", "context_before": "", "context_after": ""},
{"in": "", "out": "", "context_before": "", "context_after": ""},
{"in": "ĩː", "out": "", "context_before": "", "context_after": ""},
{"in": "ɔ̃ː", "out": "ɔ̃", "context_before": "", "context_after": ""},
{"in": "ɔ̃", "out": "ɔ̃", "context_before": "", "context_after": ""},
{"in": "o", "out": "ɔ", "context_before": "", "context_after": ""},
{"in": "ʰp", "out": "p", "context_before": "", "context_after": ""},
{"in": "ʰt", "out": "t", "context_before": "", "context_after": ""},
{"in": "ʰk", "out": "k", "context_before": "", "context_after": ""},
Expand Down Expand Up @@ -53,4 +55,4 @@
{"in": "ɪ", "out": "ɪ", "context_before": "", "context_after": ""},
{"in": "", "out": "", "context_before": "", "context_after": ""},
{"in": "j", "out": "j", "context_before": "", "context_after": ""}
]
]
11 changes: 10 additions & 1 deletion g2p/tests/public/data/crg.psv
Original file line number Diff line number Diff line change
@@ -1,7 +1,16 @@
crg-tmd|crg-ipa|kishchaymikawshoow|kɪʃtʃeːmɪkɑːʃoaw
crg-tmd|crg-ipa|smenn|smɛn
crg-tmd|crg-ipa|dayistaen|deːɪstɛ̃ː
crg-tmd|crg-ipa|baenn|bɛ̃ː
crg-dv|crg-ipa|kishcheemikaashoaw|kɪʃtʃeːmɪkɑːʃoaw
crg-dv|crg-ipa|deeistaeñ|deːɪstɛ̃ː
crg-dv|crg-ipa|lañg|lɑ̃ːŋ
crg-dv|crg-ipa|eede|eːdɛ
crg-dv|crg-ipa|eede|eːdɛ
crg-dv|crg-ipa|Booñ|bɔ̃ː
crg-dv|crg-ipa|Not|not
crg-dv|crg-ipa|mooñd|mɔ̃ːd
crg-dv|crg-ipa|maañzhii|mɑ̃ːʒiː
crg-dv|crg-ipa|Aeñ|ɛ̃ː
crg-dv|eng-arpabet|Booñ|B AO N
crg-dv|eng-arpabet|Not|N AO T
crg-dv|eng-arpabet|maañzhii|M AA N ZH IY

0 comments on commit fa27730

Please sign in to comment.