Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

incorrect pinyin: 卡 > qiǎ instead of 卡 > kǎ #25

Open
glowinthedark opened this issue Apr 9, 2020 · 2 comments
Open

incorrect pinyin: 卡 > qiǎ instead of 卡 > kǎ #25

glowinthedark opened this issue Apr 9, 2020 · 2 comments

Comments

@glowinthedark
Copy link

import pinyin

pinyin.get('这里不收信用卡。', delimiter=" ")
# Out[3]: 'zhè lǐ bù shōu xìn yòng qiǎ 。'

Expected output:

 'zhè lǐ bù shōu xìn yòng kǎ 。'
@casserlyprogramming
Copy link

That is because https://github.com/lxyu/pinyin/blob/master/pinyin/pinyin.py#L19-L20 only takes the first option and

5361 QIA3 KA3
has two. In all cases with multiple options what is the correct way to handle that? Should we return all possible sentences or are there rules in mandarin to dictate which pronunciation is correct? (My Mandarin is hsk3 at best so I don't know the answer to the last question).

@ri-aje
Copy link

ri-aje commented Jul 21, 2022

That is because https://github.com/lxyu/pinyin/blob/master/pinyin/pinyin.py#L19-L20 only takes the first option and

5361 QIA3 KA3

has two. In all cases with multiple options what is the correct way to handle that? Should we return all possible sentences or are there rules in mandarin to dictate which pronunciation is correct? (My Mandarin is hsk3 at best so I don't know the answer to the last question).

then essentially all the multi-sound chars are reduced to the first sound listed, which is not even necessarily the most popular sound, e.g., 5361 is more often pronounced as KA3 than it is QIA3. unless the code can do some smart semantics analysis to figure out the correct sound given the context, I would say let pinyin.get to return a list or generator, enumerating all possible pronunciations, at least this gives the caller a chance to decide which one to go. bonus it should also work for cases where there isn't enough context to determine what would be the correct sound, e.g., pinyin.get('卡') should return all possible sounds as there is no context to determine which sound wins out here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants