Crawler for Cantonese pronunciation data on LSHK Jyutping Word List (香港語言學學會粵拼詞表)
See sanitized.txt
for the final result.
lshk.py
: The crawlerresult.txt
: Raw result output by the crawlersanitize.py
: Sanitizer for the resultsanitized.txt
: Final result output by the sanitizersanitize_log.txt
: Sanitize log
According to the original terms, the dictionary data is distributed under CC BY 4.0.
Python code in this repository is distributed under MIT license.
The link of the word list is now broken. If you are interested in a more up-to-date word list, see rime/rime-cantonese.