Skip to content
This repository has been archived by the owner on Apr 27, 2021. It is now read-only.

Latest commit

 

History

History
23 lines (14 loc) · 785 Bytes

README.md

File metadata and controls

23 lines (14 loc) · 785 Bytes

lshk-word-list-crawler

Crawler for Cantonese pronunciation data on LSHK Jyutping Word List (香港語言學學會粵拼詞表)

See sanitized.txt for the final result.

File structure

  • lshk.py: The crawler
  • result.txt: Raw result output by the crawler
  • sanitize.py: Sanitizer for the result
  • sanitized.txt: Final result output by the sanitizer
  • sanitize_log.txt: Sanitize log

License

According to the original terms, the dictionary data is distributed under CC BY 4.0.

Python code in this repository is distributed under MIT license.

Disclaimer

The link of the word list is now broken. If you are interested in a more up-to-date word list, see rime/rime-cantonese.