Audio-cmn aims to provide hight quality & easy to use Chinese words audio recordings for modern web & mobile application. Audio-cmn is all
- an original work by the recording of chinese syllabs
- a curation work reusing pre-existing audios from SWAC Recorder
- a post-processing work by providing light & optimised
.mp3
files rather than the huge original.flac
files. These audios are thus suitable for mobile application developments.
# of items | Naming | Set's specifics | Authorship |
---|---|---|---|
1707 | cmn-zi4.mp3 |
syllabs v.2 | Chen Wang, CC-by-sa |
+8000 | cmn-名字.mp3 |
HSK_2000 list (words,zi) | Yue Tan, CC-by-sa |
Type of data :
- .../syllabs/cmn-{tonedPinyin}.mp3 : 1707 chinese syllabs (all)
- .../hsk/cmn-{hanzi}.mp3 : 5,596 HSK_2000 words and characters
- /96k/ - best audio quality, improvement from 64k is not perceptible.
- no syllabs folder
- /96k/hsk/
- /64k/ - optimal audio quality for voice recording.
- /64k/syllabs/
- /64k/hsk/
- 24k-abr - brutally optimized : ~2 times lighter, for 80% of the audio quality.
- /24k-abr/syllabs/
- /24k-abr/hsk/
- 18k-abr - brutally optimized : ~3 times lighter, for 60% of the audio quality.
- /18k-abr/syllabs/
- /18k-abr/hsk/
sudo apt install libav-tools
sudo apt install lame
curl -L -C - 'http://download.shtooka.net/cmn-caen-tan_flac.tar' -o ./cmn-caen-tan_flac.tar
unrar e -o- './cmn-caen-tan_flac.tar' # '*.flac' ./flac/
The current HSK audio database was build upon the official HSK 2000
, published in 2000. The HSK 2000 is thus near fully covered (at least 8596 out of ~8800). List comparison with the last HSK 2012
words list is available and done via :
bash ./hsk-missing-audios.bash HSK2012_all.txt # List missing audios, compared to input list of words
bash ./missing-audios.bash --help # Tiny manual
Current difference: 582 HSK2012 words which are missing human audios. See files in ./lists/
.
- Speakers -- see table upper
- Hugo Lopez, PLIDAM, INALCO -- Project management, repository, audio compression, file renaming
- Nicolas Vion -- recording software & technical support
- v.0.1.0: clean up data by deleting the cmn-*5.ext items since copies of cmn-*1.mp3
- v.0.1.1: add ./18k-abr (<40MB), an optimized version of ./64kb with understable sound quality
- v.0.1.2: improve README.md ; Add ./lists/ and script for comparison with the HSK2012.
- v.2.0.0: [BREAKING CHANGE] Merge back former /hskzi/ and /hsk/ back together. [Others]: fix for critical bug on some audios ; Add 24k and 96k ; share the conversion commands via compress-raw.bash
- CC-by-sa. See table upper for authors.