-
-
Notifications
You must be signed in to change notification settings - Fork 274
Chuẩn hóa text tiếng Việt
Vu Anh edited this page Jun 14, 2022
·
11 revisions
Sử dụng chuẩn hóa NFC
Diacritics (accent marks) normalization
-
hoà
->hòa
-
uý
->úy
- UPPER CASE
- lower case
- Title Case
- i ngắn, y dài:
đột quị
->đột quỵ
- https://nguyenvanhieu.vn/xu-ly-tieng-viet-trong-python/
- https://gist.github.com/enamoria/e11edd8ec32863e2d83652f120c450c6
- https://vi.wikipedia.org/wiki/Quy_t%E1%BA%AFc_%C4%91%E1%BA%B7t_d%E1%BA%A5u_thanh_trong_ch%E1%BB%AF_qu%E1%BB%91c_ng%E1%BB%AF
- https://rabiloo.com/vi/blog/chuan-hoa-tieng-viet-trong-xu-ly-ngon-ngu-tu-nhien
- https://github.com/langmaninternet/VietnameseTextNormalizer
- https://unicode-table.com/en/