-
Notifications
You must be signed in to change notification settings - Fork 182
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Store only secondary weight in diacritic table and remove jamo tailoring bit #1978
Conversation
…without ignoring default ignorables Default ignorables are not ignored, because doing so would violate the fundamental assumption of the normalizes that every input character produces non-empty output. The expectation is that real NFKC_CaseFold will be implemented by first filtering out default ignorables and then plugging the NFKD_CaseFold data into the upcoming `ComposingNormalizer` code that will turn NFD into NFC and NFKD into NFKC.
Saves 7332 bytes in data size.
…and allow dynamic further shortening in tailorings This makes the action of turning a value read from the table into a `CollationElement` super-simple (and branchless).
(I marked this as a draft only because the PR also contains the changesets for #1967. Once that lands, the Files changed view here becomes more useful.) |
The |
Rerun seems to work |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Although the part of the PR title about jamo tailoring doesn't seem affected by the PR code.
Thanks. The jamo tailoring stuff was e.g. the removal of the bit getter. |
…ring bit (unicode-org#1978) * Store only secondary weight in the diacritic table, make it shorter, and allow dynamic further shortening in tailorings This makes the action of turning a value read from the table into a `CollationElement` super-simple (and branchless). * Remove unused jamo tailoring bit from metadata * Fix clippy lints
This both simplifies the case that the table is designed for and makes it possible to have non-self-contained CE32 diacritic tailorings.