Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store only secondary weight in diacritic table and remove jamo tailoring bit #1978

Merged
merged 15 commits into from
Jun 6, 2022

Conversation

hsivonen
Copy link
Member

@hsivonen hsivonen commented Jun 2, 2022

This both simplifies the case that the table is designed for and makes it possible to have non-self-contained CE32 diacritic tailorings.

hsivonen added 13 commits May 30, 2022 15:30
…without ignoring default ignorables

Default ignorables are not ignored, because doing so would violate the fundamental assumption
of the normalizes that every input character produces non-empty output.

The expectation is that real NFKC_CaseFold will be implemented by first filtering out default ignorables
and then plugging the NFKD_CaseFold data into the upcoming `ComposingNormalizer` code that will turn
NFD into NFC and NFKD into NFKC.
Saves 7332 bytes in data size.
…and allow dynamic further shortening in tailorings

This makes the action of turning a value read from the table into a `CollationElement` super-simple (and branchless).
@hsivonen hsivonen added C-collator Component: Collation, normalization S-medium Size: Less than a week (larger bug fix or enhancement) labels Jun 2, 2022
@hsivonen hsivonen self-assigned this Jun 2, 2022
@hsivonen hsivonen requested a review from echeran June 2, 2022 11:39
@hsivonen hsivonen changed the title Store only secondary weight in diacritic table Store only secondary weight in diacritic table and remove jamo tailoring bit Jun 2, 2022
@hsivonen
Copy link
Member Author

hsivonen commented Jun 2, 2022

(I marked this as a draft only because the PR also contains the changesets for #1967. Once that lands, the Files changed view here becomes more useful.)

@hsivonen hsivonen marked this pull request as ready for review June 3, 2022 05:27
@hsivonen hsivonen requested review from a team, sffc, robertbastian and Manishearth as code owners June 3, 2022 05:27
@hsivonen
Copy link
Member Author

hsivonen commented Jun 3, 2022

The macos-latest failure is a networking failure when CI tries to install nightly Rust.

@robertbastian
Copy link
Member

Rerun seems to work

Copy link
Contributor

@echeran echeran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Although the part of the PR title about jamo tailoring doesn't seem affected by the PR code.

@hsivonen hsivonen merged commit 02b9a3f into unicode-org:main Jun 6, 2022
@hsivonen
Copy link
Member Author

hsivonen commented Jun 6, 2022

Thanks. The jamo tailoring stuff was e.g. the removal of the bit getter.

@hsivonen hsivonen deleted the diacritics branch June 6, 2022 16:57
yzhang1994 pushed a commit to yzhang1994/icu4x that referenced this pull request Jun 8, 2022
…ring bit (unicode-org#1978)

* Store only secondary weight in the diacritic table, make it shorter, and allow dynamic further shortening in tailorings

This makes the action of turning a value read from the table into a `CollationElement` super-simple (and branchless).

* Remove unused jamo tailoring bit from metadata

* Fix clippy lints
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-collator Component: Collation, normalization S-medium Size: Less than a week (larger bug fix or enhancement)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants