Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[browser][non-icu] Revert HybridGlobalization normalization. #87007

Merged
merged 4 commits into from
Jun 5, 2023

Conversation

ilonatommy
Copy link
Member

@ilonatommy ilonatommy commented Jun 1, 2023

Solving #86041 in hybrid version is too costly. If we wanted to apply all the rules from https://www.unicode.org/Public/idna/13.0.0/IdnaMappingTable.txt we would use more space than we are saving from removing normalization filter from ICU (~59kB).

IDN working branch: https://github.com/ilonatommy/runtime/tree/idn-mapping.
The implementation is based on enhanced invariant IDN mapping. The problem is that it does not throw when the input is incorrect. In Unicode13 we have ~6k test cases. For function GetAscii() the code produces ~3.3k cases of false truth when the output is correct and ~729 cases of false truth when the output is incorrect. To fix it, IDN mapping table is needed. Loading mapping table separately and maintaining them with changing Unicode version does not give us size reduction.

From this reason, this PR is reverting normalization changes. Normalization and IDN mapping use the same chunk of ICU data.
Partially reverts the changes from #85510 (refactoring changes stay unreverted).

@ilonatommy ilonatommy self-assigned this Jun 1, 2023
@ghost
Copy link

ghost commented Jun 1, 2023

Tagging subscribers to 'arch-wasm': @lewing
See info in area-owners.md if you want to be subscribed.

Issue Details

Solving #86041 in hybrid version is too costly. If we wanted to apply all the rules from https://www.unicode.org/Public/idna/13.0.0/IdnaMappingTable.txt we would use more space than saving from removing normalization filter from ICU gives us (~59kB).

IDN working branch: https://github.com/ilonatommy/runtime/tree/idn-mapping.
The implementation is based on enhanced invariant IDN mapping. The problem is that it does not throw when the input is incorrect. In Unicode13 we have ~6k test cases. For function GetAscii() the code produces ~3.3k cases of false truth when the result is correct and ~729 cases of false truth when the result is incorrect. To fix it, IDN mapping table is needed. Loading mapping table separately and maintaining them with changing Unicode version does not give us size reduction.

From this reason, this PR is reverting normalization changes. Normalization and IDN mapping uses the same chunk of ICU data.
Partially reverts the changes from #85510 (refactoring changes stay unreverted).

Author: ilonatommy
Assignees: ilonatommy
Labels:

arch-wasm, area-System.Globalization

Milestone: -

@ilonatommy
Copy link
Member Author

Failures are not connected.

src/mono/sample/wasm/browser-bench/String.cs Outdated Show resolved Hide resolved
@ilonatommy ilonatommy merged commit 081e558 into dotnet:main Jun 5, 2023
@ghost ghost locked as resolved and limited conversation to collaborators Jul 5, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants