-
-
Notifications
You must be signed in to change notification settings - Fork 734
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ICU-22368 Reduce ~200K langInfo.res size by encode LSR into 32bits int. #2458
Conversation
d8327b6
to
80d4070
Compare
Notice: the branch changed across the force-push!
~ Your Friendly Jira-GitHub PR Checker Bot |
This change reduce the size of icudt73l.dat from 32206240 to 31999424 and reduced 206816 bytes |
80d4070
to
0f3fb19
Compare
Notice: the branch changed across the force-push!
~ Your Friendly Jira-GitHub PR Checker Bot |
Please review. thanks |
@pedberg-icu @markusicu could you take a look at this or should I ask someone else? |
...to-icu/src/main/java/org/unicode/icu/tool/cldrtoicu/localedistance/LocaleDistanceMapper.java
Outdated
Show resolved
Hide resolved
// Do not have enough bits to store the all 1000 possible combination of \d{3} | ||
// Only support what is needed in resource- "001", "143" and "419". | ||
if (region.length() == 3) { | ||
assert region.equals("001") || region.equals("143") || region.equals("419"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my comments about this approach in LocaleDistanceMapper.java. It is too fragile and inflexible. I think LocaleDistanceMapper.java should write an array of supported numeric region codes that this method can use for encoding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well do. Changing the code now. Stay tune.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DONE PTAL
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does not seem like this code got updated to use the m49 array?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
opps. you are right. need to change it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs a better way of handling numeric region codes, see comments (does not even support all currently used codes)
0f3fb19
to
c010bc9
Compare
Notice: the branch changed across the force-push!
~ Your Friendly Jira-GitHub PR Checker Bot |
Peter - I modified the PR to output the M49 codes into the resources from the CLDR to ICU tool and use that during the decoding now. |
...to-icu/src/main/java/org/unicode/icu/tool/cldrtoicu/localedistance/LocaleDistanceMapper.java
Show resolved
Hide resolved
...to-icu/src/main/java/org/unicode/icu/tool/cldrtoicu/localedistance/LocaleDistanceMapper.java
Show resolved
Hide resolved
Oh, I like this much better, thanks! But it looks like LSR.java is still not using the new M49 array... |
It is not intend to support ALL used codes in everywhere, but only for those encoded in the value of the likelySubtag.xml |
yes, sorry, I missed that. Will fix |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good now, thanks for this!
2df23e4
to
4578999
Compare
Hooray! The files in the branch are the same across the force-push. 😃 ~ Your Friendly Jira-GitHub PR Checker Bot |
Checklist