-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Further optimize storage of icu locale data #45643
Conversation
Tagging subscribers to this area: @tarekgh, @safern, @krwq Issue DetailsShrink disk usage from 35kB to 16kB; skip .cctor initialization by switching to
|
0x4a << 16 | 2926 << 4 | 2, // te | ||
0x4b << 16 | 1824 << 4 | 2, // kn | ||
0x4c << 16 | 2061 << 4 | 2, // ml | ||
0x4d << 16 | 183 << 4 | 2, // as | ||
0x4e << 16 | 2110 << 4 | 2, // mr | ||
0x4f << 16 | 2540 << 4 | 2, // sa | ||
0x50 << 16 | 2066 << 4 | 2, // mn | ||
0x51 << 16 | 279 << 4 | 2, // bo | ||
0x52 << 16 | 392 << 4 | 2, // cy | ||
0x53 << 16 | 1819 << 4 | 2, // km | ||
0x54 << 16 | 1966 << 4 | 2, // lo | ||
0x55 << 16 | 2141 << 4 | 2, // my | ||
0x56 << 16 | 1511 << 4 | 2, // gl | ||
0x57 << 16 | 1839 << 4 | 3, // kok | ||
0x58 << 16 | 2098 << 4 | 3, // mni | ||
0x59 << 16 | 2563 << 4 | 2, // sd | ||
0x5a << 16 | 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change to deal with the sort order prefix; they are grouped into sections (same order as before); however the sort is used for the lookup e.g. LCID lookup of de-de
vs de-de_phoneb
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LCIDToLocaleName
method; GitHub doesn't seem to be highlighting the correct lines 😢
Might be a bug :) |
src/libraries/System.Globalization/tests/CultureInfo/CultureInfoAll.cs
Outdated
Show resolved
Hide resolved
cc @lewing |
Use ReadOnlySpan<byte> to avoid initalizing a 1.7kb ushort[] in IcuLocaleData.cctor
From 5184 bytes to 2592 bytes (string to ReadOnlySpan<byte>)
Shrink from 30kb to 13.5kb and use ReadOnlySpan<byte> to avoid initalizing a large ushort[] in IcuLocaleData.cctor
Console.WriteLine("};"); | ||
} | ||
*/ | ||
|
||
private static ReadOnlySpan<byte> CultureNames => new byte[] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add comment where to find the generator
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If #45699 is some form is merged; could change it to an actual source generator from input files? 🤔
@@ -3814,8 +3781,8 @@ internal static int GetLocaleDataNumericPart(string cultureName, IcuLocaleDataPa | |||
return null; | |||
} | |||
|
|||
Debug.Assert(CulturesCount == (c_threeLetterWindowsLanguageName.Length / 3)); | |||
return c_threeLetterWindowsLanguageName.Substring(index * 3, 3); | |||
Debug.Assert(CulturesCount == (ThreeLetterWindowsLanguageName.Length / 3)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is redundant, compiler does the check in array initializer
|
||
// Format of the data is LCDI | index to CultureNames | culture name length | ||
private static readonly int[] s_lcidToCultureNameIndices = new int[] | ||
private static ReadOnlySpan<byte> NameIndexToNumericData => new byte[CulturesCount * NumericLocaleDataBytesPerRow] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be better to have more than just 1 huge table but grouped the data into logical sections so illinker can remove unused data
// every 3-characters entry is matching locale name entry in CultureNames | ||
private static ReadOnlySpan<byte> ThreeLetterWindowsLanguageName => new byte[CulturesCount * 3] | ||
{ | ||
(byte)'Z', (byte)'Z', (byte)'Z', // aa |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There seems to be a lot of duplicates it might be worth trying to index them as well
@benaadams is there any changes you need to do here? or @marek-safar comments can be addressed in other PR and we can merge this one? |
Is good from me; cna do it in follow up |
ok, merging this one and any further comments can be addressed in follow up PR's. |
Shrink disk usage from 35kB to 16kB; skip .cctor initialization by switching to
ReadOnlySpan<byte>
rather thanushort[]
andint[]
string
toReadOnlySpan<byte>
)NameIndexToNumericData
30kb to 13.5kb; by not using anint
for every column and only storing the needed bytes.Previous startup use:
No .cctor allocations in new path
Contributes to #45129
/cc @marek-safar @eerhardt