-
Notifications
You must be signed in to change notification settings - Fork 183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Complete the set of DateTimeFormat options #272
Comments
I changed you to assignee, @gregtatum :) Here are some of my thoughts on skeletons: Predefined Common SkeletonsI would like to see fastpaths for the most common skeletons. Here is a good starting point from the Google Closure i18n library: https://github.com/google/closure-library/blob/master/closure/goog/i18n/datetimepatterns.js
I would also add month-day (date with no year/era), as I've seen that as a common feature request. What do I mean by fastpath? I think we can precompile these patterns as separate data provider keys. For example, "dates/fulldatepattern@1" or maybe "datepatterns/fulldate@1" will return the specified pattern directly, without having to resolve skeletons at runtime. I think this is important because:
* DTPG refers to DateTimePatternGenerator, the monolith of code in ICU (and soon ICU4X) that resolves datetime skeletons to datetime patterns ECMA-402 Style SkeletonsICU4C/J use strings to represent skeletons. ECMA-402 uses option bags instead. I think the ECMA-402 approach is superior, especially for Rust, in large part because we can do more logic to figure out what data we might need. We can't resolve skeletons to patterns at compile time since they are locale-dependent, but we can at least tell what symbols* the skeleton might require. If the skeleton only requests date fields, then we don't need to include time symbols, for example. We might want to supplement the ECMA-402 bag-based skeleton with a proc-macro that compiles the string-based skeleton to a bag. * See #257 for an explanation of symbols data. Spec ComplianceThe UTS 35 spec for skeleton resolution is currently in a little bit of flux. The biggest outstanding issue is CLDR-13627, but there are others you can find on Jira. Do as much at compile time as possibleThe ICU code for skeleton resolution (DTPG) is a bit of a mess, and it would be nice if we can clean it up. Explore ways to do as much logic at build time (in CldrJsonDataProvider) as possible. Make the data provider give you the most useful form of data possible, such that the code we ship in actual DateTimeFormat should be as simple and clean as possible. |
@sffc - can you help me understand what you mean by "fastpaths for common skeletons"? Are you suggesting to have DataProvider fastpath for Option Bags that match certain skeletons? Do you also suggest that we don't provide |
That's one way of doing it. What I more had in mind though would be a third method of instantiating a DateTimeFormat: datetime style (current), arbitrary components (bag of fields), and predefined components (one selection out of an enum with 10-15 choices). However, we could still fastpath arbitrary skeletons into predefined skeletons if they match.
That is what I am putting on the table for further discussion. |
This comment is for my notes as I look into the prior art. I plan on editing it with my research. DateTimePatternGeneratorPattern vs SkeletonPer DateTimePatternGenerator::staticGetSkeleton
ECMA 402var date = new Date(Date.UTC(2012, 11, 20, 3, 0, 0, 200));
var options = { weekday: 'long', year: 'numeric', month: 'long', day: 'numeric' };
console.log(new Intl.DateTimeFormat('de-DE', options).format(date)); Components table:Table 6: Components of date and time formats
CLDRHere is the dateTimeFormat information in the CLDR. It is broken into multiple sections. Already implemented in ICU4X is the "Style" format. e.g. for Date:
Then for DateTime, it references the Time and Date formats, where they are
The
The work here, as I'm understanding it, is to generate a Rust representation of the skeleton, e.g. "Bh" or "Bhm", and then find the best skeleton, and return the pattern, e.g. ""h B" or "h:mm B". TODO - Figure out how this differs from the "Component" model. Field Symbol TableUTS 35 - Matching skeletonshttps://unicode.org/reports/tr35/tr35-dates.html#Matching_Skeletons |
So here is an update on where I am with this: I've done a lot of the research on the prior art, and understanding the terminology. I'm currently working on some local prototypes of different pieces of the architecture. I've done a bit of prototyping around the serialization of the skeletons for use with the data providers. I'm not quite happy with what I have locally, so I'm going to explore the relationship between the components::Bag fields, skeleton representations, and pattern representations. Today I'm going to start prototyping some of the skeleton matching algorithm, as I think the serialization could be driven by the needs of this algorithm. I don't really have concrete work that's worth showing yet, but I'm hopeful to comment back on some of the API design discussion. I'd also like to make sure that #451 lands before writing any real code, but that's not blocking some early prototypes. |
One thing I am discovering is that the I also wrote a script that collects every skeleton available in the CLDR, and how many patterns are available in the locale for it. https://gist.github.com/gregtatum/1d76bbdb87132f71a969a10f0c1d2d9c#file-2-output-js |
I think this issue needs better scoping, and a break out of separate issues. I added the discuss to add it to the meeting agenda. If we don't have time to discuss this week, I'll add my thoughts here. |
The C-API for ICU4C provides a list of common skeletons. I think this was interesting enough to document for future work: https://unicode-org.github.io/icu-docs/apidoc/dev/icu4c/udat_8h.html e.g.
|
The text was updated successfully, but these errors were encountered: