Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Complete the set of DateTimeFormat options #272

Open
1 of 3 tasks
zbraniecki opened this issue Sep 26, 2020 · 9 comments
Open
1 of 3 tasks

Complete the set of DateTimeFormat options #272

zbraniecki opened this issue Sep 26, 2020 · 9 comments
Assignees
Labels
A-data Area: Data coverage or quality A-design Area: Architecture or design C-datetime Component: datetime, calendars, time zones S-epic Size: Major project (create smaller child issues) T-core Type: Required functionality

Comments

@zbraniecki
Copy link
Member

zbraniecki commented Sep 26, 2020

@zbraniecki zbraniecki added T-core Type: Required functionality C-data-infra Component: provider, datagen, fallback, adapters A-design Area: Architecture or design C-datetime Component: datetime, calendars, time zones labels Sep 26, 2020
@zbraniecki zbraniecki added this to the ICU4X 0.2 milestone Oct 1, 2020
@sffc sffc self-assigned this Oct 1, 2020
@gregtatum
Copy link
Member

I'm interested in working on this after #107, although I see that @sffc is currently assigned to it. I would like to start with filling out this issue with links to prior art and discussions on the API design.

@sffc sffc assigned gregtatum and unassigned sffc Jan 7, 2021
@sffc sffc added the discuss Discuss at a future ICU4X-SC meeting label Jan 7, 2021
@sffc
Copy link
Member

sffc commented Jan 7, 2021

I changed you to assignee, @gregtatum :)

Here are some of my thoughts on skeletons:

Predefined Common Skeletons

I would like to see fastpaths for the most common skeletons. Here is a good starting point from the Google Closure i18n library:

https://github.com/google/closure-library/blob/master/closure/goog/i18n/datetimepatterns.js

  YEAR_FULL
  YEAR_FULL_WITH_ERA
  YEAR_MONTH_ABBR
  YEAR_MONTH_FULL
  YEAR_MONTH_SHORT
  MONTH_DAY_ABBR
  MONTH_DAY_FULL
  MONTH_DAY_SHORT
  MONTH_DAY_MEDIUM
  MONTH_DAY_YEAR_MEDIUM
  WEEKDAY_MONTH_DAY_MEDIUM
  WEEKDAY_MONTH_DAY_YEAR_MEDIUM
  DAY_ABBR
  MONTH_DAY_TIME_ZONE_SHORT

I would also add month-day (date with no year/era), as I've seen that as a common feature request.

What do I mean by fastpath? I think we can precompile these patterns as separate data provider keys. For example, "dates/fulldatepattern@1" or maybe "datepatterns/fulldate@1" will return the specified pattern directly, without having to resolve skeletons at runtime.

I think this is important because:

  1. We don't have to ship DTPG* code for these use cases
  2. Faster performance
  3. Easier to add new ICU4X clients

* DTPG refers to DateTimePatternGenerator, the monolith of code in ICU (and soon ICU4X) that resolves datetime skeletons to datetime patterns

ECMA-402 Style Skeletons

ICU4C/J use strings to represent skeletons. ECMA-402 uses option bags instead.

I think the ECMA-402 approach is superior, especially for Rust, in large part because we can do more logic to figure out what data we might need. We can't resolve skeletons to patterns at compile time since they are locale-dependent, but we can at least tell what symbols* the skeleton might require. If the skeleton only requests date fields, then we don't need to include time symbols, for example.

We might want to supplement the ECMA-402 bag-based skeleton with a proc-macro that compiles the string-based skeleton to a bag.

* See #257 for an explanation of symbols data.

Spec Compliance

The UTS 35 spec for skeleton resolution is currently in a little bit of flux. The biggest outstanding issue is CLDR-13627, but there are others you can find on Jira.

Do as much at compile time as possible

The ICU code for skeleton resolution (DTPG) is a bit of a mess, and it would be nice if we can clean it up. Explore ways to do as much logic at build time (in CldrJsonDataProvider) as possible. Make the data provider give you the most useful form of data possible, such that the code we ship in actual DateTimeFormat should be as simple and clean as possible.

@zbraniecki
Copy link
Member Author

@sffc - can you help me understand what you mean by "fastpaths for common skeletons"?

Are you suggesting to have DataProvider fastpath for Option Bags that match certain skeletons?

Do you also suggest that we don't provide options::Skeleton side by side with options::Bag but instead provide a proc macro that takes skeleton!() and produces options::Bag?

@sffc
Copy link
Member

sffc commented Jan 7, 2021

@sffc - can you help me understand what you mean by "fastpaths for common skeletons"?

Are you suggesting to have DataProvider fastpath for Option Bags that match certain skeletons?

That's one way of doing it. What I more had in mind though would be a third method of instantiating a DateTimeFormat: datetime style (current), arbitrary components (bag of fields), and predefined components (one selection out of an enum with 10-15 choices). However, we could still fastpath arbitrary skeletons into predefined skeletons if they match.

Do you also suggest that we don't provide options::Skeleton side by side with options::Bag but instead provide a proc macro that takes skeleton!() and produces options::Bag?

That is what I am putting on the table for further discussion.

@sffc sffc removed the discuss Discuss at a future ICU4X-SC meeting label Jan 7, 2021
@zbraniecki zbraniecki changed the title [datetime] Add support for skeletons [datetime] Add support for options bag Jan 8, 2021
@gregtatum
Copy link
Member

gregtatum commented Jan 13, 2021

This comment is for my notes as I look into the prior art. I plan on editing it with my research.

DateTimePatternGenerator

Pattern vs Skeleton

Per DateTimePatternGenerator::staticGetSkeleton

  • "MMM-dd" and "dd/MMM" are both considered patterns.
  • "MMMdd" is considered a skeleton representation of both.

ECMA 402

MDN DateTimeFormat options

var date = new Date(Date.UTC(2012, 11, 20, 3, 0, 0, 200));
var options = { weekday: 'long', year: 'numeric', month: 'long', day: 'numeric' };
console.log(new Intl.DateTimeFormat('de-DE', options).format(date));

Components table:Table 6: Components of date and time formats

Internal Slot Property Values
[[Weekday]] "weekday" "narrow", "short", "long"
[[Era]] "era" "narrow", "short", "long"
[[Year]] "year" "2-digit", "numeric"
[[Month]] "month" "2-digit", "numeric", "narrow", "short", "long"
[[Day]] "day" "2-digit", "numeric"
[[Hour]] "hour" "2-digit", "numeric"
[[Minute]] "minute" "2-digit", "numeric"
[[Second]] "second" "2-digit", "numeric"
[[TimeZoneName]] "timeZoneName" "short", "long"

Gecko implementation

CLDR

Here is the dateTimeFormat information in the CLDR. It is broken into multiple sections. Already implemented in ICU4X is the "Style" format.

e.g. for Date:

            "dateFormats": {
              "full": "EEEE, MMMM d, y",
              "long": "MMMM d, y",
              "medium": "MMM d, y",
              "short": "M/d/yy"
            },

Then for DateTime, it references the Time and Date formats, where they are {0} and {1} respectively in the pattern.

            "dateTimeFormats": {
              "full": "{1} 'at' {0}",
              "long": "{1} 'at' {0}",
              "medium": "{1}, {0}",
              "short": "{1}, {0}",
              "availableFormats": { ... }
              }

The "availableFormats" key then matches skeletons to patterns.

              {
                "Bh": "h B",
                "Bhm": "h:mm B",
                "Bhms": "h:mm:ss B",
                "d": "d",
                "E": "ccc",
                "EBhm": "E h:mm B",
                "EBhms": "E h:mm:ss B",
                ...
              }

The work here, as I'm understanding it, is to generate a Rust representation of the skeleton, e.g. "Bh" or "Bhm", and then find the best skeleton, and return the pattern, e.g. ""h B" or "h:mm B".

TODO - Figure out how this differs from the "Component" model.

Field Symbol Table

UTS 35 - Matching skeletons

https://unicode.org/reports/tr35/tr35-dates.html#Matching_Skeletons

@gregtatum
Copy link
Member

So here is an update on where I am with this:

I've done a lot of the research on the prior art, and understanding the terminology. I'm currently working on some local prototypes of different pieces of the architecture.

I've done a bit of prototyping around the serialization of the skeletons for use with the data providers. I'm not quite happy with what I have locally, so I'm going to explore the relationship between the components::Bag fields, skeleton representations, and pattern representations.

Today I'm going to start prototyping some of the skeleton matching algorithm, as I think the serialization could be driven by the needs of this algorithm. I don't really have concrete work that's worth showing yet, but I'm hopeful to comment back on some of the API design discussion.

I'd also like to make sure that #451 lands before writing any real code, but that's not blocking some early prototypes.

@gregtatum
Copy link
Member

gregtatum commented Jan 27, 2021

One thing I am discovering is that the components::Bag does not allow for every configuration of available date field symbols.

I also wrote a script that collects every skeleton available in the CLDR, and how many patterns are available in the locale for it.

https://gist.github.com/gregtatum/1d76bbdb87132f71a969a10f0c1d2d9c#file-2-output-js

@gregtatum gregtatum added the discuss Discuss at a future ICU4X-SC meeting label Jan 28, 2021
@gregtatum
Copy link
Member

I think this issue needs better scoping, and a break out of separate issues. I added the discuss to add it to the meeting agenda. If we don't have time to discuss this week, I'll add my thoughts here.

@sffc sffc removed the discuss Discuss at a future ICU4X-SC meeting label Jan 28, 2021
@gregtatum gregtatum changed the title [datetime] Add support for options bag Add support for various DateTimeFormatOptions Feb 8, 2021
@sffc sffc modified the milestones: ICU4X 0.2, ICU4X 0.3 Apr 1, 2021
@gregtatum gregtatum added the S-epic Size: Major project (create smaller child issues) label Apr 7, 2021
@gregtatum
Copy link
Member

The C-API for ICU4C provides a list of common skeletons. I think this was interesting enough to document for future work: https://unicode-org.github.io/icu-docs/apidoc/dev/icu4c/udat_8h.html

e.g.

#define | UDAT_YEAR   "y"
  | Constant for date skeleton with year.  More...
 
#define | UDAT_QUARTER   "QQQQ"
  | Constant for date skeleton with quarter.  More...
 
#define | UDAT_ABBR_QUARTER   "QQQ"
  | Constant for date skeleton with abbreviated quarter.  More...
 
#define | UDAT_YEAR_QUARTER   "yQQQQ"
  | Constant for date skeleton with year and quarter.  More...
 
#define | UDAT_YEAR_ABBR_QUARTER   "yQQQ"
  | Constant for date skeleton with year and abbreviated quarter.  More...
 
#define | UDAT_MONTH   "MMMM"
  | Constant for date skeleton with month.  More...

etc.

@sffc sffc added A-data Area: Data coverage or quality and removed C-data-infra Component: provider, datagen, fallback, adapters labels Jun 16, 2021
@sffc sffc changed the title Add support for various DateTimeFormatOptions Complete the set of DateTimeFormat options Oct 21, 2021
@sffc sffc modified the milestones: ICU4X 0.4, ICU4X 0.5 Oct 21, 2021
@gregtatum gregtatum removed their assignment Nov 18, 2021
@sffc sffc modified the milestones: ICU4X 0.5, ICU4X 1.0 Jan 27, 2022
@sapriyag sapriyag modified the milestones: ICU4X 1.0, ICU4X 1.1 May 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-data Area: Data coverage or quality A-design Area: Architecture or design C-datetime Component: datetime, calendars, time zones S-epic Size: Major project (create smaller child issues) T-core Type: Required functionality
Projects
None yet
Development

No branches or pull requests

4 participants