Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Class hierarchy for DateTimeFormat (incl. runtime toggle switches) #380

Closed
sffc opened this issue Nov 5, 2020 · 14 comments · Fixed by #2133
Closed

Class hierarchy for DateTimeFormat (incl. runtime toggle switches) #380

sffc opened this issue Nov 5, 2020 · 14 comments · Fixed by #2133
Assignees
Labels
C-datetime Component: datetime, calendars, time zones S-medium Size: Less than a week (larger bug fix or enhancement) T-core Type: Required functionality

Comments

@sffc
Copy link
Member

sffc commented Nov 5, 2020

Right now, DateTimeFormat uses a model where a single pattern is prepped when the object is constructed. This is in line with ICU4C SimpleDateFormat. However, I think we should move away from that model.

In addition to being good i18n practice, ECMA-402 is also pushing us in this direction. Here are three reasons why DateTimeFormat, or at least the data backing up DateTimeFormat, may need to carry multiple patterns at once:

  1. The hour cycle could change from request to request within the same language. For example, one request might be for "en-u-hc-h12", and the next might be for "en-u-hc-h23". We won't be having data keys consider extensions, instead leaving that step up to the ICU4X calling code (if you disagree with that statement, open a new issue explaining why). Therefore, a request for time style "short" needs to return 2 patterns, one with day period and one without day period.
  2. The fields could change from format request to format request. In Temporal, the current proposal is to overload Intl.DateTimeFormat and allow it to format a YearMonth, MonthDay, Date, Time, etc., which each could require a different pattern in the DateTimeFormat. We could choose to ignore this problem in ICU4X DateTimeFormat, but we still need a reasonable pathway for ECMA-402 implementations with Temporal.
  3. The display of the era could change. For example, negative Gregorian years may be displayed as "2020 BC", but positive Gregorian years are generally displayed as simply "2020". See Option eraDisplay in Intl.DateTimeFormat() tc39/ecma402#426 and Negative Gregorian years do not render by default in DateTimeFormat tc39/ecma402#508.

Specific thoughts:

  1. I can see arguments around keeping a low-level type that handles one pattern at a time. Date formatting is complicated, and there is plenty of logic that can fit inside the scope of a low-level formatter. We could call that type DateTimePattern, for instance. This type would not be part of the ECMA-402 trait layer, but we could nonetheless consider exposing it publicly as an ICU4X add-on. Perhaps it would fit inside the utils/ namespace.
  2. The main type that we encourage people to use, which should probably be called DateTimeFormat, could hold a list of one or more DateTimePatterns, and use runtime logic to delegate formatting requests to one of them.

Thoughts?

CC @zbraniecki

@sffc sffc added discuss Discuss at a future ICU4X-SC meeting C-datetime Component: datetime, calendars, time zones labels Nov 5, 2020
@Louis-Aime
Copy link

I am not sure I understand all the consequences, but I support the idea of giving the choice to authors. DateTimePatterns may answer the question of ordering the "Parts" (in the sense of FormatToParts) in the right order with respect to the language and region, and keeping consistency while displaying elements e.g. "/" if date if numeric etc.

@mihnita
Copy link
Contributor

mihnita commented Nov 17, 2020

I think there is some good functionality here, but not in the use case described.
There was something here that didn't feel quite right, but I didn't manage to put my finger on it.

But now I think I do...

I think this is good to have / needed when the result depends on the input data, but not otherwise.

If I want to show a 12h time in one area, and a 24h time in another, I need 2 formatters. And that's OK.
Otherwise, where do we stop?
If I want to show a month-year in one area, month-day in another, full date in another, and short date+time in the last, does it mean that I will have one formatter only, and that one will "magically" create 4 different patterns?

We have good use cases, some of them implemented and working:

  • date-interval + skeleton "dMMMy" I get 3 patterns: "MMM d-d, y", "MMM d-MMM d, y", and "MMM d, y-MMM d, y"
  • for a currency formatter I might get "variations" depending on input: $5.20 $5.00 or (optional) $5.20 $5 (don't show decimals for integer values)
  • for number formatter I can have an option to show or not the + sign (-3.14 and 3.14 vs -3.14 and +3.14)

So I would summarize this as:

  • as a user I want one formatter per "UI visible field"
  • that formatter should work for all input values
  • give me the right "knobs" to tell the formatter what result I want
  • I don't care how many patterns the formatter needs to implement proper functionality

BUT: I changed my mind, and now I want other fields / hour style is not a valid use case.

So based on this kind of "rules of thumb" I think that from the initial entry the use case 1 and 2 are not valid, but 3 is.

@sffc
Copy link
Member Author

sffc commented Nov 17, 2020

Agreed about 12h/24h time. In the OP, I was conflating two related but separate problems: you should always have data for both the 12h and the 24h forms, but you only need one of them at a time in the formatter object.

I think this is good to have / needed when the result depends on the input data, but not otherwise.

I agree with this, and my pointers to tc39/ecma402#426 and tc39/ecma402#508 (use case 3) are examples of where the pattern does depend on the input data.

All I'm proposing in this thread is to generalize the problem of "pick pattern based on input data" by separating the public DateTimeFormat[ter] API from the DateTimePattern API. In some cases, DateTimeFormat[ter] might unconditionally use only one DateTimePattern, but in other cases, it might have a set of DateTimePatterns that it delegates to. That solves case 3, but it could also be extended in userland to solve cases 1 and 2.

@mihnita
Copy link
Contributor

mihnita commented Nov 19, 2020

Other than "have a set of DateTimePatterns that it delegates to", LGTM.

It might be just lingo. But I would not "delegate to" a pattern.
A pattern is just data, should do nothing.
The formatter retrieves one or more patterns, as needed, and does the formatting using the right pattern depending the input.

So "DateTimeFormat ... has a set of DateTimePatterns, and selects which one to use based on input" ?

@sffc
Copy link
Member Author

sffc commented Nov 20, 2020

Call it a DateTimeSingleFormat instead of a DateTimePattern, then. DateTimeFormat wraps a list of one or more DateTimeSingleFormats, and delegates to one of them.

I'm proposing one class that formats a datetime according to a pattern, and another class that figures out which one to use.

@sffc sffc changed the title Runtime toggle switches for DateTimeFormat (era, day period, others) Class hierarchy for DateTimeFormat (incl. runtime toggle switches) Feb 13, 2021
@sffc
Copy link
Member Author

sffc commented Feb 13, 2021

I would like to broaden the scope of this issue a bit to cover the overall class hierarchy of DateTimeFormat.

Here are specific pieces of DateTimeFormat which I would like to see spun off:

  • DateTimePatternInterpolator: constructed with a single, possibly pre-parsed, date pattern. References the required symbols. Has a format function that takes a Date[Time]Input and returns a FormattedDateTime.
  • DateTimeSkeletonMatcher: houses all logic involving the resolution of skeletons / component bags to patterns.
    • Bikeshed: DateTimeComponentMatcher? DateTimeFieldMatcher?
  • DateTimeStyleMatcher: resolves dateStyle/timeStyle with glue patterns
  • DateTimeInputProcessor: maps from various input types to DateTimeInput and LocalizedDateTimeInput
  • DateTimeSymbolLoader: resolves locale data loading

The higher-level type DateTimeFormat should look largely the same, but having these separate pieces will make it easier to reason about the code and the modularity of the individual pieces. They may also be useful to allow FFI to have the same level of modularity.

@sffc sffc added T-core Type: Required functionality and removed discuss Discuss at a future ICU4X-SC meeting labels Feb 18, 2021
@sffc sffc self-assigned this Feb 18, 2021
@sffc sffc added this to the 2021-Q2-m1 milestone Feb 18, 2021
@sffc sffc added the discuss Discuss at a future ICU4X-SC meeting label Apr 3, 2021
@sffc
Copy link
Member Author

sffc commented Apr 3, 2021

I'd like to bring this up for discussion again now that we've been making progress on skeletons and time zones. Here's what I see as actionable:

  • Split the existing DateTimePattern into smaller components as described above
  • Document the reasoning behind that decision somewhere

@sffc sffc removed their assignment Apr 3, 2021
@sffc
Copy link
Member Author

sffc commented Apr 8, 2021

In #598, @nordzilla introduced a standalone component TimeZoneFormat, and a new type ZonedDateTimeFormat that wraps the pair of DateTimeFormat with TimeZoneFormat. I agree with this approach.

Consider renaming DateTimeFormat to PlainDateTimeFormat for better parallelism with Temporal.

@sffc sffc added the S-medium Size: Less than a week (larger bug fix or enhancement) label Apr 19, 2021
@sffc sffc modified the milestones: 2021-Q2-m1, ICU4X 0.4 Apr 19, 2021
@sffc sffc added help wanted Issue needs an assignee and removed discuss Discuss at a future ICU4X-SC meeting labels Apr 22, 2021
@sffc sffc removed the help wanted Issue needs an assignee label Aug 12, 2021
@sffc
Copy link
Member Author

sffc commented Aug 12, 2021

@gregtatum interested in the design work. Implementation may be done by someone else. @zbraniecki to assist in some ways.

@sffc sffc modified the milestones: ICU4X 0.4, 2021 Q4 0.5 Sprint B Oct 21, 2021
@gregtatum
Copy link
Member

Design work is done, see #1317 for the plan.

@sffc
Copy link
Member Author

sffc commented Nov 21, 2021

The Ideal Components Bag is an important part of the class hierarchy, but the class hierarchy also includes how we reason about the modularity of time zones, calendars, etc. I'm going to move this issue to the 1.0 milestone.

@sffc sffc reopened this Nov 21, 2021
@sffc
Copy link
Member Author

sffc commented Apr 28, 2022

At a very basic level, I think we should split DateTimeFormat as follows:

  • DateTimeFormat: top-level, all-inclusive kitchen sink

    • Contains one or more DateTimePatternInterpolator and one or more TimeZoneFormat
    • Contains constructors for each of the types of DateTimeFormat we want to build
  • DateTimePatternInterpolator: formats a specific pattern

    • Contains a DataPayload for a single pattern being used (zero-alloc when possible)
    • Does not support time zones directly; needs a TimeZoneFormat
  • TimeZoneFormat: formats a time zone

    • (already done)

Internally, we will likely want additional classes to do things like skeleton resolution. This does not need to be public API. An argument could be made that DateTimePatternInterpolator should be internal as well.

@gregtatum gregtatum removed their assignment Apr 28, 2022
@sffc
Copy link
Member Author

sffc commented May 3, 2022

@sffc
Copy link
Member Author

sffc commented May 31, 2022

In terms of 1.0:

  • Split the keys into date and time
  • Refactor the API according to Option 3 in the doc linked above

@dminor is tentatively interested in this.

dminor added a commit to dminor/icu4x that referenced this issue Jun 21, 2022
This is a first step towards fixing unicode-org#380.
dminor added a commit to dminor/icu4x that referenced this issue Jun 21, 2022
This is a first step towards fixing unicode-org#380.
dminor added a commit to dminor/icu4x that referenced this issue Jun 21, 2022
This is a first step towards fixing unicode-org#380.
dminor added a commit that referenced this issue Jun 23, 2022
* Split date and time data keys.

This is a first step towards fixing #380.

* Revert accidental change to work_log.rs

* Update readme

* Address review feedback
dminor added a commit to dminor/icu4x that referenced this issue Jun 29, 2022
dminor added a commit that referenced this issue Jun 29, 2022
samchen61661 pushed a commit to samchen61661/icu4x that referenced this issue Jul 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-datetime Component: datetime, calendars, time zones S-medium Size: Less than a week (larger bug fix or enhancement) T-core Type: Required functionality
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants