From f3df1f147fa9ab8680e0b0fd8860892b901425e1 Mon Sep 17 00:00:00 2001 From: "Shane F. Carr" Date: Wed, 11 Sep 2024 16:02:14 +0000 Subject: [PATCH] CLDR-17842 Add semantic datetime skeleton LDML tech preview See #4031 --- docs/ldml/tr35-dates.md | 326 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 326 insertions(+) diff --git a/docs/ldml/tr35-dates.md b/docs/ldml/tr35-dates.md index 77c8bf09b50..27cfb842727 100644 --- a/docs/ldml/tr35-dates.md +++ b/docs/ldml/tr35-dates.md @@ -2365,6 +2365,332 @@ The meaning of symbol fields should be easy to determine; the problem is determi * Look forward or backward in the current format string for a literal that matches the one most recently encountered. See if you can resynchronize from that point. Use the value of the numeric field to resynchronize as well, if possible (for example, a number larger than the largest month cannot be a month). * If that fails, use other format strings from the locale (including those in ``) to try to match the previous or next symbol or literal (again, using a loose match). +## Semantic Skeletons + +When speaking about dates and times, not all combinations of fields are semantically valid. For example, it does not make sense to talk about a particular minute without knowing the hour, or a day-of-month and year without knowing the month. This section defines _semantic skeletons_, a mechanism for expressing the subset of date and time skeletons that are sufficient for almost all use cases. + +Libraries implementing UTS 35 may benefit from the use of semantic skeletons in their APIs. Software can optimize for the bounded set of datetime formats defined by semantic skeletons, delivering better performance to users. + +This section describes only the structures and enumerations for expressing a semantic skeleton. The section [Generating Patterns for Semantic Skeletons](#Generating_Patterns_for_Semantic_Skeletons) describes a mechanism to extract the actual pattern backing a semantic skeleton from CLDR data. + +Note: This document does not currently define a string form, but we may need one for MessageFormat. + +> [!IMPORTANT] +> Semantic skeletons (this section) are a technical preview and should not be considered stable. + +### Parts of a Semantic Skeleton + +A semantic skeleton is composed of two parts: + +1. The _field set_: the minimal set of fields to be displayed. For example, "month and day." +2. The _options_: configurations that impact the choice and style of fields. For example, "render the fields in a long format." Not all options modify the same fields. + +As a general rule, the field set determines _what is being displayed_, and the options determine _how to display it_. + +#### Semantic Field Sets + +This section defines four disjoint categories of field sets: + +1. [Date](#Semantic_Date_Field_Sets) +2. [Calendar Period](#Semantic_Calendar_Period_Field_Sets) +3. [Time](#Semantic_Time_Field_Sets) +4. [Time Zone](#Semantic_Time_Zone_Field_Sets) + +Certain combinations of categories form [Composite Field Sets](#Semantic_Composite_Field_Sets). + +##### Date Field Sets + +A _date field set_ refers to a particular day in time. Higher-order fields, such as the month or year, could be omitted, but there must always be a reference to a particular day. + +The fields that may be included in a date field set are: + +1. **Year:** The year, possibly with an era and possibly with partial precision, depending on factors such as the length, locale, calendar system, and [Year Style](#Semantic_Skeleton_Year_Style). If the era is displayed, it may or may not be directly adjacent to the numeric year in the output string. +2. **Month:** The month of a year. The year can be explicit or implied. +3. **Day:** The day of the month. The month can be explicit or implied. +4. **Weekday:** The day of the week. Often stands on its own or is used to clarify a Day. + +The valid date field sets are in the following table: + +| Field Set | Example | +|-----------------------------------|---------------------------| +| { Day } | The 1st | +| { Weekday } | Saturday | +| { Day, Weekday } | Saturday the 1st | +| { Month, Day } | January 1 | +| { Month, Day, Weekday } | Saturday, January 1 | +| { Year, Month, Day } | January 1, 2000 | +| { Year, Month, Day, Weekday } | Saturday, January 1, 2000 | + +Note: Month and Year are not valid date field sets on their own because they do not refer to a specific day. Instead, they are considered calendar period field sets. + +Note: This table may be extended in the future to include additional fields, such as week and quarter. + +##### Calendar Period Field Sets + +A _calendar period field set_ refers to a span of time in a calendar system, _above_ the order of a day. + +The fields that are permissible in date field sets are also the ones permissible in calendar period field sets. + +The valid calendar period field sets are in the following table: + +| Field Set | Example | +|---------------------|--------------| +| { Month } | January | +| { Year } | 2000 | +| { Year, Month } | January 2000 | + +Note: This table may be extended in the future to include additional fields, such as week, quarter, or standalone era. + +Note: A _calendar period_ is distinct from a _date_ because it cannot be paired with time to form a composite field set. + +##### Time Field Sets + +A _time field set_ refers to a particular time of day. Low-order fields, such as the minute or second, could be omitted. + +The fields that may be included in a time field set are: + +1. **Hour:** The hour, possibly with a day period, depending on factors such as the length, locale, and hour cycle locale keyword. +2. **Minute:** The minute within an hour. +3. **Second:** The second within a minute, a decimal number that may include fractional digits. See the [Fractional Second Digits](#Semantic_Skeleton_Fractional_Second_Digits) option. + +The valid time field sets are in the following table: + +| Field Set | Example (en-US) | Example (en-GB) | +|--------------------------|-----------------|-----------------| +| { Hour } | 4 pm | 16h | +| { Hour, Minute } | 4:03 pm | 16:03 | +| { Hour, Minute, Second } | 4:03:51 pm | 16:03:51 | + +Note: Minute and Second are not valid time field sets on their own because they do not refer to a particular time of day. They must be interpreted in the context of an explicit hour. + +Note: Durations, such as "3 minutes and 12 seconds" (or 3:12), are not handled through the skeleton mechanisms. + +##### Time Zone Field Sets + +A _time zone field set_ refers to a particular time zone. There is only one time zone field and one time zone field set, but the rendering can be configured with the [Zone Style](#Semantic_Skeleton_Time_Zone_Style) option. + +| Field Set | Example | +|-----------|-------------------------------------| +| { Zone } | PST / PT / Los Angeles Time / GMT-8 | + +##### Composite Field Sets + +Date, calendar period, and time field sets can be combined in certain ways shown in the following table: + +| Categories | Example Field Set | Example Output | +|-------------------------|----------------------------|----------------------| +| Date + Time | { Month, Day, Hour } | January 1 at 4 pm | +| Date + Time Zone | { Month, Day, Zone } | January 1, PT | +| Date + Time + Time Zone | { Month, Day, Hour, Zone } | January 1 at 4 pm PT | +| Time + Time Zone | { Hour, Zone } | 4 pm PT | + +Note: "Date + Time Zone" is a valid combination because it refers to a specific span of time. "January 1, PST" refers to the span of time starting at `01-01T00:00-0800` and ending before `01-02T00:00-0800` (with an implied year). + +Note: This table may be extended in the future to include additional combinations. + +#### Semantic Skeleton Options + +A semantic skeleton associates fields with zero or more options, listed in this section. Options apply to specific fields, and they should not be specified if their respective fields are not in the field set. Some options have a default value. + +##### Length + +**Required Fields: Year, Month, Day, Weekday, Hour, or Zone** + +**Default Value: Medium** + +The _length_ determines how wide the fields should be rendered. There are three choices: + +1. **Long:** Much space is available. Fields are typically spelled-out. Examples: + - January 1, 2000 + - Rabiʻ I 7, 1446 AH +2. **Medium:** Space is limited, and spelled-out fields are desired. Examples: + - Jan. 1, 2000 + - Rab. I 7, 1446 AH +3. **Short:** Space is limited, and numeric fields are desired. Examples: + - 1/1/00 + - 3/7/1446 AH + +Note: Unlike standard CLDR pattern and skeleton strings, there is only one length option for the whole semantic skeleton. This is based on the principle that developers ought to inform the library how much space is available and the context in which the date/time is being displayed, and translators ought to decide how to use that space. For example, it is possible for long month names and abbreviated weekday names to coexist, but that should be a translator decision, not a developer decision. However, this option may be extended in the future to allow hinting at lengths for individual fields. + +Note: The locale or calendar may coerce the month length to be different than the skeleton length. For example, there is no numeric representation of months in the Hebrew calendar in English, so spelled-out month names will be used in "en-u-ca-hebrew" even if the length is Short. + +Note: Additional lengths could be added in the future, such as "narrow" or "conversational". + +##### Alignment + +**Required Fields: Year, Month, Day, or Hour** + +**Default Value: Inline** + +The _alignment_ provides additional context that can be used for determining how to display certain fields, particularly numeric ones. There are two choices: + +1. **Inline:** The text will be displayed in a paragraph, label, heading, or similar context. +2. **Column:** The text will be displayed vertically in a column-like layout or similar context where similar rendered widths are preferred. + +Note: The most common behavior with "column" alignment is for implementations to render a minimum of two digits on impacted fields. For example, an implementation might render "01/01/2000" instead of "1/1/2000" in US English. + +##### Year Style + +**Required Field: Year** + +**Default Value: Auto** + +The _year style_ defines the level of precision to use when displaying the year. There are three choices: + +1. **Auto:** Display the year with full or partial precision, and display the era if needed to disambiguate the year, depending on locale, calendar, and length. +2. **Full:** Display the year with full precision, and display the era if needed to disambiguate the year, depending on locale and calendar. +3. **With era:** Display the year with full precision, and always display the era. + +Going down the list, the three options can be seen as requiring additional context. "Auto" gives translators the most flexibility; "full" requires that the year be displayed with full precision; and "with era" additionally requires that the era field be displayed. + +Implementations could choose to use heuristics such as the following: + +- Gregorian years within 20 years of the current date: partial precision okay +- Gregorian years after January 1, 1000: require full precision, but okay to hide era +- Other Gregorian years: require full precision and the era +- Non-Gregorian years: show era if not the default calendar system in the locale + +Examples in Gregorian: + +| Year Style | 2020 CE | 1500 CE | 750 CE | 500 BCE | +|------------|---------|---------|--------|---------| +| Auto | ‘20 | 1500 | 750 AD | 500 BC | +| Full | 2020 | 1500 | 750 AD | 500 BC | +| With era | 2020 AD | 1500 AD | 750 AD | 500 BC | + +Note: This algorithm and the list of choices is likely to evolve as CLDR learns more about era display customs in different regions and calendar systems, and it may become normative. + +##### Hour Cycle + +**Required Field: Hour** + +**Default Value: Auto** + +The _hour cycle_, which corresponds directly to the `-u-hc` Unicode Locale extension keyword, determines how hours should be numbered. It is always left up to the locale to determine how and whether day periods should be displayed. + +The choices are: + +1. **Auto:** Locale default +2. **H11:** Display hours numbered from 0 through 11 +3. **H12:** Display hours numbered from 1 through 12 (the most common 12-hour clock) +4. **H23:** Display hours numbered from 0 through 23 (the most common 24-hour clock) +5. **H24:** Display hours numbered from 1 through 24 + +Typically, locales will display a day period on H11 and H12, but the day period could be any of those allowed by CLDR, such as AM/PM (field "a"), noon/midnight (field "b"), or flexible day periods such as "in the afternoon" (field "B"). The choice could depend on locale, length, and calendar system. + +Note: An option could be added in the future to give the developer more control over how day periods are displayed or to disable day periods when there is sufficient context. + +##### Fractional Second Digits + +**Required Field: Second** + +**Default Value: Auto** + +The _fractional second digits_ option defines how many fractional digits should be displayed in the second field. The choices are: + +1. **Auto:** Display fractional digits if they are provided in the input. Do not pad with trailing zeros. +2. An integer from 0 to 9: display exactly this many fractional digits. Extra digits may be truncated (rounded toward zero), and trailing zeros may be added. + +Note: The finest level of precision is currently specified as nanoseconds, consistent with the requirements of many popular datetime libraries. + +##### Time Zone Style + +**Required Field: Zone** + +**Default Value: Auto** + +The _time zone style_ defines how to display the time zone. There are choices are: + +1. **Auto:** Choose the best style based on the locale. +2. **Specific:** A time zone that unambiguously maps the time of day to an instant, which can be understood independently of the location or time of year. This field could resolve to specific non-location (pattern symbol "x", "xxxx") or offset (pattern symbols "O", "OOOO"), depending on the locale, length, and time zone identity. +3. **Generic:** A time zone based on the location of an event. This field could resolve to generic non-location (pattern symbols "v", "vvvv"), generic partial-location, or location (pattern symbol "VVVV"), depending on the locale, length, and time zone identity. Do not use this field if the location of the event is unknown from context, because doing so could lead to ambiguity. +4. **Location:** A time zone based on the identity of the IANA time zone. This field always resolves to the location format (pattern symbol "VVVV"). +5. **Offset:** A time zone based on the time offset from UTC. + +Examples: + +| Style | Example | +|-----------|-----------------------| +| Specific | Pacific Standard Time | +| Generic | Pacific Time | +| Location | Los Angeles Time | +| Offset | GMT-8 | + +### Generating Patterns for Semantic Skeletons + +A semantic skeleton can be mapped to a standard skeleton, which in turn can be mapped to a pattern according to the procedure described in [Matching Skeletons](#Matching_Skeletons). + +#### Mapping to Standard Skeletons + +The selected fields in the field set should be converted to standard skeleton symbols according to the following table. "Standalone" should be used when the field set contains only one field. Hour is broken down by [hour cycle](#Semantic_Skeleton_Hour_Cycle), and Zone is broken down by [time zone style](#Semantic_Skeleton_Time_Zone_Style). + +| Semantic Field | Long | Medium | Short | +|------------------------------|--------|--------|--------| +| Year | \* | \* | \* | +| Month | \* | \* | \* | +| Month (Standalone) | LLLL | LLL | L | +| Day | \* | \* | \* | +| Weekday | EEEE | EEE | EEE | +| Weekday (Standalone) | EEEE | EEE | EEEEE | +| Hour - Auto | C | C | C | +| Hour - H11, H12 | h | h | h | +| Hour - H23, H24 | H | H | H | +| Minute | m | m | m | +| Second | s | s | s | +| Zone - Generic | v | v | v | +| Zone - Generic (Standalone) | vvvv | vvvv | v | +| Zone - Specific | z | z | z | +| Zone - Specific (Standalone) | zzzz | zzzz | z | +| Zone - Location | VVVV | VVVV | VVVV | +| Zone - Offset | O | O | O | + +\* Lengths for Year, Month, and Day are taken from the [datetimeSkeleton](#dateFormats) in the Long, Medium, and Short variants. The era field, if present, should be included with the Year. For example, in en-US, CLDR 46, the datetimeSkeletons are: + +| Length | Calendar | datetimeSkeleton | +|--------|-----------|------------------| +| Long | Gregorian | yMMMMd | +| Medium | Gregorian | yMMMd | +| Short | Gregorian | yyMd | +| Long | Japanese | GyMMMMd | +| Medium | Japanese | GyMMMd | +| Short | Japanese | GGGGGyMd | + +This means that the Year, Month, and Day semantic field mapping in en-US should be: + +| Semantic Field | Calendar | Long | Medium | Short | +|----------------|-----------|--------|--------|--------| +| Year | Gregorian | y | y | yy | +| Month | Gregorian | MMMM | MMM | M | +| Day | Gregorian | d | d | d | +| Year | Japanese | Gy | Gy | GGGGGy | +| Month | Japanese | MMMM | MMM | M | +| Day | Japanese | d | d | d | + +##### Year Style Skeleton Variations + +The [year style](#Semantic_Skeleton_Year_Style) should change the skeleton for all lengths as follows: + +- Auto: No change from datetimeSkeleton (note: could be "y", "yy", "yG", or another combination of year and era fields) +- Full: Replace "yy" with "y" +- With era: Replace "yy" with "y" and add "G" if there is not already an era field + +### Semantic Skeleton Conformance + +This specification describes at a high level the space of legal configurations for a semantic skeleton. The exact shape of the API or syntax is left to the implementation. + +Requirements for an implementation of semantic skeletons to be conformant with this specification: + +1. All field sets and options described by this specification must be fully implemented. +2. Field sets other than the ones described by this specification must cause an error. +3. If none of the required fields for an input option are in the field set, there must be an error. + +For example, a conformant specification must reject the following inputs: + +| Field Set | Options | Rejection Reason | +|----------------|--------------------------------|--------------------------------| +| { Year, Day } | Length: Long | Invalid field set | +| { Month, Day } | Length: Long\nYear Style: Full | Year Style requires Year field | + * * * Copyright © 2001–2024 Unicode, Inc. All Rights Reserved. The Unicode Consortium makes no expressed or implied warranty of any kind, and assumes no liability for errors or omissions. No liability is assumed for incidental and consequential damages in connection with or arising out of the use of the information or programs contained or accompanying this technical report. The Unicode [Terms of Use](https://www.unicode.org/copyright.html) apply.