Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Neo date formatter: Options<R> vs R::Options and same for .format #5269

Open
sffc opened this issue Jul 19, 2024 · 15 comments
Open

Neo date formatter: Options<R> vs R::Options and same for .format #5269

sffc opened this issue Jul 19, 2024 · 15 comments
Assignees
Labels
C-datetime Component: datetime, calendars, time zones discuss-priority Discuss at the next ICU4X meeting

Comments

@sffc
Copy link
Member

sffc commented Jul 19, 2024

Currently in #5248 I implemented

/// Options bag for datetime formatting.
#[derive(Debug, Copy, Clone, PartialEq, Eq)]
#[non_exhaustive]
pub struct NeoOptions<R: DateTimeMarkers> {
    /// The desired length of the formatted string,
    /// if required for the chosen field set.
    ///
    /// See [`NeoSkeletonLength`].
    pub length: R::LengthOption,
}

where R::LengthOption is set according to the field set. It is usually NeoSkeletonLength but it is a zero-sized type for field sets that don't need the length.

I intend to add at least two more options:

  1. R::HourCycleOption for runtime hour cycle choices
  2. R::EraDisplayOption for runtime era display choices

Another approach would be to instead create separate options bags for each combinations of possible options. For example:

#[non_exhaustive]
pub struct DateWithYearOptions {
    pub length: Length,
    pub era_display: EraDisplay,
}

#[non_exhaustive]
pub struct DateWithoutYearOptions {
    pub length: Length,
}

#[non_exhaustive]
pub struct TimeOptions {
    pub length: Length,
    pub hour_cycle: HourCycle,
}

// note: same as DateWithoutYearOptions; could be combined
#[non_exhaustive]
pub struct ZoneWithRuntimeLengthOptions {
    pub length: Length,
}

#[non_exhaustive]
pub struct ZoneWithConstLengthOptions {
}

#[non_exhaustive]
pub struct DateTimeWithYearOptions {
    pub length: Length,
    pub era_display: EraDisplay,
    pub hour_cycle: HourCycle,
}

and then R::Options would choose the correct struct.

Pros and cons:

  • NeoOptions<R> makes it possible/easier to add more options in the future in a non-breaking way
  • NeoOptions<R> is arguably more consistent with the neo formatter design which has aimed at lots of markers/traits but very few concrete types
  • R::Options is probably more clear in documentation because you are constructing a real struct with specific fields, rather than a weird struct whose fields depend on R

I also wanted to raise the possibility of applying a similar treatment to the format<I>(&self, input: &I) function. Currently we have a trait that looks like this:

pub trait AllInputMarkers<R: DateTimeMarkers>:
    NeoGetField<<R::D as DateInputMarkers>::YearInput>
    + NeoGetField<<R::D as DateInputMarkers>::MonthInput>
    + NeoGetField<<R::D as DateInputMarkers>::DayOfMonthInput>
    + NeoGetField<<R::D as DateInputMarkers>::DayOfWeekInput>
    + NeoGetField<<R::D as DateInputMarkers>::DayOfYearInput>
    + NeoGetField<<R::D as DateInputMarkers>::AnyCalendarKindInput>
    + NeoGetField<<R::T as TimeMarkers>::HourInput>
    + NeoGetField<<R::T as TimeMarkers>::MinuteInput>
    + NeoGetField<<R::T as TimeMarkers>::SecondInput>
    + NeoGetField<<R::T as TimeMarkers>::NanoSecondInput>
    + NeoGetField<<R::Z as ZoneMarkers>::TimeZoneInput>
where
    R::D: DateInputMarkers,
    R::T: TimeMarkers,
    R::Z: ZoneMarkers,
{
}

which is blanked-implemented on all T satisfying its bounds. We could instead have

#[non_exhaustive]
pub struct Input<R> {
    pub year: R::YearInput,
    pub month: R::MonthInput,
    // ...
}

Pros and cons:

  • pub trait AllInputMarkers<R> allows users to pass foreign types directly into the format function so long as they implement the trait
  • pub struct Input<R> makes the format function less generic (instead of being generic in both R and I, now it is generic only in R), probably resulting in smaller code size
  • pub struct Input<R> allows clients to more easily drive the formatter even without using icu_calendar::Date or one of the other official date types

Thoughts?

@Manishearth @robertbastian @zbraniecki

@sffc sffc added the C-datetime Component: datetime, calendars, time zones label Jul 19, 2024
@sffc sffc added this to the ICU4X 2.0 ⟨P1⟩ milestone Jul 19, 2024
@Manishearth
Copy link
Member

Discussed with Shane yesterday. In general I'm in favor of having the intermediate type with the user calling .into() or whatever, or having a single trait.

Both of these add some indirection, but they're more easily documented over a pile of traits. IMO functions like format() should spend ink documenting how they are used, without having to go too deep into "how to make the traits work". Having an intermediate type or trait gets us over the finish line with a dedicated place to describe and untangle the trait mess.

Between having an intermediate type or trait, I find that having a type tends to be less messy, and probably better for monomorphization.

@sffc
Copy link
Member Author

sffc commented Jul 24, 2024

In addition to hour cycle and era display, I also need an option for fractional second digits, unless that gets modeled as part of the field set.

@sffc sffc added the needs-approval One or more stakeholders need to approve proposal label Jul 30, 2024
@sffc
Copy link
Member Author

sffc commented Jul 30, 2024

Would like comments from @robertbastian and @zbraniecki

@robertbastian
Copy link
Member

I'm in favour of R::Options for usability.

@sffc
Copy link
Member Author

sffc commented Aug 15, 2024

Discussion on constructors, also with implications on NeoOptions:

use icu::datetime::fieldsets::YMD;
use icu::datetime::DateComponents;

icu::datetime::Formatter::<YMD>::try_new(locale, options)
icu::datetime::Formatter::try_new_with_components(locale, DateComponents::YearMonth, options)

// Zibi prefers:
icu::datetime::Formatter::try_new(locale, YMD, options)

// Shane: It can also be written as:
icu::datetime::Formatter::<YMD>::try_new(locale, YMD, options)

// Robert: This would give gnarly errors
icu::datetime::Formatter::<YMDT>::try_new(locale, YMD, options)

// Robert: This actually works if the type can be inferred, i.e. it's declared in a field or argument
let formatter: Formatter<YMD> =
    icu::datetime::Formatter::try_new(locale, option)

// Shane: can we keep the old signature around?
icu::datetime::Formatter::<YMD>::try_new_bikeshed(locale, options)

// Shane: If we go with NeoOptions<R>, we could make it so that you write something like
icu::datetime::Formatter::try_new(locale, YMD.with_length(Length::Long) /* NeoOptions<YMD> */)
  • @robertbastian You see these generics a lot. You can hide it, but you can't hide it when you pass it to a function, for example. And you see the type a lot in error messages.
  • @zbraniecki For such a high-profile type, I don't think we necesarily want to force people to learn about how the generics work before they can even use it.
  • @sffc We could add another constructor signature?
  • @nekevss Could you put the field set into the options?
  • @sffc Yes, if we go with NeoOptions<R>
  • @robertbastian What do we use in docs?
  • @sffc I slightly prefer using the turbofish in docs but if people think YMD.with_length or similar is better, we can do it
  • @robertbastian I like fully specifying a type, but not a turbofish on function calls. I really don't like YMD.with_length because it goes out of the way to hide that the type has a generic.
  • @Manishearth I don't think users need to understand they are different types.
  • @sffc I think that Zibi's position about learnability is important. I think the type parameter is important for people to know about eventually, but not in their initial use. People are used to creating a DateTimeFormatter with fields specified as options: this is how Intl works, it is how ICU4C works and ICU4J and most other formatting libraries. So it makes sense to me that there should be a way to specify the field set as a parameter in the constructor.

Constructor signature proposal:

  • The options type is one of the following:
    • Options<FSet> with a field_set field
    • FSetOptions, one type per field set marker
  • All fieldsets implement Default and are inhabited ZSTs (have exactly 1 possible value)
  • fieldset types gain convenience methods like YMD.with_length(..) etc
  • We document three constructor call sites:
    • Formatter::<YMD>::try_new(locale, Options {...})
    • Formatter::try_new(locale, YMD.with_length(..))
    • Formatter::try_new(locale, Options { ..., fieldset: YMD }) (we don't need to display this in any docs, but people can do this if they want)
  • In our docs we can choose to show different techniques if needed. We pick the first two techniques to primarily use in docs: the type param by default, and the helpers where possible.
  • Auto-generated docs tests on marker types will use the first (type param) method.

LGTM: @Manishearth @sffc

New proposal:

  • Formatter::<YMD>::try_new(locale, Options {...}) is the primary constructor
  • If options design has one-type-per-fieldset (YMDOptions, or just have YMD be the options type), or a single options type generic over fieldsets (Options<YMD>)
    • We add YMD.with_length(..) convenience options builders which will produce an appropriately-constrained options type
    • We can also document Formatter::try_new(locale, YMD.with_length(..)).
    • We primarily still use the turbofish ctor, but use the with_length() convenience ones in some docs where possible.
  • Else, if we have five options types DateOptions etc, they gain a generic and a field_set field, so you can say DateOptions {...field_set: YMD} if you choose, but also it can be defaulted as usual.
    • Downside: now we have generics on these options types and really don't need them usually. However, the choice of fieldset is something that can be considered an option so it's not too weird. (Check if @robertbastian is ok with this constraint)

LGTM: @Manishearth @sffc

Need input from @robertbastian @zbraniecki

@Manishearth
Copy link
Member

The main thing that we discussed in the meeting after people left was the "else" block there, which proposes a way to decouple the ctor discussion from the options discussion, however it constrains the options discussion by giving DateOptions (etc) a generic in the "we have five options types" model

@sffc
Copy link
Member Author

sffc commented Aug 15, 2024

I want to explore the field set being the options type.

We would get call sites like this:

icu::datetime::Formatter::try_new(locale, YMD::with_length(Length::Medium))

The function signature is

impl Formatter<FSet> {
    pub fn try_new(Preferences, FSet)
}

And the field set would be defined like this (most likely expanded from a macro)

#[non_exhaustive]
pub struct YMD {
    pub length: Length,
    pub era_display: EraDisplay,
}

For runtime components, it would be

#[non_exhaustive]
pub struct DateSkeleton {
    pub field_set: DateFieldSet,
    pub length: Length,
    pub era_display: EraDisplay,
}

Assuming it works, I can agree to this.

@robertbastian
Copy link
Member

This would produce good documentation, so I'm onboard.

@sffc sffc removed the needs-approval One or more stakeholders need to approve proposal label Aug 16, 2024
@sffc sffc self-assigned this Aug 16, 2024
@sffc sffc added the discuss-priority Discuss at the next ICU4X meeting label Sep 17, 2024
@sffc
Copy link
Member Author

sffc commented Sep 17, 2024

We haven't reached consensus on the input types.

Currently, we have:

impl<C: CldrCalendar, R: DateTimeMarkers> TypedNeoFormatter<C, R>
where
    R::D: DateInputMarkers,
    R::T: TimeMarkers,
    R::Z: ZoneMarkers,
{
    pub fn format<I>(&self, input: &I) -> FormattedNeoDateTime
    where
        I: ?Sized + IsInCalendar<C> + AllInputMarkers<R>,
    {
        let input = ExtractedInput::extract_from_neo_input::<R::D, R::T, R::Z, I>(input);
        FormattedNeoDateTime {
            pattern: self.selection.select(&input),
            input,
            names: self.names.as_borrowed(),
        }
    }
}

pub trait AllInputMarkers<R: DateTimeMarkers>:
    NeoGetField<<R::D as DateInputMarkers>::YearInput>
    + NeoGetField<<R::D as DateInputMarkers>::MonthInput>
    + NeoGetField<<R::D as DateInputMarkers>::DayOfMonthInput>
    + NeoGetField<<R::D as DateInputMarkers>::DayOfWeekInput>
    + NeoGetField<<R::D as DateInputMarkers>::DayOfYearInput>
    + NeoGetField<<R::D as DateInputMarkers>::AnyCalendarKindInput>
    + NeoGetField<<R::T as TimeMarkers>::HourInput>
    + NeoGetField<<R::T as TimeMarkers>::MinuteInput>
    + NeoGetField<<R::T as TimeMarkers>::SecondInput>
    + NeoGetField<<R::T as TimeMarkers>::NanoSecondInput>
    + NeoGetField<<R::Z as ZoneMarkers>::TimeZoneOffsetInput>
    + NeoGetField<<R::Z as ZoneMarkers>::TimeZoneIdInput>
    + NeoGetField<<R::Z as ZoneMarkers>::TimeZoneMetazoneInput>
    + NeoGetField<<R::Z as ZoneMarkers>::TimeZoneVariantInput>
where
    R::D: DateInputMarkers,
    R::T: TimeMarkers,
    R::Z: ZoneMarkers,
{
}

Problems with this design:

  1. We are using traits because we thought that third-party libraries could implement them. However, as observed previously, most third-party libraries can't actually implement these traits, since they don't have useful calendar information, for example.
  2. More traits means more monomorphization.
  3. Traits means you can't .into(): you must explicitly name the type being formatted.

I would instead like to explore the following design:

pub struct NeoInput<R: DateTimeMarkers>
where
    R::D: DateInputMarkers,
    R::T: TimeMarkers,
    R::Z: ZoneMarkers,
{
    pub year: <R::D as DateInputMarkers>::YearInput,
    pub month: <R::D as DateInputMarkers>::MonthInput,
    // ...
}

This is reminiscent of NeoOptions.

Pros and cons:

  • Pro: Monomorphization depends only on R (the field set), not on the type being formatted
  • Pro: .into() works
  • Pro: Third-party types can implement functions to convert themselves into a NeoInput, which is a more straightforward task than figuring out all the GetInput impls
  • Pro: More consistent with the other terminals such as FixedDecimalFormatter and PluralRules that process types that have a bunch of conversion impls available
  • Con: Same reasons my original NeoOptions design didn't taste right to @robertbastian
  • Pro and Con: Fields are eagerly evaluated

I put "fields are eagerly evaluated" as a pro and con. An advantage is that it has a more clean separation of responsibilities: field resolution happens above the ICU4X formatter membrane. A disadvantage is that it means we can't in the future refactor the code to lazy-evaluate the inputs.

I want to emphasize that the actual fields being read are determined by the field set R, not by the pattern, and that we are already reading fields eagerly. I'm not convinced that we will ever want lazy extraction:

  1. If fields are read in .format: we need to iterate over the pattern in order to know which fields we need to read. But, we already iterate over the pattern in impl TryWriteable, so it seems wrong to do it again in .format. It is likely slower, because there's a lot more branching involved: iterating over the pattern requires reading all the fields and pattern items with big enum matches. But maybe it could work.
  2. If fields are read in impl TryWriteable via a type parameter on FormattedNeo: Bad because the monomorphization permeates much deeper, temporaries can't be passed into the format function, and it is a breaking change.
  3. If fields are read in impl TryWriteable via a trait object: Bad because trait objects are bad for code size, temporaries, and breaking change.

I guess it is possible that we could have lazy extraction in .format (but probably not TryWriteable). So, that's the one thing my proposal would preclude.

@robertbastian @Manishearth

@sffc
Copy link
Member Author

sffc commented Sep 17, 2024

I discussed this a bit with @robertbastian and @Manishearth today. I didn't take notes, but here were some of my takeaways:

Creating struct Input<FSet> makes the bounds on Formatter a bit simpler, but it just moves the "trait soup" somewhere else. Types that want to convert themselves into Input<FSet> still need to have all the input bounds on their conversion functions.

I noted above that third-party datetime types can't really in practice implement GetInput, but I didn't give concrete examples. YearInfo and MonthInfo have a lot of fields and internal invariants. For example:

#[non_exhaustive]
pub struct YearInfo {
    pub extended_year: i32,
    pub kind: YearKind,
}

#[non_exhaustive]
pub enum YearKind {
    Era(EraYear),
    Cyclic(CyclicYear),
}

#[non_exhaustive]
pub struct EraYear {
    pub formatting_era: Era,
    pub standard_era: Era,
    pub era_year: i32,
}

#[non_exhaustive]
pub struct CyclicYear {
    pub year: NonZero<u8>,
    pub related_iso: i32,
}

There are similar cross-field invariants between Time Zone and Metazone, between MonthInfo and DayOfMonth, between a future WeekOfYear and DayOfYear, etc.

A convenience function could be added to construct a YearInfo or MonthInfo from an ISO or Gregorian year or month number, and I consider it a trait invariant that the data returned from different GetInput calls be self-consistent. However, there's nothing enforcing it. @robertbastian pointed out that a custom type might work in English Gregorian but then break in some other locale and calendar system that wasn't tested ahead of time.

One possible resolution, which I resisted for a long time but am warming up to, is to simply accept the concrete icu_calendar types in the format functions. There are two ways we could do this, with their own unique pros and cons:

  1. Keep GetInput but make it sealed and only implemented on our blessed types
  2. Create different format_date, format_time, format_datetime, ... functions, taking the concrete types and enabled/disabled based on FSet bounds

In both cases, we lose the ability to format, for example, a MonthDay. GetInput currently allows one to create a custom type with only the subset of fields that FSet requires. If we find this to be a problem, we could just add more types to icu_calendar to meet these use cases.

Thoughts? @Manishearth @zbraniecki

@Manishearth
Copy link
Member

I'd rather still retain the ability to integrate with other crates, and the ability for people to make their own MonthDay types.

Keep GetInput but make it sealed and only implemented on our blessed typ

This is going to still be a trait mess since we only have one formatter type so this would need to be GetInput<FS>

@Manishearth
Copy link
Member

I'd rather still retain the ability to integrate with other crates, and the ability for people to make their own MonthDay types.

I think when we first designed this crate I thought the DateTimeInput trait was overkill and wanted to use our types. Since then I think I've moved on to liking it and thinking it helps keep the API small.

ICU4X Dates are not 100% cheap to construct, so I'm worried about other datetime libraries paying that cost. But I don't strongly oppose this, if people feel that we should simplify by sealing, I think that's ok for now.

@sffc
Copy link
Member Author

sffc commented Sep 18, 2024

ICU4X Dates are not 100% cheap to construct, so I'm worried about other datetime libraries paying that cost.

DateTime<Iso> is quite cheap to construct, and most third-party crates would only be able to support the Iso calendar anyway.

@robertbastian
Copy link
Member

  1. Keep GetInput but make it sealed and only implemented on our blessed types
  2. Create different format_date, format_time, format_datetime, ... functions, taking the concrete types and enabled/disabled based on FSet bounds

If we're restricting inputs to a fixed set, we might as well use different methods. That way we can document each one properly, and the client sees which concrete type they have to construct from the signature.

I don't see MonthDay as such a big issue. If we have MD skeleton, you can just format a Date with it. In your data model you are 99% going to have a Date, not a floating MonthDay.

@Manishearth
Copy link
Member

  • @sffc and @robertbastian bring the group up to speed with their current thinking
  • @Manishearth I don't really see third-party crates implementing the trait; I see them asking us to implement the trait. I could see new smaller crates implementing it. But I don't see large crates implementing it due to it being finnicky. But, I think we should have seamless integration. So if we have the traits, we should be prepared to implement them ourselves. And if we format a struct, we should be prepared to have From impls.
  • @Manishearth I don't really like having 5 different format functions. I guess, if we're moving toward sealing the traits, we should be clearer which types go in. That either means doing the traits in a way where it's abundantly clear which traits implement it, or doing the functions in which case it's abundantly clear which types go in. I guess the functions are nicer, but I don't like having a billion functions.
  • @robertbastian I think the motivation for these traits was that you could avoid duplicate computation. If Jiff creates a date from a timestamp, and then Jiff converts to an icu_calendar type, we might have to repeat the computation. But it's not clear that we trust Jiff to do the computations in the way we expect it to do. Since the traits make the API really confusing, I'm leaning toward separate functions.

List of formattable types:

  • icu::calendar::Date
  • icu::calendar::Time
  • icu::calendar::DateTime
  • icu::timezone::UtcOffset (proposed)
  • icu::timezone::ResolvedTimeZone
  • icu::timezone::OffsetDateTime (proposed)
  • icu::timezone::ResolvedZonedDateTime

More discussion:

  • @Manishearth Do we think that the use case for formatting all of these is equally likely?
  • @Manishearth We could potentially have functions for the main types, but keep the trait for the other things that could potentially be formatted. Explicitly mark format_generic() as a power user API and say that if you want to format Date, DateTime, ..., you should use the specific methods, and document other types that implement the Input trait on format_generic.
  • @sffc I don't feel that this list is unbounded. We may eventually add YearMonth and MonthDay. I don't see it getting bigger than ~10.
  • @Manishearth I don't want to add 9 methods and committing to a design that requires that we add more methods when we want new types.
  • @sffc Just noting, a Formatter<YMD> will disable its format_time function via another trait bound of some sort.
  • @Manishearth where FS: TimeSet + DateSet?
  • @nekevss I see cases where you have parts you want to format and you want to pass the parts.
  • @Manishearth If we seal the trait, we could unseal in the future when we have concrete use cases.
  • @sffc An advantage of the concrete functions is that type inference works better.
  • @Manishearth Also, with concrete functions, the docs are cleaner.
  • @robertbastian I don't think having many format functions is bad. Users will load the docs and see the 5-10 format_* functions and it will be clear all the possible things they can pass in.
  • @Manishearth The concrete functions will need to exist over FFI anyway

Options:

  1. Single format function with current signature but sealed trait
  2. format_* functions for all formattable concrete types in the icu_calendar and icu_timezone crates at the time of the ICU4X 2.0 release (likely implemented via private trait). If we add more types post-2.0, we will discuss whether adding them via trait functions or additional concrete format functions. There is a list above of the rough set of types that will be added, it will either be 5 types or 7 types.
  3. A small selection of format_* functions as well as a format_generic function with the sealed trait

Discussion on the specific options:

  • @hsivonen I don't have super informed opinions but in general lots of generics are maybe not great so format_* seems more straightforward
  • @nekevss I'm also out of context, but option 2 seemed to sound the best based on the little context I have.
  • @echeran I feel the same way as @nekevss

Decision: Option 2

LGTM: @robertbastian @sffc @Manishearth

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-datetime Component: datetime, calendars, time zones discuss-priority Discuss at the next ICU4X meeting
Projects
Status: Being worked on
Development

No branches or pull requests

3 participants