Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create an Ideal Components Bag / Skeleton for DateTimeFormat #1317

Open
7 tasks
Tracked by #272
gregtatum opened this issue Nov 18, 2021 · 19 comments
Open
7 tasks
Tracked by #272

Create an Ideal Components Bag / Skeleton for DateTimeFormat #1317

gregtatum opened this issue Nov 18, 2021 · 19 comments
Assignees
Labels
C-datetime Component: datetime, calendars, time zones help wanted Issue needs an assignee S-epic Size: Major project (create smaller child issues) T-core Type: Required functionality

Comments

@gregtatum
Copy link
Member

gregtatum commented Nov 18, 2021

This is a meta issue to track implementing the "ideal components bag" as laid out in the DateTimeFormat Deep Dive 2021-10-01 design document. Originally there was some discussion to have this replace the current components bag, but it is to be implemented alongside the existing components bag. A better name can be bikeshed if needed.

The following need to be completed.

@sffc
Copy link
Member

sffc commented Jan 27, 2022

@gregtatum will provide mentorship.

@zbraniecki
Copy link
Member

I'm spreading the word about this issue looking for candidates.

More details:

Description: Currently, DateTimeFormat has two ways to select the right format, both of them are imperfect. We believe we have a balanced novel solution that, once implemented, will become the foundational use of the DateTimeFormat.
Scope: We believe that the initial implementation should take one person several (2-3) months to implement. Hopefully in time for ICU4X 1.0.
Mentorship: This project is well staffed on the mentorship side with @gregtatum from Mozilla, @sffc from Google and @zbraniecki from Amazon willing to invest time to mentor the engineer who'll pick it.
How to start: If you are interested in the project, comment in this issue or join unicode-org.slack.com #icu4x and we'll get you on-ramped.

@ozghimire
Copy link
Contributor

ozghimire commented Feb 10, 2022

I'm interested to work on this issue.

@zbraniecki
Copy link
Member

@gregtatum are you still open to mentor?

@pdogr
Copy link
Contributor

pdogr commented Feb 11, 2022

If this issue is still open, I'm definitely interested to work on this.

@gregtatum
Copy link
Member Author

@ozghimire Great! How would you prefer to get started? There is a document linked above outlining the strategy which should discuss how to get things going. I would suggest starting with #1318. I will fill in more details on that issue.

@pdogr I think ozghimire is taking the first step on this to move it forward, and it's hard to parallelize this initial step, but there will probably be work to help out on around the issues. You could take another DateTimeFormat issue to get onboarded. I'm sure there will be opportunities to help in the short term. #1581 would be a good bug to onboard with if you wanted to take it.

@randomicon00
Copy link

Hello @gregtatum, are still looking for contributors?

@sffc sffc added discuss Discuss at a future ICU4X-SC meeting and removed v1 labels Apr 1, 2022
sffc added a commit that referenced this issue Aug 22, 2024
…FieldLength variants (#5392)

#1317

This makes it such that adding or removing fractional second digits
involves changing one FieldSymbol, rather than going and mutating the
choice of fields in the pattern. Simpler and less error-prone at
runtime.

The parsing of "ss.SSS" is moved into the parser, where it belongs,
rather than handling it at format time.
sffc added a commit that referenced this issue Aug 22, 2024
@sffc
Copy link
Member

sffc commented Aug 28, 2024

I'm going to delete the benches for the old code soon, but here is a snapshot of how they currently compare, running on the same data but with different APIs:

datetime/zoned_datetime_overview
                        time:   [32.527 µs 32.563 µs 32.616 µs]
Found 7 outliers among 100 measurements (7.00%)
  5 (5.00%) high mild
  2 (2.00%) high severe

datetime/neoneo/datetime_zoned_datetime_overview
                        time:   [32.503 µs 32.565 µs 32.635 µs]
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild

datetime/datetime_components
                        time:   [365.04 µs 365.44 µs 365.89 µs]
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high severe

datetime/neoneo/datetime_components
                        time:   [322.48 µs 323.54 µs 325.27 µs]
Found 7 outliers among 100 measurements (7.00%)
  4 (4.00%) high mild
  3 (3.00%) high severe

datetime/datetime_lengths
                        time:   [36.500 µs 36.572 µs 36.669 µs]
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high severe

datetime/neoneo/datetime_lengths
                        time:   [38.260 µs 38.286 µs 38.312 µs]

In words, peformance on lengths and lengths_with_zones is basically identical, and performance on components is improved from 365 to 323 (a 12% improvement).

The primary metrics that my work has been focused on are memory use and binary/data size, so it's nice to see that we could achieve those while also getting a small improvement on benchmark performance as well.

@sffc
Copy link
Member

sffc commented Sep 4, 2024

Some quick binary size analysis after migrating tui to neo datetime:

Compilation: cargo +nightly-2024-07-23 build -Z build-std=std,panic_abort -Z build-std-features=panic_immediate_abort --target x86_64-unknown-linux-gnu --profile=release-opt-size --examples --workspace --features serde --exclude icu_provider_export --exclude icu_capi

Binary size breakdown of tui (including data):

1.x DateTime Neo DateTime
Total Binary Size 4,661,520 1,598,032
Code Size 104,401 92,683
Data Size (.rodata) 2,933,616 1,199,840
Data Size (.data.rel.ro) 1,030,024 146,344

How I calculated this: I opened both binaries in IDA Pro and measured the length of the segments.

What is .rodata vs .data.rel.ro: I gather that .rodata are the pure read-only binary data such as static strings (and I see some human-readable strings in that section), whereas .data.rel.ro are the stack frames of static variables, i.e. the baked data structs.

Why the numbers don't add up: the ELF contains some sections which are neither pure code nor pure data. The biggest section not included in the numbers above is used for Relocation.

The reduced data is because previously the binary was including data it didn't need for formatting, such as weekday names and certain types of time zone names, and because the new pattern layout is more efficient. I'm also pleased to see the reduced code size, which is probably due to less branching overall.

@sffc
Copy link
Member

sffc commented Sep 13, 2024

Discussion with @robertbastian (mostly obsolete given #5538):

  • @sffc I posted in Slack my plans here. I was planning to finish the Rust API migration (which includes year marker merging, renaming, and changing the signature of the NeoOptions to what we agreed), and then migrate FFI, and then finally delete the old code.
  • @robertbastian I think we should prioirtize deleting the old code. When I was working on time zones, it wasn't clear to me what was new and what was old. It's a lot of maintenance cost.
  • @sffc I'm okay with a short-term solution where I delete the old FFI, comment-out the FFI tests, and then I can delete the old Rust API. Then when I add the new FFI, I add back the tests.
  • @robertbastian Maybe make the FFI methods panic instead of deleting, so that we keep the API shape? And then you can still compile the C/C++ tests; you just don't run them.

Suggestion for 2.0 FFI:

enum DateFieldSets { }
enum CalendarPeriodFieldSets { }
enum TimeFieldSets { }
enum ZoneFieldSets { }

struct CompositeFieldSet {
    date: Option<DateFieldSets>,
    // ...
}

struct DateTimeOptionsV1 {
    length: Option<Length>,
    year_style: Option<YearStyle>,
    fractional_second_digits: Option<FractionalSecondDigits>,
    // ...
}

// ALL NEW!
opaque DateTimeFormatter {
    pub fn create(&Locale, CompositeFieldSet, DateTimeOptionsV1) -> Result<>
}

// ALL NEW!
opaque GregorianDateTimeFormatter {
    pub fn create(&Locale, CompositeFieldSet, DateTimeOptionsV1) -> Result<>
}

opaque DateFormatter { ... } // icu::datetime::Formatter<YMD>

opaque GregorianDateFormatter { ... } // icu::datetime::Formatter<YMD>

opaque TimeFormatter { ... } // icu::datetime::Formatter<TimeComponents>

Types removed:

  • ZonedDateTimeFormatter
  • GregorianZonedDateTimeFormatter

Note: I want to add more field-set-specific types, but @Manishearth suggests doing this after we have better namespacing post-2.0, and I agree.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-datetime Component: datetime, calendars, time zones help wanted Issue needs an assignee S-epic Size: Major project (create smaller child issues) T-core Type: Required functionality
Projects
None yet
Development

No branches or pull requests

10 participants