Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ICU4x number formatting #303

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

JasperDeSutter
Copy link
Contributor

Adds basic decimal formatting for FluentNumber.as_string(). Many features are still missing like currency symbols and percentage formatting. #269

Since data providers need to be managed explicitly, it needs to be handed to the memoizer to be able to construct a formatter. This required a new argument + associated type on Memoizable.

Something to figure out is where the data comes from. I've went with a simple FluentBundle.set_number_format_provider, which seems most flexible to me. I've added a test that uses icu_testdata for this, but other providers should work as well.
https://icu4x.unicode.org/doc/icu/#data-management

@gregtatum gregtatum self-requested a review December 19, 2022 16:02
Copy link
Member

@gregtatum gregtatum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thought: I'm not done reviewing this, but figured I would leave some early feedback. The implementation is looking pretty good in my initial review. I still need to complete a more thorough one.

This is going to take some careful consideration to merge since it can impact binary sizes. In addition this requires the users to have a data provider, which could be pretty significant. I'm wondering if this feature might need to start out as a trait.

Firefox is really sensitive to binary size increases, so I wouldn't feel comfortable merging until I knew we could manage the dependency changes and the additional data requirements. If Firefox has these issues, then it's possible other implementations will as well.

There are essentially 2 places new binary size can increase here, the dependency executable code, and in the data in the data provider. It would be good to measure both of these. I wonder if you just shipped with an en-001 locale (or equivalent) as a single locale, if it would be the equivalent behavior as what's going on now in Firefox and the fluent-rs project.

(I'm not the original author of this code, so I'd have to read through the code to fully know the answer here.)

Do you have any thoughts in this space @JasperDeSutter?

@gregtatum gregtatum self-requested a review December 20, 2022 22:09
@JasperDeSutter
Copy link
Contributor Author

Yeah I agree the increased binary size is the probably the main concern for incorporating ICU components.

The way I've implemented it currently allows for optionally providing the data as a user, falling back to basic number formatting when no provider is available. As such there shouldn't be a size increase from this part for consumers that don't want it.
This does come with a usability impact for users of fluent-rs though, since they now need to know about ICU4X and its data generation tooling.
We could opt for including the data behind a feature flag within fluent (the rust code generation solution looks quite nice for this), so that it becomes much easier for users to choose with or without.

The formatting code and supporting ICU libraries are unconditionally included though. I'm sure they have a sizable impact on binary size and compile time seeing the number of crates pulled in, I'll take a look into profiling binary sizes. We could use the same feature flag for conditionally depending on these libraries as for including the data.

Note that if we would want to also use ICU4X for plural rules or locale fallback, the story changes. Since these are required for the working of fluent message formatting we couldn't just put icu libraries behind a feature flag anymore.

@JasperDeSutter
Copy link
Contributor Author

I compared the average binary size of all examples in fluent-bundle: (stripped release build)

605KB - without decimal formatting (main branch): 100%
643KB - with number formatting code, no data: +6%
780KB - with decimal and fallback data for all locales: +29%
716KB - with icu_testdata (icu_decimal feature, only test locales): +18%

To me it looks like a good idea to not include data, but I'm not sure whether a feature flag is warranted for the formatting code.

Copy link
Member

@gregtatum gregtatum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me it looks like a good idea to not include data, but I'm not sure whether a feature flag is warranted for the formatting code.

I agree. 38kb is not too bad in terms of binary size increase. Thanks for quantifying it! That's really helpful. Firefox plans on eventually integrating ICU4X, but there's not a concrete written plan yet. The binary increase seems acceptable, and eventually Firefox will have its own data provider, and will control that data cost.

I'd like to get some feedback with other Mozillians in this space, and we're hitting the holidays where people are out.

I think the next steps here would be to add the trait in just for the data, and make sure the ICU4X bits are well documented with suggestions on the proper way to integrate the data. I might run this by for feedback from the ICU4X team after soliciting feedback from other Mozillians.

I'm marking request changes since there's actionable feedback already, feel free to re-request review if you would like a more thorough code review at this point before adding the trait in.

@gregtatum
Copy link
Member

I'd like to get some feedback with other Mozillians in this space, and we're hitting the holidays where people are out.

Also to be clear here, I'd like to wait until after the holidays at least before merging since people are probably out since this affects the architecture of this project and Firefox. I think any code updates and reviews can happen before then, but I just wanted to set expectations.

@gregtatum
Copy link
Member

I'm closing as stale since there were requested changes. Feel free re-open.

@JasperDeSutter JasperDeSutter force-pushed the icu-number-formatting branch from 43e798c to 55f6483 Compare May 9, 2024 13:45
@fee1-dead
Copy link

any chance we could get a review on this?

@alerque
Copy link
Collaborator

alerque commented May 18, 2024

@fee1-dead Sure, go ahead and review. Test results welcome. Only half jesting, if you want to help move things along the best way to do that is contribute by reviewing and testing yourself and reporting results, not nagging other volunteers.

Also be aware that because this introduces a bunch of new dependencies we'll probably have to hold off on merging until we can base it on or at least compare it with what is happening in #335. This work should jive pretty well with that, but it would be hard to give it a blank yes and merge before we have a solid idea how other ICU4X integration is going to work.

@JasperDeSutter JasperDeSutter force-pushed the icu-number-formatting branch from 15f5d13 to 5812fda Compare May 18, 2024 21:06
Copy link

@fee1-dead fee1-dead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@alerque
Copy link
Collaborator

alerque commented Aug 28, 2024

@fee1-dead Did you review this in context with other ICU4x related work such as in #335 and the discussions in a number of related issues. We could use help reviewing and sorting out all the moving pieces, but just saying this one PR "lgtm" in isolation doesn't move the ball forward much.

@fee1-dead
Copy link

if I actually do get time I can review the other PR, but at a first glance this PR is isolated enough (only uses icu4x for number formatting) I don't think it would be very probable for a conflict with that effort to use icu4x in more places.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants