Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling of float inputs for PluralRules and FixedDecimal #191

Closed
zbraniecki opened this issue Jul 31, 2020 · 31 comments · Fixed by #2104
Closed

Handling of float inputs for PluralRules and FixedDecimal #191

zbraniecki opened this issue Jul 31, 2020 · 31 comments · Fixed by #2104
Assignees
Labels
A-design Area: Architecture or design A-performance Area: Performance (CPU, Memory) C-pluralrules Component: Plural rules T-enhancement Type: Nice-to-have but not required

Comments

@zbraniecki
Copy link
Member

In the original PR I first added and then removed handling of float inputs - f589e62

The rationale laid by @sffc is that float is a likely bad input for operands because it may loose fractional precision, and if someone has a float they should use FixedDecimal before they construct operands.

I'd like to see how integration of FixedDecimal goes and see what we'll end up with in terms of ergonomics and performance.

I'll test it on integration into fluent-rs where we have FluentNumber, plural rules etc.
The (very mocky) code we have now is here: https://github.com/projectfluent/fluent-rs/blob/master/fluent-bundle/src/types/number.rs#L123-L236

Once we have this all plugged in, I'd like to try to switch to icu4x and see how it works.

@zbraniecki zbraniecki added T-enhancement Type: Nice-to-have but not required A-performance Area: Performance (CPU, Memory) A-design Area: Architecture or design C-pluralrules Component: Plural rules labels Jul 31, 2020
@sffc sffc added the discuss Discuss at a future ICU4X-SC meeting label Aug 14, 2020
@sffc
Copy link
Member

sffc commented Aug 27, 2020

Plural rules selection on floating points needs to know the number of trailing zeros (minimumFractionDigits). That quantity could come from the user, or it could come from locale data (such as for currencies).

Let's revisit this in advance of v1 after NumberFormat is finished, and see if that covers all of the use cases.

@sffc sffc closed this as completed Aug 27, 2020
@sffc sffc removed the discuss Discuss at a future ICU4X-SC meeting label Sep 3, 2020
@sffc sffc reopened this Sep 4, 2020
@filmil
Copy link
Contributor

filmil commented Oct 13, 2020

Related: #340

@sffc sffc changed the title Handling of float inputs for PluralRules Handling of float inputs for PluralRules and FixedDecimal Oct 20, 2020
@nordzilla
Copy link
Member

@sffc

I want to touch base with you regarding my plans to benchmark PluralRules in SpiderMonkey, especially as it relates to #166. Part of my confusion with the usage of FixedDecimal is that we currently don't have a way to construct a FixedDecimal directly from a float.

It looks like the way to create a FixedDecimal is to turn your float into a string, and then create the FixedDecimal from a string.

As we discussed in #810 we need to support some form of PuralRules::select() on floats for ECMA-402 compatibility.

I am planning to set up three different benchmarks:

  1. Reinstate @zbraniecki's deleted PluralOperands::from(f64) from this commit, and benchmark SpiderMonkey with direct float to PluralOperands.

  2. Implement Select creating a string from a float, then creating a FixedDecimal from a string.

  3. Implement step 2) using ryu to do the float-to-string conversion.

I think these numbers should provide an interesting perspective into the performance of FixedDecimal.

I wanted to gather any other thoughts/advice you have about this plan.

@sffc
Copy link
Member

sffc commented Jun 22, 2021

We discuss extensively in #166 the many reasons why we don't support floats in ICU4X.

I would not be surprised if reinstating Zibi's PluralOperands::from(f64) would have some wins on benchmarks. But, performance is not the reason we removed it. My position is that we both do not want and do not need that function in order to support ECMA-402.

Aside from the concerns I've raised about floats' lack of ability to represent trailing zeros, ECMA-402 formats floats to formatted decimals before performing plural rule selection (e.g., by applying minimumFractionDigits and maximumFractionDigits). So, passing a float to PluralRules doesn't do what ECMA-402 does in the first place.

@nordzilla
Copy link
Member

I have no doubt that reinstating Zibi's PluralOperands::from(f64) would win on benchmarks. But, performance is not the reason we removed it. My position is that we both do not want and do not need that function in order to support ECMA-402.

I understand that. I'm not saying that if PluralOperands::from(f64) has the fastest benchmark, then that's what we need to do. More so, I think it would be good to know how much slower (if much at all) FixedDecimal is compared to that.

And importantly, (sorry to not be as clear about this) I'm interested in your opinion about the best ways to create FixedDecimal from float. Is float => ryu(float) -> string => string -> FixedDecimal the optimal path that we have? Or can we try creating a FixedDecimal directly from a float?

As you mentioned, I wasn't around for all of the early FixedDecimal conversations, so I'm trying to gather a bit of background knowledge here.

@sffc
Copy link
Member

sffc commented Jun 22, 2021

Once #166 is fixed, there will be two ways to create FixedDecimal from float:

  1. float => (your choice of float-to-string, either ryu or standard library) => FixedDecimal
  2. float => ryu_floating_decimal => FixedDecimal

You can do (2) now; for example code, see #724 (I expect to migrate that code to a documentation example). Of course, this comes at the cost of adding a dependency on ryu_floating_decimal which may or may not be acceptable for you, but that's a business decision your team can make.

Option (1) almost works, except our impl FromStr for FixedDecimal does not currently handle the exponent e field common in ryu and standard library string formats. Fixing #166 (which is "help wanted") will involve handling the e, as well as adding documentation for all of this.

@sffc
Copy link
Member

sffc commented Jun 22, 2021

Or can we try creating a FixedDecimal directly from a float?

That would be the holy grail. The problem is that it isn't actually possible, because floats are stored in base 2, but FixedDecimal is base 10 (and PluralRules needs base 10 as well), so we need to perform a base conversion. Converting to a string is the most common way to express a float in base 10, but crates like ryu_floating_decimal give alternative representations.

@nordzilla
Copy link
Member

Thanks for all of that background! It's very helpful.

@dminor
Copy link
Contributor

dminor commented Jun 23, 2021

We're also planning to benchmark NumberFormat in Gecko backed by FixedDecimalFormat. ICU4C NumberFormat supports formatting directly from a double, and that is the most common use case in Gecko, int64_t and string inputs are used for BigInt representations.

I get that what NumberFormat does is separate from the concerns here about PluralRules, but I think we'll eventually want to support NumberFormat from double for performance reasons.

@sffc
Copy link
Member

sffc commented Jun 23, 2021

To be clear: I support the idea of adding APIs for correctly supporting floats in principle, but there are technical challenges that are not yet resolved. We discussed those technical issues in our meeting last Thursday. See #166.

for performance reasons

Ultimately there should not be a performance difference in the general case if the double-to-decimal conversion happens in userland or in our library. There is simply not a way to "just format a double". The standing conclusion of #166 is to leave it up to the caller to choose how they want to perform the conversion, such that they can tune their size/speed tradeoffs, etc.

@dminor
Copy link
Contributor

dminor commented Jun 23, 2021

Sorry, I missed that ICU4C is first converting to an ascii format before doing anything locale specific, so you're right, it doesn't matter if ICU4x accepts a double.

@sffc
Copy link
Member

sffc commented Jun 23, 2021

Sorry, I missed that ICU4C is first converting to an ascii format before doing anything locale specific, so you're right, it doesn't matter if ICU4x accepts a double.

Right. More specifically, ICU4C uses https://github.com/google/double-conversion in the general case, which converts a double to an ASCII string, with a fast path for small doubles. I would consider the small double fast path to be in scope for ICU4X if we wanted it.

@nordzilla
Copy link
Member

nordzilla commented Jul 12, 2021

@sffc

It's still unclear to me what our final vision is for handling significant digits, fraction digits etc. using a Float => Ryu(String) => FixedDecimal => PluralRules::Select workflow.

Ryu doesn't look like it takes any inputs to format() that suggest you can change these properties of the formatted string.

Our current FFI doesn't exposes any functionality to change these properties. I don't see any documentations or discussions specifically about this (but sorry if I missed them). As you have brought up, we need to differentiate 1 from 1.0 in the en locale for example.

@sffc
Copy link
Member

sffc commented Jul 13, 2021

Let's try to discuss options for double-to-decimal in #166, and keep this thread focused on the problem as it relates specifically to PluralRules.

@zbraniecki
Copy link
Member Author

@sffc am I correct that the only way to get PluralOperands from f64 is

f64 -> some to string -> FixedDecimal -> PluralOperands.

We did not secure a route that wouldn't require stringification yet.
Do we want to conclude here? I feel like it is suboptimal and we should continue pursuing a solution that doesn't require stringification.

It may involve waiting for some intermittent represtation that ryu did not expose will show up, or it may involve forking ryu, but I hope we will keep looking for one.

Does that match your mental model?

@sffc
Copy link
Member

sffc commented Mar 19, 2022

#166 has most of the discussion on this, and it was fixed a few months ago by Manish. You can now do FixedDecimal::new_from_f64 without the explicit string step. Under the hood it still uses a string unfortunately because ryu does not want to add a non-string API. We do not currently have plans to fork ryu since Shadaj found the performance impact to be small (which surprised me), and the crate is already zero-alloc. Again, #166 is where to look for the discussion.

@zbraniecki
Copy link
Member Author

Ah, thank you!

I'm wondering why Manish chose new_from_f64 instead of from_f64 for the name (or even From<(f64, DoublePrecision)>).

I also think that https://unicode-org.github.io/icu4x-docs/doc/fixed_decimal/decimal/struct.FixedDecimal.html#method.new_from_f64 and https://unicode-org.github.io/icu4x-docs/doc/fixed_decimal/decimal/enum.DoublePrecision.html would benefit from examples that show different result than Maximum to help user decide between them (since it's likely going to be where they get introduced to the enum and the concept behind it).

@sffc
Copy link
Member

sffc commented Mar 19, 2022

Yeah, I'd like to go through and improve the docs.

I don't remember why new_from_f64 was chosen over the alternatives.

@zbraniecki
Copy link
Member Author

@Manishearth:

  • is there a reason DoublePrecision has no default (should Maximum be a default)?
  • could we have TryFrom<f64> and/or `TryFrom<(f64, DoublePrecision)>?
  • Why new_from_f64 rather than from_f64?

@Manishearth
Copy link
Member

  • is there a reason DoublePrecision has no default (should Maximum be a default)?

No reason

  • could we have TryFrom<f64> and/or `TryFrom<(f64, DoublePrecision)>?

@sffc, thoughts? I'm against conversion from a tuple, it's not very Rusty (just use a function). Somewhat okay with TryFrom but unsure if we should be picking that default for users. Plus all of this is behind ryu and it's less noticeable if a conversion trait impl is behind a feature.

  • Why new_from_f64 rather than from_f64?

No strong reason.

@sffc
Copy link
Member

sffc commented Mar 21, 2022

As expressed elsewhere on multiple occasions, I am strongly opposed to anything that involves making an implicit default value for DoublePrecision. I had thought that this discussion was settled.

@Manishearth
Copy link
Member

Ah, sorry, I hadn't recalled it. Personally I have no strong opinion there.

@zbraniecki
Copy link
Member Author

As expressed elsewhere on multiple occasions, I am strongly opposed to anything that involves making an implicit default value for DoublePrecision. I had thought that this discussion was settled.

Understood. This refutes my drive to TryFrom. I'd only suggest renaming from new_from_f64 to from_f64. and document more why DoublePrecision has no default and how developers should chose which one to use.

@sffc sffc added this to the ICU4X 1.0 milestone Apr 1, 2022
@sffc sffc removed backlog labels Apr 1, 2022
@sapriyag sapriyag added the discuss Discuss at a future ICU4X-SC meeting label May 25, 2022
@sffc sffc added discuss-priority Discuss at the next ICU4X meeting and removed discuss Discuss at a future ICU4X-SC meeting labels Jun 9, 2022
@sffc
Copy link
Member

sffc commented Jun 9, 2022

Given the emerging design for rounding in FixedDecimal, I'd like to propose a change to the try_from_f64 functions. I would like to not put the RoundingMode in these functions, because (1) it unnecisarilly complicates the API, including for FFI, and (2) it may have code size impact since the rounding mode code paths are different.

Currently, the DoublePrecision enum is:

pub enum DoublePrecision {
    Integer,
    Magnitude(i16, RoundingMode),
    SignificantDigits(u8, RoundingMode),
    Floating,
}

I'd like to change it to:

pub enum DoublePrecision {
    IntegerExact,
    Magnitude(i16),
    SignificantDigits(u8),
    Floating,
}

Semantics:

  • IntegerExact returns an error if the float is not integer-valued
  • Magnitude(i16) and SignificantDigits(u8) always use "halfTrunc" rounding (could be "halfExpand" or "halfEven" if people prefer), adding trailing zeros if necessary
  • Floating retains the full precision without trailing zeros

This covers my main concern, which is that the first choice for people using floats should be that they specify the precision of the float. However, if they want behavior that isn't in the prescribed list, they can use Floating, and we can document the impact of this choice.

@Manishearth
Copy link
Member

This works for me.

@zbraniecki
Copy link
Member Author

wfm too

@sffc
Copy link
Member

sffc commented Jun 9, 2022

To discuss: which rounding mode to use implicitly on the Magnitude and SignificantDigits options. Three choices:

The code size and performance difference between "halfTrunc" and "halfEven" is likely small, but I haven't measured it.

@sffc
Copy link
Member

sffc commented Jun 9, 2022

Another option to consider: we could split the function into four,

  • try_from_f64_integer_exact()
  • try_from_f64_with_magnitude()
  • try_from_f64_with_significant_digits()
  • try_from_f64_with_floating_precision()

An advantage here is that try_from_f64_integer_exact() can be exposed without ryu since it's fairly easy to implement manually. The other three require ryu.

@sffc
Copy link
Member

sffc commented Jun 9, 2022

An advantage here is that try_from_f64_integer_exact() can be exposed without ryu since it's fairly easy to implement manually. The other three require ryu.

Scratch that. There's a fast path for integer-valued floats that are less than MAX_SAFE_VALUE, but for integer-valued floats larger than that, we need ryu. The fast path is just casting the float to a u64, which clients can do externally.

The fast path that we use in ICU is for f64 with fewer than ~12 significant digits. We could in the future add a ryu-less function or variant that implements the fast path and fixes the f64 to ~12 digits.

@sffc
Copy link
Member

sffc commented Jun 17, 2022

  • @robertbastian - Could we make the number of significant digits a const parameter?
  • @younies - When users create FixedDecimal from f64, we are usually going to need a rounding mode. We should have a default one and allow users to modify it.
  • @sffc - If a user wants a non-default rounding mode, they can use the FixedDecimal like a builder-style API (which is a footgun in the general case, but OK if users read the docs first)
  • @Manishearth - That SGTM
  • @sffc - What is the default mode?
  • @Manishearth - I generally expect halfExpand
  • @younies - I think halfCeiling
  • @sffc - Is there a significant difference in complexity?
  • @younies - There's not much of a difference except for truncate
  • @sffc - I think I would prefer to use one of the ones in IEEE 754, which is halfExpand or halfEven. I prefer halfEven.
  • @Manishearth - I think it should be either truncate (which is what you do when you cast to an integer) or halfEven/halfCeiling (which is what normal people expect)
  • @sffc - Another advantage of halfEven is that it results in more mathematically sound distributions.
  • @younies - It only matters when you are at exactly 0.5. So the decision is fine.
  • @Manishearth - LGTM
  • @zbraniecki - LGTM

@sffc
Copy link
Member

sffc commented Jun 17, 2022

  • @sffc - How about multiple functions versus enum?
  • @Manishearth - We need multiple functions over FFI. SO it's only a Rust question.
  • @zbraniecki - I have a slight preference for single function, but I could be convinced if there are code size wins.
  • ??? - A feature could enable more enum variants.
  • @Manishearth - That's okay only if the enum is non_exhaustive.
  • @sffc - I think I formulated my preference. We should have separate functions because (1) closest to FFI, (2) some small code size wins, and (3) most consistent when we add the no-ryu function in the future
  • @younies okay.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-design Area: Architecture or design A-performance Area: Performance (CPU, Memory) C-pluralrules Component: Plural rules T-enhancement Type: Nice-to-have but not required
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants