Handling of float inputs for PluralRules and FixedDecimal #191

zbraniecki · 2020-07-31T20:07:09Z

In the original PR I first added and then removed handling of float inputs - f589e62

The rationale laid by @sffc is that float is a likely bad input for operands because it may loose fractional precision, and if someone has a float they should use FixedDecimal before they construct operands.

I'd like to see how integration of FixedDecimal goes and see what we'll end up with in terms of ergonomics and performance.

I'll test it on integration into fluent-rs where we have FluentNumber, plural rules etc.
The (very mocky) code we have now is here: https://github.com/projectfluent/fluent-rs/blob/master/fluent-bundle/src/types/number.rs#L123-L236

Once we have this all plugged in, I'd like to try to switch to icu4x and see how it works.

The text was updated successfully, but these errors were encountered:

sffc · 2020-08-27T18:26:41Z

Plural rules selection on floating points needs to know the number of trailing zeros (minimumFractionDigits). That quantity could come from the user, or it could come from locale data (such as for currencies).

Let's revisit this in advance of v1 after NumberFormat is finished, and see if that covers all of the use cases.

filmil · 2020-10-13T17:51:27Z

Related: #340

nordzilla · 2021-06-22T18:30:30Z

@sffc

I want to touch base with you regarding my plans to benchmark PluralRules in SpiderMonkey, especially as it relates to #166. Part of my confusion with the usage of FixedDecimal is that we currently don't have a way to construct a FixedDecimal directly from a float.

It looks like the way to create a FixedDecimal is to turn your float into a string, and then create the FixedDecimal from a string.

As we discussed in #810 we need to support some form of PuralRules::select() on floats for ECMA-402 compatibility.

I am planning to set up three different benchmarks:

Reinstate @zbraniecki's deleted PluralOperands::from(f64) from this commit, and benchmark SpiderMonkey with direct float to PluralOperands.
Implement Select creating a string from a float, then creating a FixedDecimal from a string.
Implement step 2) using ryu to do the float-to-string conversion.

I think these numbers should provide an interesting perspective into the performance of FixedDecimal.

I wanted to gather any other thoughts/advice you have about this plan.

sffc · 2021-06-22T19:46:14Z

We discuss extensively in #166 the many reasons why we don't support floats in ICU4X.

I would not be surprised if reinstating Zibi's PluralOperands::from(f64) would have some wins on benchmarks. But, performance is not the reason we removed it. My position is that we both do not want and do not need that function in order to support ECMA-402.

Aside from the concerns I've raised about floats' lack of ability to represent trailing zeros, ECMA-402 formats floats to formatted decimals before performing plural rule selection (e.g., by applying minimumFractionDigits and maximumFractionDigits). So, passing a float to PluralRules doesn't do what ECMA-402 does in the first place.

nordzilla · 2021-06-22T20:00:45Z

I have no doubt that reinstating Zibi's PluralOperands::from(f64) would win on benchmarks. But, performance is not the reason we removed it. My position is that we both do not want and do not need that function in order to support ECMA-402.

I understand that. I'm not saying that if PluralOperands::from(f64) has the fastest benchmark, then that's what we need to do. More so, I think it would be good to know how much slower (if much at all) FixedDecimal is compared to that.

And importantly, (sorry to not be as clear about this) I'm interested in your opinion about the best ways to create FixedDecimal from float. Is float => ryu(float) -> string => string -> FixedDecimal the optimal path that we have? Or can we try creating a FixedDecimal directly from a float?

As you mentioned, I wasn't around for all of the early FixedDecimal conversations, so I'm trying to gather a bit of background knowledge here.

sffc · 2021-06-22T21:06:47Z

Once #166 is fixed, there will be two ways to create FixedDecimal from float:

float => (your choice of float-to-string, either ryu or standard library) => FixedDecimal
float => ryu_floating_decimal => FixedDecimal

You can do (2) now; for example code, see #724 (I expect to migrate that code to a documentation example). Of course, this comes at the cost of adding a dependency on ryu_floating_decimal which may or may not be acceptable for you, but that's a business decision your team can make.

Option (1) almost works, except our impl FromStr for FixedDecimal does not currently handle the exponent e field common in ryu and standard library string formats. Fixing #166 (which is "help wanted") will involve handling the e, as well as adding documentation for all of this.

sffc · 2021-06-22T21:09:47Z

Or can we try creating a FixedDecimal directly from a float?

That would be the holy grail. The problem is that it isn't actually possible, because floats are stored in base 2, but FixedDecimal is base 10 (and PluralRules needs base 10 as well), so we need to perform a base conversion. Converting to a string is the most common way to express a float in base 10, but crates like ryu_floating_decimal give alternative representations.

nordzilla · 2021-06-22T21:13:25Z

Thanks for all of that background! It's very helpful.

dminor · 2021-06-23T11:30:27Z

We're also planning to benchmark NumberFormat in Gecko backed by FixedDecimalFormat. ICU4C NumberFormat supports formatting directly from a double, and that is the most common use case in Gecko, int64_t and string inputs are used for BigInt representations.

I get that what NumberFormat does is separate from the concerns here about PluralRules, but I think we'll eventually want to support NumberFormat from double for performance reasons.

sffc · 2021-06-23T16:18:10Z

To be clear: I support the idea of adding APIs for correctly supporting floats in principle, but there are technical challenges that are not yet resolved. We discussed those technical issues in our meeting last Thursday. See #166.

for performance reasons

Ultimately there should not be a performance difference in the general case if the double-to-decimal conversion happens in userland or in our library. There is simply not a way to "just format a double". The standing conclusion of #166 is to leave it up to the caller to choose how they want to perform the conversion, such that they can tune their size/speed tradeoffs, etc.

dminor · 2021-06-23T16:40:51Z

Sorry, I missed that ICU4C is first converting to an ascii format before doing anything locale specific, so you're right, it doesn't matter if ICU4x accepts a double.

sffc · 2021-06-23T17:05:51Z

Sorry, I missed that ICU4C is first converting to an ascii format before doing anything locale specific, so you're right, it doesn't matter if ICU4x accepts a double.

Right. More specifically, ICU4C uses https://github.com/google/double-conversion in the general case, which converts a double to an ASCII string, with a fast path for small doubles. I would consider the small double fast path to be in scope for ICU4X if we wanted it.

nordzilla · 2021-07-12T17:32:42Z

@sffc

It's still unclear to me what our final vision is for handling significant digits, fraction digits etc. using a Float => Ryu(String) => FixedDecimal => PluralRules::Select workflow.

Ryu doesn't look like it takes any inputs to format() that suggest you can change these properties of the formatted string.

Our current FFI doesn't exposes any functionality to change these properties. I don't see any documentations or discussions specifically about this (but sorry if I missed them). As you have brought up, we need to differentiate 1 from 1.0 in the en locale for example.

sffc · 2021-07-13T05:26:59Z

Let's try to discuss options for double-to-decimal in #166, and keep this thread focused on the problem as it relates specifically to PluralRules.

zbraniecki · 2022-03-19T01:12:48Z

@sffc am I correct that the only way to get PluralOperands from f64 is

f64 -> some to string -> FixedDecimal -> PluralOperands.

We did not secure a route that wouldn't require stringification yet.
Do we want to conclude here? I feel like it is suboptimal and we should continue pursuing a solution that doesn't require stringification.

It may involve waiting for some intermittent represtation that ryu did not expose will show up, or it may involve forking ryu, but I hope we will keep looking for one.

Does that match your mental model?

sffc · 2022-03-19T02:13:40Z

#166 has most of the discussion on this, and it was fixed a few months ago by Manish. You can now do FixedDecimal::new_from_f64 without the explicit string step. Under the hood it still uses a string unfortunately because ryu does not want to add a non-string API. We do not currently have plans to fork ryu since Shadaj found the performance impact to be small (which surprised me), and the crate is already zero-alloc. Again, #166 is where to look for the discussion.

zbraniecki · 2022-03-19T07:33:51Z

Ah, thank you!

I'm wondering why Manish chose new_from_f64 instead of from_f64 for the name (or even From<(f64, DoublePrecision)>).

I also think that https://unicode-org.github.io/icu4x-docs/doc/fixed_decimal/decimal/struct.FixedDecimal.html#method.new_from_f64 and https://unicode-org.github.io/icu4x-docs/doc/fixed_decimal/decimal/enum.DoublePrecision.html would benefit from examples that show different result than Maximum to help user decide between them (since it's likely going to be where they get introduced to the enum and the concept behind it).

sffc · 2022-03-19T15:32:17Z

Yeah, I'd like to go through and improve the docs.

I don't remember why new_from_f64 was chosen over the alternatives.

zbraniecki · 2022-03-21T21:36:57Z

@Manishearth:

is there a reason DoublePrecision has no default (should Maximum be a default)?
could we have TryFrom<f64> and/or `TryFrom<(f64, DoublePrecision)>?
Why new_from_f64 rather than from_f64?

Manishearth · 2022-03-21T21:59:35Z

is there a reason DoublePrecision has no default (should Maximum be a default)?

No reason

could we have TryFrom<f64> and/or `TryFrom<(f64, DoublePrecision)>?

@sffc, thoughts? I'm against conversion from a tuple, it's not very Rusty (just use a function). Somewhat okay with TryFrom but unsure if we should be picking that default for users. Plus all of this is behind ryu and it's less noticeable if a conversion trait impl is behind a feature.

Why new_from_f64 rather than from_f64?

No strong reason.

sffc · 2022-03-21T22:13:21Z

As expressed elsewhere on multiple occasions, I am strongly opposed to anything that involves making an implicit default value for DoublePrecision. I had thought that this discussion was settled.

Manishearth · 2022-03-21T22:47:10Z

Ah, sorry, I hadn't recalled it. Personally I have no strong opinion there.

zbraniecki · 2022-03-21T22:55:28Z

As expressed elsewhere on multiple occasions, I am strongly opposed to anything that involves making an implicit default value for DoublePrecision. I had thought that this discussion was settled.

Understood. This refutes my drive to TryFrom. I'd only suggest renaming from new_from_f64 to from_f64. and document more why DoublePrecision has no default and how developers should chose which one to use.

sffc · 2022-06-09T15:30:25Z

Given the emerging design for rounding in FixedDecimal, I'd like to propose a change to the try_from_f64 functions. I would like to not put the RoundingMode in these functions, because (1) it unnecisarilly complicates the API, including for FFI, and (2) it may have code size impact since the rounding mode code paths are different.

Currently, the DoublePrecision enum is:

pub enum DoublePrecision {
    Integer,
    Magnitude(i16, RoundingMode),
    SignificantDigits(u8, RoundingMode),
    Floating,
}

I'd like to change it to:

pub enum DoublePrecision {
    IntegerExact,
    Magnitude(i16),
    SignificantDigits(u8),
    Floating,
}

Semantics:

IntegerExact returns an error if the float is not integer-valued
Magnitude(i16) and SignificantDigits(u8) always use "halfTrunc" rounding (could be "halfExpand" or "halfEven" if people prefer), adding trailing zeros if necessary
Floating retains the full precision without trailing zeros

This covers my main concern, which is that the first choice for people using floats should be that they specify the precision of the float. However, if they want behavior that isn't in the prescribed list, they can use Floating, and we can document the impact of this choice.

Manishearth · 2022-06-09T15:32:24Z

This works for me.

zbraniecki · 2022-06-09T15:44:46Z

wfm too

sffc · 2022-06-09T15:51:31Z

To discuss: which rounding mode to use implicitly on the Magnitude and SignificantDigits options. Three choices:

"halfTrunc" is likely the smallest and fastest because it always removes digits, never adding digits
"halfEven" is generally recommended for numeric stability, and is listed as "the default for binary floating point" in https://en.wikipedia.org/wiki/IEEE_754#Rounding_rules
"halfExpand" is the ECMA-262 default and is listed as "an option for decimal floating point" in https://en.wikipedia.org/wiki/IEEE_754#Rounding_rules

The code size and performance difference between "halfTrunc" and "halfEven" is likely small, but I haven't measured it.

sffc · 2022-06-09T15:56:08Z

Another option to consider: we could split the function into four,

try_from_f64_integer_exact()
try_from_f64_with_magnitude()
try_from_f64_with_significant_digits()
try_from_f64_with_floating_precision()

An advantage here is that try_from_f64_integer_exact() can be exposed without ryu since it's fairly easy to implement manually. The other three require ryu.

sffc · 2022-06-09T18:53:18Z

An advantage here is that try_from_f64_integer_exact() can be exposed without ryu since it's fairly easy to implement manually. The other three require ryu.

Scratch that. There's a fast path for integer-valued floats that are less than MAX_SAFE_VALUE, but for integer-valued floats larger than that, we need ryu. The fast path is just casting the float to a u64, which clients can do externally.

The fast path that we use in ICU is for f64 with fewer than ~12 significant digits. We could in the future add a ryu-less function or variant that implements the fast path and fixes the f64 to ~12 digits.

sffc · 2022-06-17T18:31:20Z

@robertbastian - Could we make the number of significant digits a const parameter?
@younies - When users create FixedDecimal from f64, we are usually going to need a rounding mode. We should have a default one and allow users to modify it.
@sffc - If a user wants a non-default rounding mode, they can use the FixedDecimal like a builder-style API (which is a footgun in the general case, but OK if users read the docs first)
@Manishearth - That SGTM
@sffc - What is the default mode?
@Manishearth - I generally expect halfExpand
@younies - I think halfCeiling
@sffc - Is there a significant difference in complexity?
@younies - There's not much of a difference except for truncate
@sffc - I think I would prefer to use one of the ones in IEEE 754, which is halfExpand or halfEven. I prefer halfEven.
@Manishearth - I think it should be either truncate (which is what you do when you cast to an integer) or halfEven/halfCeiling (which is what normal people expect)
@sffc - Another advantage of halfEven is that it results in more mathematically sound distributions.
@younies - It only matters when you are at exactly 0.5. So the decision is fine.
@Manishearth - LGTM
@zbraniecki - LGTM

sffc · 2022-06-17T18:40:03Z

@sffc - How about multiple functions versus enum?
@Manishearth - We need multiple functions over FFI. SO it's only a Rust question.
@zbraniecki - I have a slight preference for single function, but I could be convinced if there are code size wins.
??? - A feature could enable more enum variants.
@Manishearth - That's okay only if the enum is non_exhaustive.
@sffc - I think I formulated my preference. We should have separate functions because (1) closest to FFI, (2) some small code size wins, and (3) most consistent when we add the no-ryu function in the future
@younies okay.

zbraniecki added T-enhancement Type: Nice-to-have but not required A-performance Area: Performance (CPU, Memory) A-design Area: Architecture or design C-pluralrules Component: Plural rules labels Jul 31, 2020

zbraniecki mentioned this issue Jul 31, 2020

Add PluralRules #119

Merged

sffc added the discuss Discuss at a future ICU4X-SC meeting label Aug 14, 2020

sffc assigned zbraniecki Aug 14, 2020

sffc added backlog labels Aug 27, 2020

sffc closed this as completed Aug 27, 2020

sffc removed the discuss Discuss at a future ICU4X-SC meeting label Sep 3, 2020

sffc reopened this Sep 4, 2020

sffc changed the title ~~Handling of float inputs for PluralRules~~ Handling of float inputs for PluralRules and FixedDecimal Oct 20, 2020

sffc mentioned this issue Mar 22, 2022

Improvements to FixedDecimal f64 APIs #1718

Merged

sffc added this to the ICU4X 1.0 milestone Apr 1, 2022

sffc removed backlog labels Apr 1, 2022

sapriyag added the discuss Discuss at a future ICU4X-SC meeting label May 25, 2022

sffc added discuss-priority Discuss at the next ICU4X meeting and removed discuss Discuss at a future ICU4X-SC meeting labels Jun 9, 2022

sffc removed the discuss-priority Discuss at the next ICU4X meeting label Jun 17, 2022

sffc assigned younies and unassigned zbraniecki Jun 17, 2022

sffc modified the milestones: ICU4X 1.0 Untriaged, ICU4X 1.0 (Polish), ICU4X 1.0 (Features) Jun 17, 2022

younies mentioned this issue Jun 24, 2022

Remove rounding mode and use half_even as a default rounding mode. #2104

Merged

younies closed this as completed in #2104 Jun 25, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handling of float inputs for PluralRules and FixedDecimal #191

Handling of float inputs for PluralRules and FixedDecimal #191

zbraniecki commented Jul 31, 2020

sffc commented Aug 27, 2020

filmil commented Oct 13, 2020

nordzilla commented Jun 22, 2021

sffc commented Jun 22, 2021 •

edited

Loading

nordzilla commented Jun 22, 2021

sffc commented Jun 22, 2021 •

edited

Loading

sffc commented Jun 22, 2021 •

edited

Loading

nordzilla commented Jun 22, 2021

dminor commented Jun 23, 2021

sffc commented Jun 23, 2021 •

edited

Loading

dminor commented Jun 23, 2021

sffc commented Jun 23, 2021

nordzilla commented Jul 12, 2021 •

edited

Loading

sffc commented Jul 13, 2021

zbraniecki commented Mar 19, 2022

sffc commented Mar 19, 2022

zbraniecki commented Mar 19, 2022

sffc commented Mar 19, 2022

zbraniecki commented Mar 21, 2022

Manishearth commented Mar 21, 2022

sffc commented Mar 21, 2022

Manishearth commented Mar 21, 2022

zbraniecki commented Mar 21, 2022

sffc commented Jun 9, 2022

Manishearth commented Jun 9, 2022

zbraniecki commented Jun 9, 2022

sffc commented Jun 9, 2022

sffc commented Jun 9, 2022

sffc commented Jun 9, 2022

sffc commented Jun 17, 2022

sffc commented Jun 17, 2022 •

edited by robertbastian

Loading

Handling of float inputs for PluralRules and FixedDecimal #191

Handling of float inputs for PluralRules and FixedDecimal #191

Comments

zbraniecki commented Jul 31, 2020

sffc commented Aug 27, 2020

filmil commented Oct 13, 2020

nordzilla commented Jun 22, 2021

sffc commented Jun 22, 2021 • edited Loading

nordzilla commented Jun 22, 2021

sffc commented Jun 22, 2021 • edited Loading

sffc commented Jun 22, 2021 • edited Loading

nordzilla commented Jun 22, 2021

dminor commented Jun 23, 2021

sffc commented Jun 23, 2021 • edited Loading

dminor commented Jun 23, 2021

sffc commented Jun 23, 2021

nordzilla commented Jul 12, 2021 • edited Loading

sffc commented Jul 13, 2021

zbraniecki commented Mar 19, 2022

sffc commented Mar 19, 2022

zbraniecki commented Mar 19, 2022

sffc commented Mar 19, 2022

zbraniecki commented Mar 21, 2022

Manishearth commented Mar 21, 2022

sffc commented Mar 21, 2022

Manishearth commented Mar 21, 2022

zbraniecki commented Mar 21, 2022

sffc commented Jun 9, 2022

Manishearth commented Jun 9, 2022

zbraniecki commented Jun 9, 2022

sffc commented Jun 9, 2022

sffc commented Jun 9, 2022

sffc commented Jun 9, 2022

sffc commented Jun 17, 2022

sffc commented Jun 17, 2022 • edited by robertbastian Loading

sffc commented Jun 22, 2021 •

edited

Loading

sffc commented Jun 22, 2021 •

edited

Loading

sffc commented Jun 22, 2021 •

edited

Loading

sffc commented Jun 23, 2021 •

edited

Loading

nordzilla commented Jul 12, 2021 •

edited

Loading

sffc commented Jun 17, 2022 •

edited by robertbastian

Loading