Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for e oprand to PluralOperands #228

Closed
zbraniecki opened this issue Sep 4, 2020 · 7 comments · Fixed by #407
Closed

Add support for e oprand to PluralOperands #228

zbraniecki opened this issue Sep 4, 2020 · 7 comments · Fixed by #407
Assignees
Labels
A-scope Area: Project scope, feature coverage C-pluralrules Component: Plural rules T-enhancement Type: Nice-to-have but not required
Milestone

Comments

@zbraniecki
Copy link
Member

In https://unicode-org.atlassian.net/browse/CLDR-12010 CLDR adds e operand. We should implement it as well.

@zbraniecki zbraniecki added T-enhancement Type: Nice-to-have but not required A-scope Area: Project scope, feature coverage v1 C-pluralrules Component: Plural rules labels Sep 4, 2020
@sffc
Copy link
Member

sffc commented Sep 5, 2020

In FixedDecimal, I would like to add an exponent field, similar to how @echeran did it in ICU4J.

@sffc sffc added this to the 2020 Q4 milestone Sep 17, 2020
@zbraniecki
Copy link
Member Author

I started looking into this, because it'll block our upgrade to CLDR-38.

In 38, the only use of e I found is in French:

e = 0 and i != 0 and i % 1000000 = 0 and v = 0 or e != 0..5 @integer 1000000, 1e6, 2e6, 3e6, 4e6, 5e6, 6e6, … @decimal 1.0000001e6, 1.1e6, 2.0000001e6, 2.1e6, 3.0000001e6, 3.1e6, …

Can I get some help in understand what is the intended behavior here?

Should we calculate e for an input value 1000 to be 3? Or should it only be calculated for an input value of "1e3"? @echeran can you help?

@sffc
Copy link
Member

sffc commented Dec 4, 2020

@echeran @macchiati What is the status of "c" versus "e" for the new operand?

@echeran
Copy link
Contributor

echeran commented Dec 5, 2020

Yeah, the operand e is only being used in French. It represents the suppressed exponent in compact numbers, so yes, for input value 1000, e = 3 and the formatted output is 1K. And for input value 120000, if the compact number becomes 120K, then e is still 3.

It's not quite ready for general use because of a couple of issues. First, I implemented it for scientific notation as well, which means for 1000, then e = 3 for formatted output 1e3. But for input value 120000, e = 5 for formatted output 1.2e5.

Here are the unit tests for computing the e plural operand, as well as applying a FormattedNumber with the correct e values to the plural rules to obtain the proper plural keyword:

testCompactDecimalSuppresedExponent testCompactDecimalPluralKeyword
ICU4J link link
ICU4C link link

Rule data in CLDR have only been added for French because it's not quite ready for general use for a couple of reasons. The first reason is that I implemented the suppressed exponent computation for scientific notation as well. And it turns out that French actually treats plurals on compact numbers differently than it does plurals for scientific notation. So when we write out the plural samples, and try to parse them into a number type rich enough to support e, we need to further disambiguate a compact number from a scientific number when we see something like 1e6 and go to parse. So it was proposed to add a c operand to represent compact number suppressed exponent (ex: 1c6), and leave e for scientific notation suppressed exponent (1e6).

The other reason/issue on the ICU4C/J side is making sure we use a rich enough number type to hold e and c -- ex: FormattedNumber -- consistently across ICU and CLDR code that is also fair game to expose publicly. In other words, we shouldn't continue to use doubles in ICU4C (ex: in plural sample parsing unit tests), since it is obviously not capable of encoding this extra info we need.

For the implementation of plural operand c, I haven't started, but I am not blocked on that.

As for the change of how ICU parses plural sample strings into numbers, that will cause an API change, so @macchiati and I need to draft that before proceeding on that work.

@zbraniecki
Copy link
Member Author

@echeran am I correct to interpret your words and the tests that only string input (or FixedDecimal) can make e anything else than 0?

If the input (in Rust) is u64, isize, or f64, it will never need e and if it is 1000000 it will produce:

PluralOperands {
    n: 1000000,
    i: 1000000,
    v: 0,
    w: 0,
    f: 0,
    t: 0,
    e: 0
}

Am I correct?

In which case, the only operation plurals crate itself carries about is FromStr for PluralOperands and then separately FromStr for FixedDecimal.
Everything else sets e to 0, right?

@sffc
Copy link
Member

sffc commented Dec 5, 2020

Once the open questions are hammered out, FixedDecimal should be equipped with an e (or c) field. impl From<&FixedDecimal> for PluralOperands should not try to "guess" what e (or c) should be.

@echeran
Copy link
Contributor

echeran commented Dec 5, 2020

@echeran am I correct to interpret your words and the tests that only string input (or FixedDecimal) can make e anything else than 0?

Yep, that's right, strings/FixedDecimals/any other rich-enough-type are necessary to distinguish 120000.0 from 120K from 1.2e5 from 12e4, etc. If you start off with the knowledge of the differences in these representations, it's a lossy operation to use a u64/long/f64/double, which can only see all those representations as the same.

If the input (in Rust) is u64, isize, or f64, it will never need e

Yes, this is an assumption we must make when implementing the function. We must clearly publicize the distinction (warning? caveat?) in the doc strings for the function so it is clear to users what tradeoffs they are making by using it.

and if it is 1000000 it will produce:

PluralOperands {
    n: 1000000,
    i: 1000000,
    v: 0,
    w: 0,
    f: 0,
    t: 0,
    e: 0
}

Am I correct?

Yes, that's right. That's all that is safe for us to infer from i64/u64. And we will probably need a c operand, as well -- c will be the new e.

In which case, the only operation plurals crate itself carries about is FromStr for PluralOperands and then separately FromStr for FixedDecimal.
Everything else sets e to 0, right?

Yep! And perhaps you were already thinking about this, but if you already start with an implementation of the conversion FromStr for FixedDecimal, then FromStr for PluralOperands should be a simple composition of FromStr for FixedDecimal with From<&FixedDecimal> for PluralOperands.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-scope Area: Project scope, feature coverage C-pluralrules Component: Plural rules T-enhancement Type: Nice-to-have but not required
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants