Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unified API for number formatting #215

Closed
sffc opened this issue Feb 12, 2018 · 17 comments
Closed

Unified API for number formatting #215

sffc opened this issue Feb 12, 2018 · 17 comments
Labels
c: numbers Component: numbers, currency, units s: in progress Status: the issue has an active proposal

Comments

@sffc
Copy link
Contributor

sffc commented Feb 12, 2018

For the past year and a half, I have done extensive design work on number formatting APIs in ICU. There are many things I have learned, which I believe could lead to a simpler way to integrate the many number formatting features that have been requested for ECMA 402.

For more background, please read my ICU design doc and review my presentation from the 2018 Internationalization and Unicode Conference.

The Eight Orthogonal Settings

Older APIs for number formatting, such as ICU DecimalFormat, are composed of many settings (over 40 of them) that often conflict with one another. For example, the API allows mixing specifications for fraction digits, significant digits, rounding increment, and currency rules, yielding behavior that is often ill-defined. There are also settings that are too broad: for example, the "style" attribute in both ECMA 402 and ICU DecimalFormat supports standard, integer, percent, currency, and scientific formatting (ICU-only), but does not allow for combining them; for example, scientific notation with percent. It's also confusing that even if you set a currency, you need to enable the currency "style" in order for it to be displayed.

I found eight new settings that have full coverage of the DecimalFormat functionality. They are "orthogonal", meaning that your choice for one does not conflict with your choice for any other. You can choose only one option for each of the eight settings.

  1. Notation: Scientific, compact, and simple (default).
  2. Unit: Currency, percent, or measure units. A "unit-width", such as narrow, short, or full-name, can affect how the unit is displayed.
  3. Rounding: For example, fraction digits, significant digits, increment, or currency rules.
  4. Grouping: Turn it on or off, as well as other locale-insensitive grouping strategies.
  5. Symbols: The set of locale symbols to use when formatting. This includes the numbering system (latin, native arabic, etc).
  6. Sign Display: This allows turning on the plus sign on positive numbers. It also includes accounting format for negative numbers.
  7. Integer Width: Zero-fill and truncation for digits before the decimal separator. Older APIs call this "minimum integer digits" and "maximum integer digits".
  8. Decimal Display: Whether to render the decimal separator on integers. This is the "smallest" setting.

You can combine these in any way you like: for example, you are able to have scientific notation, a measure unit, round to the nearest integer, enable grouping, using native arabic digits, show the sign on positive numbers, pad with two zeros, and show the decimal separator.

What this can mean for ECMA 402

The Intl.NumberFormat API is clearly inspired by ICU DecimalFormat. Although it has some of the "bad" settings that I pointed out from DecimalFormat, the existing subset is small enough that we can expand Intl.NumberFormat to be more powerful without creating a whole new API and confusing users.

For reference, here are the existing settings that affect number formatting:

  • style: "decimal", "currency", and "percent"
  • currency: the ISO code
  • currencyDisplay: "symbol", "code", "name"
  • useGrouping: true or false
  • minimumIntegerDigits, minimumFractionDigits, maximumFractionDigits
  • minimumSignificantDigits, maximumSignificantDigits: the spec says that setting anything for these options overrides settings for integer and fraction digits.

Conceptually, consider Intl.NumberFormat to internally be composed of the eight options listed above. The existing settings can control the eight options as follows:

  • style: can map to unit.
    • decimal: set UNIT = no unit.
    • currency: requires a currency to be specified; throw an exception otherwise. (This is the current behavior)
    • percent: set UNIT = percent.
  • currency: can also map to unit.
    • set UNIT = the given currency code with unitWidth = short (symbol).
    • A setting in currency should override a setting in style.
    • I suggest that you should no longer need to set style: "currency" in order to display the currency specified in the currency option. This would be a behavior change.
  • currencyDisplay: a modifier on the unit.
    • Process after the currency has been selected from the style or currency setting.
  • useGrouping: I really don't like that this is a boolean, but for now, we can say that
    • true: set GROUPING = AUTO
    • false: set GROUPING = OFF
    • A future proposal can offer more control over grouping strategies.
  • minimumIntegerDigits:
    • set INTEGER-WIDTH = zero-fill to minimumIntegerDigits.
    • Note that integer digits control the number of digits at the beginning of the number, whereas fraction and significant digits control the number of digits at the end. There is no reason significant digits need to override settings to integer width.
  • minimumFracitonDigits, maximumFractionDigits:
    • set ROUNDING = fraction rounding with the specified settings.
  • minimumSignificantDigits, maximumSignificantDigits:
    • set ROUNDING = significant digit rounding with the specified settings. As the spec currently specifies, a setting here should override a setting in the fraction digits.

Now the fun part. With this change to how number formatting is performed conceptually, the following settings can be easily added.

  • measureUnit: an object literal to customize the UNIT axis with units of measure, without needing to affect any of the other axes. Control over rounding, grouping, and everything else is already taken care of.
    • With a string: unit: "length-meter" using the official CLDR name of the unit with the default display width.
    • With an object literal: measureUnit: { name: "length-meter", width: "narrow" } to customize the display width.
      • Alternative: put the unit width on the top level. {measureUnit: "length-meter", measureUnitWidth: "narrow" }. This clutters the top-level namespace but is closer to the precedent set by currencies.
    • Potential to add something like measureUnit: { usage: "length" } in the future to automatically select the appropriate unit for a locale (miles in the US, kilometers elsewhere).
    • The option could also be named simply unit instead of measureUnit.
  • measurePerUnit or perUnit: compound measure unit formatting. Can have same customizations as above.
    • Could also be nested under unit or perUnit
  • notation: choose between "engineering", "scientific", "compact-short", "compact-long", and "simple" (default). This option can be combined with all other options, even with units.
    • Scientific and engineering notation can be an object literal with more choices: notation: { type: "scientific", minimumExponentDigits: 2 }
    • I'm not cluttering this post with details on compact notation, but can elaborate more if needed.

More can be added, but these two are a good place to start. You will be able to do this (Option 1a):

new Intl.NumberFormat("en", {
  measureUnit: "length-meter",
  measurePerUnit: "duration-second",
  measureUnitWidth: "full-name",
  notation: "engineering",
  maximumSignificantDigits: 2
}).format(0.0000876);  // => 8.8×10^-6 meters/second

Or, option 1b:

new Intl.NumberFormat("en", {
  unit: {
    name: "length-meter",
    perUnit: "duration-second",
    width: "full-name"
  },
  notation: "engineering",
  maximumSignificantDigits: 2
}).format(0.0000876);  // => 8.8×10^-6 meters/second

Summary of changes

  1. Rewrite parts of the spec to express Intl.NumberFormat in terms of the eight axes. The goal here is to simplify the specification and allow easier expansion without introducing changes in behavior.
  2. Behavior change: setting a value in currency should not require that you also set style: "currency".
  3. Behavior change: minimumIntegerDigits should work even if significant digits is specified. For example, minInt=5 and maxSig=3 should format 1234 as "01230".
  4. Add the unit or measureUnit key as an option as described above.
  5. Add the perUnit or perMeasureUnit key as an option as described above.
  6. Optionally add the unitWidth or mesasureUnitWidth key as described above.
  7. Add the notation key as an option, with choices for scientific and compact notation, as described above.

Option 2: Fewer changes to existing behavior

Although it still seems conterintuitive to me that Intl.NumberFormat("en",{currency:"EUR"}) doesn't display the currency symbol, I understand that there is a desire to maintain backwards compatibility. Another option, which would be more consistent with the current design, would be to add "measure" as an option to the style key. The style key could be seen as a mandatory option for Ecma 402 to toggle between the four flavors of units that we support (no unit, percent, currency, and measure). That is,

// Option 1
new Intl.NumberFormat(locale, { currency: "USD" });
new Intl.NumberFormat(locale, { measureUnit: "distance-meter" });

// Option 2
new Intl.NumberFormat(locale, { style: "currency", currency: "USD" })
new Intl.NumberFormat(locale, { style: "measure", unit: "distance-meter" })

Even with Option 2, I advocate decoupling minimumIntegerDigits from significant digits. That simply looks like a mistake to me in the original specification.

To-do items

  1. Any high-level comments?
  2. Pick between options 1a, 1b, and 2, or some combination.
  3. Hammer down the names for the unit widths and notation styles. ICU MeasureFormat (older API) uses "wide", "short", and "narrow". ICU NumberFormatter (new API) uses "narrow", "short", and "full-name".
  4. Hammer down the notation style names. The names I proposed are the ones from ICU NumberFormatter API, but they don't necessarily need to be the same.
@ljharb
Copy link
Member

ljharb commented Feb 12, 2018

What about use cases where i want my locale set to US-en, but I’m originally from Europe and prefer always seeing a , as the decimal point? What about languages where there’s a native numbering system that’s sometimes used, but the most common is using Arabic numerals?

I do very much love the idea that the common defaults are so easy and intuitive, but what about the use cases that require maximal configuration?

@sffc
Copy link
Contributor Author

sffc commented Feb 12, 2018

I was thinking custom symbols and digits are material for another proposal. #175 adds a numberingSystem option, which solves the ascii vs native arabic numerals. A full-service symbols API will need to be designed (although arguably if you don't want native digits or native symbols, you should be fine using a non-Intl library).

On the other hand, I'm happy to propose a single, larger proposal that adds support for symbols, sign display (#163), alternate currency symbols (#200), algorithmic numbering systems (#95), and additional grouping strategies all in one.

@caridy caridy added enhancement s: in progress Status: the issue has an active proposal labels Feb 12, 2018
@caridy
Copy link
Contributor

caridy commented Feb 12, 2018

@sffc at first glance this looks very promising. As for changing the spec to align with those eight axes, it seems doable! @littledan can we have this in the agenda for this week?

@ljharb there are many things that you cannot do with Intl today, and many folks consider those as "limitations", but others consider those as "feature". E.g.: Should a developer be able to show a month/day/year when attempting to format a date object with en-CA locale? probably no! I'm not sure those limitations were put in place intentional or not in the first edition, but what I'm sure is that they help a lot to prevent people from doing the wrong thing and confuse their users. In other words, maximal configuration doesn't seem to be part of the Intl charter AFAICT.

@zbraniecki
Copy link
Member

What about use cases where i want my locale set to US-en, but I’m originally from Europe and prefer always seeing a , as the decimal point?

This is no different, as @caridy pointed out, to other intl settings. We hope to start using unicode extension keys soon to allow for building locales with such (or similar) customizations.

I like this proposal a lot.

@anba, @jswalden, @rxaviers , @nciric, @jungshik ?

@littledan
Copy link
Member

@caridy @sffc added this topic to the agenda for the Friday meeting.

Many of the open feature requests we have are about adding more options to formatters. I'm happy if we can solve these feature requests within a framework that's more cleaned-up and future-proof.

Would such a similar design exercise make sense for DateTimeFormat, which also has many feature requests for additional options?

@rxaviers
Copy link
Member

Overall, I liked the proposal too! I liked the way you grouped the orthogonal settings. Although I (personally) don't like NumberFormat handling the unit group (because API gets too cluttered), currency formatting is already here, so your idea of bringing UnitFormat too has an alibi.

Some questions:

  1. Unit: … percent

Please, (a) why do you consider percent as a unit, but compact as a notation? (b) In your ICU proposal, unit=percent format 1 as "1%", I assume we are not going to change that in Intl and still format 1 as "100%" correct?

  • style: can map to unit.
    • currency: set UNIT = the default currency for the locale with unitWidth = short (symbol).

Setting a default currency is an anti-pattern and should be avoided. UTS#35 says "Note: Currency values should never be interchanged without a known currency code. You never want the number 3.5 interpreted as $3.50 by one user and €3.50 by another. Locale data contains localization information for currencies, not a currency value for a country. A currency amount logically consists of a numeric value, plus an accompanying currency code (or equivalent). The currency code may be implicit in a protocol, such as where USD is implicit. But if the raw numeric value is transmitted without any context, then it has no definitive interpretation. "


Low level question that perhaps could be saved & defered for later...

  • minimumIntegerDigits:
    • ... significant digits control the number of digits at the end ...

Note "maximum significant digits" doesn't always mean "maximum digits", for example maxSig=2 formats 1234 as "1200" (4 digits). I believe this doesn't affect your overall proposal, but wanted to mention/check since it's emphasized in the proposal (Why did you rename Significant Digits to just Digits?).

  • minimumIntegerDigits:
    • ... There is no reason significant digits need to override settings to integer width.

I agree with you. They could live together as in you examplify with minInt=5 and maxSig=3 should format 1234 as "01230".

  • notation: choose between "engineering", "scientific", "compact-short", "compact-long"

For notation, you suggest the style (compact) and form (short) in the same value, but for unit there are two options (measureUnit and measureUnitWidth). Should we have an additional option for form too?

new Intl.NumberFormat(locale, { measureUnit: "distance-meter" });

Please, what's the motivation to use measureUnit instead of the simpler term unit? Anything to do with distinguishing it from currency?

@sffc
Copy link
Contributor Author

sffc commented Feb 14, 2018

(a) why do you consider percent as a unit, but compact as a notation?

A notation affects the way that the decimal stem is displayed; a unit affects the thing that is being measured. For example, one could reasonably say something like, "the inflation rate is 2K percent!" (unit = percent, unitWidth = long-name, notation = compact-short), but it makes no sense to combine percent with other units (currency/measure) or compact with other notations (scientific). CLDR also has an active suggestion to add percent as a measure unit.

In your ICU proposal, unit=percent format 1 as "1%", I assume we are not going to change that in Intl and still format 1 as "100%" correct?

If we were designing new API, I would suggest removing the x100 for reasons stated in the ICU design document, but I do not feel strongly enough to advocate changing the behavior of old API.

Setting a default currency is an anti-pattern and should be avoided. UTS#35 says "Note: Currency values should never be interchanged without a known currency code.

Good catch! I didn't realize that the old API throws an exception in this case. I will update the OP.

Why did you rename Significant Digits to just Digits?

https://docs.google.com/document/u/2/d/e/2PACX-1vSebyOO1dNqZEPvZBn5qbaB-Zkn0IVLk6t0QgrmhDpzLybogT5JcuDz7xKUR8sfDYoDG5hC7TZAYnen/pub#h.fzl1a8i3dgnn

I don't see a reason to propose that for Ecma 402 though.

For notation, you suggest the style (compact) and form (short) in the same value, but for unit there are two options (measureUnit and measureUnitWidth). Should we have an additional option for form too?

It could be. I put them together because for compact, there are only two options, but for measure, the number of choices is large and will continually grow if combined; for example, it is clunky to say, "distance-meter-narrow", "distance-meter-short", "distance-meter-wide", "distance-inch-narrow", and so on. The width on measure units could also be seen as optional, whereas for compact notation, having two string keys requires that the user make the choice. However, I can be convinced otherwise.

Please, what's the motivation to use measureUnit instead of the simpler term unit? Anything to do with distinguishing it from currency?

Yes, distinguishing it from currency and percent is the motivation, but I can be convinced otherwise. I put both options in the OP (I called them 1a and 1b).

@sffc
Copy link
Contributor Author

sffc commented Feb 16, 2018

Today there was discussion about whether ICU could add a C API to help implementers. Here is an ICU bug to track that discussion. Please comment:

http://bugs.icu-project.org/trac/ticket/13597

@sffc
Copy link
Contributor Author

sffc commented Feb 28, 2018

What are the next steps on this proposal?

@zbraniecki
Copy link
Member

@sffc if I recall from the meeting the plan was for you to start a repo and document your proposal in a README. We offered to help you from there to get the spec draft.

@sffc
Copy link
Contributor Author

sffc commented Mar 1, 2018

How do I start the repo? Is there a template? Do I need access to the tc39 organization (which I don't seem to have)?

@ljharb
Copy link
Member

ljharb commented Mar 1, 2018

https://github.com/tc39/template-for-proposals , you shouldn’t need any access.

@littledan
Copy link
Member

@sffc Just create a repo under your own account; we can then go through the process of transferring to tc39 a little later.

@sffc
Copy link
Contributor Author

sffc commented Mar 16, 2018

@zbraniecki
Copy link
Member

https://github.com/sffc/proposal-unified-intl-numberformat

Seems like the right format for a repo. Next step is for you to put your proposal in there. It can be in form of a .md file with description of proposed changes to the NumberFormat API. Once that's done we can discuss the proposal and one of us will take your proposal and translate it into a spec file as a PR in that repo.

@zbraniecki
Copy link
Member

Intl.NumberFormat rev. 2 has landed in SpiderMonkey for Gecko 70 (behind the flag).

@anba
Copy link
Contributor

anba commented Mar 17, 2020

#404 has been merged.

@sffc sffc closed this as completed Mar 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c: numbers Component: numbers, currency, units s: in progress Status: the issue has an active proposal
Projects
None yet
Development

No branches or pull requests

7 participants