-
Notifications
You must be signed in to change notification settings - Fork 183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Division of concerns and scientific notation in FixedDecimal #1267
Comments
How to represent sign display in FixedDecimal?Currently, FixedDecimal has a boolean field "is_negative". This should change to an enum Examples:
Meanwhile, the SignDisplay enum can still be present, but only as a setter on FixedDecimal. It is not possible to persist the SignDisplay setting, and doing so is not important. |
I'm in favor of moving visible_exponent and signdisplay into FD itself; the above plan looks good to me. A thing that is important to me is that we have a clear distinction between which properties should belong on FD and which should be a part of FDF's options bag: this does move us closer to that world, which is really nice. |
I'm not clear on the solution to the problem space.
Given that, I draw the line in a different place - the notation has no impact on the objective value of the Decimal. It's a strong position weakly held, because I recognize that a similar argument can be made about "1.50" vs "1.5" having the same value. My argument is that
From that thinking come two components - Value and FormattingOptions - required to format that value. The That's not unlike... DateTimeFormat! Where the date is absolute, and then there are formatting options. Let's take an example of - display of Month - should Month be displayed as Encoding The counter argument brought up by Shane is that scientific notation, or compat notation, is not locale specific - neither must be spelled month name vs moth number. The user may prefer that irrelevant of the locale, just like they may prefer scientific or non-scientific. But because locale may be involved in deciding, we need that formatting options to be available in the Intl context and Intl context may provide some defaults. Another argument is that we need precision to select PluralRule. That problem seems similar to some hypothetical date formatter that needs to know the gender of the month to format the date properly, and the gender of the month depends on whether it is textual or numerical (hypothetical). Will we then have I'm torn, but I think based on this I'm shying toward recognizing that Shane is right that precision is not unique and we need more formatting, maybe even locale-independent, toggles in the Decimal formatting. And maybe Decimal is a snowflake that is just more dominant as a value to justify If that's the case then I'd approve that model, but not without some hesitation about consistency of the architecture. |
I think what I'm trying to achieve with
It could, perhaps. In much the same way that the locale+currency combination can affect the number of trailing zeros. I anticipate that when we add Full Number Format, there will be mutations applied to the
One of the issue I've been grappling with is that we should really consider splitting this into two or even three different types of FixedDecimal: a "raw" number input, an "intermediate" that has been processed but not formatted, and an "output" that has been fully formatted. However, I've struggled to express that cleanly in either an API or a mental model. So my current proposed approach takes the position that To be clear, the complete, comprehensive list of what I currently foresee
Note that all of these except perhaps (4) are generally universally accepted in a decimal number string, like I see the following things as out-of-scope, to be implemented perhaps as a wrapper over
* For compact notation, I would like to do what ECMA-402 and ICU 60+ do here, which is to consider compact notation as a "human readable scientific notation". It could be the case that compact notation is not directly expressed on the FixedDecimal, but is instead a display option for scientific notation, such that formatting "1.2E3" in compact notation produces "1.2 thousand", for example. |
CC @echeran. I added you to the approvers list. |
We need to define exactly what is a FixedDecimal and what is its lifecycle. My mental model has been that a FixedDecimal is a locale-agnostic, structured representation of the human-readable form of a decimal number.
With that in mind, I have long seen it as a goal of FixedDecimal to guarantee that it is well-specified as soon as it is constructed, as discussed in #166. Put another way, we should avoid having a "partially-constructed" FixedDecimal.
If we further add sign display and scientific notation to FixedDecimal, we make FixedDecimal diverge further from what the core numeric types are able to represent. We need to consider what this means for the lifecycle of a FixedDecimal. |
I agree with @sffc 's comment above, but I wanted to elaborate on the point about Human-readable form above. The question from @zbraniecki is a good one -- what data is essential vs. what is derivative? When it comes to question and date time, I think about how Joda Time compares to ICU. Joda Time has been the go-to library in Java for making Dates and Times immutable and supporting basic DateTime arithmetic with time zones, etc., but in an ISO/Gregorian calendar (no calendars or other i18n things). It only stored 2 things: 1) number of milliseconds from the Unix epoch, and the time zone. Everything else can be derived. However, the ICU notion of a DateTime is more inclusive, so it needs more than those 2 fields in order to hold onto all of the essential data needed to cover all of the supported functionality. So similarly, for some of the existing use cases in which On the possible question of whether we want to store that extra information (leading/trailing zeroes, exponent) in |
Discussion with @eggrobin and others:
|
Assuming we punt on FixedDecimal NaN and infinity, on which see #862 (comment), https://unicode-org.atlassian.net/browse/CLDR-15609 would align the UTS #35 source number with FixedDecimal (as proposed here), and thus make a FixedDecimal uniquely representable by a sampleValue string. |
Having discussed this with @sffc last week, we found a couple of issues with putting exponents (of both the scientific and the compact kind) in
Having distinct intermediate representations See the proposal (and rationale) in https://docs.google.com/document/d/1yjLPwM08Y_gf6-3FhDI9uaB8t_OCjliRtUSoMJ9_r98. I would like approval on that proposal from: |
For the purposes of 1.0, I think we should focus on landing the updated |
I added a question in the doc about how to model the relationship between |
@eggrobin Is there anything left on this ticket? |
It no longer blocks 1.0 though, as discussed. |
I think this is done, because CompactDecimal and ScientificDecimal are landed. |
FixedDecimal is defined as "a core API for representing numbers in a human-readable form appropriate for formatting and plural rule selection". Currently, we support decimal numbers with leading zeros and trailing zeros.
The following two numbers are the same quantity, but they differ in the way they are presented to humans: "1200" and "1.2E3". #1265 added scientific notation parsing support in
FixedDecimal::from_str
, but without storing that information in the data model of FixedDecimal. The subject of this issue is my assertion that we need to also add "visible exponent" to the data model of FixedDecimal in order to express this difference.Isn't this a formatting concern? This is an important question. Here is how I draw the line: formatting concerns should be constrained to locale-specific rendering options that do not affect the meaning of the value in context.
Let's look at the other knobs we currently have:
Why do we need more info in the FixedDecimal data model? The primary reason is that this information can affect the plural rules and inflections of surrounding words in a sentence. For example, we retain trailing zeros because the plural forms of "1" and "1.0" are different. Likewise, the plural forms of "1200" and "1.2E3" could also be different. For sign display, it's probably not the case that the plural forms of "100" and "+100" are different, but they might result in different vowel sounds / inflections in a sentence, for example.
How about compact notation? We need to be able to express compact notation in the data model as well. Compact notation is orthogonal to scientific notation, so we may be able to store this information in the same field. However, I would like to figure out how to extend to compact notation in a separate issue. We'll also need to think about the impact on currencies, units, etc., which I hope to do when we tackle kitchen sink number format (#275).
Will this make FixedDecimal or FixedDecimalFormat heavier? It's crucial to keep FixedDecimal and FixedDecimalFormat as lightweight as possible. What I've described here is consistent with that goal. We are adding very little business logic, and perhaps a few more symbols to the data file.
Concretely, I would like to do the following:
visible_exponent
, along with lots of documentation, APIs, etc.FixedDecimal::from_str
to retain the visible exponentNeeds feedback from:
The text was updated successfully, but these errors were encountered: