-
Notifications
You must be signed in to change notification settings - Fork 182
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The actual formatting part of compact decimal formatting #2898
Changes from all commits
0f2b754
1debe59
f1760a1
d5657bf
8b0f103
4d9f95f
43fe38e
aa6dc5c
507e234
a5b63f6
a0db0c9
12bebe4
7e60fe1
0c25367
5f5db46
3dfd5bf
9537951
b9f1405
9ff37a2
22097a2
09cdad5
addb602
b9309e7
21a35cf
7373756
1c17a74
e9c7bb1
89c14cb
a980ac0
5265ca9
0f3ce45
040b22a
fb00543
dd8616c
0b693bc
8bbc3b2
15a7ccb
7127e5f
193e9b3
03afa3e
4115b83
e696837
8285b0a
3760ec5
63e855b
ae21492
b09a5e3
8d9f94d
4a39f6b
5fbd7c1
4c92089
35d0180
2b55276
64ca7a8
5b44901
376b3c6
80d051d
31d35b3
ad61326
c6e3d9d
3c6aa71
3e6086b
5b243bd
4305c73
ddb3bae
d2023b2
60b9843
1a78ce4
e94c2ce
e16100a
9d95dae
1ce1827
41d817f
e6ff2e8
55f4b82
b2ae152
b4336aa
0761f89
fa05f33
061953d
d64d299
9538823
a2df14d
42448d4
32b0e62
316f607
b1b5887
aace046
fdbf234
b49caed
9a9b7ae
6ba4c27
ca644e1
1479b06
a9fafdd
1258527
dbe1258
6b241f2
da5ec50
e64818c
c5184c9
ab62c18
287d8f7
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Large diffs are not rendered by default.
Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
@@ -0,0 +1,74 @@ | ||||||||||||||||||||||||||||||||
// This file is part of ICU4X. For terms of use, please see the file | ||||||||||||||||||||||||||||||||
// called LICENSE at the top level of the ICU4X source tree | ||||||||||||||||||||||||||||||||
// (online at: https://github.com/unicode-org/icu4x/blob/main/LICENSE ). | ||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||
use alloc::borrow::Cow; | ||||||||||||||||||||||||||||||||
use fixed_decimal::{CompactDecimal, FixedDecimal}; | ||||||||||||||||||||||||||||||||
use writeable::Writeable; | ||||||||||||||||||||||||||||||||
use zerovec::maps::ZeroMap2dCursor; | ||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||
use crate::compactdecimal::CompactDecimalFormatter; | ||||||||||||||||||||||||||||||||
use crate::provider::{Count, PatternULE}; | ||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||
/// An intermediate structure returned by [`CompactDecimalFormatter`](crate::CompactDecimalFormatter). | ||||||||||||||||||||||||||||||||
/// Use [`Writeable`][Writeable] to render the formatted decimal to a string or buffer. | ||||||||||||||||||||||||||||||||
pub struct FormattedCompactDecimal<'l> { | ||||||||||||||||||||||||||||||||
pub(crate) formatter: &'l CompactDecimalFormatter, | ||||||||||||||||||||||||||||||||
pub(crate) value: Cow<'l, CompactDecimal>, | ||||||||||||||||||||||||||||||||
pub(crate) plural_map: Option<ZeroMap2dCursor<'l, 'l, i8, Count, PatternULE>>, | ||||||||||||||||||||||||||||||||
} | ||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||
impl<'l> Writeable for FormattedCompactDecimal<'l> { | ||||||||||||||||||||||||||||||||
fn write_to<W>(&self, sink: &mut W) -> core::result::Result<(), core::fmt::Error> | ||||||||||||||||||||||||||||||||
where | ||||||||||||||||||||||||||||||||
W: core::fmt::Write + ?Sized, | ||||||||||||||||||||||||||||||||
{ | ||||||||||||||||||||||||||||||||
if self.value.exponent() == 0 { | ||||||||||||||||||||||||||||||||
self.formatter | ||||||||||||||||||||||||||||||||
.fixed_decimal_format | ||||||||||||||||||||||||||||||||
.format(self.value.significand()) | ||||||||||||||||||||||||||||||||
.write_to(sink) | ||||||||||||||||||||||||||||||||
} else { | ||||||||||||||||||||||||||||||||
let plural_map = self.plural_map.as_ref().ok_or(core::fmt::Error)?; | ||||||||||||||||||||||||||||||||
let chosen_pattern = (|| { | ||||||||||||||||||||||||||||||||
if self.value.significand() == &FixedDecimal::from(1) { | ||||||||||||||||||||||||||||||||
if let Some(pattern) = plural_map.get1(&Count::Explicit1) { | ||||||||||||||||||||||||||||||||
return Some(pattern); | ||||||||||||||||||||||||||||||||
} | ||||||||||||||||||||||||||||||||
} | ||||||||||||||||||||||||||||||||
let plural_category = self | ||||||||||||||||||||||||||||||||
.formatter | ||||||||||||||||||||||||||||||||
.plural_rules | ||||||||||||||||||||||||||||||||
.category_for(self.value.significand()); | ||||||||||||||||||||||||||||||||
plural_map | ||||||||||||||||||||||||||||||||
.get1(&plural_category.into()) | ||||||||||||||||||||||||||||||||
.or_else(|| plural_map.get1(&Count::Other)) | ||||||||||||||||||||||||||||||||
})() | ||||||||||||||||||||||||||||||||
.ok_or(core::fmt::Error)?; | ||||||||||||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. note: IIFEs aren't super idiomatic in Rust, but the alternative is definitely uglier. being able to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. TIL a new acronym. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmm, there's actually a nicer way to do this in Rust 1.65 |
||||||||||||||||||||||||||||||||
match chosen_pattern.index { | ||||||||||||||||||||||||||||||||
u8::MAX => sink.write_str(&chosen_pattern.literal_text), | ||||||||||||||||||||||||||||||||
_ => { | ||||||||||||||||||||||||||||||||
let i = usize::from(chosen_pattern.index); | ||||||||||||||||||||||||||||||||
sink.write_str( | ||||||||||||||||||||||||||||||||
chosen_pattern | ||||||||||||||||||||||||||||||||
.literal_text | ||||||||||||||||||||||||||||||||
.get(..i) | ||||||||||||||||||||||||||||||||
.ok_or(core::fmt::Error)?, | ||||||||||||||||||||||||||||||||
)?; | ||||||||||||||||||||||||||||||||
self.formatter | ||||||||||||||||||||||||||||||||
.fixed_decimal_format | ||||||||||||||||||||||||||||||||
.format(self.value.significand()) | ||||||||||||||||||||||||||||||||
.write_to(sink)?; | ||||||||||||||||||||||||||||||||
Comment on lines
+58
to
+61
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Issue: This might not work as expected with negative numbers. Please add some test cases for that (can be done after landing this PR). The way FixedDecimalFormatter deals with signs (plus and minus) is to have 3 versions of the pattern that are selected at runtime: no sign, plus sign, and minus sign. So, we don't actually render a sign at runtime; we just select the pre-rendered pattern. I don't know if that's how we want to handle it in CDF since we have a lot more patterns and would like to avoid bloating the data. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Added some examples in the docs tests; it works as I expect at least. French There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, but try this in a locale where the sign and the compact affix are on the same side. For example: We have a pattern
The correct output according to CLDR would be "+M3", but your code produces "M+3" There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In that same file, there is
which demonstrates how CLDR does actually want us to consider these as proper decimal patterns that could have different signed and unsigned versions. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. (argh, too many threads.) Yes, but there is also See this in the datagen: icu4x/provider/datagen/src/transform/cldr/decimal/compact_decimal_pattern.rs Lines 51 to 65 in 356bfa8
I suspect we have bad data, but that having the sign next to the number is the better default, since many swahili patterns go out of their way to stick it back there. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I am unconvinced of that, given that it abbreviates something that seems to want to be Bear in mind that while compact decimal is used quite a bit more than scientific, its edge cases (including the use of negative numbers) quickly get uncharted (after all, the Romance languages are only just starting not to be a flagrantly ungrammatical mess, and that is for positive values). We will indeed need to deal with that for currencies (along with a number of other complexities). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We have some evidence from the number range work that designers have special requirements when it comes to single-character symbols. They tend to prefer single characters to be "glued" to the number, with extra annotations (like range separators and signs) displayed outside of the single-character symbol. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's unambiguous that the current UTS 35 requires "+M3" instead of "M+3". Perhaps that data is wrong, but even if it is, there will continually be more cases where we need to handle cases like this. Should we appeal to CLDR to get clarity? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Either way I need to open a CLDR ticket because it is clear that there is a lot of wrongness in the Swahili data, for instance If it turns out that something like There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
OK. Having only two valid places to put the sign (inside or outside the affix) is probably reasonable, but it's yet another invariant that we'll need to enforce in datagen. It does make things a bit easier and smaller. |
||||||||||||||||||||||||||||||||
sink.write_str( | ||||||||||||||||||||||||||||||||
chosen_pattern | ||||||||||||||||||||||||||||||||
.literal_text | ||||||||||||||||||||||||||||||||
.get(i..) | ||||||||||||||||||||||||||||||||
.ok_or(core::fmt::Error)?, | ||||||||||||||||||||||||||||||||
) | ||||||||||||||||||||||||||||||||
} | ||||||||||||||||||||||||||||||||
} | ||||||||||||||||||||||||||||||||
} | ||||||||||||||||||||||||||||||||
} | ||||||||||||||||||||||||||||||||
} | ||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||
writeable::impl_display_with_writeable!(FormattedCompactDecimal<'_>); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thought: ought we pre-resolve the plural rule selection?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i guess it's tricky since it's not always needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had the same thought. There's not much use pre-resolving it since FormattedCompactDecimal is short-lived and generally intended to only be converted to a string or to parts; in fact, CompactDecimalFormatter::format is already doing more work than either FixedDecimalFormatter or DateTimeFormatter. Most of the work should live in FormattedCompactDecimal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason for that is that the error checking of the fallible one (
format_compact_decimal
) needs to happen early enough that we can return a useful error (rather thancore::fmt::Error
).That work being done,
FormattedCompactDecimal
naturally takes the shape it has.