-
Notifications
You must be signed in to change notification settings - Fork 379
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CLDR-17938 Add test data generator for date time formatting #4011
Conversation
|
||
import java.io.IOException; | ||
import java.util.Arrays; | ||
import org.pcollections.HashPMap; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We try not to introduce too many dependencies on on other libraries. It seems like this could just be done with regular collections or guava collections.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I support that concern generally. The reason I thought to introduce a persistent collection library is that there are multiple dimensions that need to be iterated over (generate combinations for all calendar systems, for all selected dates, for all selected times, for all ways of specifying style/length (skeleton vs. style options), etc.). In particular, the fields for skeleton and the style options in a test case should be mutually exclusive -- if one has data, then the other should be empty.
Using immutable maps across the nested loops eliminates chances for unintended consequences of stateful mutation. I strongly prefer immutable maps for this reason, and thus do not want to use regular collections. But that also means I need to append to immutable collections.
I didn't immediately see it in the Guava APIs, but I guess it is possible, although doing so is clunky. Going this route would compel me to write helper methods to reduce that boilerplate, which would feel like I'm partially recreating the persistent collections library that point.
On the other hand, persistent collections are designed for supporting immutability + append/remove/etc., and have the added benefit of code readability b/c of concision when you do perform those operations. The PCollections library has no dependencies and 75 KB of jar size, FWIW.
I'm happy to do what you suggest, here. I would prefer that it entail immutable data structures -- either via Guava or PCollections.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't really have time to do an analysis of a new package to see if it introduces any new security / performance issue If you are not modifying these structures once built, it is pretty easy to just build them and convert to immutable structures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, removed PCollections as a dep, using the existing Guava dep's collections instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
PR title needs to start with |
|
||
import java.io.IOException; | ||
import java.util.Arrays; | ||
import org.pcollections.HashPMap; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't really have time to do an analysis of a new package to see if it introduces any new security / performance issue If you are not modifying these structures once built, it is pretty easy to just build them and convert to immutable structures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- i don't see the argument for the pcollections
- as to iterating - that data should be pulled from the DTD anyway instead of hard coded.
emptyMapStringToObject.plus("timeZoneName", "longGeneric") | ||
)); | ||
|
||
private static final PVector<String> CALENDARS = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
duplicate of other supplemental data
private static final PVector<String> TIME_ZONES = | ||
TreePVector.from(Arrays.asList( | ||
"America/Los_Angeles", | ||
"Africa/Luanda", | ||
"Asia/Tehran", | ||
"Europe/Kiev", | ||
"Australia/Brisbane", | ||
"Pacific/Palau", | ||
"America/Montevideo" | ||
)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
other code uses ImmutableSet.of("…", "…", "…") - does this provide value beyond that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I switched over to using Guava according to Mark's guidance, so it's a moot point now.
(There's a bit of cloning during runtime and verbosity of code that could be avoided if one were to use persistent data structures.)
|
||
private static final HashPMap<String,Object> emptyMapStringToObject = HashTreePMap.empty(); | ||
|
||
private static final PVector<PMap<String, Object>> SPEC_OPTIONS = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't this just be read from the DTD?
@macchiati @srl295 Do you know what the proper way to format a date within CLDR code is? @sffc and I found a few separate instances of date time formatting, such as in |
There is a class called ICUServiceBuilder, that patches ICU calls with CLDR data. (If you are doing something new we might need to add APIs to that.) |
a good source of example code would be the ExampleGenerator.java |
Thanks @macchiati and @srl295 . @sffc and I took a look at There's a lot of important and necessary code in that method that would be better off refactored into a reusable place rather than taking a copy-and-paste shortcut. The method itself has a lot of copy and-pasted code 3x over for full/long/short date format patterns. We think it would be good if we also cleaned up & refactored the duplicated code as a part of creating the test data generator. WDYT? |
Sounds good; pulling useful methods out of ExampleGenerator (etc). There is
a pending PR (from one of the interns) that refactors ExampleGenerator, so
you should wait until that is merged to avoid conflicts.
…On Mon, Sep 9, 2024 at 4:27 PM Elango Cheran ***@***.***> wrote:
Thanks @macchiati <https://github.com/macchiati> and @srl295
<https://github.com/srl295> . @sffc <https://github.com/sffc> and I took
a look at ExampleGenerator.java and found in handleDateFormatItem that
finds the right xpaths and the values at that those xpaths for the
date/time patterns, and then calls icuServiceBuilder.getDateFormat(...)
using those values in the call site.
There's a lot of important and necessary code in that method that would be
better off refactored into a reusable place rather than taking a
copy-and-paste shortcut. The method itself has a lot of copy and-pasted
code 3x over for full/long/short date format patterns. We think it would be
good if we also cleaned up & refactored the duplicated code as a part of
creating the test data generator. WDYT?
—
Reply to this email directly, view it on GitHub
<#4011 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACJLEMC5YRCZKA2SAXSKEFDZVYVHFAVCNFSM6AAAAABNSQEI36VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMZZGMZDIMJZGA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
@echeran dont' forget to open a ticket and update the PR title and commit messages. |
Is it PR #3895? It looks like it is just stuck on a linter. @echeran pushed the lint fix to the branch. |
Still failing spotless, build — and is draft. Elango, do you want to try to fix this for v46, or punt to 47? |
yes thanks for merging #3895 |
Still trying to get this done ASAP in case it can fit for v46. When is the last call for v46? |
Any test data that is targeted at ICU should be in before their freeze date
(Thursday??).
…On Mon, Sep 16, 2024 at 4:59 PM Elango Cheran ***@***.***> wrote:
Still failing spotless, build — and is draft. Elango, do you want to try
to fix this for v46, or punt to 47?
Still trying to get this done ASAP in case it can fit for v46. When is the
last call for v46?
—
Reply to this email directly, view it on GitHub
<#4011 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACJLEMAESUBCANADM572AWLZW5WHNAVCNFSM6AAAAABNSQEI36VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNJUGIZTINJXGA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Things are looking good and complete now. The only thing is the Jira ticket isn't accepted yet. The key files are |
The data currently looks like
A few suggestions for improvement:
|
Done. All suggestions were good. PTAL. |
@@ -513,11 +513,11 @@ private static ImmutableMap<Object, Object> getTestCaseForZonedDateTime( | |||
dateTimeGluePatternFormatType, | |||
icuServiceBuilder); | |||
|
|||
// "input_string" = the ISO 18601 UTC time zone formatted string of the zoned date time | |||
optionsBuilder.put("inputString", zdt.toString()); | |||
// "input" = the ISO 18601 UTC time zone formatted string of the zoned date time |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ISO - International Organization for Standardization
ISO 18601:2013 - Packaging and the environment — General requirements for the use of ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// "input" = the ISO 18601 UTC time zone formatted string of the zoned date time | |
// "input" = the ISO 8601 UTC time zone formatted string of the zoned date time |
I can't review the Jason file right now, because it's too big for the browser. |
For the purposes of just quick reviewing, you can view the PR files including the large JSON file by replacing the https://github.dev/unicode-org/cldr/pull/4011 (If you had to do it at the command line, you would have to check out the branch via |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the tip, Elango. Data looks ok to me.
(I was on a tablet, and so constained in how I could review.)
(longer term, for each testdata file I think we want to have a unittest in CLDR; and in the testdata file point to the generator and the unit test. That will make it easier to manage.)
Blocked on the Jira ticket being accepted. Everything else is ready to go. |
private static final ImmutableSet<String> NUMBERING_SYSTEMS = | ||
ImmutableSet.of("latn", "arab", "beng"); | ||
|
||
// Use underscores for locale id b/c of CLDR historical reasons, even though dash is preferable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could be in bcp47 and then convert them to CLDR using ULocale.forLanguageTag
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK to move forward (minor comments)
f075050
to
0d14e89
Compare
Hooray! The files in the branch are the same across the force-push. 😃 ~ Your Friendly Jira-GitHub PR Checker Bot |
CLDR-17938
ALLOW_MANY_COMMITS=true