Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ICU-21254 Add plural rule parsing for exponent operand in C++ #1286

Merged
merged 1 commit into from
Sep 3, 2020

Conversation

echeran
Copy link
Contributor

@echeran echeran commented Sep 1, 2020

This PR adds the missing functionality in C++ for parsing plural rule strings as a part of supporting the exponent operand in plural rules. The previous exponent operand work was implemented in #938 (Java) and #972 (C++).

As a side note, after adding the logic for the plural rule parser for the 'e' operation in C++, I noticed that the equivalent code for the 'w' operand seems to be missing -- is that intentional?

Checklist

@echeran echeran requested review from sffc and pedberg-icu September 1, 2020 23:57
@echeran
Copy link
Contributor Author

echeran commented Sep 2, 2020

I wanted to get this PR out as soon as I could, so I figured it might be better to ask theses few smaller, basic issues that I had over the PR. Now that we have the PR:

  • Did I appropriately use the APIs and types in my test code correctly? I did my best to look around in other tests and the codebase to learn and make educated guesses on things that I wasn't 100% certain on.
  • What's the best way to break up long lines? Similarly, what's the best way to break up a long string in C++ across multiple lines?

pedberg-icu
pedberg-icu previously approved these changes Sep 2, 2020
@pedberg-icu
Copy link
Contributor

  • Did I appropriately use the APIs and types in my test code correctly? I did my best to look around in other tests and the codebase to learn and make educated guesses on things that I wasn't 100% certain on.

Looks good to me, Shane may have a more informed opinion.

  • What's the best way to break up long lines? Similarly, what's the best way to break up a long string in C++ across multiple lines?

Long lines seem to be fine in ICU source code. But you are looking to break up a long C string "aaabbbcccddd" you can always use string concatenation as in
"aaabbb"
"cccddd"

sffc
sffc previously approved these changes Sep 2, 2020
Locale locale = Locale::createFromName(localeName);

struct TestCase {
const char* skeleton;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional: It would be more correct to make this char16_t* (and u"" strings), because number skeletons are 16-bit strings. Currently you are performing an implicit codepage conversion in the NumberFormatter:forSkeleton function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@jira-pull-request-webhook
Copy link

Notice: the branch changed across the force-push!

  • icu4c/source/test/intltest/plurults.cpp is different
  • icu4c/source/test/intltest/plurults.h is different

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

@echeran
Copy link
Contributor Author

echeran commented Sep 2, 2020

Can I get a rubber stamp, of sorts? I forced-pushed after amending the branch commit in order to re-trigger the PR check on Jira ticket status and also to make the optional type change that Shane suggested (+ breaking up long lines in source code).

@echeran echeran requested a review from sffc September 2, 2020 17:08
@jefgen
Copy link
Member

jefgen commented Sep 2, 2020

Can I get a rubber stamp, of sorts?

FYI: The MSVC and GCC MSYS2 CI builds seem to be crashing in the new test (but only in Debug builds).

         testGetAllKeywordValues {
         } OK:   testGetAllKeywordValues 
         testCompactDecimalPluralKeyword {
make[2]: *** [Makefile:110: check-local] Segmentation fault
make[2]: Leaving directory '/d/a/1/s/icu4c/source/test/intltest'
make[1]: Making `check' in `iotest'
make[2]: Entering directory '/d/a/1/s/icu4c/source/test/iotest'
Detected MSYS version: 3

@echeran
Copy link
Contributor Author

echeran commented Sep 2, 2020

@jefgen Thanks for the heads up, so strange. I got ahead of myself after seeing tests pass locally. I see some MSVC tests pass, whereas these have a segfault. If you happen to have any guesses as to what I'm doing, let me know.

@jefgen
Copy link
Member

jefgen commented Sep 2, 2020

If you happen to have any guesses as to what I'm doing, let me know.

Sure, let me try to take a look into this in a bit. :)

@jefgen
Copy link
Member

jefgen commented Sep 2, 2020

Okay, so it looks like the crash is happening on the first test case:
{u"", 0, u"0", u"one"}

Here's the full stack-trace when the test fails:

 # Call Site
00 icuin68d!icu_68::PluralRules::select+0x35
01 icuin68d!icu_68::PluralRules::select+0xc6
02 intltest!PluralRulesTest::getPluralKeyword+0x268
03 intltest!PluralRulesTest::testCompactDecimalPluralKeyword+0x574
04 intltest!PluralRulesTest::runIndexedTest+0x366
05 intltest!IntlTest::runTestLoop+0x2a6
06 intltest!IntlTest::runTest+0x110
07 intltest!IntlTest::callTest+0x9c
08 intltest!IntlTestFormat::runIndexedTest+0x204b
09 intltest!IntlTest::runTestLoop+0x2a6
0a intltest!IntlTest::runTest+0x110
0b intltest!IntlTest::callTest+0x9c
0c intltest!MajorTestLevel::runIndexedTest+0x34e
0d intltest!IntlTest::runTestLoop+0x2a6
0e intltest!IntlTest::runTest+0x110
0f intltest!main+0xe8d

Debugging a bit, it looks like the problem is actually here:

LocalPointer<PluralRules> rules(PluralRules::createRules(
    "one: i = 0,1 @integer 0, 1 @decimal 0.0~1.5;  "
    "many: e = 0 and i % 1000000 = 0 and v = 0 or e != 0 .. 5;  "
    "other:  @integer 2~17, 100, 1000, 10000, 100000, 1000000,  "
    "  @decimal 2.0~3.5, 10.0, 100.0, 1000.0, 10000.0, 100000.0, 1000000.0, …", errorCode));

The creation of the PluralRules object actually fails and sets the errorCode to be U_UNEXPECTED_TOKEN | U_FMT_PARSE_ERROR_START.

But the test doesn't check the errorCode before using the rules object.

This later causes a null-pointer dereference as LocalPointer<PluralRules> is actually empty since the createRules didn't succeed and returned nullptr.

@jefgen
Copy link
Member

jefgen commented Sep 2, 2020

Ah, found it. -- The problem is the last character, the "…".
This is actually U+2026 (not "..."), which isn't in the default code page.

@sffc
Copy link
Member

sffc commented Sep 2, 2020

Good find!

It looks like PluralRules does expect a "…" (U_2026) rather than "...": static const UChar ELLIPSIS = ((UChar) 0x2026);

I think the correct fix is to use a 16-bit UnicodeString instead of an 8-bit string. Currently you have

    LocalPointer<PluralRules> rules(PluralRules::createRules(
        "one: i = 0,1 @integer 0, 1 @decimal 0.0~1.5;  "
        "many: e = 0 and i % 1000000 = 0 and v = 0 or e != 0 .. 5;  "
        "other:  @integer 2~17, 100, 1000, 10000, 100000, 1000000,  "
        "  @decimal 2.0~3.5, 10.0, 100.0, 1000.0, 10000.0, 100000.0, 1000000.0, …", errorCode));

However PluralRules::createRules takes a UnicodeString. So you should actually be doing

    LocalPointer<PluralRules> rules(PluralRules::createRules(
        u"one: i = 0,1 @integer 0, 1 @decimal 0.0~1.5;  "
        u"many: e = 0 and i % 1000000 = 0 and v = 0 or e != 0 .. 5;  "
        u"other:  @integer 2~17, 100, 1000, 10000, 100000, 1000000,  "
        u"  @decimal 2.0~3.5, 10.0, 100.0, 1000.0, 10000.0, 100000.0, 1000000.0, …", errorCode));

sffc
sffc previously approved these changes Sep 2, 2020
Copy link
Member

@sffc sffc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. The latest test failures appear to be unrelated; I'm fixing them in #1292.

@echeran
Copy link
Contributor Author

echeran commented Sep 2, 2020

Thanks for the detailed troubleshooting, @jefgen and @sffc ! Going to squash, rebase against latest in master, and push again. I'll wait for the latest build checks here and on master to finish and see what they say.

@jira-pull-request-webhook
Copy link

Hooray! The files in the branch are the same across the force-push. 😃

~ Your Friendly Jira-GitHub PR Checker Bot

@jira-pull-request-webhook
Copy link

Hooray! The files in the branch are the same across the force-push. 😃

~ Your Friendly Jira-GitHub PR Checker Bot

@echeran
Copy link
Contributor Author

echeran commented Sep 3, 2020

Looks like all the checks passed. Now is finally the time for a re-stamp.

@echeran echeran requested review from pedberg-icu and sffc September 3, 2020 14:55
@echeran
Copy link
Contributor Author

echeran commented Sep 3, 2020

I almost forgot, @pedberg-icu -- where is the logKnownIssue for ICU-21254? I can't find it in the ICU codebase. Let me know if I need to amend this PR.

@pedberg-icu
Copy link
Contributor

I almost forgot, @pedberg-icu -- where is the logKnownIssue for ICU-21254? I can't find it in the ICU codebase. Let me know if I need to amend this PR.

There is no longer a logKnownIsssue for ICU-21254; I realized that the tests I was originally skipping with it were actually broken due to https://unicode-org.atlassian.net/browse/ICU-21258, and changed the logKnownIssue to reference ICU-21258 instead.

@echeran echeran merged commit cb7f197 into unicode-org:master Sep 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants