Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement InitializeDateTimeFormat #2072

Closed

Conversation

NorbertGarfield
Copy link
Contributor

This Pull Request partially fixes #1562.

It changes the following:

  • Adds IsValidTimeZoneName
  • Adds CanonicalizeTimeZoneName
  • Implements BasicFormatMatcher
  • Implements BestFitFormatMatcher
  • Implements InitializeDateTimeFormat

@codecov
Copy link

codecov bot commented May 14, 2022

Codecov Report

Merging #2072 (90a7a25) into main (bad686d) will increase coverage by 0.61%.
The diff coverage is 72.96%.

@@            Coverage Diff             @@
##             main    #2072      +/-   ##
==========================================
+ Coverage   43.81%   44.43%   +0.61%     
==========================================
  Files         217      217              
  Lines       19560    19978     +418     
==========================================
+ Hits         8571     8877     +306     
- Misses      10989    11101     +112     
Impacted Files Coverage Δ
boa_engine/src/context/mod.rs 31.85% <ø> (+0.91%) ⬆️
boa_engine/src/object/mod.rs 20.43% <14.28%> (+0.32%) ⬆️
boa_engine/src/context/icu.rs 37.50% <20.00%> (-42.50%) ⬇️
boa_engine/src/builtins/intl/mod.rs 65.51% <53.84%> (+4.55%) ⬆️
boa_engine/src/builtins/intl/date_time_format.rs 78.11% <76.03%> (+21.86%) ⬆️
boa_engine/src/builtins/function/mod.rs 24.34% <0.00%> (-4.87%) ⬇️
...ngine/src/syntax/parser/statement/try_stm/catch.rs 43.20% <0.00%> (-2.47%) ⬇️
...oa_engine/src/syntax/parser/statement/block/mod.rs 41.46% <0.00%> (-2.44%) ⬇️
boa_engine/src/syntax/parser/mod.rs 36.76% <0.00%> (-2.30%) ⬇️
... and 8 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bad686d...90a7a25. Read the comment docs.

@raskad
Copy link
Member

raskad commented May 15, 2022

VM implementation

Test result main count PR count difference
Total 90,186 90,186 0
Passed 54,651 54,687 +36
Ignored 23,273 23,273 0
Failed 12,262 12,226 -36
Panics 0 0 0
Conformance 60.60% 60.64% +0.04%
Fixed tests (36):
test/intl402/DateTimeFormat/constructor-options-order-timedate-style.js [strict mode] (previously Failed)
test/intl402/DateTimeFormat/constructor-options-order-timedate-style.js (previously Failed)
test/intl402/DateTimeFormat/constructor-options-style-conflict.js [strict mode] (previously Failed)
test/intl402/DateTimeFormat/constructor-options-style-conflict.js (previously Failed)
test/intl402/DateTimeFormat/constructor-options-timeStyle-invalid.js [strict mode] (previously Failed)
test/intl402/DateTimeFormat/constructor-options-timeStyle-invalid.js (previously Failed)
test/intl402/DateTimeFormat/constructor-options-numberingSystem-invalid.js [strict mode] (previously Failed)
test/intl402/DateTimeFormat/constructor-options-numberingSystem-invalid.js (previously Failed)
test/intl402/DateTimeFormat/constructor-options-throwing-getters-fractionalSecondDigits.js [strict mode] (previously Failed)
test/intl402/DateTimeFormat/constructor-options-throwing-getters-fractionalSecondDigits.js (previously Failed)
test/intl402/DateTimeFormat/constructor-options-dateStyle-invalid.js [strict mode] (previously Failed)
test/intl402/DateTimeFormat/constructor-options-dateStyle-invalid.js (previously Failed)
test/intl402/DateTimeFormat/constructor-options-order.js [strict mode] (previously Failed)
test/intl402/DateTimeFormat/constructor-options-order.js (previously Failed)
test/intl402/DateTimeFormat/constructor-options-fractionalSecondDigits-invalid.js [strict mode] (previously Failed)
test/intl402/DateTimeFormat/constructor-options-fractionalSecondDigits-invalid.js (previously Failed)
test/intl402/DateTimeFormat/timezone-invalid.js [strict mode] (previously Failed)
test/intl402/DateTimeFormat/timezone-invalid.js (previously Failed)
test/intl402/DateTimeFormat/constructor-options-calendar-invalid.js [strict mode] (previously Failed)
test/intl402/DateTimeFormat/constructor-options-calendar-invalid.js (previously Failed)
test/intl402/DateTimeFormat/constructor-options-throwing-getters.js [strict mode] (previously Failed)
test/intl402/DateTimeFormat/constructor-options-throwing-getters.js (previously Failed)
test/intl402/DateTimeFormat/constructor-options-order-fractionalSecondDigits.js [strict mode] (previously Failed)
test/intl402/DateTimeFormat/constructor-options-order-fractionalSecondDigits.js (previously Failed)
test/intl402/DateTimeFormat/constructor-options-timeZoneName-invalid.js [strict mode] (previously Failed)
test/intl402/DateTimeFormat/constructor-options-timeZoneName-invalid.js (previously Failed)
test/intl402/DateTimeFormat/constructor-calendar-numberingSystem-order.js [strict mode] (previously Failed)
test/intl402/DateTimeFormat/constructor-calendar-numberingSystem-order.js (previously Failed)
test/intl402/DateTimeFormat/constructor-options-order-dayPeriod.js [strict mode] (previously Failed)
test/intl402/DateTimeFormat/constructor-options-order-dayPeriod.js (previously Failed)
test/intl402/DateTimeFormat/constructor-options-dayPeriod-invalid.js [strict mode] (previously Failed)
test/intl402/DateTimeFormat/constructor-options-dayPeriod-invalid.js (previously Failed)
test/intl402/DateTimeFormat/constructor-options-throwing-getters-dayPeriod.js [strict mode] (previously Failed)
test/intl402/DateTimeFormat/constructor-options-throwing-getters-dayPeriod.js (previously Failed)
test/intl402/DateTimeFormat/constructor-options-throwing-getters-timedate-style.js [strict mode] (previously Failed)
test/intl402/DateTimeFormat/constructor-options-throwing-getters-timedate-style.js (previously Failed)

@NorbertGarfield
Copy link
Contributor Author

@raskad, thanks for the update. I fixed these panicking tests locally, but I don't know how to gather that VM implementation report. My current method is to run failing script directly via cargo run -- /path/to/failing_test.js.

Also, Locale::from_bytes call panicked when parsing "i-klingon" locale, so I had to add a fallback to my previous implementation to separate extensions from the locale id. Please let me know if you have another idea.

@raskad
Copy link
Member

raskad commented May 15, 2022

@raskad, thanks for the update. I fixed these panicking tests locally, but I don't know how to gather that VM implementation report. My current method is to run failing script directly via cargo run -- /path/to/failing_test.js.

You are right, we should describe how to gather this report locally.
I always do it like this:

git checkout main
cargo run --release --bin boa_tester -- run -vv -o results/main
git checkout <your-branch>
cargo run --release --bin boa_tester -- run -vv -o results/<your-branch>
cargo run --release --bin boa_tester -- compare results/mainlatest.json results/<your-branch>/latest.json

You may want to redirect the output of the boa_tester run commands to some file, because printing ~200k lines to stdout can be really slow.

I will review the PR in a bit ;)

@jedel1043
Copy link
Member

jedel1043 commented May 15, 2022

@NorbertGarfield Do note, however, that if you're running boa_tester on Windows you'll probably have some additional panicking tests caused by #1570. You can just ignore them in the meantime 😅

boa_engine/src/builtins/intl/date_time_format.rs Outdated Show resolved Hide resolved
boa_engine/src/builtins/intl/date_time_format.rs Outdated Show resolved Hide resolved
boa_engine/src/builtins/intl/date_time_format.rs Outdated Show resolved Hide resolved
boa_engine/src/builtins/intl/date_time_format.rs Outdated Show resolved Hide resolved
boa_engine/src/builtins/intl/date_time_format.rs Outdated Show resolved Hide resolved
boa_engine/src/builtins/intl/date_time_format.rs Outdated Show resolved Hide resolved
@NorbertGarfield NorbertGarfield force-pushed the garfield/init-date-time branch 2 times, most recently from cb12567 to a6fbb8a Compare May 16, 2022 09:05
@NorbertGarfield NorbertGarfield requested review from Razican and raskad May 16, 2022 09:06
@NorbertGarfield NorbertGarfield force-pushed the garfield/init-date-time branch 4 times, most recently from 0dca742 to 6e5f442 Compare May 30, 2022 17:58
Copy link

@zbraniecki zbraniecki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi.

First of all, this is so exciting for us. We're building ICU4X particularly to target ECMA-402 needs well and so far we've mostly been testing it in context of SpiderMonkey use in Firefox.
Your work on boa+ecma-402 is a perfect fit and I'd love us to collaborate closer to supply your needs.

For this particular PR, I left some drive-by comments, and my overarching feedback, as a maintainer of locid component is - you should never have to work with strings. Any place where you substr or look for -u or anything like that feels like either icu_locid should provide you that, or you're doing something wrong. Generally speaking I strongly believe the whole implementation should always operate on a Locale type and only stringify/parse on output/input.
If something is missing or the API we have is suboptimal, let's fix it. It's a perfect timing as we're warming up to 1.0 stability but still have wiggle room left.

boa_engine/src/builtins/intl/mod.rs Outdated Show resolved Hide resolved
boa_engine/src/builtins/intl/mod.rs Outdated Show resolved Hide resolved
boa_engine/src/builtins/intl/tests.rs Outdated Show resolved Hide resolved
@jedel1043
Copy link
Member

jedel1043 commented May 31, 2022

@zbraniecki Thank you for your review! As you can see, our Intl implementation using icu4x is still very lacking, but I hope we can be of help by being early adopters of the library!

On another note, we had a small problem with Locale when we implemented the BestAvailableLocale operation. The algorithm described there requires the steps:

b. Let pos be the character index of the last occurrence of "-" (U+002D) within candidate. If that character does not occur, return undefined.
c. If pos ≥ 2 and the character "-" occurs at index pos - 2 of candidate, decrease pos by 2.
d. Let candidate be the substring of candidate from position 0, inclusive, to position pos, exclusive.

If we translate this to a high level operation, the spec requires removing the last word from the locale. Unfortunately, Locale doesn't expose its internal data structures, so we have to convert to a String and parse again if we want to reduce the Locale by one word. So, would it be too much to ask for a pop_last_word method in Locale?

If that's not possible, then a way to deconstruct the internals of each extension would be good, because that would allow us to do something like: mem::take -> deconstruct -> modify -> recreate modified using unchecked_*.

@zbraniecki
Copy link

Uuh, that's really unfortunate. This seems to want you to operate on a stringified version of Locale, which i think we should avoid. There are other issues with it (like, "sr-Cyrl" should not be shortened to "sr"), but also, it makes the unreasonable attempt to cut out subtags from extensions.

I think this operation should instead operate on [variants, region, script, language] array removing each one in order after removing all extensions.

I don't think we should be adding APIs to handle treating Locale as a string and make semantic operations based on positions of subtags. I think this is a change we should apply to ECMA-402 spec. CC @sffc for validation and we can file an issue there if he doesn't point out that I am missing something obvious.

@sffc
Copy link

sffc commented May 31, 2022

The spec text only requires that behavior be consistent with what would happen if you did the string operations on the locale string. You can implement it differently, so long as the result is the same. In this case, BCP47 guarantees that variant always comes after region which always comes after script which always comes after language, and you can build that assumption into your code.

@jedel1043
Copy link
Member

jedel1043 commented May 31, 2022

I don't think Language Identifiers are the problem per se; I can easily manipulate LanguageIdentifiers using the provided API, and a canonicalized Language Identifier cannot become invalid by following the spec algorithm. The problem are extensions, which could become invalid if that algorithm is followed as stated e.g. with hi-t-en-h0-hybrid it would try to match with hi-t-en-h0, which has an invalid t extension since the h0 key ought to have a value.

@NorbertGarfield NorbertGarfield force-pushed the garfield/init-date-time branch from 6e5f442 to 617c5f4 Compare May 31, 2022 14:46
@NorbertGarfield NorbertGarfield requested a review from jedel1043 May 31, 2022 18:22
@zbraniecki
Copy link

The problem are extensions, which could become invalid if that algorithm is followed as stated e.g. with hi-t-en-h0-hybrid it would try to match with hi-t-en-h0, which has an invalid t extension since the h0 key ought to have a value.

I claim with the authority given to me by the ultimate powers (Shane in this case) that you should, for all purposes, treat this as a LanguageIdentifier. It just that ICU4X is the first time we provide API surface to the spec that identifies LI as a separate thing from Locale.

I hereby recommend:

  1. In this code, take LI out of Locale and in the algo just take out subtag after subtag and compare.
  2. File an issue in ECMA-402 to request additional precision in the algo to cut out extensions

I'd particularly appreciate if you did (2) as I know it's easy to stop on (1) and move on :) Thank you!

@jedel1043
Copy link
Member

@zbraniecki I opened an issue in the spec and in the meantime we'll follow your suggestions and treat the input as a Language Identifier. Thank you for all your assistance!

Improve test semantics

Fetch default timezone from the system
Copy link
Member

@jedel1043 jedel1043 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the main problem with the approach this PR takes is the many possible ways to lose correctness because of the types used in several definitions throughout the implementation.

For example, the DateTimeFormat type is defined as:

pub struct DateTimeFormat {
    initialized_date_time_format: bool,
    locale: JsString,
    calendar: JsValue,
    numbering_system: JsValue,
    time_zone: JsString,
    weekday: JsValue,
    era: JsValue,
    year: JsValue,
    month: JsValue,
    day: JsValue,
    day_period: JsString,
    hour: JsValue,
    minute: JsValue,
    second: JsValue,
    fractional_second_digits: JsString,
    time_zone_name: JsValue,
    hour_cycle: JsValue,
    pattern: JsString,
    bound_format: JsString,
    date_style: JsValue,
    time_style: JsValue,
}

So, what would happen if someone creates an instance of that type as follows?

pub struct DateTimeFormat {
    initialized_date_time_format: true,
    locale: "en-US".into(),
    calendar: "gregori".into() // Notice the typo here,
    numbering_system: "arab".into(),
    ...
}

We can't really tell what would happen in this case without scanning the whole module, and that's a loss for the maintainability of the code.

In the next days I'll implement an API that conflates the requirements for each Intl spec object into a collection of traits and types that preserves the correctness of our code.

@zbraniecki
Copy link

you may want to track my work on icu_preferences which are targeting this problem space - unicode-org/icu4x#1833

I had a design brainstorm session with Shane last week and will push new revision soon, but I think it may simplify your logic quite a bit.

@jedel1043
Copy link
Member

There's a new Intl API in place for implementing Intl services more easily. This is waiting on the author to decide if they want to rebase on that or if we want to do it from scratch.

@jedel1043 jedel1043 added the waiting-on-author Waiting on PR changes from the author label Nov 29, 2023
@NorbertGarfield
Copy link
Contributor Author

@jedel1043, sorry for the late response. I'll try to implement it from scratch. I'm closing this PR.

@jedel1043 jedel1043 removed the waiting-on-author Waiting on PR changes from the author label Sep 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement missing internationalization related methods
6 participants