Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only load symbols data if pattern requires them. #791

Merged
merged 1 commit into from
Jun 18, 2021

Conversation

zbraniecki
Copy link
Member

Initial pattern analysis allows us to bail from loading symbols if they're not used by the pattern.

The analyze_pattern signature will get more advanced when we split keys further to instrument decisions on which keys to load, but I think it's a good milestone to land it already.

The common cases where this is useful are 24h times, and numerical dates.

A decent win on the overview benchmark:

datetime/datetime_overview                                                                            
                        time:   [612.45 us 613.10 us 613.71 us]
                        change: [-30.639% -30.511% -30.374%] (p = 0.00 < 0.05)
                        Performance has improved.
Benchmarking datetime/zoned_datetime_overview: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.2s, enable flat sampling, or reduce sample count to 60.
Benchmarking datetime/zoned_datetime_overview: Collecting 100 samples in estimated 6.1718 s (5050 iter                                                                                                      datetime/zoned_datetime_overview                        
                        time:   [1.1963 ms 1.1996 ms 1.2030 ms]
                        change: [-26.643% -26.355% -26.056%] (p = 0.00 < 0.05)
                        Performance has improved.

@zbraniecki zbraniecki requested a review from sffc June 12, 2021 14:24
@zbraniecki
Copy link
Member Author

@gregtatum - not flagging you for review, because you're on PTO, but pinging you so that you can skim through when you're back.

@zbraniecki zbraniecki requested a review from nordzilla June 12, 2021 14:25
@zbraniecki
Copy link
Member Author

This fixes #603 for 0.3. I'm aware of #380 and I don't think it's in conflict with #380 long term - the analyze_pattern can be still used on multiple patterns if we were to store them, and determine the need for symbols if any of the patterns supported by a given instance requires them.

Additionally, as we further split symbols, we will use this function to identify keys we want to load.
Also, in the future, icu_provider would store parsed pattern, and maybe perform this analysis and return pattern(s) + symbols all in one call.

For 0.3, I think this is a nice perf boost with minimal complexity incurred.

@codecov-commenter
Copy link

Codecov Report

Merging #791 (371ba1d) into main (feb1add) will increase coverage by 0.10%.
The diff coverage is 64.51%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #791      +/-   ##
==========================================
+ Coverage   75.42%   75.53%   +0.10%     
==========================================
  Files         196      192       -4     
  Lines       12360    12316      -44     
==========================================
- Hits         9323     9303      -20     
+ Misses       3037     3013      -24     
Impacted Files Coverage Δ
components/datetime/src/format/datetime.rs 55.30% <64.51%> (+5.30%) ⬆️
provider/testdata/src/metadata.rs
provider/testdata/src/test_data_provider.rs
provider/testdata/src/lib.rs
provider/testdata/src/paths.rs

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update feb1add...371ba1d. Read the comment docs.

@coveralls
Copy link

Pull Request Test Coverage Report for Build f8b7ab38541050e0133151c0610b138542df24e4-PR-791

  • 20 of 31 (64.52%) changed or added relevant lines in 1 file are covered.
  • 2 unchanged lines in 1 file lost coverage.
  • Overall coverage increased (+0.02%) to 75.605%

Changes Missing Coverage Covered Lines Changed/Added Lines %
components/datetime/src/format/datetime.rs 20 31 64.52%
Files with Coverage Reduction New Missed Lines %
components/datetime/src/format/datetime.rs 2 55.3%
Totals Coverage Status
Change from base Build feb1add946d05c98b8e6d57c4d5bca7311ddd6bd: 0.02%
Covered Lines: 9431
Relevant Lines: 12474

💛 - Coveralls

Copy link
Member

@sffc sffc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this, but I'd like to discuss it first to make sure I'm on the same page.

w.write_str(symbol)?
}
field @ FieldSymbol::TimeZone(_) => return Err(Error::UnsupportedField(field)),
};
Ok(())
}

// This function determins whether the struct will load symbols data.
// Keep it in sync with the `write_field` use of symbols.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: "Keep it in sync" is a noble comment, but the reality is that devs don't obey instructions in code comments unless there is a hard error. Is there a way to make this into more of a hard error? I guess test coverage, to make sure we hit your .expect(), but is that all we can do?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume it is. I could try to build some more elaborate enum that has to be true here and in the analyze function together, but I think it's overengineering at this point, and I'd prefer to land it as is.

Comment on lines +273 to +283
if supports_time_zones {
if requires_symbols {
// If we require time zones, and symbols, we know all
// we need to return already.
break;
}
} else if matches!(field.symbol, FieldSymbol::TimeZone(_)) {
// If we don't support time zones, and encountered a time zone
// field, error out.
return Err(field);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: I don't see the point of this if/elseif. More specifically, I don't think checking supports_time_zones affects any behavior. I think you can remove the supports_time_zones argument and just unconditionally break whenever requires_symbols gets set (or, better, delete that variable and just return true).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The value is that we also validate the time zone availability here. If I were to apply the change you're suggesting, assuming I understand it, then we wouldn't error out if we established we need symbols but also the pattern used time zone and was called from non-time-zoned DTF.

Am I missing something?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I didn't catch that you actually wanted to validate something. This is fine then.

components/datetime/src/format/datetime.rs Show resolved Hide resolved
@sffc sffc added discuss-priority Discuss at the next ICU4X meeting and removed discuss-priority Discuss at the next ICU4X meeting labels Jun 17, 2021
Copy link
Member

@nordzilla nordzilla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@zbraniecki zbraniecki merged commit c5135af into unicode-org:main Jun 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants