Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf fix Time #142

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from
Open

perf fix Time #142

wants to merge 1 commit into from

Conversation

mcakircali
Copy link
Contributor

try to avoid std::regex

@mcakircali mcakircali requested review from Ozaq and danovaro October 11, 2024 12:23
@codecov-commenter
Copy link

Codecov Report

Attention: Patch coverage is 94.44444% with 3 lines in your changes missing coverage. Please review.

Project coverage is 63.70%. Comparing base (19e0ec8) to head (6562aa6).

Files with missing lines Patch % Lines
src/eckit/types/Time.cc 94.44% 3 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff            @@
##           develop     #142   +/-   ##
========================================
  Coverage    63.69%   63.70%           
========================================
  Files         1065     1065           
  Lines        55142    55151    +9     
  Branches      4085     4091    +6     
========================================
+ Hits         35124    35133    +9     
  Misses       20018    20018           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link

@Ozaq Ozaq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While you are at this class, I would also encourage you to add some class level doc and describe the accepted format variants and to be expected exceptions

Comment on lines +148 to +150
throw BadTime("Unkown format for time: " + s);
} else {
throw SeriousBug("Unhandled time format!");
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is there a need for two different exceptions? I understand the intent of those exception types as follows:

  • BadTime is best described as a usage error, as in it was called as Time("ABCD")
  • SeriousBug represents an implementation error such as Time("20:11:24") not getting parsed properly.

Is that second throw giving any additional benefits?


// DIGITS: "^-?[0-9]+$"
// FLOAT: "^-?[0-9]*\\.[0-9]+$"
enum class TimeFormat { UNKOWN, OTHER, DIGITS, DECIMAL };
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UNKNOWN is treated as erroneous input, it go stronger in the name and call it INVALID

ss = sec;
} else if (format == TimeFormat::OTHER) {
std::smatch m;
if (std::regex_match(s, m, hhmmss_)) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still using a regex.

ss += 60 * (mm + 60 * (hh + 24 * dd));
if (s[0] == '-') {
ss = -ss;
} else if (std::regex_match(s, m, ddhhmmss_)) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still using a regex.

Comment on lines +40 to +49
for (auto i = start; i < time.length(); i++) {
if (time[i] == '.') {
if (hasDecimal || i == time.length() - 1) { return TimeFormat::UNKOWN; }
hasDecimal = true;
} else if (isdigit(time[i]) == 0) {
return TimeFormat::OTHER;
} else {
hasDigit = true;
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be done with a single pass over the input string and then return the typeid of the found variant (pretty similar to what you are doing already) plus up to 5 tokens, 1 token for the sign and 4 for positive integers (day, hour, minute, second).

Subsequent code can then validate on this result, e.g. was the extended flag passed.

I am thinking along the lines of:

struct tokenized_time {
    FormatType type;
    std::string_view sign;
    std::string_view integers[4];
};

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants