-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize performance of initcap
function (~2x faster)
#13691
Conversation
Signed-off-by: Tai Le Manh <manhtai.lmt@gmail.com>
initcap
function (~2x faster)
I'm not sure if this is an intended implementation or not. But seems like it would be better if we return |
char_vector.push(c.to_ascii_lowercase()); | ||
fn initcap_string(input: Option<&str>) -> Option<String> { | ||
input.map(|s| { | ||
let mut result = String::with_capacity(s.len()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW we could likely make this function much faster still by using one of the StringArrayWriters directly (it would save this allocation entirely)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch thanks @tlm365
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍🏻
for c in s.chars() { | ||
let transformed = if prev_is_alphanumeric { | ||
c.to_ascii_lowercase() | ||
} else { | ||
char_vector.push(c.to_ascii_uppercase()); | ||
} | ||
previous_character_letter_or_number = | ||
c.is_ascii_uppercase() || c.is_ascii_lowercase() || c.is_ascii_digit(); | ||
c.to_ascii_uppercase() | ||
}; | ||
result.push(transformed); | ||
prev_is_alphanumeric = c.is_ascii_alphanumeric(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am curious which implementation is faster 🤔
if s.is_empty() {
return result;
} else {
let (upper, lower) = s.split_at(1);
result.push_str(&upper.to_ascii_uppercase());
result.push_str(&lower.to_ascii_lowercase());
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's not correct and/or possible with utf8?
Seems this answer (last snippet) https://stackoverflow.com/a/38406885 might be close?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's not correct and/or possible with utf8?
Seems this answer (last snippet) https://stackoverflow.com/a/38406885 might be close?
@Dandandan you're right. I tested it with some unit tests for special characters (unicode), this PR's implementation doesn't work and the old implementation doesn't work either. But I think it makes sense at the moment since the initcap
function is in datafusion::function::string
not datafusion::function::unicode
.
Perhaps, it would be better if we update the documentation that initcap
is not supported for unicode characters yet and try to handle it if necessary in another PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made a PR to update the docs: #13749
Thanks again @tlm365 @jayzhan211 @Dandandan @Weijun-H and @Dandandan (quite a distinguished list of contributors!) |
Which issue does this PR close?
Closes #.
Rationale for this change
Eliminate the intermediate
Vec<char>
and directly push into aString
.What changes are included in this PR?
Update the
initcap
function, benchmark included.Are these changes tested?
Yes, existing unit tests.
Are there any user-facing changes?
No.
*BENCHMARK RESULT