Remove allocation in impl Display for path::Display #38879 #39023

martinhath · 2017-01-13T00:47:30Z

I wasn't too sure about testing this, but I threw in a test for checking that the non-unicode sequences were converted to '\u{fffd}'

rust-highfive · 2017-01-13T00:47:38Z

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @aturon (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see the contribution instructions for more information.

nagisa · 2017-01-13T01:15:14Z

I’d much rather this do 1 allocation for a cold case (invalid unicode) rather than have write_char called for every single codepoint every single time.

martinhath · 2017-01-13T11:54:39Z

Do you mean performance wise? I think it would be interesting to see some benchmarks here.

martinhath · 2017-01-13T21:37:52Z

Ok, so I hacked together a quick bench of the two options. On my system, with a path length of 8, with illegal unicode bytes, the benchmarks reported about the same running time. With legal unicode paths, the to_lossy_str was always faster (path length of 8 was the shortest I tested). The difference was a between a factor of two and five, depending on the path length.

Now, allocation time is of course system dependent so the exact numbers here aren't written in stone, but seeing what kind of numbers we are talking about here, I'm not sure this is worth pursuing.

I guess the results are expected, so unless @jethrogb have some input here I'll close this PR.

jethrogb · 2017-01-13T21:50:27Z

You can test whether str::from_utf8 returns an error or not before moving into the slow path. This is what to_string_lossy does.

martinhath · 2017-01-13T22:01:00Z

That is certainly possible, but seeing how small the gains seems to be, and that checking for illegal utf8 before deciding on the path will slow down the function even further, I'm struggling to imagine a scenario where we'll end up on the plus side here.

jethrogb · 2017-01-13T22:03:16Z

If it's valid, you can just push the resulting &str through Formatter::write_str

hanna-kruppe · 2017-01-13T22:05:06Z

The write_char in the loop means that the bytes are not only checked for UTF-8 validity (unavoidable) but moreover the decoded code points are then re-encoded and pushed on the buffer piecemeal. It could be much more efficient to try and decode, but write the UTF-8 parts of the source string directly rather than encoding again and going char-by-char.

However, that is only economical if the nitty gritty part is already implemented, i.e., if there's existing code that validates the bytes and tells you where (if at all) the first non-UTF-8 byte is. I don't know if we have something like that.

jethrogb · 2017-01-13T22:06:42Z

@rkruppe that's pretty much the to_string_lossy internals...

hanna-kruppe · 2017-01-13T22:33:11Z

Great! Can the relevant parts be accessed from the outside, or can the code easily be refactored to allow that?

clarfonthey · 2017-01-29T17:54:20Z

IMHO this should be a double-check for invalid unicode: check if the slice is valid unicode and If so, write_str, otherwise, do the write_char loop.

This way, you get the proper optimisation on the average case (valid UTF-8), and the allocation in the uncommon case is replaced by two validity checks. Still better than before, but doesn't incur a performance penalty in the average case.

Perhaps in the long-term if we want to optimise this better on the fmt::Write side, we could offer a write_chars which takes in an iterator, and then specialise it depending on what's passed in.

Side note: It'd be nice if we had a no-allocate version of to_string_lossy so that people can use this algorithm without having to re-implement every time they need it.

aturon · 2017-01-31T17:03:26Z

I'm going to close out this PR; as it currently stands, it's a net loss to performance and introduces a bit more implementation complexity. We could consider exposing the to_string_lossy internals a bit more, but we'd need to see some strong performance numbers showing a benefit.

Thanks for the PR!

Remove allocation in impl Display for path::Display rust-lang#38879

96dc7e2

rust-highfive assigned aturon Jan 13, 2017

alexcrichton added the T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. label Jan 19, 2017

aturon closed this Jan 31, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove allocation in impl Display for path::Display #38879 #39023

Remove allocation in impl Display for path::Display #38879 #39023

martinhath commented Jan 13, 2017 •

edited

Loading

rust-highfive commented Jan 13, 2017

nagisa commented Jan 13, 2017 •

edited

Loading

martinhath commented Jan 13, 2017

martinhath commented Jan 13, 2017

jethrogb commented Jan 13, 2017

martinhath commented Jan 13, 2017

jethrogb commented Jan 13, 2017

hanna-kruppe commented Jan 13, 2017

jethrogb commented Jan 13, 2017 •

edited

Loading

hanna-kruppe commented Jan 13, 2017

clarfonthey commented Jan 29, 2017 •

edited

Loading

aturon commented Jan 31, 2017

Remove allocation in impl Display for path::Display #38879 #39023

Remove allocation in impl Display for path::Display #38879 #39023

Conversation

martinhath commented Jan 13, 2017 • edited Loading

rust-highfive commented Jan 13, 2017

nagisa commented Jan 13, 2017 • edited Loading

martinhath commented Jan 13, 2017

martinhath commented Jan 13, 2017

jethrogb commented Jan 13, 2017

martinhath commented Jan 13, 2017

jethrogb commented Jan 13, 2017

hanna-kruppe commented Jan 13, 2017

jethrogb commented Jan 13, 2017 • edited Loading

hanna-kruppe commented Jan 13, 2017

clarfonthey commented Jan 29, 2017 • edited Loading

aturon commented Jan 31, 2017

martinhath commented Jan 13, 2017 •

edited

Loading

nagisa commented Jan 13, 2017 •

edited

Loading

jethrogb commented Jan 13, 2017 •

edited

Loading

clarfonthey commented Jan 29, 2017 •

edited

Loading