Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize Wtf8Buf::into_string for the case where it contains UTF-8. #96869

Merged
merged 5 commits into from
Aug 24, 2022

Conversation

sunfishcode
Copy link
Member

Add a is_known_utf8 flag to Wtf8Buf, which tracks whether the
string is known to contain UTF-8. This is efficiently computed in many
common situations, such as when a Wtf8Buf is constructed from a String
or &str, or with Wtf8Buf::from_wide which is already doing UTF-16
decoding and already checking for surrogates.

This makes OsString::into_string O(1) rather than O(N) on Windows in
common cases.

And, it eliminates the need to scan through the string for surrogates in
Args::next and Vars::next, because the strings are already being
translated with Wtf8Buf::from_wide.

Many things on Windows construct OsStrings with Wtf8Buf::from_wide,
such as DirEntry::file_name and fs::read_link, so with this patch,
users of those functions can subsequently call .into_string() without
paying for an extra scan through the string for surrogates.

r? @ghost

@rustbot rustbot added the T-libs Relevant to the library team, which will review and decide on the PR/issue. label May 9, 2022
library/std/src/sys_common/wtf8.rs Outdated Show resolved Hide resolved
library/std/src/sys_common/wtf8.rs Show resolved Hide resolved
library/std/src/sys_common/wtf8.rs Show resolved Hide resolved
library/std/src/sys_common/wtf8.rs Outdated Show resolved Hide resolved
@sunfishcode sunfishcode marked this pull request as ready for review May 10, 2022 14:28
@sunfishcode
Copy link
Member Author

r? rust-lang/libs

@joshtriplett
Copy link
Member

@bors r+ rollup=never

@bors
Copy link
Contributor

bors commented May 17, 2022

📌 Commit f4c93bcbe837ab5220beaa2892fc249f49a9e740 has been approved by joshtriplett

@bors bors added the S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. label May 17, 2022
@bors
Copy link
Contributor

bors commented May 17, 2022

⌛ Testing commit f4c93bcbe837ab5220beaa2892fc249f49a9e740 with merge 7f13587a59ff3c5350b77bf9334994f91f05d9fc...

@bors
Copy link
Contributor

bors commented May 17, 2022

💔 Test failed - checks-actions

@bors bors added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels May 17, 2022
@rust-log-analyzer

This comment has been minimized.

@sunfishcode
Copy link
Member Author

src/librustdoc/html/render/context.rs had an assert that rustdoc's render context was a certain size. It's already not checked on most platforms, as it's just a guard against the context growing unexpectedly, so I've now added a patch to disable the check on Windows too.

@bors
Copy link
Contributor

bors commented May 28, 2022

☔ The latest upstream changes (presumably #97433) made this pull request unmergeable. Please resolve the merge conflicts.

@crlf0710
Copy link
Member

There's merge conflict now. Needs a rebase.

@crlf0710 crlf0710 added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels May 28, 2022
@sunfishcode
Copy link
Member Author

r? @joshtriplett

@sunfishcode
Copy link
Member Author

@rustbot label -S-waiting-on-author +S-waiting-on-review

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels May 31, 2022
@joshtriplett
Copy link
Member

@bors r+

@bors
Copy link
Contributor

bors commented Jun 23, 2022

📌 Commit e620b42644e3404377fa0fae0afd660c6bd77999 has been approved by joshtriplett

@bors
Copy link
Contributor

bors commented Jun 23, 2022

🌲 The tree is currently closed for pull requests below priority 1000. This pull request will be tested once the tree is reopened.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jun 23, 2022
@klensy
Copy link
Contributor

klensy commented Jun 23, 2022

Add a `is_known_utf8` flag to `Wtf8Buf`, which tracks whether the
string is known to contain UTF-8. This is efficiently computed in many
common situations, such as when a `Wtf8Buf` is constructed from a `String`
or `&str`, or with `Wtf8Buf::from_wide` which is already doing UTF-16
decoding and already checking for surrogates.

This makes `OsString::into_string` O(1) rather than O(N) on Windows in
common cases.

And, it eliminates the need to scan through the string for surrogates in
`Args::next` and `Vars::next`, because the strings are already being
translated with `Wtf8Buf::from_wide`.

Many things on Windows construct `OsString`s with `Wtf8Buf::from_wide`,
such as `DirEntry::file_name` and `fs::read_link`, so with this patch,
users of those functions can subsequently call `.into_string()` without
paying for an extra scan through the string for surrogates.
This assert is just making sure the size of `Context` doens't grow
unexpectedly, and it's already not being checked on every platform.

`PathBuf` now has a different size on Windows, so adjust this to
avoid checking the size on Windows.
@sunfishcode
Copy link
Member Author

I think I used the github merge-conflict UI and didn't realize that generated a merge commit. I've now fixed that.

r? @joshtriplett

@crlf0710 crlf0710 added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Jul 10, 2022
@JohnCSimon JohnCSimon added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Aug 13, 2022
@sunfishcode
Copy link
Member Author

Ping @joshtriplett; this is already r+ above, it just needs an approval because of the merge-commit issue which is now fixed.

@jyn514
Copy link
Member

jyn514 commented Aug 23, 2022

@bors r=joshtriplett

@bors
Copy link
Contributor

bors commented Aug 23, 2022

📌 Commit ddeb936 has been approved by joshtriplett

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Aug 23, 2022
@bors
Copy link
Contributor

bors commented Aug 24, 2022

⌛ Testing commit ddeb936 with merge 25ea5a3...

@bors
Copy link
Contributor

bors commented Aug 24, 2022

☀️ Test successful - checks-actions
Approved by: joshtriplett
Pushing 25ea5a3 to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Aug 24, 2022
@bors bors merged commit 25ea5a3 into rust-lang:master Aug 24, 2022
@rustbot rustbot added this to the 1.65.0 milestone Aug 24, 2022
@sunfishcode sunfishcode deleted the main branch August 24, 2022 04:07
@rust-timer
Copy link
Collaborator

Finished benchmarking commit (25ea5a3): comparison URL.

Overall result: no relevant changes - no action needed

@rustbot label: -perf-regression

Instruction count

This benchmark run did not return any relevant results for this metric.

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean1 range count2
Regressions ❌
(primary)
2.3% [2.3%, 2.3%] 1
Regressions ❌
(secondary)
4.9% [2.6%, 7.2%] 2
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-2.8% [-3.9%, -2.0%] 4
All ❌✅ (primary) 2.3% [2.3%, 2.3%] 1

Cycles

This benchmark run did not return any relevant results for this metric.

Footnotes

  1. the arithmetic mean of the percent change

  2. number of relevant changes

/// It is possible for `bytes` to have valid UTF-8 without this being
/// set, such as when we're concatenating `&Wtf8`'s and surrogates become
/// paired, as we don't bother to rescan the entire string.
is_known_utf8: bool,
Copy link
Member

@RalfJung RalfJung Apr 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a field here introduced a subtle bug in PathBuf: #124409.

Privacy-breaking transmutes are "fun". ;)

bors added a commit to rust-lang-ci/rust that referenced this pull request Jun 12, 2024
…ng-utf8-invariant, r=<try>

Make PathBuf less Ok with adding UTF-16 then `into_string`

Fixes rust-lang#126291 which is, as far as I can tell, a regression introduced by rust-lang#96869.

try-job: x86_64-msvc
workingjubilee added a commit to workingjubilee/rustc that referenced this pull request Jun 12, 2024
…ring-utf8-invariant, r=joboet

Make PathBuf less Ok with adding UTF-16 then `into_string`

Fixes rust-lang#126291 which is, as far as I can tell, a regression introduced by rust-lang#96869.

try-job: x86_64-msvc
rust-timer added a commit to rust-lang-ci/rust that referenced this pull request Jun 12, 2024
Rollup merge of rust-lang#126305 - workingjubilee:fix-os-string-to-string-utf8-invariant, r=joboet

Make PathBuf less Ok with adding UTF-16 then `into_string`

Fixes rust-lang#126291 which is, as far as I can tell, a regression introduced by rust-lang#96869.

try-job: x86_64-msvc
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.