Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove unnecessary lseek syscall when using std::fs::read #106664

Merged
merged 2 commits into from
Jan 11, 2023

Conversation

chenyukang
Copy link
Member

Fixes #106597
r? @bjorn3

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Jan 10, 2023
@rustbot
Copy link
Collaborator

rustbot commented Jan 10, 2023

Hey! It looks like you've submitted a new PR for the library teams!

If this PR contains changes to any rust-lang/rust public library APIs then please comment with @rustbot label +T-libs-api -T-libs to tag it appropriately. If this PR contains changes to any unstable APIs please edit the PR description to add a link to the relevant API Change Proposal or create one if you haven't already. If you're unsure where your change falls no worries, just leave it as is and the reviewer will take a look and make a decision to forward on if necessary.

Examples of T-libs-api changes:

  • Stabilizing library features
  • Introducing insta-stable changes such as new implementations of existing stable traits on existing stable types
  • Introducing new or changing existing unstable library APIs (excluding permanently unstable features / features without a tracking issue)
  • Changing public documentation in ways that create new stability guarantees
  • Changing observable runtime behavior of library APIs

@chenyukang
Copy link
Member Author

The result from strace now is:

openat(AT_FDCWD, "./p/f.rs", O_RDONLY|O_CLOEXEC) = 3
statx(0, NULL, AT_STATX_SYNC_AS_STAT, STATX_ALL, NULL) = -1 EFAULT (Bad address)
statx(3, "", AT_STATX_SYNC_AS_STAT|AT_EMPTY_PATH, STATX_ALL, {stx_mask=STATX_ALL|0x1000, stx_attributes=0, stx_mode=S_IFREG|0644, stx_size=88, ...}) = 0
read(3, "fn main() {\n    let res = std::f"..., 88) = 88
read(3, "", 32)                         = 0
close(3)

@ShawnZhong
Copy link

buffer_capacity_required is also called by read_to_end.
If a file is already half-read, we don't really need to reserve the entire file size.

@ShawnZhong
Copy link

I'd suggest modifying std::fs::read to reserve the entire file size (since we just opened it, and the file offset is guaranteed to be 0) and call default_read_to_end there (instead of the current implementation which calls file.read_to_end)

@chenyukang
Copy link
Member Author

buffer_capacity_required is also called by read_to_end. If a file is already half-read, we don't really need to reserve the entire file size.

Make sense.

@chenyukang chenyukang force-pushed the yukang/fix-106597-remove-lseek branch from eea7dfb to eae615d Compare January 10, 2023 07:35
@bjorn3
Copy link
Member

bjorn3 commented Jan 10, 2023

r? rust-lang/libs

@rustbot rustbot assigned cuviper and unassigned bjorn3 Jan 10, 2023
@bjorn3
Copy link
Member

bjorn3 commented Jan 10, 2023

Thanks! I don't think I am a suitable reviewer, so I re-assigned it to someone on the libs team.

Comment on lines 252 to 254
let mut bytes = Vec::new();
file.read_to_end(&mut bytes)?;
let size = file.metadata().map(|m| m.len()).unwrap_or(0);
bytes.reserve(size as usize);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can do let mut bytes = Vec::with_capacity(size as usize) instead of two lines reserve and Vec::new()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I thought on this, but I have a dig on the implementation of with_capacity and reserve, seems it's not simply share the same code path, so I'm not 100% sure is there any flaw to use with_capacity here.

I just have a simple bench test, seems with_capacity perform better:

cat@LAPTOP-V6U0QKD4:~/code/play/bench$ cargo bench
    Finished bench [optimized] target(s) in 0.00s
     Running unittests src/main.rs (target/release/deps/bench-b73bcfb176559323)

running 2 tests
test tests::bench_reserve_capacity ... bench:     677,683 ns/iter (+/- 828,000)
test tests::bench_with_capacity    ... bench:     469,924 ns/iter (+/- 319,682)

test result: ok. 0 passed; 0 failed; 0 ignored; 2 measured; 0 filtered out; finished in 10.56s

cat@LAPTOP-V6U0QKD4:~/code/play/bench$ cargo bench
    Finished bench [optimized] target(s) in 0.01s
     Running unittests src/main.rs (target/release/deps/bench-b73bcfb176559323)

running 2 tests
test tests::bench_reserve_capacity ... bench:     645,487 ns/iter (+/- 347,440)
test tests::bench_with_capacity    ... bench:     415,772 ns/iter (+/- 162,321)

test result: ok. 0 passed; 0 failed; 0 ignored; 2 measured; 0 filtered out; finished in 10.09s

with bench code:

#![feature(test)]
extern crate test;

pub fn test_with_capacity(size: usize) {
    let mut v: Vec<i32> = Vec::with_capacity(size);
    v.push(3);
    assert_eq!(v.len(), 1);
}

pub fn test_reserve_capacity(size: usize) {
    let mut v: Vec<i32> = Vec::new();
    v.reserve(size);
    v.push(3);
    assert_eq!(v.len(), 1);
}

#[cfg(test)]
mod tests {
    use super::*;
    use test::Bencher;

    #[bench]
    fn bench_with_capacity(b: &mut Bencher) {
        b.iter(|| {
            for s in 0..10000 {
                test_with_capacity(s);
            }
        });
    }

    #[bench]
    fn bench_reserve_capacity(b: &mut Bencher) {
        b.iter(|| {
            for s in 0..10000 {
                test_reserve_capacity(s);
            }
        });
    }
}

Comment on lines 292 to 295
file.read_to_string(&mut string)?;
let size = file.metadata().map(|m| m.len()).unwrap_or(0);
string.reserve(size as usize);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

@cuviper
Copy link
Member

cuviper commented Jan 10, 2023

@bors r+

@bors
Copy link
Contributor

bors commented Jan 10, 2023

📌 Commit f7bc68b has been approved by cuviper

It is now in the queue for this repository.

@bors
Copy link
Contributor

bors commented Jan 10, 2023

🌲 The tree is currently closed for pull requests below priority 999. This pull request will be tested once the tree is reopened.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jan 10, 2023
Noratrieb added a commit to Noratrieb/rust that referenced this pull request Jan 11, 2023
…e-lseek, r=cuviper

Remove unnecessary lseek syscall when using std::fs::read

Fixes rust-lang#106597
r? ``@bjorn3``
bors added a commit to rust-lang-ci/rust that referenced this pull request Jan 11, 2023
Rollup of 9 pull requests

Successful merges:

 - rust-lang#106321 (Collect and emit proper backtraces for `delay_span_bug`s)
 - rust-lang#106397 (Check `impl`'s `where` clauses in `consider_impl_candidate` in experimental solver)
 - rust-lang#106427 (Improve fluent error messages)
 - rust-lang#106570 (add tests for div_duration_* functions)
 - rust-lang#106648 (Polymorphization cleanup)
 - rust-lang#106664 (Remove unnecessary lseek syscall when using std::fs::read)
 - rust-lang#106709 (Disable "split dwarf inlining" by default.)
 - rust-lang#106715 (Autolabel and ping wg for changes to new solver)
 - rust-lang#106717 (fix typo LocalItemId -> ItemLocalId)

Failed merges:

r? `@ghost`
`@rustbot` modify labels: rollup
@bors bors merged commit 9a820f7 into rust-lang:master Jan 11, 2023
@rustbot rustbot added this to the 1.68.0 milestone Jan 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unnecessary lseek syscall when using std::fs::read
7 participants