Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove PATH_MAX restriction from with_nix_path and further improve performance #1656

Merged
merged 1 commit into from
Feb 27, 2022
Merged

Remove PATH_MAX restriction from with_nix_path and further improve performance #1656

merged 1 commit into from
Feb 27, 2022

Conversation

SUPERCILEX
Copy link
Contributor

@SUPERCILEX SUPERCILEX commented Feb 3, 2022

This PR removes the PATH_MAX limitation since that's wrong anyway: https://insanecoding.blogspot.com/2007/11/pathmax-simply-isnt.html

A nice side effect is that this lets us further optimize with_nix_path by having the compiler not insert probe frames, saving us another fat pile of instructions:

2,289,930 ( 1.67%) 305,324 ( 0.99%) 152,662 ( 0.74%) 18 ( 0.15%)  .            .          .          .           .           => ???:__rust_probestack (152,662x)

New numbers:

19,257,593 (16.43%) 4,415,242 (15.75%) 2,593,635 (12.82%) 41 ( 0.27%) 47 ( 0.12%) 53 ( 0.26%) 7 ( 0.12%) 0          3 ( 0.02%) 2,404,730 (14.52%) 2,721 ( 0.53%) 762,780 (30.96%)    66 ( 0.90%)  => /home/asaveau/Desktop/nix/src/lib.rs:<[u8] as nix::NixPath>::with_nix_path (152,556x)
Ir                 Dr               Dw               I1mr        D1mr        D1mw       ILmr       DLmr       DLmw       Bc               Bcm        Bi               Bim            

1,067,892 ( 0.91%)       0          457,668 ( 2.26%)  2 ( 0.01%)  0          0          1 ( 0.02%) .          .                .          .                .              .               fn with_nix_path<T, F>(&self, f: F) -> Result<T>
        .                .                .           .           .          .          .          .          .                .          .                .              .               where
        .                .                .           .           .          .          .          .          .                .          .                .              .                   F: FnOnce(&CStr) -> T,
        .                .                .           .           .          .          .          .          .                .          .                .              .               {
        .                .                .           .           .          .          .          .          .                .          .                .              .                   // The real PATH_MAX is 4096, but it's statistically unlikely to have a path longer than
        .                .                .           .           .          .          .          .          .                .          .                .              .                   // ~300 bytes. See the appendix to get stats for your own machine.
        .                .                .           .           .          .          .          .          .                .          .                .              .                   //
        .                .                .           .           .          .          .          .          .                .          .                .              .                   // By being smaller than a memory page, we also avoid the compiler inserting a probe frame:
        .                .                .           .           .          .          .          .          .                .          .                .              .                   // https://docs.rs/compiler_builtins/latest/compiler_builtins/probestack/index.html
        .                .                .           .           .          .          .          .          .                .          .                .              .                   const MAX_STACK_ALLOCATION: usize = 512;
        .                .                .           .           .          .          .          .          .                .          .                .              .           
  305,112 ( 0.26%)       0                0           0           0          0          0          0          0          152,556 ( 0.92%) .                .              .                   if self.len() >= MAX_STACK_ALLOCATION {
        .                .                .           .           .          .          .          .          .                .          .                .              .                       return with_nix_path_allocating(self, f);
        .                .                .           .           .          .          .          .          .                .          .                .              .                   }
        .                .                .           .           .          .          .          .          .                .          .                .              .           
        .                .                .           .           .          .          .          .          .                .          .                .              .                   let mut buf = MaybeUninit::<[u8; MAX_STACK_ALLOCATION]>::uninit();
        .                .                .           .           .          .          .          .          .                .          .                .              .                   let buf_ptr = buf.as_mut_ptr() as *mut u8;
        .                .                .           .           .          .          .          .          .                .          .                .              .           
        .                .                .           .           .          .          .          .          .                .          .                .              .                   unsafe {
        .                .                .           .           .          .          .          .          .                .          .                .              .                       ptr::copy_nonoverlapping(self.as_ptr(), buf_ptr, self.len());
        .                .                .           .           .          .          .          .          .                .          .                .              .                       buf_ptr.add(self.len()).write(0);
        .                .                .           .           .          .          .          .          .                .          .                .              .                   }
        .                .                .           .           .          .          .          .          .                .          .                .              .           
1,067,892 ( 0.91%) 305,112 ( 1.09%) 152,556 ( 0.75%)  2 ( 0.01%)  7 ( 0.02%) 0          1 ( 0.02%) 0          0          305,112 ( 1.84%) 6 ( 0.00%) 152,556 ( 6.19%)     1 ( 0.01%)          match CStr::from_bytes_with_nul(unsafe { slice::from_raw_parts(buf_ptr, self.len() + 1) }) {
8,730,403 ( 7.45%) 1,211,383 ( 4.32%) 1,067,892 ( 5.28%) 18 ( 0.12%)  2 ( 0.00%) 0          2 ( 0.04%) 0          0          1,336,744 ( 8.07%) 609 ( 0.12%) 152,556 ( 6.19%)    62 ( 0.85%)  => /rustc/21b4a9cfdcbb1e76f4b36b5c3cfd64d627285093//library/std/src/ffi/c_str.rs:std::ffi::c_str::CStr::from_bytes_with_nul (152,556x)
  610,224 ( 0.52%) 610,224 ( 2.18%)       .           .           .          .          .          .          .                .          .                .              .                       Ok(s) => Ok(f(s)),
        .                .                .           .           .          .          .          .          .                .          .                .              .                       Err(_) => Err(Errno::EINVAL),
        .                .                .           .           .          .          .          .          .                .          .                .              .                   }
  762,780 ( 0.65%) 610,224 ( 2.18%)       .           .           .          .          .          .          .                .          .                .              .               }
        .                .                .           .           .          .          .          .          .                .          .                .              .           }

Appendix

use histogram::Histogram;
use std::env;
use std::fs::{canonicalize, read_dir};
use std::path::Path;

fn main() {
    let args = env::args().collect::<Vec<String>>();

    let mut histogram = Histogram::new();
    if args.len() == 2 {
        log(&mut histogram, canonicalize(&args[1]).unwrap());
    } else {
        log(&mut histogram, canonicalize(".").unwrap());
    }

    println!(
        "min={}\nmax={}\nmean={}\np50={}\np90={}\np99={}\np999={}\nstddev={}\nstdvar={}",
        histogram.minimum().unwrap(),
        histogram.maximum().unwrap(),
        histogram.mean().unwrap(),
        histogram.percentile(50.0).unwrap(),
        histogram.percentile(90.0).unwrap(),
        histogram.percentile(99.0).unwrap(),
        histogram.percentile(99.9).unwrap(),
        histogram.stddev().unwrap(),
        histogram.stdvar().unwrap(),
    )
}

fn log(histogram: &mut Histogram, path: impl AsRef<Path>) {
    for dir in read_dir(path.as_ref()).unwrap() {
        let entry = dir.unwrap();
        histogram
            .increment(entry.path().as_os_str().len() as u64)
            .unwrap();
        if entry.file_type().unwrap().is_dir() {
            log(histogram, entry.path());
        }
    }
}

@SUPERCILEX SUPERCILEX changed the title Run rustfmt on lib.rs and ignore .idea Remove PATH_MAX restriction from with_nix_path and further improve performance Feb 3, 2022
SUPERCILEX added a commit to SUPERCILEX/ftzz that referenced this pull request Feb 13, 2022
- Update nix with my PRs: nix-rust/nix#1656, nix-rust/nix#1655
- Use raw pointer manipulation to extract name cache values
- Use a faster rand implementation with better statistical guarantees
- Tune the task queue capacity and consumption rate and get rid of copies in drain (by using a VecDeque)

Signed-off-by: Alex Saveau <saveau.alexandre@gmail.com>
@SUPERCILEX
Copy link
Contributor Author

@rtzoeller Alrighty, this one should be ready for review.

src/lib.rs Outdated
if self.len() >= PATH_MAX as usize {
return Err(Errno::ENAMETOOLONG);
// The real PATH_MAX is 4096, but it's statistically unlikely to have a path longer than
// ~300 bytes. See the appendix to get stats for your own machine.
Copy link
Contributor Author

@SUPERCILEX SUPERCILEX Feb 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My numbers starting from ~:

min=17
max=368
mean=125
p50=118
p90=185
p99=244
p999=271
stddev=42
stdvar=1733

src/lib.rs Outdated Show resolved Hide resolved
src/lib.rs Outdated
//
// By being smaller than a memory page, we also avoid the compiler inserting a probe frame:
// https://docs.rs/compiler_builtins/latest/compiler_builtins/probestack/index.html
const MAX_STACK_ALLOCATION: usize = 512;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was there a particular reason you chose 512? Choosing the "best" value for MAX_STACK_ALLOCATION has been bothering me.

Considerations include things like:

  • Avoiding __rust_probestack for the "common" case.
    • Only supported on x86 and x86_64 today, so let's assume a page is at least 4096 bytes.
    • Our stack usage is dependent on the size of T, but excluding buf is typically fairly low.
  • Avoiding heap allocations where possible.
    • Only fall back to with_nix_path_allocating where it is necessary or otherwise justifiable.
  • Avoiding using significantly more stack space than previously.
    • PATH_MAX is 1024 on the BSDs, so indiscriminately picking a much larger value isn't appropriate.

Choosing 4032 (as I believe you had done previously?) seems to mostly alleviate the first two points, but is significantly higher than the current stack utilization on the BSDs. On the other hand, choosing 512 addresses the first and third points, but still leads to heap allocations on all platforms for valid path lengths.

IMO 1024 seems like a reasonable compromise. It should fully avoid __rust_probestack for sane values of T. It ensures that we don't get a heap allocation on the BSDs while not increasing the stack usage, and also ensures we won't get a heap allocation on other platforms for reasonable path lengths. We could optionally have different definitions per-platform, but that feels like we're getting into diminishing returns.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, 512 was pretty arbitrary. I bumped it to 1024, but I'm even happy to go to 2048 if you think that's worth it. Might be interesting to run that histogram script on your machine to see what your longest path is.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@asomers I'm curious for your thoughts on the maximum stack allocation size here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There really isn't any magic number for what's acceptable for the stack. I don't have any better idea that Supercilex does.

src/lib.rs Outdated Show resolved Hide resolved
@rtzoeller
Copy link
Collaborator

@SUPERCILEX can you add a changelog note indicating that the PATH_MAX restriction has been removed? Probably in the "Changed" section.

Signed-off-by: Alex Saveau <saveau.alexandre@gmail.com>
@SUPERCILEX
Copy link
Contributor Author

Sweet, done!

@rtzoeller
Copy link
Collaborator

bors r+

@bors bors bot merged commit 9312f1c into nix-rust:master Feb 27, 2022
@SUPERCILEX SUPERCILEX deleted the path_length branch February 27, 2022 21:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants