Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TRY] perf: spawn (parallelize) io tasks related to tarball #190

Closed
wants to merge 18 commits into from

Conversation

kdy1
Copy link

@kdy1 kdy1 commented Nov 11, 2023

This PR uses the threadpool of rayon to perform actual blocking IO operations. It's done by using rayon::scope API.

image

Blocked by:

@kdy1
Copy link
Author

kdy1 commented Nov 11, 2023

Oops my commits are gone. I'll use git reflog to find it..

Copy link
Contributor

@KSXGitHub KSXGitHub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your works.

I have a few change requests.

crates/tarball/Cargo.toml Outdated Show resolved Hide resolved
Comment on lines +241 to +243
let (file_path, file_hash) = store_dir
.write_cas_file(scope, buffer, file_is_executable)
.map_err(TarballError::WriteCasFile)?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be the only IO operation in the entire scope. And none of the operations that follow depend on this IO operation being done either. Meaning, we could potentially defer this operation after building the cas_paths (i.e. do the IO on the created cas_paths). See if this suggestion can improve the code quality and performance.

@KSXGitHub
Copy link
Contributor

KSXGitHub commented Nov 21, 2023

A new PR has been merged into main. It allows pacquet to work with a big lockfile. You may pull from main (or rebase) and benchmark with this lockfile: https://gist.github.com/KSXGitHub/ca8076a18b384b6cb6eea86d4ec7c675 (The --fixture-dir/-D flag would allow you to change lockfile without copy pasting)

@KSXGitHub KSXGitHub changed the title perf: Spawn (parallelize) io tasks related to tarball perf: spawn (parallelize) io tasks related to tarball Nov 22, 2023
Copy link

codecov bot commented Nov 22, 2023

Codecov Report

Attention: 9 lines in your changes are missing coverage. Please review.

Comparison is base (c62a631) 92.65% compared to head (4569c7f) 92.50%.

❗ Current head 4569c7f differs from pull request most recent head 339b83a. Consider uploading reports for the commit 339b83a to get more accurate results

Files Patch % Lines
crates/tarball/src/lib.rs 86.15% 9 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #190      +/-   ##
==========================================
- Coverage   92.65%   92.50%   -0.16%     
==========================================
  Files          57       57              
  Lines        2847     2881      +34     
==========================================
+ Hits         2638     2665      +27     
- Misses        209      216       +7     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@KSXGitHub
Copy link
Contributor

@kdy1 main is faster when run just integrated-benchmark -s frozen-lockfile main HEAD -V -D ./crates/testing-utils/src/fixtures/big/ on this branch:

benchmark result

@kdy1
Copy link
Author

kdy1 commented Nov 22, 2023

It's expected because you are using a beast machine and the lockfile is for a very small project.

The PR is about limiting the number of threads used for syscalls.

@KSXGitHub
Copy link
Contributor

the lockfile is for a very small project

I added -D ./crates/testing-utils/src/fixtures/big/ which contains a big lockfile.

The PR is about limiting the number of threads used for syscalls.

Is there a mechanism to scale the number of syscalls limit based on number of CPUs? Yesterday, there was a merged PR that scale the concurrent network requests limit based on number of CPUs.

Also, is the perf: in the title misleading?

@kdy1
Copy link
Author

kdy1 commented Nov 22, 2023

rayon has an issue with a beast machine. rayon-rs/rayon#394

Can you try limiting the number of rayon threads by configuring RAYON_NUM_THREADS? It's documented at https://docs.rs/rayon/latest/rayon/struct.ThreadPoolBuilder.html#method.num_threads

And the perf is intentional, as I expected it to improve the performance. (rayon is not a good fit for syscalls, but we should not mix rayon and spawn_blocking for syscalls anyway)

Co-authored-by: Khải <hvksmr1996@gmail.com>
@kdy1 kdy1 changed the title perf: spawn (parallelize) io tasks related to tarball [TRY] perf: spawn (parallelize) io tasks related to tarball Nov 22, 2023
@KSXGitHub
Copy link
Contributor

@kdy1 Should the RAYON_NUM_THREADS be low or high? Since I/O are blocking operations, which won't be benefited from having many parallel threads, it would be counterproductive to hog all CPUs for themselves. So counterintuitively, the more CPUs the machine has, the slower this code will be? Am I correct? If this is the case, then would it be more appropriate to have a dedicated thread reserved for only I/O operations?

@kdy1
Copy link
Author

kdy1 commented Nov 22, 2023

It should be lower.

So counterintuitively, the more CPUs the machine has, the slower this code will be

This is partially correct, but this is not related to IO but rather a bug of rayon.
But as it's about communication cost, it goes like

         .
      .    .  
   .          .
.                .


The performance graph will look like this because n is very small here. The communication cost is O(n^2), which is still small if the CPU core is in small range

@KSXGitHub
Copy link
Contributor

@kdy1 Is RAYON_NUM_THREADS=16 just integrated-benchmark -s frozen-lockfile main HEAD -V the correct way to run it? main was always 3 times faster than HEAD regardless of the number.

@kdy1
Copy link
Author

kdy1 commented Nov 22, 2023

Hmmm... Yeah, it's correct. So we should move syscalls to spawn_blocking, not rayon

@kdy1 kdy1 closed this Nov 22, 2023
@kdy1 kdy1 deleted the perf-3-spawn-io-tasks branch November 22, 2023 06:29
@KSXGitHub
Copy link
Contributor

@kdy1 Actually, my hypothesis regarding I/O operations is proven incorrect.

I have created a test repo which benchmark 4 cases. The result was parallel I/O + serial CPU is the fastest.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants