Minor optimizations to the codegen of TaskFnInputFunction #8304

bgw · 2024-06-05T00:22:10Z

What?

I noticed a number of small optimizations that could be applied to these hot and heavily-monomorphized functions:

Using with_context() instead of .context(), to avoid evaluating the error message in the common case that it's unused. I also tried concat!() since this can be a static string, but the resulting binary is slightly larger, and we don't need to optimize for the unlikely error case.
Extracted the parts of the monomorphized functions that didn't require the type parameters into separate non-generic functions. While the goal here is mostly to reduce binary size and compilation time, this optimization on it's own seems to help with the runtime benchmarks too (though I didn't test it rigorously in isolation).

Here's a section from Rust for Rustaceans explaining this "non-generic function" trick:

Binary Size?

Slightly negative, at least for stripped debug builds:

time pnpm pack-next

-rw-r--r-- 1 bgw bgw 167895040 Jun  4 17:20 next-swc.after.tar
-rw-r--r-- 1 bgw bgw 168622080 Jun  4 15:37 next-swc.before.tar

Runtime Performance?

Using https://github.com/bgw/benchmark-scripts/

Microbenchmark (`turbo_tasks_memory_stress/fibonacci/200`)

$ TURBOPACK_BENCH_STRESS=yes cargo bench -p turbo-tasks-memory -- fibonacci/200
   Compiling turbo-tasks v0.1.0 (/home/bgw/turbo/crates/turbo-tasks)
   Compiling turbo-tasks-memory v0.1.0 (/home/bgw/turbo/crates/turbo-tasks-memory)
   Compiling turbo-tasks-testing v0.1.0 (/home/bgw/turbo/crates/turbo-tasks-testing)
    Finished `bench` profile [optimized] target(s) in 10.79s
     Running benches/mod.rs (target/release/deps/mod-8c0f970371f8713d)
turbo_tasks_memory_stress/fibonacci/200
                        time:   [64.420 ms 64.683 ms 64.941 ms]
                        thrpt:  [309.53 Kelem/s 310.76 Kelem/s 312.03 Kelem/s]
                 change:
                        time:   [-2.2828% -1.7587% -1.2206%] (p = 0.00 < 0.05)
                        thrpt:  [+1.2357% +1.7902% +2.3361%]
                        Performance has improved.
Found 1 outliers among 20 measurements (5.00%)
  1 (5.00%) low mild

"Realistic" Benchmark (`bench_startup/Turbopack CSR/1000 modules`)

The difference is small. I patched the benchmark to increase the number of iterations so I could get something statistically significant.

diff --git a/crates/turbopack-bench/src/lib.rs b/crates/turbopack-bench/src/lib.rs
index 4e3df12db0..d950d76071 100644
--- a/crates/turbopack-bench/src/lib.rs
+++ b/crates/turbopack-bench/src/lib.rs
@@ -35,8 +35,8 @@ pub mod util;

 pub fn bench_startup(c: &mut Criterion, bundlers: &[Box<dyn Bundler>]) {
     let mut g = c.benchmark_group("bench_startup");
-    g.sample_size(10);
-    g.measurement_time(Duration::from_secs(60));
+    g.sample_size(100);
+    g.measurement_time(Duration::from_secs(600));

     bench_startup_internal(g, false, bundlers);
 }

cargo bench -p turbopack-cli -- bench_startup

    Finished `bench` profile [optimized] target(s) in 1.30s
     Running benches/mod.rs (target/release/deps/mod-2681e324dfd90da1)
bench_startup/Turbopack CSR/1000 modules
                        time:   [2.2684 s 2.2717 s 2.2750 s]
                        change: [-1.9365% -1.7387% -1.5602%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) low mild
  2 (2.00%) high mild

Build Speed?

Not enough of a difference to measure.

rm -rf target/ && time cargo build -p turbopack-cli

Before:

real    10m42.174s

After:

real    10m40.735s

…ic helper function

…haring across monomorphized instances

vercel · 2024-06-05T00:22:10Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
examples-basic-web	🔄 Building (Inspect)	Visit Preview	💬 Add feedback	Jun 5, 2024 0:24am
examples-gatsby-web	🔄 Building (Inspect)	Visit Preview	💬 Add feedback	Jun 5, 2024 0:24am
examples-kitchensink-blog	🔄 Building (Inspect)	Visit Preview	💬 Add feedback	Jun 5, 2024 0:24am
examples-native-web	🔄 Building (Inspect)	Visit Preview	💬 Add feedback	Jun 5, 2024 0:24am
examples-nonmonorepo	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Jun 5, 2024 0:24am
rust-docs	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Jun 5, 2024 0:24am

4 Ignored Deployments

Name	Status	Preview	Updated (UTC)
examples-designsystem-docs	⬜️ Ignored (Inspect)	Visit Preview	Jun 5, 2024 0:24am
examples-svelte-web	⬜️ Ignored (Inspect)	Visit Preview	Jun 5, 2024 0:24am
examples-tailwind-web	⬜️ Ignored (Inspect)	Visit Preview	Jun 5, 2024 0:24am
examples-vite-web	⬜️ Ignored (Inspect)	Visit Preview	Jun 5, 2024 0:24am

bgw · 2024-06-05T00:22:25Z

Minor optimizations to the codegen of TaskFnInputFunction #8304 👈
main

This stack of pull requests is managed by Graphite. Learn more about stacking.

Join @bgw and the rest of your teammates on Graphite

github-actions · 2024-06-05T00:24:15Z

🟢 Turbopack Benchmark CI successful 🟢

Thanks

github-actions · 2024-06-05T00:25:06Z

✅ This change can build next-swc

github-actions · 2024-06-05T00:29:19Z

⚠️ CI failed ⚠️

The following steps have failed in CI:

Turbopack Rust tests (mac/win, non-blocking)

See workflow summary for details

# Turbopack * vercel/turborepo#8272  * vercel/turborepo#8262  * vercel/turborepo#8174  * vercel/turborepo#7674  * vercel/turborepo#8287  * vercel/turborepo#8037  * vercel/turborepo#8293  * vercel/turborepo#8239  * vercel/turborepo#8304  * vercel/turborepo#8221  ### What? I tried using `Arc<String>` in vercel/turborepo#7772, but a team member suggested creating a new type so we can replace underlying implementation easily in the future. ### Why? To reduce memory usage. ### How? Closes PACK-2776

…borepo#8304) ## What? I noticed a number of small optimizations that could be applied to these hot and heavily-monomorphized functions: - Using `with_context()` instead of `.context()`, to avoid evaluating the error message in the common case that it's unused. I also tried `concat!()` since this can be a static string, but the resulting binary is slightly larger, and we don't need to optimize for the unlikely error case. - Extracted the parts of the monomorphized functions that didn't require the type parameters into separate non-generic functions. While the goal here is mostly to reduce binary size and compilation time, this optimization on it's own seems to help with the runtime benchmarks too (though I didn't test it rigorously in isolation). Here's a section from [Rust for Rustaceans](https://rust-for-rustaceans.com/) explaining this "non-generic function" trick: ![Screenshot from 2024-06-04 20-26-20.png](https://graphite-user-uploaded-assets-prod.s3.amazonaws.com/HAZVitxRNnZz8QMiPn4a/91a2c00d-0e43-49c7-9e67-019b98c0ca55.png) ## Binary Size? Slightly negative, at least for stripped debug builds: ``` time pnpm pack-next ``` ``` -rw-r--r-- 1 bgw bgw 167895040 Jun 4 17:20 next-swc.after.tar -rw-r--r-- 1 bgw bgw 168622080 Jun 4 15:37 next-swc.before.tar ``` ## Runtime Performance? Using https://github.com/bgw/benchmark-scripts/ ### Microbenchmark (`turbo_tasks_memory_stress/fibonacci/200`) ``` $ TURBOPACK_BENCH_STRESS=yes cargo bench -p turbo-tasks-memory -- fibonacci/200 Compiling turbo-tasks v0.1.0 (/home/bgw/turbo/crates/turbo-tasks) Compiling turbo-tasks-memory v0.1.0 (/home/bgw/turbo/crates/turbo-tasks-memory) Compiling turbo-tasks-testing v0.1.0 (/home/bgw/turbo/crates/turbo-tasks-testing) Finished `bench` profile [optimized] target(s) in 10.79s Running benches/mod.rs (target/release/deps/mod-8c0f970371f8713d) turbo_tasks_memory_stress/fibonacci/200 time: [64.420 ms 64.683 ms 64.941 ms] thrpt: [309.53 Kelem/s 310.76 Kelem/s 312.03 Kelem/s] change: time: [-2.2828% -1.7587% -1.2206%] (p = 0.00 < 0.05) thrpt: [+1.2357% +1.7902% +2.3361%] Performance has improved. Found 1 outliers among 20 measurements (5.00%) 1 (5.00%) low mild ``` ### "Realistic" Benchmark (`bench_startup/Turbopack CSR/1000 modules`) The difference is small. I patched the benchmark to increase the number of iterations so I could get something statistically significant. ``` diff --git a/crates/turbopack-bench/src/lib.rs b/crates/turbopack-bench/src/lib.rs index 4e3df12db0..d950d76071 100644 --- a/crates/turbopack-bench/src/lib.rs +++ b/crates/turbopack-bench/src/lib.rs @@ -35,8 +35,8 @@ pub mod util; pub fn bench_startup(c: &mut Criterion, bundlers: &[Box<dyn Bundler>]) { let mut g = c.benchmark_group("bench_startup"); - g.sample_size(10); - g.measurement_time(Duration::from_secs(60)); + g.sample_size(100); + g.measurement_time(Duration::from_secs(600)); bench_startup_internal(g, false, bundlers); } ``` ``` cargo bench -p turbopack-cli -- bench_startup ``` ``` Finished `bench` profile [optimized] target(s) in 1.30s Running benches/mod.rs (target/release/deps/mod-2681e324dfd90da1) bench_startup/Turbopack CSR/1000 modules time: [2.2684 s 2.2717 s 2.2750 s] change: [-1.9365% -1.7387% -1.5602%] (p = 0.00 < 0.05) Performance has improved. Found 3 outliers among 100 measurements (3.00%) 1 (1.00%) low mild 2 (2.00%) high mild ``` ## Build Speed? Not enough of a difference to measure. ``` rm -rf target/ && time cargo build -p turbopack-cli ``` Before: ``` real 10m42.174s ``` After: ``` real 10m40.735s ```

# Turbopack * vercel/turborepo#8272  * vercel/turborepo#8262  * vercel/turborepo#8174  * vercel/turborepo#7674  * vercel/turborepo#8287  * vercel/turborepo#8037  * vercel/turborepo#8293  * vercel/turborepo#8239  * vercel/turborepo#8304  * vercel/turborepo#8221  ### What? I tried using `Arc<String>` in vercel/turborepo#7772, but a team member suggested creating a new type so we can replace underlying implementation easily in the future. ### Why? To reduce memory usage. ### How? Closes PACK-2776

* vercel/turborepo#8272  * vercel/turborepo#8262  * vercel/turborepo#8174  * vercel/turborepo#7674  * vercel/turborepo#8287  * vercel/turborepo#8037  * vercel/turborepo#8293  * vercel/turborepo#8239  * vercel/turborepo#8304  * vercel/turborepo#8221  I tried using `Arc<String>` in vercel/turborepo#7772, but a team member suggested creating a new type so we can replace underlying implementation easily in the future. To reduce memory usage. Closes PACK-2776

bgw added 3 commits June 4, 2024 14:59

Use static strings for .context() calls in TaskFnInputFunction impls

0bbc096

Continue using format!() but use with_context and move into non-gener…

d3d777d

…ic helper function

Move argument extraction into non-generic functions to improve code s…

bfbf39b

…haring across monomorphized instances

turbo-orchestrator bot added created-by: turbopack labels Jun 5, 2024

vercel bot deployed to Preview – examples-nonmonorepo June 5, 2024 00:22 View deployment

vercel bot deployed to Preview – rust-docs June 5, 2024 00:24 View deployment

bgw changed the title ~~Use static strings for .context() calls in TaskFnInputFunction impls~~ Minor optimizations to the codegen of TaskFnInputFunction Jun 5, 2024

bgw marked this pull request as ready for review June 5, 2024 03:29

bgw requested a review from a team as a code owner June 5, 2024 03:29

kdy1 approved these changes Jun 5, 2024

View reviewed changes

bgw merged commit e65d1e7 into main Jun 5, 2024
56 of 58 checks passed

bgw deleted the bgw/task-fn-errors branch June 5, 2024 05:02

kdy1 mentioned this pull request Jun 5, 2024

feat(turbopack): Introduce RcStr vercel/next.js#66262

Merged

bgw mentioned this pull request Jul 15, 2024

Refactor task arguments to be a single one #8736

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Minor optimizations to the codegen of TaskFnInputFunction #8304

Minor optimizations to the codegen of TaskFnInputFunction #8304

bgw commented Jun 5, 2024 •

edited

Loading

vercel bot commented Jun 5, 2024 •

edited

Loading

bgw commented Jun 5, 2024

github-actions bot commented Jun 5, 2024

github-actions bot commented Jun 5, 2024

github-actions bot commented Jun 5, 2024 •

edited

Loading

Minor optimizations to the codegen of TaskFnInputFunction #8304

Minor optimizations to the codegen of TaskFnInputFunction #8304

Conversation

bgw commented Jun 5, 2024 • edited Loading

What?

Binary Size?

Runtime Performance?

Microbenchmark (turbo_tasks_memory_stress/fibonacci/200)

"Realistic" Benchmark (bench_startup/Turbopack CSR/1000 modules)

Build Speed?

vercel bot commented Jun 5, 2024 • edited Loading

bgw commented Jun 5, 2024

github-actions bot commented Jun 5, 2024

🟢 Turbopack Benchmark CI successful 🟢

github-actions bot commented Jun 5, 2024

github-actions bot commented Jun 5, 2024 • edited Loading

⚠️ CI failed ⚠️

bgw commented Jun 5, 2024 •

edited

Loading

Microbenchmark (`turbo_tasks_memory_stress/fibonacci/200`)

"Realistic" Benchmark (`bench_startup/Turbopack CSR/1000 modules`)

vercel bot commented Jun 5, 2024 •

edited

Loading

github-actions bot commented Jun 5, 2024 •

edited

Loading