Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
feat(perf): Flamegraphs for test program execution benchmarks (#6253)
# Description ## Problem\* Resolves #6186 ## Summary\* * Breaks up the `nargo_cli/benches/criterion.rs` into two steps: 1) a compilation step which happens only once and 2) the execution of the already compiled program. * Limits the code in the benchmark loop to be only the circuit execution, excluding the parsing of the program and its inputs and the saving of results as the execute command would do. These are the timings of just the circuit execution: ```text eddsa_execute time: [111.13 ms 111.19 ms 111.25 ms] regression_execute time: [151.27 µs 151.84 µs 152.66 µs] struct_execute time: [602.39 ns 606.05 ns 609.77 ns] ``` The flame graphs look as follows (you can right click and use the "Save Linked File As..." to download it and open it in a browser to make it interactive): * `eddsa`: ![eddsa](https://github.com/user-attachments/assets/c9f35961-65d9-4ac9-b2a6-1d50d14a9adc) * `regression`: ![regression](https://github.com/user-attachments/assets/5664ce3a-eb6e-4fe8-a832-0ae539c99881) * `struct`: ![struct](https://github.com/user-attachments/assets/15ebab47-1d52-4152-8d32-88f124fda525) To generate them run the following commands: ```shell ./scripts/benchmark_start.sh cargo bench -p nargo_cli --bench criterion -- --profile-time=30 ./scripts/benchmark_stop.sh ``` ## Additional Context The problem with the current `nargo_cli/benches/criterion.rs` was that it executed the `nargo execute --program-dir ... --force` command as a sub-process, and the profiler that creates the Flamegraph only sampled the criterion executor itself, not what the subprocess was doing. My first idea to get the flame graphs to include the actual execution was to turn `nargo_cli` into a library _and_ a binary (af19dfc), so that we can import the commands into the benchmark and run them in the same process. This resulted in the following flame graphs: * `eddsa`: ![eddsa](https://github.com/user-attachments/assets/e214653e-c6e3-4614-b763-b35694eeaec8) * `regression`: ![regression](https://github.com/user-attachments/assets/ade1ef1a-1a1b-4ca4-9c09-62693551a8b0) * `struct`: ![struct](https://github.com/user-attachments/assets/72838e1c-7c6b-4a0d-8040-acd335007463) These include the entire `ExecuteCommand::run` command, which unfortunately results in the flame graph dominated by parsing logic, reading inputs and writing outputs to files. In fact in all but the `eddsa` example the `execute_circuit` was so fast I can't even find it on the flamegraph. These are the timings for the command execution per test program: ```text eddsa_execute time: [984.12 ms 988.74 ms 993.95 ms] regression_execute time: [71.240 ms 71.625 ms 71.957 ms] struct_execute time: [68.447 ms 69.414 ms 70.438 ms] ``` For this reason I rolled back the library+binary change and broke up the execution further by parsing the program into a `CompiledProgram` once and calling `execute_program` in the benchmark loop without saving the results. I copied two utility functions to read the program artefact and the input map. Not sure these are worth moving around, even the errors they raise a CLI specific. ## Documentation\* Check one: - [ ] No documentation needed. - [x] Documentation included in this PR. - [ ] **[For Experimental Features]** Documentation to be submitted in a separate PR. # PR Checklist\* - [x] I have tested the changes locally. - [ ] I have formatted the changes with [Prettier](https://prettier.io/) and/or `cargo fmt` on default settings.
- Loading branch information