feat(perf): Flamegraphs for test program execution benchmarks (#6253)

# Description ## Problem\* Resolves #6186 ## Summary\* * Breaks up the `nargo_cli/benches/criterion.rs` into two steps: 1) a compilation step which happens only once and 2) the execution of the already compiled program. * Limits the code in the benchmark loop to be only the circuit execution, excluding the parsing of the program and its inputs and the saving of results as the execute command would do. These are the timings of just the circuit execution: ```text eddsa_execute time: [111.13 ms 111.19 ms 111.25 ms] regression_execute time: [151.27 µs 151.84 µs 152.66 µs] struct_execute time: [602.39 ns 606.05 ns 609.77 ns] ``` The flame graphs look as follows (you can right click and use the "Save Linked File As..." to download it and open it in a browser to make it interactive): * `eddsa`: ![eddsa](https://github.com/user-attachments/assets/c9f35961-65d9-4ac9-b2a6-1d50d14a9adc) * `regression`: ![regression](https://github.com/user-attachments/assets/5664ce3a-eb6e-4fe8-a832-0ae539c99881) * `struct`: ![struct](https://github.com/user-attachments/assets/15ebab47-1d52-4152-8d32-88f124fda525) To generate them run the following commands: ```shell ./scripts/benchmark_start.sh cargo bench -p nargo_cli --bench criterion -- --profile-time=30 ./scripts/benchmark_stop.sh ``` ## Additional Context The problem with the current `nargo_cli/benches/criterion.rs` was that it executed the `nargo execute --program-dir ... --force` command as a sub-process, and the profiler that creates the Flamegraph only sampled the criterion executor itself, not what the subprocess was doing. My first idea to get the flame graphs to include the actual execution was to turn `nargo_cli` into a library _and_ a binary (af19dfc), so that we can import the commands into the benchmark and run them in the same process. This resulted in the following flame graphs: * `eddsa`: ![eddsa](https://github.com/user-attachments/assets/e214653e-c6e3-4614-b763-b35694eeaec8) * `regression`: ![regression](https://github.com/user-attachments/assets/ade1ef1a-1a1b-4ca4-9c09-62693551a8b0) * `struct`: ![struct](https://github.com/user-attachments/assets/72838e1c-7c6b-4a0d-8040-acd335007463) These include the entire `ExecuteCommand::run` command, which unfortunately results in the flame graph dominated by parsing logic, reading inputs and writing outputs to files. In fact in all but the `eddsa` example the `execute_circuit` was so fast I can't even find it on the flamegraph. These are the timings for the command execution per test program: ```text eddsa_execute time: [984.12 ms 988.74 ms 993.95 ms] regression_execute time: [71.240 ms 71.625 ms 71.957 ms] struct_execute time: [68.447 ms 69.414 ms 70.438 ms] ``` For this reason I rolled back the library+binary change and broke up the execution further by parsing the program into a `CompiledProgram` once and calling `execute_program` in the benchmark loop without saving the results. I copied two utility functions to read the program artefact and the input map. Not sure these are worth moving around, even the errors they raise a CLI specific. ## Documentation\* Check one: - [ ] No documentation needed. - [x] Documentation included in this PR. - [ ] **[For Experimental Features]** Documentation to be submitted in a separate PR. # PR Checklist\* - [x] I have tested the changes locally. - [ ] I have formatted the changes with [Prettier](https://prettier.io/) and/or `cargo fmt` on default settings.
noir-lang · Oct 10, 2024 · c186791 · c186791
1 parent 70cbeb4
commit c186791
Show file tree

Hide file tree

Showing 3 changed files with 162 additions and 21 deletions.
diff --git a/tooling/nargo_cli/benches/README.md b/tooling/nargo_cli/benches/README.md
@@ -0,0 +1,13 @@
+# Benchmarks 
+
+To generate flamegraphs for the execution of a specific program, execute the following commands:
+
+```shell
+./scripts/benchmark_start.sh
+cargo bench -p nargo_cli --bench criterion <test-program-name> -- --profile-time=30
+./scripts/benchmark_stop.sh
+```
+
+Afterwards the flamegraph is available at `target/criterion/<test-program-name>_execute/profile/flamegraph.svg`
+
+Alternatively, omit `<test-program-name>` to run profiling on all test programs defined in [utils.rs](./utils.rs).
diff --git a/tooling/nargo_cli/benches/criterion.rs b/tooling/nargo_cli/benches/criterion.rs
@@ -1,33 +1,154 @@
 //! Select representative tests to bench with criterion
+use acvm::{acir::native_types::WitnessMap, FieldElement};
 use assert_cmd::prelude::{CommandCargoExt, OutputAssertExt};
 use criterion::{criterion_group, criterion_main, Criterion};
 
-use paste::paste;
+use noirc_abi::{
+    input_parser::{Format, InputValue},
+    Abi, InputMap,
+};
+use noirc_artifacts::program::ProgramArtifact;
+use noirc_driver::CompiledProgram;
 use pprof::criterion::{Output, PProfProfiler};
+use std::hint::black_box;
+use std::path::Path;
+use std::{cell::RefCell, collections::BTreeMap};
 use std::{process::Command, time::Duration};
+
 include!("./utils.rs");
 
-macro_rules! criterion_command {
-    ($command_name:tt, $command_string:expr) => {
-        paste! {
-            fn [<criterion_selected_tests_ $command_name>](c: &mut Criterion) {
-                let test_program_dirs = get_selected_tests();
-                for test_program_dir in test_program_dirs {
-                    let mut cmd = Command::cargo_bin("nargo").unwrap();
-                    cmd.arg("--program-dir").arg(&test_program_dir);
-                    cmd.arg($command_string);
-                    cmd.arg("--force");
-
-                    let benchmark_name = format!("{}_{}", test_program_dir.file_name().unwrap().to_str().unwrap(), $command_string);
-                    c.bench_function(&benchmark_name, |b| {
-                        b.iter(|| cmd.assert().success())
-                    });
-                }
-            }
+/// Compile the test program in a sub-process
+fn compile_program(test_program_dir: &Path) {
+    let mut cmd = Command::cargo_bin("nargo").unwrap();
+    cmd.arg("--program-dir").arg(test_program_dir);
+    cmd.arg("compile");
+    cmd.arg("--force");
+    cmd.assert().success();
+}
+
+/// Read the bytecode(s) of the program(s) from the compilation artifacts
+/// from all the binary packages. Pair them up with their respective input.
+///
+/// Based on `ExecuteCommand::run`.
+fn read_compiled_programs_and_inputs(
+    dir: &Path,
+) -> Vec<(CompiledProgram, WitnessMap<FieldElement>)> {
+    let toml_path = nargo_toml::get_package_manifest(dir).expect("failed to read manifest");
+    let workspace = nargo_toml::resolve_workspace_from_toml(
+        &toml_path,
+        nargo_toml::PackageSelection::All,
+        Some(noirc_driver::NOIR_ARTIFACT_VERSION_STRING.to_string()),
+    )
+    .expect("failed to resolve workspace");
+
+    let mut programs = Vec::new();
+    let binary_packages = workspace.into_iter().filter(|package| package.is_binary());
+
+    for package in binary_packages {
+        let program_artifact_path = workspace.package_build_path(package);
+        let program: CompiledProgram = read_program_from_file(&program_artifact_path).into();
+
+        let (inputs, _) = read_inputs_from_file(
+            &package.root_dir,
+            nargo::constants::PROVER_INPUT_FILE,
+            Format::Toml,
+            &program.abi,
+        );
+
+        let initial_witness =
+            program.abi.encode(&inputs, None).expect("failed to encode input witness");
+
+        programs.push((program, initial_witness));
+    }
+    programs
+}
+
+/// Read the bytecode and ABI from the compilation output
+fn read_program_from_file(circuit_path: &Path) -> ProgramArtifact {
+    let file_path = circuit_path.with_extension("json");
+    let input_string = std::fs::read(file_path).expect("failed to read artifact file");
+    serde_json::from_slice(&input_string).expect("failed to deserialize artifact")
+}
+
+/// Read the inputs from Prover.toml
+fn read_inputs_from_file(
+    path: &Path,
+    file_name: &str,
+    format: Format,
+    abi: &Abi,
+) -> (InputMap, Option<InputValue>) {
+    if abi.is_empty() {
+        return (BTreeMap::new(), None);
+    }
+
+    let file_path = path.join(file_name).with_extension(format.ext());
+    if !file_path.exists() {
+        if abi.parameters.is_empty() {
+            return (BTreeMap::new(), None);
+        } else {
+            panic!("input file doesn't exist: {}", file_path.display());
         }
-    };
+    }
+
+    let input_string = std::fs::read_to_string(file_path).expect("failed to read input file");
+    let mut input_map = format.parse(&input_string, abi).expect("failed to parse input");
+    let return_value = input_map.remove(noirc_abi::MAIN_RETURN_NAME);
+
+    (input_map, return_value)
+}
+
+/// Use the nargo CLI to compile a test program, then benchmark its execution
+/// by executing the command directly from the benchmark, so that we can have
+/// meaningful flamegraphs about the ACVM.
+fn criterion_selected_tests_execution(c: &mut Criterion) {
+    for test_program_dir in get_selected_tests() {
+        let benchmark_name =
+            format!("{}_execute", test_program_dir.file_name().unwrap().to_str().unwrap());
+
+        // The program and its inputs will be populated in the first setup.
+        let artifacts = RefCell::new(None);
+
+        let mut foreign_call_executor =
+            nargo::ops::DefaultForeignCallExecutor::new(false, None, None, None);
+
+        c.bench_function(&benchmark_name, |b| {
+            b.iter_batched(
+                || {
+                    // Setup will be called many times to set a batch (which we don't use),
+                    // but we can compile it only once, and then the executions will not have to do so.
+                    // It is done as a setup so that we only compile the test programs that we filter for.
+                    if artifacts.borrow().is_some() {
+                        return;
+                    }
+                    compile_program(&test_program_dir);
+                    // Parse the artifacts for use in the benchmark routine
+                    let programs = read_compiled_programs_and_inputs(&test_program_dir);
+                    // Warn, but don't stop, if we haven't found any binary packages.
+                    if programs.is_empty() {
+                        eprintln!("\nWARNING: There is nothing to benchmark in {benchmark_name}");
+                    }
+                    // Store them for execution
+                    artifacts.replace(Some(programs));
+                },
+                |_| {
+                    let artifacts = artifacts.borrow();
+                    let artifacts = artifacts.as_ref().expect("setup compiled them");
+
+                    for (program, initial_witness) in artifacts {
+                        let _witness_stack = black_box(nargo::ops::execute_program(
+                            black_box(&program.program),
+                            black_box(initial_witness.clone()),
+                            &bn254_blackbox_solver::Bn254BlackBoxSolver,
+                            &mut foreign_call_executor,
+                        ))
+                        .expect("failed to execute program");
+                    }
+                },
+                criterion::BatchSize::SmallInput,
+            );
+        });
+    }
 }
-criterion_command!(execution, "execute");
 
 criterion_group! {
     name = execution_benches;

diff --git a/tooling/nargo_cli/benches/utils.rs b/tooling/nargo_cli/benches/utils.rs
@@ -6,6 +6,7 @@ fn get_selected_tests() -> Vec<PathBuf> {
         Ok(dir) => PathBuf::from(dir),
         Err(_) => std::env::current_dir().unwrap(),
     };
+
     let test_dir = manifest_dir
         .parent()
         .unwrap()
@@ -15,5 +16,11 @@ fn get_selected_tests() -> Vec<PathBuf> {
         .join("execution_success");
 
     let selected_tests = vec!["struct", "eddsa", "regression"];
-    selected_tests.into_iter().map(|t| test_dir.join(t)).collect()
+    let mut selected_tests =
+        selected_tests.into_iter().map(|t| test_dir.join(t)).collect::<Vec<_>>();
+
+    let test_dir = test_dir.parent().unwrap().join("benchmarks");
+    selected_tests.extend(test_dir.read_dir().unwrap().filter_map(|e| e.ok()).map(|e| e.path()));
+
+    selected_tests
 }