Add -o option to all e2e benches #5658

jaylmiller · 2023-03-20T20:59:00Z

Which issue does this PR close?

Part of #5561

Rationale for this change

For e2e benchmarks, the TCPH bin has an option to output a machine readable file, which can then be consumed the script from PR #5655 . It would be nice to be able to re-use this script for all bins in the e2e benches.

What changes are included in this PR?

This PR pulls out the existing logic from tpch.rs that (optionally) writes the run data to a machine readable json file. That logic is then used in all the other benchmarks, adding a -o option to every bin in the e2e benchmarks dir.

Are these changes tested?

Are there any user-facing changes?

…datafusion into benchmarks-e2e

comphead · 2023-03-21T15:29:54Z

benchmarks/src/bin/h2o.rs

-    let elapsed = start.elapsed().as_millis();
-
+    let elapsed = start.elapsed().as_secs_f64() * 1000.0;
+    let numrows = batches.iter().map(|b| b.num_rows()).sum::<usize>();


Thanks @jaylmiller for looking into this.

Noticed for other testcases you calc numrows before elapsed, perhaps to prevent numrows runtime to be part of benchmark runtime

Yes! Good catch thank youi... was a mistake by me

Another thing I'm thinking is can it be calculating num rows triggers some system cache and benchmark will run faster, alhough its unexpected

I think num_rows is pretty fast (it doesn't actually do any work , it just returns a field's value): https://docs.rs/arrow-array/35.0.0/src/arrow_array/record_batch.rs.html#278

alamb

Looks like a definite improvement to me 🚀 -- thank you @jaylmiller

I had some suggestions about improving code ergonomics but I don't think they are required to merge this PR if you would prefer not to do them.

alamb · 2023-03-21T19:13:48Z

benchmarks/src/bin/parquet.rs

-        disjunction([
+        ("Selective-ish filter", col("request_method").eq(lit("GET"))),
+        (
+            "Non-selective filter",


this is nice to add the details into the output file

benchmarks/src/bin/h2o.rs

benchmarks/src/lib.rs

alamb · 2023-03-21T19:21:50Z

benchmarks/src/lib.rs

+/// A single iteration of a benchmark query
+#[derive(Debug, Serialize)]
+struct QueryIter {
+    elapsed: f64,


Can you please add some documentation about what unit this is in (I think it is milliseconds?)

Relatedly I wonder if we could make this API easier to use by storing a Duration https://doc.rust-lang.org/std/time/struct.Duration.html, calculated with SystemTime::now() - start

I've changed elapsed to be a Duration object and am using a custom serializer to make it appear as unix secs in the output json

alamb · 2023-03-21T19:22:57Z

benchmarks/src/bin/h2o.rs

-    let elapsed = start.elapsed().as_millis();
-
+    let elapsed = start.elapsed().as_secs_f64() * 1000.0;
+    let numrows = batches.iter().map(|b| b.num_rows()).sum::<usize>();


I think num_rows is pretty fast (it doesn't actually do any work , it just returns a field's value): https://docs.rs/arrow-array/35.0.0/src/arrow_array/record_batch.rs.html#278

alamb

This looks really great -- thank you @jaylmiller

alamb · 2023-03-21T21:19:36Z

benchmarks/src/bin/h2o.rs

+    println!(
+        "h2o groupby query {} took {} ms",
+        opt.query,
+        elapsed.as_secs_f64() * 1000.0


jaylmiller added 5 commits March 17, 2023 11:41

add machine readable output option to all e2e benches

a009293

Merge branch 'apache:main' into benchmarks-e2e

86fb427

clippy

efc739c

Merge branch 'benchmarks-e2e' of https://github.com/jaylmiller/arrow-…

f45fe2c

…datafusion into benchmarks-e2e

clippy

7b3c07e

jaylmiller force-pushed the benchmarks-e2e branch from 39db9cc to 7b3c07e Compare March 21, 2023 01:21

jaylmiller marked this pull request as ready for review March 21, 2023 01:52

jaylmiller mentioned this pull request Mar 21, 2023

Report and compare benchmark runs against two branches #5561

Closed

comphead reviewed Mar 21, 2023

View reviewed changes

compute row count after timer ends

26a73eb

alamb approved these changes Mar 21, 2023

View reviewed changes

ergonomics changes suggested by alamb

fb79179

alamb approved these changes Mar 21, 2023

View reviewed changes

benchmarks/src/bin/h2o.rs

println!(

"h2o groupby query {} took {} ms",

opt.query,

elapsed.as_secs_f64() * 1000.0

Copy link

Contributor

alamb Mar 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

alamb merged commit b9964d6 into apache:main Mar 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add -o option to all e2e benches #5658

Add -o option to all e2e benches #5658

jaylmiller commented Mar 20, 2023 •

edited

Loading

comphead Mar 21, 2023

jaylmiller Mar 21, 2023

comphead Mar 21, 2023

alamb Mar 21, 2023

alamb left a comment

alamb Mar 21, 2023

alamb Mar 21, 2023

jaylmiller Mar 21, 2023 •

edited

Loading

alamb Mar 21, 2023

alamb left a comment

alamb Mar 21, 2023

Add -o option to all e2e benches #5658

Add -o option to all e2e benches #5658

Conversation

jaylmiller commented Mar 20, 2023 • edited Loading

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alamb left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jaylmiller Mar 21, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alamb left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jaylmiller commented Mar 20, 2023 •

edited

Loading

jaylmiller Mar 21, 2023 •

edited

Loading