Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add bench.sh script to automate benchmarking DataFusion against itself #6131

Merged
merged 2 commits into from
Apr 30, 2023

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Apr 26, 2023

Which issue does this PR close?

Closes #6127

Rationale for this change

TLDR to make it easier to run the benchmarks included with DataFusion with a standard set of scenarios

See #6127

What changes are included in this PR?

  • Add a bench.sh script that orchestrates creating the data files and orchestrating executing the benchmarks
  • Update benchmark documentation
  • Remove outdated tpch_dbgen.sh script

This script currently supports two benchmarks as shown in the usage instructions.

(arrow_dev) alamb@MacBook-Pro-8:~/Software/arrow-datafusion$ ./benchmarks/bench.sh
Error: unknown command:

Orchestrates running benchmarks against DataFusion checkouts

Usage:
./benchmarks/bench.sh data [benchmark]
./benchmarks/bench.sh run [benchmark]
./benchmarks/bench.sh compare <branch1> <branch2>

**********
Examples:
**********
# Create the datasets for all benchmarks in /Users/alamb/Software/arrow-datafusion/benchmarks/data
./bench.sh data

# Run the 'tpch' benchmark on the datafusion checkout in /source/arrow-datafusion
DATAFASION_DIR=/source/arrow-datafusion ./bench.sh run tpch

**********
* Commands
**********
data:         Generates data needed for benchmarking
run:          Runs the named benchmark
compare:      Comares results from benchmark runs

**********
* Benchmarks
**********
all(default): Data/Run/Compare for all benchmarks
tpch:         TPCH inspired benchmark on Scale Factor (SF) 1 (~1GB), single parquet file per table
tpch_mem:     TPCH inspired benchmark on Scale Factor (SF) 1 (~1GB), query from memory

**********
* Supported Configuration (Environment Variables)
**********
DATA_DIR        directory to store datasets
CARGO_COMMAND   command that runs the benchmark binary
DATAFASION_DIR  directory to use (default /Users/alamb/Software/arrow-datafusion/benchmarks/..)

Are these changes tested?

I tested them manually on an x86 mac and a Linux x86 machine.

Are there any user-facing changes?

No, it is just development scripts

@alamb alamb added the development-process Related to development process of DataFusion label Apr 26, 2023
@github-actions github-actions bot removed the development-process Related to development process of DataFusion label Apr 26, 2023
@alamb alamb changed the title Add bench script to benchmark datafusion against itself Add bench.sh script to benchmark DataFusion against itself Apr 26, 2023
@alamb
Copy link
Contributor Author

alamb commented Apr 26, 2023

Some interesting results already -- I ran a quick experiment to see how much 'lto' link time optimization helps. The answer is "quite a bit"

alamb@aal-dev:~/arrow-datafusion2/benchmarks$ python compare.py results/alamb_bench/tpch.json results/alamb_bench_compare/tpch.json
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃ /home/alamb… ┃ /home/alamb… ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1     │    1269.30ms │    1097.60ms │ +1.16x faster │
│ QQuery 2     │     418.17ms │     309.01ms │ +1.35x faster │
│ QQuery 3     │     393.15ms │     365.30ms │ +1.08x faster │
│ QQuery 4     │     212.83ms │     214.36ms │     no change │
│ QQuery 5     │     534.56ms │     531.17ms │     no change │
│ QQuery 6     │     209.39ms │     184.41ms │ +1.14x faster │
│ QQuery 7     │    1037.82ms │     981.81ms │ +1.06x faster │
│ QQuery 8     │     550.99ms │     540.95ms │     no change │
│ QQuery 9     │     982.00ms │     984.53ms │     no change │
│ QQuery 10    │     613.48ms │     560.14ms │ +1.10x faster │
│ QQuery 11    │     272.45ms │     231.46ms │ +1.18x faster │
│ QQuery 12    │     319.91ms │     320.54ms │     no change │
│ QQuery 13    │    1127.46ms │    1087.70ms │     no change │
│ QQuery 14    │     286.89ms │     263.16ms │ +1.09x faster │
│ QQuery 15    │     255.63ms │     233.42ms │ +1.10x faster │
│ QQuery 16    │     302.94ms │     309.25ms │     no change │
│ QQuery 17    │    2891.05ms │    2628.59ms │ +1.10x faster │
│ QQuery 18    │    3123.23ms │    3154.47ms │     no change │
│ QQuery 19    │     511.97ms │     472.75ms │ +1.08x faster │
│ QQuery 20    │    1042.76ms │     938.75ms │ +1.11x faster │
│ QQuery 21    │    1567.78ms │    1611.91ms │     no change │
│ QQuery 22    │     182.56ms │     171.52ms │ +1.06x faster │
└──────────────┴──────────────┴──────────────┴───────────────┘
alamb@aal-dev:~/arrow-datafusion2/benchmarks$ python compare.py results/alamb_bench/tpch_mem.json results/alamb_bench_compare/tpch_mem.json
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃           -o ┃           -o ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1     │     876.15ms │     796.57ms │ +1.10x faster │
│ QQuery 2     │     265.87ms │     267.19ms │     no change │
│ QQuery 3     │     169.43ms │     164.82ms │     no change │
│ QQuery 4     │     110.07ms │     116.02ms │  1.05x slower │
│ QQuery 5     │     462.34ms │     449.41ms │     no change │
│ QQuery 6     │      44.40ms │      40.54ms │ +1.10x faster │
│ QQuery 7     │    1099.47ms │    1077.84ms │     no change │
│ QQuery 8     │     241.97ms │     247.20ms │     no change │
│ QQuery 9     │     584.01ms │     606.74ms │     no change │
│ QQuery 10    │     301.95ms │     299.14ms │     no change │
│ QQuery 11    │     239.17ms │     221.07ms │ +1.08x faster │
│ QQuery 12    │     153.73ms │     139.95ms │ +1.10x faster │
│ QQuery 13    │     793.76ms │     753.25ms │ +1.05x faster │
│ QQuery 14    │      59.50ms │      49.38ms │ +1.20x faster │
│ QQuery 15    │     103.03ms │      89.82ms │ +1.15x faster │
│ QQuery 16    │     216.38ms │     213.85ms │     no change │
│ QQuery 17    │    3356.99ms │    2866.85ms │ +1.17x faster │
│ QQuery 18    │    3017.82ms │    2910.50ms │     no change │
│ QQuery 19    │     161.19ms │     137.81ms │ +1.17x faster │
│ QQuery 20    │     924.38ms │     855.74ms │ +1.08x faster │
│ QQuery 21    │    1502.52ms │    1460.54ms │     no change │
│ QQuery 22    │     133.10ms │     128.34ms │     no change │
└──────────────┴──────────────┴──────────────┴───────────────┘

@alamb alamb changed the title Add bench.sh script to benchmark DataFusion against itself Add bench.sh script to automate benchmarking DataFusion against itself Apr 28, 2023

# Benchmark Descriptions:

## `tpch` Benchmark derived from TPC-H

These benchmarks are derived from the [TPC-H][1] benchmark. And we use this repo as the source of tpch-gen and answers:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I next hope / plan tor review the other benchmarks and consolidate them and their data generation and runner scripts into the bench.sh framework

@alamb alamb marked this pull request as ready for review April 28, 2023 17:58
@alamb alamb requested a review from andygrove April 28, 2023 20:46
Copy link
Member

@yjshen yjshen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @alamb!

# Gather baseline data for tpch benchmark
./benchmarks/bench.sh run tpch

# Switch to the branch the branch name is mybranch and gather data
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 I was curious before about what's the magic for comparing branches

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review @yjshen -- I am trying to reduce the amount of magic involved.

I am going to merge this in and we can continue to iterate (next I would like to increase the number of different tests supported)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Easy DataFusion / DataFusion Benchmarking
2 participants