Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add "first impression" benchmark to bench.sh #6156

Open
alamb opened this issue Apr 28, 2023 · 2 comments
Open

Add "first impression" benchmark to bench.sh #6156

alamb opened this issue Apr 28, 2023 · 2 comments
Labels
enhancement New feature or request

Comments

@alamb
Copy link
Contributor

alamb commented Apr 28, 2023

Is your feature request related to a problem or challenge?

Many people who use DataFusion for the first time are querying a single CSV file, so we may want to benchmark that scenario (via bench.sh -- see #6131)

Describe the solution you'd like

I would like to add a "first impression" performance benchmark -- perhaps based on querying some subset of the ClickBench #6128 in CSV format

Describe alternatives you've considered

No response

Additional context

Suggested by @andygrove here #6127 (comment)

@alamb alamb added the enhancement New feature or request label Apr 28, 2023
@alamb
Copy link
Contributor Author

alamb commented May 1, 2023

@alamb
Copy link
Contributor Author

alamb commented May 9, 2023

Related: #6287

I have thought a lot about this. I plan to start relatively simply;

  1. Only SQL queries (as I think that is dominant way people evaluate DataFusion initially)
  2. Basic TPCH aggregate query (likely Q1 or derivative)
  3. Uses all cores (not restricted at all)
  4. On three different input types: CSV, Parquet (and eventually) JSON

-- I think the ideal first impression of DataFusion is when:

  1. It can read files directly from disk, using as many cores as possible

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant