Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ballista: Finish implementing shuffle mechanism #707

Closed
andygrove opened this issue Jul 11, 2021 · 1 comment · Fixed by #750
Closed

Ballista: Finish implementing shuffle mechanism #707

andygrove opened this issue Jul 11, 2021 · 1 comment · Fixed by #750
Assignees
Labels
bug Something isn't working enhancement New feature or request performance Make DataFusion faster

Comments

@andygrove
Copy link
Member

andygrove commented Jul 11, 2021

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

The shuffle mechanism in Ballista is not yet working and there are some design issues that need resolving in order to fix this.

When running TPC-H query 12 I see query plans like this being executed:

ShuffleWriterExec: None
  ParquetExec: ...

The output from each stage is not read by the next stage. The next stage is redundantly executing the previous stages again.

Describe the solution you'd like
Get shuffles working.

Describe alternatives you've considered
None

Additional context
None

@andygrove andygrove added enhancement New feature or request ballista performance Make DataFusion faster labels Jul 11, 2021
@andygrove andygrove self-assigned this Jul 11, 2021
@andygrove andygrove added the bug Something isn't working label Jul 11, 2021
@andygrove
Copy link
Member Author

It turns out there is a fundamental bug here. I am working on fixing this.

@andygrove andygrove changed the title Ballista: Avoid pointless shuffle write of Parquet partitions Ballista: Finish implementing shuffle mechanism Jul 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request performance Make DataFusion faster
Projects
None yet
1 participant