Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-12441: [Rust][DataFusion] Support cartesian join #10092

Closed
wants to merge 14 commits into from

Conversation

Dandandan
Copy link
Contributor

@Dandandan Dandandan commented Apr 18, 2021

This is a first (naive, but probably not that bad) implementation of the cartesian join and CROSS JOIN syntax.

The left side gets loaded into memory and the right side is streamed and gets combined with the left side.

Memory consumption could be improved, the current implementation results in large batches if both of the sides are big, which could be solved by keeping a "cursor" of the left side and producing the batches one by one instead of concatenating the result of the full cartesian product.

FYI @andygrove @alamb @jorgecarleitao

This also makes query 9 run in DataFusion (though performance is not OK, but I believe that should be not related to the cross join itself, but is caused by another issue).

@github-actions
Copy link

Thanks for opening a pull request!

If this is not a minor PR. Could you open an issue for this pull request on JIRA? https://issues.apache.org/jira/browse/ARROW

Opening JIRAs ahead of time contributes to the Openness of the Apache Arrow project.

Then could you also rename pull request title in the following format?

ARROW-${JIRA_ID}: [${COMPONENT}] ${SUMMARY}

or

MINOR: [${COMPONENT}] ${SUMMARY}

See also:

@codecov-commenter
Copy link

codecov-commenter commented Apr 18, 2021

Codecov Report

Merging #10092 (edcedb0) into master (9a4ef46) will increase coverage by 0.00%.
The diff coverage is 77.22%.

❗ Current head edcedb0 differs from pull request most recent head f93a7d3. Consider uploading reports for the commit f93a7d3 to get more accurate results
Impacted file tree graph

@@           Coverage Diff            @@
##           master   #10092    +/-   ##
========================================
  Coverage   78.87%   78.87%            
========================================
  Files         286      287     +1     
  Lines       64808    64974   +166     
========================================
+ Hits        51119    51250   +131     
- Misses      13689    13724    +35     
Impacted Files Coverage Δ
...lista/rust/core/src/serde/logical_plan/to_proto.rs 0.00% <ø> (ø)
rust/datafusion/src/optimizer/constant_folding.rs 91.95% <0.00%> (-0.36%) ⬇️
...t/datafusion/src/optimizer/projection_push_down.rs 98.66% <ø> (ø)
rust/datafusion/src/physical_plan/hash_utils.rs 100.00% <ø> (+2.89%) ⬆️
rust/datafusion/src/physical_plan/mod.rs 87.09% <ø> (ø)
...datafusion/src/optimizer/hash_build_probe_order.rs 55.65% <55.55%> (-0.02%) ⬇️
rust/datafusion/src/logical_plan/plan.rs 79.83% <61.53%> (-0.56%) ⬇️
...ust/datafusion/src/physical_plan/cartesian_join.rs 75.47% <75.47%> (ø)
rust/benchmarks/src/bin/tpch.rs 38.86% <100.00%> (+0.52%) ⬆️
rust/datafusion/src/logical_plan/builder.rs 89.33% <100.00%> (+0.44%) ⬆️
... and 7 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9a4ef46...f93a7d3. Read the comment docs.

@Dandandan Dandandan changed the title WIP cartesian join ARROW-12441: Support cartesian join Apr 18, 2021
@github-actions
Copy link

@Dandandan Dandandan changed the title ARROW-12441: Support cartesian join ARROW-12441: [Rust][DataFusion] Support cartesian join Apr 18, 2021
@Dandandan
Copy link
Contributor Author

Moved to apache/datafusion#11

@Dandandan Dandandan closed this Apr 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants