Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RepartitionExec should not error if output has hung up #576

Merged
merged 2 commits into from
Jun 18, 2021

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Jun 16, 2021

Which issue does this PR close?

Closes #575

Rationale for this change

See #575 for gory details, but the idea is that an output hanging up is not an error (it is a LIMIT). However, we also should stop repartition exec from reading input that will never be read.

What changes are included in this PR?

  1. Ignore errors when sending to output that has hung up
  2. Stop pulling from input if all outputs have hung up.

Are there any user-facing changes?

Avoid intermittent errors (that probably started appearing after #521 as previously runtime errors were ignored)

@@ -723,4 +743,105 @@ mod tests {

assert_batches_sorted_eq!(&expected, &batches);
}

#[tokio::test]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not write a test that early shutdown stop consuming input (because repartition exec uses unbounded channels, it buffers indefinitely and thus the timing control has to be very precise)

🤔 mostly that sounds like an excuse

@codecov-commenter
Copy link

Codecov Report

Merging #576 (33f07f8) into master (51e5445) will increase coverage by 0.02%.
The diff coverage is 90.00%.

❗ Current head 33f07f8 differs from pull request most recent head e45efcf. Consider uploading reports for the commit e45efcf to get more accurate results
Impacted file tree graph

@@            Coverage Diff             @@
##           master     #576      +/-   ##
==========================================
+ Coverage   76.02%   76.05%   +0.02%     
==========================================
  Files         156      156              
  Lines       27063    27126      +63     
==========================================
+ Hits        20575    20631      +56     
- Misses       6488     6495       +7     
Impacted Files Coverage Δ
datafusion/src/test/exec.rs 74.73% <82.75%> (+3.52%) ⬆️
datafusion/src/physical_plan/repartition.rs 87.02% <95.12%> (+0.85%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 51e5445...e45efcf. Read the comment docs.

@alamb
Copy link
Contributor Author

alamb commented Jun 18, 2021

@Dandandan / @andygrove would you be ok if I merged this bug fix (it is causing us trouble in IOx)?

Copy link
Contributor

@Dandandan Dandandan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense 👍

@alamb alamb merged commit 4a55364 into apache:master Jun 18, 2021
@houqp houqp added bug Something isn't working datafusion Changes in the datafusion crate labels Jul 30, 2021
@alamb alamb deleted the alamb/repartition_error branch October 6, 2022 18:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working datafusion Changes in the datafusion crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bug: RepartitionExec sometimes incorrectly reports "Error" when output is not completely consumed
4 participants