-
Notifications
You must be signed in to change notification settings - Fork 196
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TPC-H queries are failing on main branch #1058
Comments
We got a similar problem with joins in our fork of ballista, we traced it down to apache/datafusion#9236 and the |
Can you confirm it is "solved" by removing the line here: datafusion-ballista/ballista/scheduler/src/state/execution_graph/execution_stage.rs Line 353 in e39a7e6
|
ref to apache/datafusion#12491. |
Thanks @Dandandan. I can confirm that removing that optimization does resolve the issue. |
I also had to register a couple more functions in
|
Describe the bug
While running the TPC-H queries in distributed mode(having ballista-cli pointing to ballista-scheduler, ballista-scheduler and one ballista-executor) few queries are failing and few are getting passed.
Passed Queries - q1, q3, q4, q5, q6, q11, q12, q13, q16, q17, q19, q20, q21
Failed Queries - q2, q7, q8, q9, q10, q14, q15, q18, q22
Failed queries are giving similar error. For example, sharing one below for query number 2.
ballista_scheduler::scheduler_server::query_stage_scheduler] Failed to update 1 task statuses for Executor 167eb7c2-fc0f-4232-a279-47aa0d0f70e7: DataFusionError(Internal("PhysicalExpr Column references column 's_acctbal' at index 9 (zero-based) but input schema only has 9 columns: [\"s_name\", \"s_address\", \"s_nationkey\", \"s_phone\", \"s_acctbal\", \"s_comment\", \"p_partkey\", \"p_mfgr\", \"ps_supplycost\"]"))ballista_scheduler::scheduler_server::query_stage_scheduler] Failed to update 1 task statuses for Executor 167eb7c2-fc0f-4232-a279-47aa0d0f70e7: DataFusionError(Internal("PhysicalExpr Column references column 's_acctbal' at index 9 (zero-based) but input schema only has 9 columns: [\"s_name\", \"s_address\", \"s_nationkey\", \"s_phone\", \"s_acctbal\", \"s_comment\", \"p_partkey\", \"p_mfgr\", \"ps_supplycost\"]"))
Note - This issue started coming afterwards this commit 3b6964b
To Reproduce
Steps to reproduce the behavior:
Expected behavior
A clear and concise description of what you expected to happen.
Additional context
Things are working with datafusion version 35.0.0. As soon as we upgrade datafusion version to 39.0.0, TPC-H queries start failing.
The text was updated successfully, but these errors were encountered: