Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TPC-H queries are failing on main branch #1058

Closed
kaushik-pankaj opened this issue Sep 20, 2024 · 5 comments · Fixed by #1060
Closed

TPC-H queries are failing on main branch #1058

kaushik-pankaj opened this issue Sep 20, 2024 · 5 comments · Fixed by #1060
Labels
bug Something isn't working

Comments

@kaushik-pankaj
Copy link

kaushik-pankaj commented Sep 20, 2024

Describe the bug
While running the TPC-H queries in distributed mode(having ballista-cli pointing to ballista-scheduler, ballista-scheduler and one ballista-executor) few queries are failing and few are getting passed.
Passed Queries - q1, q3, q4, q5, q6, q11, q12, q13, q16, q17, q19, q20, q21
Failed Queries - q2, q7, q8, q9, q10, q14, q15, q18, q22

Failed queries are giving similar error. For example, sharing one below for query number 2.

ballista_scheduler::scheduler_server::query_stage_scheduler] Failed to update 1 task statuses for Executor 167eb7c2-fc0f-4232-a279-47aa0d0f70e7: DataFusionError(Internal("PhysicalExpr Column references column 's_acctbal' at index 9 (zero-based) but input schema only has 9 columns: [\"s_name\", \"s_address\", \"s_nationkey\", \"s_phone\", \"s_acctbal\", \"s_comment\", \"p_partkey\", \"p_mfgr\", \"ps_supplycost\"]"))ballista_scheduler::scheduler_server::query_stage_scheduler] Failed to update 1 task statuses for Executor 167eb7c2-fc0f-4232-a279-47aa0d0f70e7: DataFusionError(Internal("PhysicalExpr Column references column 's_acctbal' at index 9 (zero-based) but input schema only has 9 columns: [\"s_name\", \"s_address\", \"s_nationkey\", \"s_phone\", \"s_acctbal\", \"s_comment\", \"p_partkey\", \"p_mfgr\", \"ps_supplycost\"]"))

Note - This issue started coming afterwards this commit 3b6964b

To Reproduce
Steps to reproduce the behavior:

  1. check out the main branch.
  2. do cargo build (build the project)
  3. run scheduler and executor
  4. connect ballista-cli to scheduler.
  5. run TPC-H queries on ballista cli(https://github.com/apache/datafusion-ballista/tree/main/benchmarks/queries)
    Expected behavior
    A clear and concise description of what you expected to happen.

Additional context
Things are working with datafusion version 35.0.0. As soon as we upgrade datafusion version to 39.0.0, TPC-H queries start failing.

@kaushik-pankaj kaushik-pankaj added the bug Something isn't working label Sep 20, 2024
@Dandandan
Copy link
Contributor

We got a similar problem with joins in our fork of ballista, we traced it down to apache/datafusion#9236 and the JoinSelection rule when creating stages which doesn't support projections yet.

@Dandandan
Copy link
Contributor

Can you confirm it is "solved" by removing the line here:

?

@my-vegetable-has-exploded
Copy link

my-vegetable-has-exploded commented Sep 20, 2024

ref to apache/datafusion#12491.
Sorry about it.

@andygrove
Copy link
Member

Thanks @Dandandan. I can confirm that removing that optimization does resolve the issue.

@andygrove
Copy link
Member

I also had to register a couple more functions in execution_loop.rs to get all queries working again:

    // TODO which other functions need adding here?
    task_scalar_functions.insert("date_part".to_string(), date_part());
    task_scalar_functions.insert("substr".to_string(), substr());

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants