-
Notifications
You must be signed in to change notification settings - Fork 442
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[TPCH][VL] tpch has some query execution error logs but queries could finish and the result is correct #1090
Comments
I encountered the same problem before, which seems to be caused by constructing |
hi @deshanxiao We have TPCH jobs(spark3.2.2) running nightly and didn't run into these issues. Based on the logs, it seems you are testing with Spark 3.2.4-SNAPSHOT, which is not released yet. Thanks, -yuan |
Thank you for your reply, yes, I tested it with https://github.com/apache/spark/tree/branch-3.2 and no any modifications on Spark. Let me re-build a spark3.2.2 to test it again. Thank you @zhouyuan |
Hi @zhouyuan, did you run those TPCH jobs with parquet? From the above, this issue should be related to the native parquet reader of velox. |
yes, the issue is related with filter pushdown as you suggested, and the two line code is not touched for some time. The nightly jobs are running with parquet datasource as described here: https://github.com/oap-project/gluten/tree/main/backends-velox/workload/tpch The github action CI also checks for running with small TPCH/TPCDS datasets: https://github.com/oap-project/gluten/actions/runs/4362916625/jobs/7628376329 So the issue here seems a bit surprise to me, I'm trying to build a new Spark to reproduce the issues(the spark used in our tests are 322 and 321, something may changed in the 3.2 branch) Thanks, -yuan |
Hi @zhouyuan, I click on these tabs and see: |
Thanks for pointing out the logs, I get your point now. So the issue is: TPCH query reported exceptions during run, but query is finished and the result is correct. Initially I misunderstood the issue here, I thought you meant for "query will fail due to the exceptions".
and in some validation functions, velox will dump out some error logs, but it should be well handled in the following catch block in gluten. Thanks, -yuan |
Got it, so these errors are expected, right? In fact, from the log I uploaded, the query can get the result eventually. Please forgive my inaccurate expression. |
@deshanxiao no worries, yes, the error messages are expected. Thanks for pointing this out usually we only verify the results and just ignored the "error messages" shown in executor logs. I can help to add one explanation in the doc first, a better fix would be improving the try/catch code to give out a more clarity |
@zhouyuan Thanks for your explanation and sorry for the late response, previously I used an old commit of gluten, and this velox issue would cause a core dump (which confused me for a while). Now I tried the latest commit, and everything works fine. I'll follow closer to the community! |
BTW, I've seen that the validation & fallback mechanism would ensure the query to be executed properly, but for this issue, it will still cause an unnecessary fallback of the FileScan operator, am I right? |
Not really. For this issue, the filter like |
Closing this issue. Please re-open it if needed. |
Describe the bug
There are some issues when execute the tpch query of "q4 q12 q13 q16 q21". Below is the detailed error log:
error log.txt
To Reproduce
Based on the doc : https://github.com/oap-project/gluten/tree/main/backends-velox/workload/tpch
···
bash tpch_parquet.sh
···
Spark Version:3.2.4
Expected behavior
No error log
Additional context
The text was updated successfully, but these errors were encountered: