-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Aggregate queries produce different results between runs #658
Comments
@alamb @jorgecarleitao @Dandandan fyi I am going to take a look at this one tomorrow but let me know if you all have any educated guesses as to the root cause. I think I will start by adding more metrics to see number of output rows from each operator and see if that provides any clues. |
We had some problems before with errors during execution causing it to return empty results, as the errors were ignored elsewhere. Might be something similar that an error happens somewhere during the execution, so it only produces part of the data... In that case, the randomness could be because of the streams of batches being produced in parallel? |
The issue seems to be in HashJoinExec. Here are results from two runs.
|
I created #664 to add metrics to help with this. I am out of time for today. |
@andygrove I am wondering if this is still the case on latest master. #827 had one fix for an issue which could maybe trigger something like this for big datasets. |
@Dandandan This does seem to be resolved now. I ran it half a dozen times just now and got consistent results:
It is also with noting that this query is now almost 2x the speed compared to when I filed this issue 🚀 I am closing this issue. |
I ran the same query with Apache Spark and the results are consistent.
|
Describe the bug
I ran TPC-H query 12 several times with DataFusion and got different results each time:
I see the same behavior with Ballista.
To Reproduce
Expected behavior
Results should be the same on each run.
Additional context
Physical plan:
The text was updated successfully, but these errors were encountered: