Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query65 is producing indeterministic results #35

Open
wjxiz1992 opened this issue Jun 27, 2022 · 3 comments
Open

Query65 is producing indeterministic results #35

wjxiz1992 opened this issue Jun 27, 2022 · 3 comments

Comments

@wjxiz1992
Copy link
Collaborator

Both CPU and GPU runs for query65 are producing indeterministic results, more details in #7 (comment). We need to track this and think about how to fix it.

@abellina
Copy link
Collaborator

@wjxiz1992 any progress on this or do you need someone else to take a look?

@wjxiz1992
Copy link
Collaborator Author

I haven't got time looking into this. I'll spend some time today and give some basic information here.

@wjxiz1992
Copy link
Collaborator Author

wjxiz1992 commented Jul 18, 2022

@abellina
In addition to #7 (comment) I did the following test:

  1. filter the null rows for table item and store by
...
df.filter("i_item_desc is not null")
...
df.filter("s_store_name is not null")
...
# then save them

so the order by will not see any null values.

  1. re-run the query65 to get new outputs.
  2. compare the outputs between 2 runs
  3. The results is still mismatch
Row 0:
['able', 'A', Decimal('0.00'), Decimal('9.55'), Decimal('1.74'), 'edu packnameless #2']
['able', 'A', Decimal('0.55'), Decimal('2.57'), Decimal('1.64'), 'edu packnameless #10']

Row 1:
['able', 'A', Decimal('0.38'), Decimal('8.62'), Decimal('1.84'), 'importoscholar #2']
['able', 'A', Decimal('0.57'), Decimal('1.47'), Decimal('7.31'), 'exportiexporti #2']

Row 2:
['able', 'A', Decimal('0.63'), Decimal('1.89'), Decimal('1.33'), 'namelessunivamalg #4']
['able', 'A', Decimal('0.86'), Decimal('4.62'), Decimal('3.14'), 'exportiedu pack #2']

Row 3:
['able', 'A', Decimal('0.71'), Decimal('3.78'), Decimal('1.89'), 'importoexporti #2']
['able', 'A', Decimal('1.11'), Decimal('8.78'), Decimal('6.76'), 'amalgedu pack #2']

Row 4:
['able', 'A', Decimal('0.74'), Decimal('1.47'), Decimal('7.31'), 'exportiexporti #2']
['able', 'A', Decimal('1.54'), Decimal('13.66'), Decimal('50.29'), 'importoamalg #2']

Row 5:
['able', 'A', Decimal('0.86'), Decimal('4.62'), Decimal('3.14'), 'exportiedu pack #2']
['able', 'A', Decimal('1.75'), Decimal('2.54'), Decimal('0.28'), 'amalgedu pack #2']

Row 6:
['able', 'A', Decimal('1.46'), Decimal('3.78'), Decimal('1.89'), 'importoexporti #2']
['able', 'A', Decimal('2.22'), Decimal('8.78'), Decimal('6.76'), 'amalgedu pack #2']

Row 7:
['able', 'A', Decimal('1.54'), Decimal('13.66'), Decimal('50.29'), 'importoamalg #2']
['able', 'A', Decimal('2.64'), Decimal('61.36'), Decimal('32.52'), 'importoimporto #2']

Row 8:
['able', 'A', Decimal('1.59'), Decimal('2.57'), Decimal('1.64'), 'edu packnameless #10']
['able', 'A', Decimal('2.84'), Decimal('9.61'), Decimal('1.13'), 'importonameless #2']

Row 9:
['able', 'A', Decimal('1.65'), Decimal('3.52'), Decimal('2.78'), 'edu packnameless #6']
['able', 'A', Decimal('3.01'), Decimal('9.61'), Decimal('1.13'), 'importonameless #2']

This happens to CPU runs as well. It makes me feel the SQL itself is producing indeterministic results.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants