Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove the hasNans config and no_nan data gen in test code. #6502

Closed

Conversation

HaoYang670
Copy link
Collaborator

@HaoYang670 HaoYang670 commented Sep 5, 2022

Signed-off-by: remzi 13716567376yh@gmail.com
close #6487.

Signed-off-by: remzi <13716567376yh@gmail.com>
Signed-off-by: remzi <13716567376yh@gmail.com>
Signed-off-by: remzi <13716567376yh@gmail.com>
Signed-off-by: remzi <13716567376yh@gmail.com>
Signed-off-by: remzi <13716567376yh@gmail.com>
Signed-off-by: remzi <13716567376yh@gmail.com>
Signed-off-by: remzi <13716567376yh@gmail.com>
Signed-off-by: remzi <13716567376yh@gmail.com>
@HaoYang670 HaoYang670 self-assigned this Sep 5, 2022
@HaoYang670 HaoYang670 added the test Only impacts tests label Sep 5, 2022
@HaoYang670
Copy link
Collaborator Author

build

integration_tests/src/main/python/hash_aggregate_test.py Outdated Show resolved Hide resolved
@@ -954,7 +891,7 @@ def test_hash_multiple_mode_query_avg_distincts(data_gen, conf):
@approximate_float
@ignore_order
@incompat
@pytest.mark.parametrize('data_gen', _init_list_no_nans, ids=idfn)
@pytest.mark.parametrize('data_gen', _init_list, ids=idfn)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that just because we're removing the hasNans config does not mean we want to avoid testing values without NaNs. One concern with allowing NaNs into the mix is that it can cause some operations to effectively always return NaN because the chance of generating at least one NaN in the input could be high, and summing/averaging with a NaN produces a NaN. If too many NaNs are generated then we're not really testing whether we're getting the non-NaN case right because the NaNs "eclipse" the results.

If we're convinced that NaN-eclipsing isn't going to be a problem then we're fine, but for a full reduction on a column it seems likely at least one NaN would be generated. Ideally we want to test both cases, with and without NaNs, to make sure we're doing both correctly.

Co-authored-by: Jason Lowe <jlowe@nvidia.com>
@HaoYang670
Copy link
Collaborator Author

HaoYang670 commented Sep 14, 2022

Hi @jlowe. Thank you for your explanation.
I will close this PR and just clean the hasNan config in the test code (I will do this in #6512).

@HaoYang670 HaoYang670 closed this Sep 14, 2022
@HaoYang670 HaoYang670 deleted the 6487_clean_nan_tests branch September 15, 2022 01:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
test Only impacts tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] Remove the hasNans config from the test code.
2 participants