Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] test_sortmerge_join_struct_mixed_key_with_null_filter LeftSemi/LeftAnti fails #3429

Closed
jlowe opened this issue Sep 9, 2021 · 0 comments · Fixed by #3431
Closed

[BUG] test_sortmerge_join_struct_mixed_key_with_null_filter LeftSemi/LeftAnti fails #3429

jlowe opened this issue Sep 9, 2021 · 0 comments · Fixed by #3431
Assignees
Labels
bug Something isn't working P0 Must have for release

Comments

@jlowe
Copy link
Member

jlowe commented Sep 9, 2021

test_sortmerge_join_struct_mixed_key_with_null_filter is failing on LeftSemi and LeftAnti:

LeftSemi:

14:53:32  _ test_sortmerge_join_struct_mixed_key_with_null_filter[LeftSemi-Struct(['child0', String(not_null)],['child1', Byte(not_null)],['child2', Short(not_null)],['child3', Integer(not_null)],['child4', Long(not_null)],['child5', Boolean(not_null)],['child6', Date(not_null)],['child7', Timestamp(not_null)])] _
14:53:32  [gw2] linux -- Python 3.8.11 /usr/bin/python
14:53:32  
14:53:32  data_gen = Struct(['child0', String(not_null)],['child1', Byte(not_null)],['child2', Short(not_null)],['child3', Integer(not_null)],['child4', Long(not_null)],['child5', Boolean(not_null)],['child6', Date(not_null)],['child7', Timestamp(not_null)])
14:53:32  join_type = 'LeftSemi'
14:53:32  
14:53:32      @ignore_order(local=True)
14:53:32      @pytest.mark.parametrize('data_gen', struct_gens, ids=idfn)
14:53:32      @pytest.mark.parametrize('join_type', ['Inner', 'Left', 'Right', 'Cross', 'LeftSemi', 'LeftAnti'], ids=idfn)
14:53:32      def test_sortmerge_join_struct_mixed_key_with_null_filter(data_gen, join_type):
14:53:32          def do_join(spark):
14:53:32              left = two_col_df(spark, data_gen, int_gen, length=500)
14:53:32              right = two_col_df(spark, data_gen, int_gen, length=500)
14:53:32              return left.join(right, (left.a == right.a) & (left.b == right.b), join_type)
14:53:32          # Disable constraintPropagation to test null filter on built table with nullable structures.
14:53:32          conf = {'spark.sql.constraintPropagation.enabled': 'false', **_sortmerge_join_conf}
14:53:32  >       assert_gpu_and_cpu_are_equal_collect(do_join, conf=conf)
14:53:32  
14:53:32  ../../src/main/python/join_test.py:623: 
14:53:32  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
14:53:32  ../../src/main/python/asserts.py:440: in assert_gpu_and_cpu_are_equal_collect
14:53:32      _assert_gpu_and_cpu_are_equal(func, 'COLLECT', conf=conf, is_cpu_first=is_cpu_first)
14:53:32  ../../src/main/python/asserts.py:432: in _assert_gpu_and_cpu_are_equal
14:53:32      assert_equal(from_cpu, from_gpu)
14:53:32  ../../src/main/python/asserts.py:101: in assert_equal
14:53:32      _assert_equal(cpu, gpu, float_check=get_float_check(), path=[])
14:53:32  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
14:53:32  
14:53:32  cpu = [Row(a=Row(child0='\x00\x08y®\x96\x0269', child1=-70, child2=-32768, child3=-1322782629, child4=-1690857710608059544, ...alse, child6=datetime.date(2735, 1, 11), child7=datetime.datetime(319, 4, 24, 18, 37, 44, 718000)), b=-504146606), ...]
14:53:32  gpu = [Row(a=None, b=None), Row(a=None, b=-1938242823), Row(a=None, b=-1902567188), Row(a=None, b=-1839266743), Row(a=None, b=-1771431272), Row(a=None, b=-1709949039), ...]
14:53:32  float_check = <function get_float_check.<locals>.<lambda> at 0x7f06ed78e940>
14:53:32  path = []
14:53:32  
14:53:32      def _assert_equal(cpu, gpu, float_check, path):
14:53:32          t = type(cpu)
14:53:32          if (t is Row):
14:53:32              assert len(cpu) == len(gpu), "CPU and GPU row have different lengths at {} CPU: {} GPU: {}".format(path, len(cpu), len(gpu))
14:53:32              if hasattr(cpu, "__fields__") and hasattr(gpu, "__fields__"):
14:53:32                  assert cpu.__fields__ == gpu.__fields__, "CPU and GPU row have different fields at {} CPU: {} GPU: {}".format(path, cpu.__fields__, gpu.__fields__)
14:53:32                  for field in cpu.__fields__:
14:53:32                      _assert_equal(cpu[field], gpu[field], float_check, path + [field])
14:53:32              else:
14:53:32                  for index in range(len(cpu)):
14:53:32                      _assert_equal(cpu[index], gpu[index], float_check, path + [index])
14:53:32          elif (t is list):
14:53:32  >           assert len(cpu) == len(gpu), "CPU and GPU list have different lengths at {} CPU: {} GPU: {}".format(path, len(cpu), len(gpu))
14:53:32  E           AssertionError: CPU and GPU list have different lengths at [] CPU: 450 GPU: 500
14:53:32  
14:53:32  ../../src/main/python/asserts.py:40: AssertionError

LeftAnti:

14:53:32  _ test_sortmerge_join_struct_mixed_key_with_null_filter[LeftAnti-Struct(['child0', String(not_null)],['child1', Byte(not_null)],['child2', Short(not_null)],['child3', Integer(not_null)],['child4', Long(not_null)],['child5', Boolean(not_null)],['child6', Date(not_null)],['child7', Timestamp(not_null)])] _
14:53:32  [gw2] linux -- Python 3.8.11 /usr/bin/python
14:53:32  
14:53:32  data_gen = Struct(['child0', String(not_null)],['child1', Byte(not_null)],['child2', Short(not_null)],['child3', Integer(not_null)],['child4', Long(not_null)],['child5', Boolean(not_null)],['child6', Date(not_null)],['child7', Timestamp(not_null)])
14:53:32  join_type = 'LeftAnti'
14:53:32  
14:53:32      @ignore_order(local=True)
14:53:32      @pytest.mark.parametrize('data_gen', struct_gens, ids=idfn)
14:53:32      @pytest.mark.parametrize('join_type', ['Inner', 'Left', 'Right', 'Cross', 'LeftSemi', 'LeftAnti'], ids=idfn)
14:53:32      def test_sortmerge_join_struct_mixed_key_with_null_filter(data_gen, join_type):
14:53:32          def do_join(spark):
14:53:32              left = two_col_df(spark, data_gen, int_gen, length=500)
14:53:32              right = two_col_df(spark, data_gen, int_gen, length=500)
14:53:32              return left.join(right, (left.a == right.a) & (left.b == right.b), join_type)
14:53:32          # Disable constraintPropagation to test null filter on built table with nullable structures.
14:53:32          conf = {'spark.sql.constraintPropagation.enabled': 'false', **_sortmerge_join_conf}
14:53:32  >       assert_gpu_and_cpu_are_equal_collect(do_join, conf=conf)
14:53:32  
14:53:32  ../../src/main/python/join_test.py:623: 
14:53:32  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
14:53:32  ../../src/main/python/asserts.py:440: in assert_gpu_and_cpu_are_equal_collect
14:53:32      _assert_gpu_and_cpu_are_equal(func, 'COLLECT', conf=conf, is_cpu_first=is_cpu_first)
14:53:32  ../../src/main/python/asserts.py:432: in _assert_gpu_and_cpu_are_equal
14:53:32      assert_equal(from_cpu, from_gpu)
14:53:32  ../../src/main/python/asserts.py:101: in assert_equal
14:53:32      _assert_equal(cpu, gpu, float_check=get_float_check(), path=[])
14:53:32  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
14:53:32  
14:53:32  cpu = [Row(a=None, b=None), Row(a=None, b=-1938242823), Row(a=None, b=-1902567188), Row(a=None, b=-1839266743), Row(a=None, b=-1771431272), Row(a=None, b=-1709949039), ...]
14:53:32  gpu = []
14:53:32  float_check = <function get_float_check.<locals>.<lambda> at 0x7f06ef182b80>
14:53:32  path = []
14:53:32  
14:53:32      def _assert_equal(cpu, gpu, float_check, path):
14:53:32          t = type(cpu)
14:53:32          if (t is Row):
14:53:32              assert len(cpu) == len(gpu), "CPU and GPU row have different lengths at {} CPU: {} GPU: {}".format(path, len(cpu), len(gpu))
14:53:32              if hasattr(cpu, "__fields__") and hasattr(gpu, "__fields__"):
14:53:32                  assert cpu.__fields__ == gpu.__fields__, "CPU and GPU row have different fields at {} CPU: {} GPU: {}".format(path, cpu.__fields__, gpu.__fields__)
14:53:32                  for field in cpu.__fields__:
14:53:32                      _assert_equal(cpu[field], gpu[field], float_check, path + [field])
14:53:32              else:
14:53:32                  for index in range(len(cpu)):
14:53:32                      _assert_equal(cpu[index], gpu[index], float_check, path + [index])
14:53:32          elif (t is list):
14:53:32  >           assert len(cpu) == len(gpu), "CPU and GPU list have different lengths at {} CPU: {} GPU: {}".format(path, len(cpu), len(gpu))
14:53:32  E           AssertionError: CPU and GPU list have different lengths at [] CPU: 50 GPU: 0
14:53:32  
14:53:32  ../../src/main/python/asserts.py:40: AssertionError
@jlowe jlowe added bug Something isn't working ? - Needs Triage Need team to review and classify P0 Must have for release labels Sep 9, 2021
@jlowe jlowe self-assigned this Sep 9, 2021
@jlowe jlowe linked a pull request Sep 9, 2021 that will close this issue
@Salonijain27 Salonijain27 removed the ? - Needs Triage Need team to review and classify label Sep 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working P0 Must have for release
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants