You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A nightly integration test run failed with a table already exists exception:
[2022-02-22T21:59:38.650Z] ______________ test_dpp_via_aggregate_subquery_aqe_off[2-parquet] ______________
[2022-02-22T21:59:38.650Z]
[2022-02-22T21:59:38.650Z] store_format = 'parquet', s_index = 2
[2022-02-22T21:59:38.650Z] spark_tmp_table_factory = <conftest.TmpTableFactory object at 0x7f0151c0a9d0>
[2022-02-22T21:59:38.650Z]
[2022-02-22T21:59:38.650Z] @ignore_order
[2022-02-22T21:59:38.650Z] @pytest.mark.parametrize('store_format', ['parquet', 'orc'], ids=idfn)
[2022-02-22T21:59:38.651Z] @pytest.mark.parametrize('s_index', list(range(len(_statements))), ids=idfn)
[2022-02-22T21:59:38.651Z] @pytest.mark.skipif(is_databricks_runtime(), reason="DPP can not cooperate with rapids plugin on Databricks runtime")
[2022-02-22T21:59:38.651Z] def test_dpp_via_aggregate_subquery_aqe_off(store_format, s_index, spark_tmp_table_factory):
[2022-02-22T21:59:38.651Z] > __dpp_via_aggregate_subquery(store_format, s_index, spark_tmp_table_factory, 'false')
[2022-02-22T21:59:38.651Z]
[2022-02-22T21:59:38.651Z] ../../src/main/python/dpp_test.py:227:
[2022-02-22T21:59:38.651Z] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
[2022-02-22T21:59:38.651Z] ../../src/main/python/dpp_test.py:212: in __dpp_via_aggregate_subquery
[2022-02-22T21:59:38.651Z] create_fact_table(fact_table, store_format)
[2022-02-22T21:59:38.651Z] ../../src/main/python/dpp_test.py:55: in create_fact_table
[2022-02-22T21:59:38.651Z] with_cpu_session(fn)
[2022-02-22T21:59:38.651Z] ../../src/main/python/spark_session.py:92: in with_cpu_session
[2022-02-22T21:59:38.651Z] return with_spark_session(func, conf=copy)
[2022-02-22T21:59:38.651Z] ../../src/main/python/spark_session.py:76: in with_spark_session
[2022-02-22T21:59:38.651Z] ret = func(_spark)
[2022-02-22T21:59:38.651Z] ../../src/main/python/dpp_test.py:51: in fn
[2022-02-22T21:59:38.651Z] df.write.format(table_format) \
[2022-02-22T21:59:38.651Z] /home/jenkins/agent/workspace/jenkins-rapids_it-3.0.x-SNAPSHOT-dev-github-304/jars/spark-3.0.4-SNAPSHOT-bin-hadoop3.2/python/lib/pyspark.zip/pyspark/sql/readwriter.py:871: in saveAsTable
[2022-02-22T21:59:38.651Z] self._jwrite.saveAsTable(name)
[2022-02-22T21:59:38.651Z] /home/jenkins/agent/workspace/jenkins-rapids_it-3.0.x-SNAPSHOT-dev-github-304/jars/spark-3.0.4-SNAPSHOT-bin-hadoop3.2/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py:1304: in __call__
[2022-02-22T21:59:38.651Z] return_value = get_return_value(
[2022-02-22T21:59:38.651Z] /home/jenkins/agent/workspace/jenkins-rapids_it-3.0.x-SNAPSHOT-dev-github-304/jars/spark-3.0.4-SNAPSHOT-bin-hadoop3.2/python/lib/pyspark.zip/pyspark/sql/utils.py:134: in deco
[2022-02-22T21:59:38.651Z] raise_from(converted)
[2022-02-22T21:59:38.651Z] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
[2022-02-22T21:59:38.651Z]
[2022-02-22T21:59:38.651Z] e = AnalysisException("Can not create the managed table('`tmp_table_981165_0`'). The associated location('file:/home/jenki...:79)\n\tat py4j.GatewayConnection.run(GatewayConnection.java:238)\n\tat java.lang.Thread.run(Thread.java:750)\n", None)
[2022-02-22T21:59:38.651Z]
[2022-02-22T21:59:38.651Z] > ???
[2022-02-22T21:59:38.651Z] E pyspark.sql.utils.AnalysisException: Can not create the managed table('`tmp_table_981165_0`'). The associated location('file:/home/jenkins/agent/workspace/jenkins-rapids_it-3.0.x-SNAPSHOT-dev-github-304/jars/integration_tests/target/run_dir_dpp_test/spark-warehouse/tmp_table_981165_0') already exists.;
Looks like we need to improve our table name random number generation. Either two threads can end up with the same random number sequence based on how random.randint behaves, or we hit the 1-in-a-million chance that two threads ended up picking the same number. If it's the latter, maybe we need to make it 1-in-a-billion.
The text was updated successfully, but these errors were encountered:
A nightly integration test run failed with a table already exists exception:
Looks like we need to improve our table name random number generation. Either two threads can end up with the same random number sequence based on how
random.randint
behaves, or we hit the 1-in-a-million chance that two threads ended up picking the same number. If it's the latter, maybe we need to make it 1-in-a-billion.The text was updated successfully, but these errors were encountered: