[BUG] test_dpp_via_aggregate_subquery_aqe_off failed with table already exists #4840

jlowe · 2022-02-22T22:46:17Z

A nightly integration test run failed with a table already exists exception:

[2022-02-22T21:59:38.650Z] ______________ test_dpp_via_aggregate_subquery_aqe_off[2-parquet] ______________
[2022-02-22T21:59:38.650Z] 
[2022-02-22T21:59:38.650Z] store_format = 'parquet', s_index = 2
[2022-02-22T21:59:38.650Z] spark_tmp_table_factory = <conftest.TmpTableFactory object at 0x7f0151c0a9d0>
[2022-02-22T21:59:38.650Z] 
[2022-02-22T21:59:38.650Z]     @ignore_order
[2022-02-22T21:59:38.650Z]     @pytest.mark.parametrize('store_format', ['parquet', 'orc'], ids=idfn)
[2022-02-22T21:59:38.651Z]     @pytest.mark.parametrize('s_index', list(range(len(_statements))), ids=idfn)
[2022-02-22T21:59:38.651Z]     @pytest.mark.skipif(is_databricks_runtime(), reason="DPP can not cooperate with rapids plugin on Databricks runtime")
[2022-02-22T21:59:38.651Z]     def test_dpp_via_aggregate_subquery_aqe_off(store_format, s_index, spark_tmp_table_factory):
[2022-02-22T21:59:38.651Z] >       __dpp_via_aggregate_subquery(store_format, s_index, spark_tmp_table_factory, 'false')
[2022-02-22T21:59:38.651Z] 
[2022-02-22T21:59:38.651Z] ../../src/main/python/dpp_test.py:227: 
[2022-02-22T21:59:38.651Z] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
[2022-02-22T21:59:38.651Z] ../../src/main/python/dpp_test.py:212: in __dpp_via_aggregate_subquery
[2022-02-22T21:59:38.651Z]     create_fact_table(fact_table, store_format)
[2022-02-22T21:59:38.651Z] ../../src/main/python/dpp_test.py:55: in create_fact_table
[2022-02-22T21:59:38.651Z]     with_cpu_session(fn)
[2022-02-22T21:59:38.651Z] ../../src/main/python/spark_session.py:92: in with_cpu_session
[2022-02-22T21:59:38.651Z]     return with_spark_session(func, conf=copy)
[2022-02-22T21:59:38.651Z] ../../src/main/python/spark_session.py:76: in with_spark_session
[2022-02-22T21:59:38.651Z]     ret = func(_spark)
[2022-02-22T21:59:38.651Z] ../../src/main/python/dpp_test.py:51: in fn
[2022-02-22T21:59:38.651Z]     df.write.format(table_format) \
[2022-02-22T21:59:38.651Z] /home/jenkins/agent/workspace/jenkins-rapids_it-3.0.x-SNAPSHOT-dev-github-304/jars/spark-3.0.4-SNAPSHOT-bin-hadoop3.2/python/lib/pyspark.zip/pyspark/sql/readwriter.py:871: in saveAsTable
[2022-02-22T21:59:38.651Z]     self._jwrite.saveAsTable(name)
[2022-02-22T21:59:38.651Z] /home/jenkins/agent/workspace/jenkins-rapids_it-3.0.x-SNAPSHOT-dev-github-304/jars/spark-3.0.4-SNAPSHOT-bin-hadoop3.2/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py:1304: in __call__
[2022-02-22T21:59:38.651Z]     return_value = get_return_value(
[2022-02-22T21:59:38.651Z] /home/jenkins/agent/workspace/jenkins-rapids_it-3.0.x-SNAPSHOT-dev-github-304/jars/spark-3.0.4-SNAPSHOT-bin-hadoop3.2/python/lib/pyspark.zip/pyspark/sql/utils.py:134: in deco
[2022-02-22T21:59:38.651Z]     raise_from(converted)
[2022-02-22T21:59:38.651Z] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
[2022-02-22T21:59:38.651Z] 
[2022-02-22T21:59:38.651Z] e = AnalysisException("Can not create the managed table('`tmp_table_981165_0`'). The associated location('file:/home/jenki...:79)\n\tat py4j.GatewayConnection.run(GatewayConnection.java:238)\n\tat java.lang.Thread.run(Thread.java:750)\n", None)
[2022-02-22T21:59:38.651Z] 
[2022-02-22T21:59:38.651Z] >   ???
[2022-02-22T21:59:38.651Z] E   pyspark.sql.utils.AnalysisException: Can not create the managed table('`tmp_table_981165_0`'). The associated location('file:/home/jenkins/agent/workspace/jenkins-rapids_it-3.0.x-SNAPSHOT-dev-github-304/jars/integration_tests/target/run_dir_dpp_test/spark-warehouse/tmp_table_981165_0') already exists.;

Looks like we need to improve our table name random number generation. Either two threads can end up with the same random number sequence based on how random.randint behaves, or we hit the 1-in-a-million chance that two threads ended up picking the same number. If it's the latter, maybe we need to make it 1-in-a-billion.

The text was updated successfully, but these errors were encountered:

revans2 · 2022-02-23T14:04:24Z

The concurrency is not per thread, it is per process. So we could use the PID again, like we did with the tmp directory.

jlowe added bug Something isn't working ? - Needs Triage Need team to review and classify labels Feb 22, 2022

jlowe self-assigned this Feb 23, 2022

jlowe added test Only impacts tests and removed ? - Needs Triage Need team to review and classify labels Feb 23, 2022

jlowe mentioned this issue Feb 23, 2022

Add worker ID to temporary table names in tests #4851

Merged

jlowe closed this as completed in #4851 Feb 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] test_dpp_via_aggregate_subquery_aqe_off failed with table already exists #4840

[BUG] test_dpp_via_aggregate_subquery_aqe_off failed with table already exists #4840

jlowe commented Feb 22, 2022

revans2 commented Feb 23, 2022

[BUG] test_dpp_via_aggregate_subquery_aqe_off failed with table already exists #4840

[BUG] test_dpp_via_aggregate_subquery_aqe_off failed with table already exists #4840

Comments

jlowe commented Feb 22, 2022

revans2 commented Feb 23, 2022