Import the right pandas from conda #4419

NvTimLiu · 2021-12-22T09:04:21Z

There is a pandas package in the spark v3.2.0 or later binary dir: python/pyspark/pandas,
the udf-cudf test includes this dir via PYTHONPATH, which causes test failure as below:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/jars/spark-3.2.0-bin-hadoop3.2/python/pyspark/pandas/__init__.py", line 31, in <module>
    require_minimum_pandas_version()
  File "/jars/spark-3.2.0-bin-hadoop3.2/python/pyspark/sql/pandas/utils.py", line 35, in require_minimum_pandas_version
    if LooseVersion(pandas.__version__) < LooseVersion(minimum_pandas_version):
AttributeError: partially initialized module 'pandas' has no attribute '__version__' (most likely due to a circular import)

To fix, put conda package path ahead of the env 'PYTHONPATH', to import the right pandas
from conda instead of spark3.2.0 or later binary path.

Signed-off-by: Tim Liu timl@nvidia.com

There is a pandas package in the spark v3.2.0 or later binary dir: python/pyspark/pandas, the udf-cudf test includes this dir via PYTHONPATH, which causes test failure as below: 'AttributeError: partially initialized module 'pandas' has no attribute __version__' To fix, put conda package path ahead of the env 'PYTHONPATH', to import the right pandas from conda instead of spark3.2.0 or later binary path. Signed-off-by: Tim Liu <timl@nvidia.com>

NvTimLiu · 2021-12-22T09:04:39Z

build

NvTimLiu · 2021-12-22T09:32:57Z

To fix #4378

jenkins/spark-tests.sh

Signed-off-by: Tim Liu <timl@nvidia.com>

jlowe · 2021-12-22T16:03:58Z

build

GaryShen2008 · 2021-12-23T04:55:14Z

build

Based on PR: NVIDIA#4419 Expose PYTHONPATH environment, as pytest checks all the dependencies during tests collecting stage, including pandas, even though we are not running cudf-udf tests. Signed-off-by: Tim Liu <timl@nvidia.com>

Based on PR: #4419 Expose PYTHONPATH environment, as pytest checks all the dependencies during tests collecting stage, including pandas, even though we are not running cudf-udf tests. Signed-off-by: Tim Liu <timl@nvidia.com>

NvTimLiu added the build Related to CI / CD or cleanly building label Dec 22, 2021

NvTimLiu requested review from jlowe, abellina, revans2, tgravescs, pxLi and GaryShen2008 December 22, 2021 09:04

NvTimLiu self-assigned this Dec 22, 2021

GaryShen2008 requested a review from zhanga5 December 22, 2021 10:14

jlowe reviewed Dec 22, 2021

View reviewed changes

jenkins/spark-tests.sh Outdated Show resolved Hide resolved

conda version of pandas works across all Spark versions

3afbab2

Signed-off-by: Tim Liu <timl@nvidia.com>

jlowe approved these changes Dec 22, 2021

View reviewed changes

GaryShen2008 merged commit 0debb11 into NVIDIA:branch-22.02 Dec 23, 2021

NvTimLiu linked an issue Dec 23, 2021 that may be closed by this pull request

[BUG] udf_test udf_cudf_test failed require_minimum_pandas_version check in spark 320+ #4378

Closed

NvTimLiu mentioned this pull request Dec 23, 2021

[BUG] udf_test udf_cudf_test failed require_minimum_pandas_version check in spark 320+ #4378

Closed

NvTimLiu mentioned this pull request Dec 28, 2021

Import the right pandas from conda [skip ci] #4433

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Import the right pandas from conda #4419

Import the right pandas from conda #4419

NvTimLiu commented Dec 22, 2021

NvTimLiu commented Dec 22, 2021

NvTimLiu commented Dec 22, 2021

jlowe commented Dec 22, 2021

GaryShen2008 commented Dec 23, 2021

Import the right pandas from conda #4419

Import the right pandas from conda #4419

Conversation

NvTimLiu commented Dec 22, 2021

NvTimLiu commented Dec 22, 2021

NvTimLiu commented Dec 22, 2021

jlowe commented Dec 22, 2021

GaryShen2008 commented Dec 23, 2021