Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Import the right pandas from conda #4419

Merged
merged 2 commits into from
Dec 23, 2021

Conversation

NvTimLiu
Copy link
Collaborator

There is a pandas package in the spark v3.2.0 or later binary dir: python/pyspark/pandas,
the udf-cudf test includes this dir via PYTHONPATH, which causes test failure as below:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/jars/spark-3.2.0-bin-hadoop3.2/python/pyspark/pandas/__init__.py", line 31, in <module>
    require_minimum_pandas_version()
  File "/jars/spark-3.2.0-bin-hadoop3.2/python/pyspark/sql/pandas/utils.py", line 35, in require_minimum_pandas_version
    if LooseVersion(pandas.__version__) < LooseVersion(minimum_pandas_version):
AttributeError: partially initialized module 'pandas' has no attribute '__version__' (most likely due to a circular import)

To fix, put conda package path ahead of the env 'PYTHONPATH', to import the right pandas
from conda instead of spark3.2.0 or later binary path.

Signed-off-by: Tim Liu timl@nvidia.com

There is a pandas package in the spark v3.2.0 or later binary dir: python/pyspark/pandas,
    the udf-cudf test includes this dir via PYTHONPATH, which causes test failure as below:
    'AttributeError: partially initialized module 'pandas' has no attribute __version__'

To fix, put conda package path ahead of the env 'PYTHONPATH', to import the right pandas
    from conda instead of spark3.2.0 or later binary path.

Signed-off-by: Tim Liu <timl@nvidia.com>
@NvTimLiu NvTimLiu added the build Related to CI / CD or cleanly building label Dec 22, 2021
@NvTimLiu NvTimLiu self-assigned this Dec 22, 2021
@NvTimLiu
Copy link
Collaborator Author

build

@NvTimLiu
Copy link
Collaborator Author

To fix #4378

@GaryShen2008 GaryShen2008 requested a review from zhanga5 December 22, 2021 10:14
jenkins/spark-tests.sh Outdated Show resolved Hide resolved
Signed-off-by: Tim Liu <timl@nvidia.com>
@jlowe
Copy link
Member

jlowe commented Dec 22, 2021

build

1 similar comment
@GaryShen2008
Copy link
Collaborator

build

@GaryShen2008 GaryShen2008 merged commit 0debb11 into NVIDIA:branch-22.02 Dec 23, 2021
NvTimLiu added a commit to NvTimLiu/spark-rapids that referenced this pull request Dec 28, 2021
Based on PR: NVIDIA#4419

Expose PYTHONPATH environment, as pytest checks all the dependencies during tests collecting stage, including pandas, even though we are not running cudf-udf tests.

Signed-off-by: Tim Liu <timl@nvidia.com>
GaryShen2008 pushed a commit that referenced this pull request Dec 30, 2021
Based on PR: #4419

Expose PYTHONPATH environment, as pytest checks all the dependencies during tests collecting stage, including pandas, even though we are not running cudf-udf tests.

Signed-off-by: Tim Liu <timl@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build Related to CI / CD or cleanly building
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] udf_test udf_cudf_test failed require_minimum_pandas_version check in spark 320+
3 participants