Skip to content

Commit

Permalink
[SPARK-46874][PYTHON] Remove pyspark.pandas dependency from `assert…
Browse files Browse the repository at this point in the history
…DataFrameEqual`

### What changes were proposed in this pull request?

This PR proposes to remove `pyspark.pandas` dependency from `assertDataFrameEqual`

### Why are the changes needed?

To allow `assertDataFrameEqual` when pandas is not installed.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

CI should pass, and manually testing

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#44899 from itholic/remove_deps_from_assertDataFrameEqual.

Authored-by: Haejoon Lee <haejoon.lee@databricks.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
  • Loading branch information
itholic authored and dongjoon-hyun committed Jan 29, 2024
1 parent f078998 commit bb21955
Showing 1 changed file with 12 additions and 3 deletions.
15 changes: 12 additions & 3 deletions python/pyspark/testing/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -758,16 +758,25 @@ def assertDataFrameEqual(
has_pandas = False
try:
# If pandas dependencies are available, allow pandas or pandas-on-Spark DataFrame
import pyspark.pandas as ps
import pandas as pd
from pyspark.testing.pandasutils import PandasOnSparkTestUtils

has_pandas = True
except ImportError:
# no pandas, so we won't call pandasutils functions
pass

if has_pandas:
has_arrow = False
try:
import pyarrow

has_arrow = True
except ImportError:
pass

if has_pandas and has_arrow:
import pyspark.pandas as ps
from pyspark.testing.pandasutils import PandasOnSparkTestUtils

if (
isinstance(actual, pd.DataFrame)
or isinstance(expected, pd.DataFrame)
Expand Down

0 comments on commit bb21955

Please sign in to comment.