Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test_df_pivot tests failed at the system_prerelease session #341

Open
chelsea-lin opened this issue Jan 23, 2024 · 0 comments
Open

test_df_pivot tests failed at the system_prerelease session #341

chelsea-lin opened this issue Jan 23, 2024 · 0 comments
Labels
api: bigquery Issues related to the googleapis/python-bigquery-dataframes API.

Comments

@chelsea-lin
Copy link
Contributor

After fixing failed tests mentioned #337, still have other tests failed:

FAILED tests/system/small/test_dataframe.py::test_dataframe_bool_aggregates[all_axis0] - AssertionError: Series.index are different
FAILED tests/system/small/test_dataframe.py::test_dataframe_bool_aggregates[any_axis0] - AssertionError: Series.index are different
FAILED tests/system/small/test_dataframe.py::test_df_pivot[values2-int64_too-columns2] - AssertionError: DataFrame.iloc[:, 0] (column name="('int64_col', <NA>)") are different
FAILED tests/system/small/test_groupby.py::test_dataframe_groupby_analytic[cumprod] - AssertionError: DataFrame.iloc[:, 1] (column name="int64_col") are different
FAILED tests/system/small/test_dataframe.py::test_to_pandas_downsampling_option_override - assert 1.3427486419677734 == 1 ± 3.0e-01
FAILED tests/system/small/test_series.py::test_series_add_prefix - AssertionError: Series.index are different
FAILED tests/system/small/test_series.py::test_series_add_suffix - AssertionError: Series.index are different
FAILED tests/system/small/test_series.py::test_groupby_window_ops[cumprod] - AssertionError: Series are different
FAILED tests/system/small/test_series.py::test_string_astype_int - AssertionError: Series.index are different

@tswast mentioned, the distinction of RangeIndex vs Int64Index issue could be unblock by setting check_index_type=False: https://pandas.pydata.org/docs/reference/api/pandas.testing.assert_series_equal.html#pandas.testing.assert_series_equal

But the iloc errors may be real issues. The callstack is shown as below:

=================================== FAILURES ===================================
__________________ test_df_pivot[values2-int64_too-columns2] ___________________

scalars_dfs = (          bool_col                                          bytes_col  \
rowindex                                    ......  2038-01-19 03:14:17.999999+00:00
8            False  ...                              <NA>

[9 rows x 13 columns])
values = ['int64_col', 'float64_col'], index = 'int64_too'
columns = ['string_col']

    @pytest.mark.parametrize(
        ("values", "index", "columns"),
        [
            ("int64_col", "int64_too", ["string_col"]),
            (["int64_col"], "int64_too", ["string_col"]),
            (["int64_col", "float64_col"], "int64_too", ["string_col"]),
        ],
    )
    def test_df_pivot(scalars_dfs, values, index, columns):
        scalars_df, scalars_pandas_df = scalars_dfs
    
        bf_result = scalars_df.pivot(
            values=values, index=index, columns=columns
        ).to_pandas()
        pd_result = scalars_pandas_df.pivot(values=values, index=index, columns=columns)
    
        # Pandas produces NaN, where bq dataframes produces pd.NA
>       pd.testing.assert_frame_equal(bf_result, pd_result, check_dtype=False)

tests/system/small/test_dataframe.py:2294: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

left = array([nan, nan, nan, nan])
right = array([nan, nan, <NA>, nan], dtype=object), err_msg = None

    def _raise(left, right, err_msg) -> NoReturn:
        if err_msg is None:
            if left.shape != right.shape:
                raise_assert_detail(
                    obj, f"{obj} shapes are different", left.shape, right.shape
                )
    
            diff = 0
            for left_arr, right_arr in zip(left, right):
                # count up differences
                if not array_equivalent(left_arr, right_arr, strict_nan=strict_nan):
                    diff += 1
    
            diff = diff * 100.0 / left.size
            msg = f"{obj} values are different ({np.round(diff, 5)} %)"
>           raise_assert_detail(obj, msg, left, right, index_values=index_values)
E           AssertionError: DataFrame.iloc[:, 0] (column name="('int64_col', <NA>)") are different
E           
E           DataFrame.iloc[:, 0] (column name="('int64_col', <NA>)") values are different (25.0 %)
E           [index]: [-2345, 0, 1, 2]
E           [left]:  [nan, nan, nan, nan]
E           [right]: [nan, nan, <NA>, nan]

.nox/system_prerelease/lib/python3.11/site-packages/pandas/_testing/asserters.py:684: AssertionError
@product-auto-label product-auto-label bot added the api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. label Jan 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery-dataframes API.
Projects
None yet
Development

No branches or pull requests

1 participant