Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Address latest pandas-related upstream test failures #9081

Merged
merged 8 commits into from
Jun 10, 2024

Conversation

spencerkclark
Copy link
Member

@spencerkclark spencerkclark commented Jun 9, 2024

This PR addresses the upstream failures described in #8844 (comment) with a few minor changes to ensure that, for the time being, nanosecond precision times continue to be used in xarray. These failures stem from pandas-dev/pandas#55901, which causes pandas.to_datetime to infer the precision to use based its input instead of always using nanosecond precision.

See the review comments for an explanation of the changes.

@spencerkclark spencerkclark added the run-upstream Run upstream CI label Jun 9, 2024
@spencerkclark spencerkclark force-pushed the upstream-failures-2024-06-09 branch 2 times, most recently from ed23adc to ad164b7 Compare June 9, 2024 17:30
@spencerkclark spencerkclark force-pushed the upstream-failures-2024-06-09 branch from ad164b7 to bd875d3 Compare June 9, 2024 17:39
Comment on lines +2948 to +2949
(datetime(2000, 1, 1), has_pandas_3),
(np.array([datetime(2000, 1, 1)]), has_pandas_3),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With pandas 3, pd.Series(datetime.datetime(...)) will produce a Series with np.datetime64[us] values instead of np.datetime64[ns] values, so this conversion now warns.

dd = times.to_pydatetime()
reference_dates = [dd[0], dd[2]]
reference_dates = [times[0], times[2]]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can tell, whether reference_dates started as datetime.datetime objects or np.datetime64[ns] values was not material to this test, so I removed the conversion to datetime.datetime to avoid the conversion warning under pandas 3 (the times would previously get converted back to datetime64[ns] values in the DataArray constructor).

roundtripped = DataArray.from_dict(da.to_dict())
with warnings.catch_warnings():
warnings.filterwarnings("ignore", message="Converting non-nanosecond")
roundtripped = DataArray.from_dict(da.to_dict())
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

da.to_dict() produces datetime.datetime objects, which under pandas 3 lead to a conversion warning in the DataArray constructor.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we have this pattern in multiple modules, it might be worth adding the code as a special context manager to xarray.tests.__init__. Something like this might work (I didn't check):

from contextlib import contextmanager
import warnings

@contextmanager
def ignore_warnings(category=None, pattern=None):
    if category is None and pattern is None:
        raise ValueError("need at least one of category and pattern")

    try:
        with warnings.catch_warnings():
            warnings.filterwarnings("ignore", message=pattern, category=category)
            yield
    finally:
        pass

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks—I ended up switching to marking these tests with @pytest.mark.filterwarnings("ignore:Converting non-nanosecond"), since that is a pattern we use elsewhere in the tests already.

darray = DataArray(data, dims=["time"])
darray.coords["time"] = np.array([datetime(2017, m, 1) for m in month])
times = pd.date_range(start="2017-01-01", freq="ME", periods=12)
darray = DataArray(data, dims=["time"], coords=[times])
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use of datetime.datetime objects was immaterial to this test, so we use pd.date_range to produce the dates instead to avoid the non-nanosecond conversion warning.

Comment on lines 260 to 262
with warnings.catch_warnings():
warnings.filterwarnings("ignore", message="Converting non-nanosecond")
expected = self.cls("t", dates)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is needed since dates sometimes consists of datetime.datetime objects, which leads to a conversion warning under pandas 3.

@@ -529,7 +529,7 @@ def test_roundtrip_string_encoded_characters(self) -> None:
assert actual["x"].encoding["_Encoding"] == "ascii"

def test_roundtrip_numpy_datetime_data(self) -> None:
times = pd.to_datetime(["2000-01-01", "2000-01-02", "NaT"])
times = pd.to_datetime(["2000-01-01", "2000-01-02", "NaT"], unit="ns")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pandas.to_datetime will infer the precision from the input in pandas 3, so we explicitly specify the desired precision now.

xarray/tests/test_combine.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@keewis keewis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the quick fixes! This looks good to me.

I know I have not been doing that either for the numpy>=2 changes, but I wonder if we should add a whats-new entry (internal changes)?

@dcherian
Copy link
Contributor

Thanks @spencerkclark

@dcherian dcherian merged commit ef709df into pydata:main Jun 10, 2024
26 of 28 checks passed
@spencerkclark spencerkclark deleted the upstream-failures-2024-06-09 branch June 11, 2024 01:10
andersy005 pushed a commit that referenced this pull request Jun 14, 2024
* Address pandas-related upstream test failures

* Address more warnings

* Don't lose coverage for pandas < 3

* Address one more warning

* Fix accidental change from MS to ME

* Use datetime64[ns] arrays

* Switch to @pytest.mark.filterwarnings
keewis added a commit to keewis/xarray that referenced this pull request Jun 19, 2024
dcherian added a commit that referenced this pull request Jul 11, 2024
* don't remove `netcdf4` from the upstream-dev environment

* also stop removing `h5py` and `hdf5`

* hard-code the precision (I believe this was missed in #9081)

* don't remove `h5py` either

* use on-diks _FillValue as standrd expects, use view instead of cast to prevent OverflowError.

* whats-new

* unpin `numpy`

* rework UnsignedCoder

* add test

* Update xarray/coding/variables.py

Co-authored-by: Justus Magin <keewis@users.noreply.github.com>

---------

Co-authored-by: Kai Mühlbauer <kai.muehlbauer@uni-bonn.de>
Co-authored-by: Kai Mühlbauer <kmuehlbauer@wradlib.org>
Co-authored-by: Deepak Cherian <dcherian@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
run-upstream Run upstream CI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants