Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test_roundtrip failing builds #237

Closed
abkfenris opened this issue Oct 12, 2023 · 4 comments · Fixed by #238
Closed

test_roundtrip failing builds #237

abkfenris opened this issue Oct 12, 2023 · 4 comments · Fixed by #238

Comments

@abkfenris
Copy link
Member

def test_roundtrip(start, end, freq, nlats, nlons, var_const, calendar, use_cftime):
ds = create_dataset(
start=start,
end=end,
freq=freq,
nlats=nlats,
nlons=nlons,
var_const=var_const,
use_cftime=use_cftime,
calendar=calendar,
)
ds = ds.chunk(ds.dims)
mapper = TestMapper(SingleDatasetRest(ds).app)
actual = xr.open_zarr(mapper, consolidated=True)
xr.testing.assert_identical(actual, ds)

Is causing builds with unrelated changes to fail recently.

It looks like the issue is due to the time encoding.

FAILED tests/test_zarr_compat.py::test_roundtrip[2018-01-01-2021-01-01-6H-180-360-True-gregorian-False] - AssertionError: Left and right Dataset objects are not identical

Differing coordinates:
L * time         (time) datetime64[ns] 2018-01-01 2018-01-01 ... 2020-12-31
    bounds: time_bounds
R * time         (time) datetime64[ns] 2018-01-01 ... 2020-12-31T12:00:00
    bounds: time_bounds
Differing data variables:
L   tmin         (time, lat, lon) float32 dask.array<chunksize=(4384, 180, 360), meta=np.ndarray>
R   tmin         (time, lat, lon) float32 dask.array<chunksize=(4384, 180, 360), meta=np.ndarray>
L   time_bounds  (time, d2) datetime64[ns] dask.array<chunksize=(4384, 2), meta=np.ndarray>
R   time_bounds  (time, d2) datetime64[ns] dask.array<chunksize=(4384, 2), meta=np.ndarray>
L   tmax         (time, lat, lon) float32 dask.array<chunksize=(4384, 180, 360), meta=np.ndarray>
R   tmax         (time, lat, lon) float32 dask.array<chunksize=(4384, 180, 360), meta=np.ndarray>

A warning is getting thrown earlier

  /home/runner/work/xpublish/xpublish/xpublish/utils/zarr.py:144: UserWarning: Times can't be serialized faithfully to int64 with requested units 'days since 1980-01-01'. Resolution of 'hours' needed. Serializing times to floating point instead. Set encoding['dtype'] to integer dtype to serialize to int64. Set encoding['dtype'] to floating point dtype to silence this warning.
    encoded_da = encode_zarr_variable(dvar, name=key)

Which is from here:

def create_zmetadata(dataset: xr.Dataset) -> dict:
"""Helper function to create a consolidated zmetadata dictionary."""
zmeta = {
'zarr_consolidated_format': ZARR_CONSOLIDATED_FORMAT,
'metadata': {},
}
zmeta['metadata'][group_meta_key] = {'zarr_format': ZARR_FORMAT}
zmeta['metadata'][attrs_key] = _extract_dataset_zattrs(dataset)
for key, dvar in dataset.variables.items():
da = dataset[key]
encoded_da = encode_zarr_variable(dvar, name=key)
encoding = extract_zarr_variable_encoding(dvar)
zattrs = _extract_dataarray_zattrs(encoded_da)
zattrs = _extract_dataarray_coords(da, zattrs)
zmeta['metadata'][f'{key}/{attrs_key}'] = zattrs
zmeta['metadata'][f'{key}/{array_meta_key}'] = _extract_zarray(
encoded_da,
encoding,
encoded_da.dtype,
)
return zmeta

But is really coming from the Xarray Zarr backend.

https://github.com/pydata/xarray/blob/25e6e084aa18c49d92934db298a3efff9c712766/xarray/backends/zarr.py#L288-L318

@abkfenris
Copy link
Member Author

Nothing has changed in encode_zarr_variable in the last few years, so it may be deeper.

image

@abkfenris
Copy link
Member Author

Digging deeper. conventions.encode_cf_variable calls times.CFDatetimeCoder() which then calls encode_cf_datetime() which has had some recent changes.

https://github.com/pydata/xarray/blob/25e6e084aa18c49d92934db298a3efff9c712766/xarray/conventions.py#L154-L195

https://github.com/pydata/xarray/blob/25e6e084aa18c49d92934db298a3efff9c712766/xarray/coding/times.py#L803-L823

https://github.com/pydata/xarray/blob/25e6e084aa18c49d92934db298a3efff9c712766/xarray/coding/times.py#L670-L747

pydata/xarray#8201 landed in Xarray v2023.09.0 which then caused a new bug that looks similar pydata/xarray#8271 which should be fixed by pydata/xarray#8272

If I anti-pin Xarray to resolve to another version (xarray!=v2023.09.0), tests at least seem to pass locally.

@abkfenris
Copy link
Member Author

abkfenris commented Oct 12, 2023

If I test locally against upstream (git+https://github.com/pydata/xarray.git in dev-requirements.txt) that also works as it looks like pydata/xarray#8272 does the trick.

For now I'm going to anti-pin Xarray in requirements.

abkfenris added a commit to abkfenris/xpublish that referenced this issue Oct 12, 2023
v2023.09.0 of Xarray caused our Zarr roundtrip tests to fail as times no longer matched.

For full details see xpublish-community#237

The fix looks to have landed in pydata/xarray#8272 and tests against the latest commit works, so we'll go with setting an pin to skip just v2023.09.0.

Closes xpublish-community#237
@abkfenris
Copy link
Member Author

Which explains why the dev builds worked.

image

abkfenris added a commit that referenced this issue Oct 14, 2023
* Anti-pin the latest version of Xarray to

v2023.09.0 of Xarray caused our Zarr roundtrip tests to fail as times no longer matched.

For full details see #237

The fix looks to have landed in pydata/xarray#8272 and tests against the latest commit works, so we'll go with setting an pin to skip just v2023.09.0.

Closes #237

* Dev requirements needs to mirror requirements due to --no-deps
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant