Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use cftime for temporal averaging operations #302

Merged
merged 5 commits into from
Aug 10, 2022

Conversation

pochedls
Copy link
Collaborator

@pochedls pochedls commented Aug 4, 2022

Description

This is an attempted fix for #301. The PR uses cftime.datetime objects in the _convert_df_to_dt function (instead of datetime / cfdatetime objects depending on the dataset attributes).

Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • My changes generate no new warnings
  • Any dependent changes have been merged and published in downstream modules

If applicable:

  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass with my changes (locally and CI/CD build)
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have noted that this is a breaking change for a major release (fix or feature that would cause existing functionality to not work as expected)

@codecov-commenter
Copy link

codecov-commenter commented Aug 4, 2022

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 100.00%. Comparing base (1d242e5) to head (e7a928f).
Report is 121 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff            @@
##              main      #302   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           14        14           
  Lines         1169      1170    +1     
=========================================
+ Hits          1169      1170    +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@tomvothecoder tomvothecoder added type: bug Inconsistencies or issues which will cause an issue or problem for users or implementors. Priority: Critical labels Aug 4, 2022
@tomvothecoder
Copy link
Collaborator

Thanks for this PR. I already started taking a look, but will be able to review more closely next week.

We should have this merged before #300.

@tomvothecoder tomvothecoder force-pushed the bugfix/301-OutOfBoundsTemporal branch 2 times, most recently from 0581d14 to d3e1dd4 Compare August 8, 2022 17:44
Copy link
Collaborator

@tomvothecoder tomvothecoder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @pochedls, I made some updates for review.

The diff (d3e1dd4) compares your commit to mine. Let me know how it looks.

Comment on lines +44 to 55
# NOTE: With `decode_times=True`, the "calendar" and "units" attributes are
# stored in `.encoding`.
time_cf.encoding["calendar"] = "standard"
time_cf.encoding["units"] = "days since 2000-01-01"


# NOTE: With `decode_times=False`, the "calendar" and "units" attributes are
# stored in `.attrs`.
time_non_cf = xr.DataArray(
data=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14],
dims=["time"],
attrs={
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just bringing this to attention.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this makes sense. This is a separate issue when I open a dataset with xarray.open_mfdataset and decode_times=True, I don't see anything populated in ds.time.encoding (but I do see calendar and units information stored here when using xarray.open_dataset).

Copy link
Collaborator

@tomvothecoder tomvothecoder Aug 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch! I found an open xarray issue related to xr.open_mfdataset() dropping encoding attributes.

pydata/xarray#2436
pydata/xarray#2436 (comment)

Somebody mentions this about the issue here roocs/clisops#88 (comment)

However, there is another problem related to xarray.open_mfdataset:
The encoding dictionary gets lost somewhere during the merging operation of the datasets of the respective files (pydata/xarray#2436).

This leads to problems for example with cf-xarray when trying to detect coordinates or bounds, but also leads to problems related to the time axis encoding apparently (as seen in the linked issue). I managed at least to avoid the problems for cf-xarray bounds and coordinates detection by using the decode functionality of xarray only after the datasets have been read in (leaving however the unnecessary time dimension in place ...):

ds = xarray.open_mfdataset("/path/to/files/*.nc")
ds = xarray.decode_cf(ds, decode_coords="all")


I think we might need to use the preprocess function to save the .encoding dict, or some other workaround from one of the comments.

Copy link
Collaborator

@tomvothecoder tomvothecoder Aug 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can open up a separate issue for this. I'm glad you're finding all of these nonobvious behaviors/bugs in xarray and xCDAT.

#308

cftime.datetime(2001, 2, 1),
cftime.datetime(2001, 3, 1),
cftime.DatetimeGregorian(
2000, 1, 1, 0, 0, 0, 0, has_year_zero=False
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of the cftime.datetime objects in our tests are now updated to cftime.<OBJECT_TYPE>.

tests/test_temporal.py Outdated Show resolved Hide resolved
xcdat/dataset.py Outdated
Comment on lines 324 to 333
time_attrs = time.attrs

# NOTE: When opening datasets with `decode_times=False`, the "calendar" and
# "units" attributes are stored in `.encoding` (unlike `decode_times=True`
# which stores them in `.attrs`). Since xCDAT manually decodes non-CF
# compliant time coordinates by first setting `decode_times=False`, the
# "calendar" and "units" attrs are popped from the `.attrs` dict and stored
# in the `.encoding` dict to mimic xarray's behavior.
calendar = time_attrs.pop("calendar", None)
units_attr = time_attrs.pop("units", None)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In decode_non_cf_time(), the "calendar" and "units" attributes were being saved in both the .attrs and .encoding dicts.

However, they should only be in the .encoding dict after decoding time coordinates. To achieve this behavior, we pop these attrs from the .attrs dict and later add them to .encoding.

Comment on lines +153 to +162
try:
self.calendar = self._dataset[self._dim].encoding["calendar"]
self.date_type = get_date_type(self.calendar)
except KeyError:
raise KeyError(
"This dataset's time coordinates do not have a 'calendar' encoding "
"attribute set, which might indicate that the time coordinates were not "
"decoded to datetime objects. Ensure that the time coordinates are "
"decoded before performing temporal averaging operations."
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We now consider the "calendar" attribute and the associated cftime date type object for temporal averaging operations.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should "were not decoded to datetime objects" be "were not decoded to cftime.datetime objects"? Or are they sufficiently interchangeable?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Temporal averaging accepts both datetime.datetime and cftime.datetime objects, but the end result will always be cftime.datetime.

I think it is sufficiently interchangeable to reference both as just datetime.

xcdat/temporal.py Show resolved Hide resolved
xcdat/dataset.py Outdated Show resolved Hide resolved
@pochedls
Copy link
Collaborator Author

pochedls commented Aug 9, 2022

@tomvothecoder - I had a couple minor questions, but this looks good to me.

pochedls and others added 3 commits August 10, 2022 13:49
- Update format of `decode_non_cf_time()` conditional statement
- Update `time_cf` with encoding dict attributes (xarray behavior)
- Update `test_dataset.py` and `test_temporal.py` with specific `cftime` date_type objects
- Add `xesmf` to `dev.yml`
- Remove `Jupyter` setting from VS Code workspace
Copy link
Collaborator

@tomvothecoder tomvothecoder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR should be ready to merge if you are satisfied with it.

I will continue work with #308 and #285.

@pochedls pochedls merged commit 4c51879 into main Aug 10, 2022
@pochedls
Copy link
Collaborator Author

This PR should be ready to merge if you are satisfied with it.

I will continue work with #308 and #285.

Thanks for the help and getting this done so quickly.

@pochedls pochedls deleted the bugfix/301-OutOfBoundsTemporal branch August 10, 2022 21:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug Inconsistencies or issues which will cause an issue or problem for users or implementors.
Projects
No open projects
Status: Done
Development

Successfully merging this pull request may close these issues.

[Bug]: Out of bounds timestamp for temporal averaging
3 participants