Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: TypeError: unsupported operand type(s) for +: 'cftime._cftime.DatetimeNoLeap' and 'NoneType' on Linux #328

Closed
tomvothecoder opened this issue Apr 24, 2024 · 13 comments

Comments

@tomvothecoder
Copy link

Hello, an xCDAT user found a bug with cftime on Linux. The time coordinates of the dataset contain nan values. When the user tries to convert the time coordinates to cftime, it breaks.

We verified the MVCE below works on Mac (M2) and not on Linux. Any ideas on why this happening? Thanks!

Minimal Complete Verifiable Example (MVCE)

Example dataset: subset.nc.zip

# imports
import xarray as xr
import cftime

# I/0
fn = 'subset.nc'
ds = xr.open_dataset(fn, decode_times=False)
cftime.num2date(ds.time.values, ds.time.units, 'standard', only_use_cftime_datetimes=True)

The ds file looks like:

<xarray.Dataset>
Dimensions:     (trajectory: 92678, obs: 920)
Coordinates:
  * obs         (obs) int32 0 1 2 3 4 5 6 7 ... 912 913 914 915 916 917 918 919
    time        (trajectory, obs) float64 3.478e+08 3.477e+08 ... nan nan
  * trajectory  (trajectory) int64 146163 146164 146165 ... 59605 60362 65431
Data variables:
    S           (trajectory, obs) float32 ...
    T           (trajectory, obs) float32 ...
    age         (trajectory, obs) float32 ...
    lat         (trajectory, obs) float32 ...
    lon         (trajectory, obs) float32 ...
    transport   (trajectory) float32 ...
    z           (trajectory, obs) float32 ...
Attributes:
    Conventions:            CF-1.6/CF-1.7
    feature_type:           trajectory
    ncei_template_version:  NCEI_NetCDF_Trajectory_Template_v2.0
    parcels_kernels:        SampleParticleAdvectionRK4_3DSample_T_SCheckPreBo...
    parcels_mesh:           spherical
    parcels_version:        3.0.2

and the time axis:

(trajectory, obs)
float64
3.478e+08 3.477e+08 ... nan nan
axis : time
calendar : noleap
long_name : time
standard_name :time
units : seconds since 0488-02-01 00:00:00

Relevant log output

---------------------------------------------------------------------------

File src/cftime/_cftime.pyx:630, in cftime._cftime.num2date()

File src/cftime/_cftime.pyx:499, in cftime._cftime.decode_dates_from_array()

TypeError: unsupported operand type(s) for +: 'cftime._cftime.DatetimeGregorian' and 'NoneType'

Environment

xarray.__version__
'2024.3.0' 

cftime.__version__
'1.6.3'
@jswhit
Copy link
Collaborator

jswhit commented Apr 28, 2024

what is the expected behavior here? PR #330 converts the input to a masked array if a nan or inf is present, so that a masked array of dates is returned.

@tomvothecoder
Copy link
Author

@jswhit Thanks for the PR! I'm not sure what the expected behavior is besides a successful output with an array of cftime objects. I would need to re-run this code on a non-Linux machine to see what the output looks like.

@jswhit
Copy link
Collaborator

jswhit commented Apr 30, 2024

@jswhit Thanks for the PR! I'm not sure what the expected behavior is besides a successful output with an array of cftime objects. I would need to re-run this code on a non-Linux machine to see what the output looks like.

Yes, that would be interesting. I don't see how this code would work on any platform without the fix in PR #330.

@tomvothecoder
Copy link
Author

tomvothecoder commented Apr 30, 2024

Minimum example (no xarray)

import cftime
import numpy as np

time_encoded = np.array(
    [3.477600e08, 3.476736e08, 3.475872e08, 2.685312e08, np.nan, np.nan]
)
time_units = "seconds since 0488-02-01 00:00:00"

time_decoded = cftime.num2date(
    time_encoded, time_units, "standard", only_use_cftime_datetimes=True
)
print(time_decoded)

Output (on MacOS)

Same code breaks on Linux with the bug described in the title.

<ipython-input-23-983092b57b88>:6: RuntimeWarning: invalid value encountered in cast
  time_decoded = cftime.num2date(

[cftime.DatetimeGregorian(499, 2, 8, 0, 0, 0, 0, has_year_zero=False)
 cftime.DatetimeGregorian(499, 2, 7, 0, 0, 0, 0, has_year_zero=False)
 cftime.DatetimeGregorian(499, 2, 6, 0, 0, 0, 0, has_year_zero=False)
 cftime.DatetimeGregorian(496, 8, 5, 0, 0, 0, 0, has_year_zero=False)
 cftime.DatetimeGregorian(488, 2, 1, 0, 0, 0, 0, has_year_zero=False)
 cftime.DatetimeGregorian(488, 2, 1, 0, 0, 0, 0, has_year_zero=False)]

The last two time coordinates are np.nan. After decoding, those values are replaced by the units attribute ("seconds since 0488-02-01 00:00:00") and become cftime.DatetimeGregorian(488, 2, 1, 0, 0, 0, 0, has_year_zero=False).

Also notice RuntimeWarning: invalid value encountered in cast.

Expected Behavior

Should they still be np.nan?

@jswhit
Copy link
Collaborator

jswhit commented Apr 30, 2024

I suspect that the problem is that the operations with NaNs can give compiler-dependent results (hence the difference with gcc (linux) and clang (macosx)). I think the way to avoid this is to either raise an exception if nans or infs are found, or treat them as missing values (as PR #330 does).

@tomvothecoder
Copy link
Author

I think PR #330 sounds good to me. Any thoughts @pochedls?

@tomvothecoder
Copy link
Author

@jswhit Actually, would it make sense to maintain the nan and/or inf values instead of raising an exception or using a masked array like in PR #330? That way the data structure doesn't change from np.array to np.masked_array, which might get interpreted differently when using something like Xarray (which represents missing values with np.nan).

@jswhit
Copy link
Collaborator

jswhit commented May 1, 2024

@tomvothecoder num2date returns cftime.datetime instances, so I don't think returning nan makes much sense in that context.

@jswhit
Copy link
Collaborator

jswhit commented May 1, 2024

if you do want to convert the missing values back to nans, you can just do if np.ma.isMA(dates): dates.filled(fill_value=np.nan)

@tomvothecoder
Copy link
Author

@tomvothecoder num2date returns cftime.datetime instances, so I don't think returning nan makes much sense in that context.

I see what you're saying. In this context, the output must be an array containing all cftime.datetime objects. If we use a np.ma, we can represent those missing values as cftime.datetime instances represented by units attribute as the default value.

For the rare cases where time coordinates might have np.nan values and wants to maintain np.nan, the user can manually convert them using the code in your comment above.

I wonder how Xarray will handle the possible np.ma output from num2date.

@pochedls
Copy link

pochedls commented May 1, 2024

Thank you for investigating this issue!

What date will be returned when the input value is a NaN? On Mac, cftime is currently returning 488-02-01 00:00:00 for units of seconds since 0488-02-01 00:00:00 (basically treating the value as a zero).

As a user, I like NaN infilling because it helps to ensure that I don't accidentally interpret the NaN value as 488-02-01 00:00:00.

@jswhit – is there a downside to returning a NaN value (to ensure the user doesn't misinterpret the incorrect cftime.datetime object that is returned - this could happen if they don't realize the array is a masked array)?

@jswhit
Copy link
Collaborator

jswhit commented May 2, 2024

@pochedia a numpy masked array is returned with dtype=object. The default fill_value is '?' (a string), so if the users code converts the masked array to a regular ndarray that's the data value they would see. (but as I mentioned above it's easy to manually change this to NaN)

jswhit added a commit that referenced this issue May 3, 2024
@jswhit
Copy link
Collaborator

jswhit commented May 3, 2024

closed by PR #330. Feel free to re-open if this solution does not meet your needs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants