Differences on datetime values appears after writing reindexed variable on netCDF file #1064

Scheibs · 2016-10-27T15:54:34Z

In my Dataset i've got a time serie coordinate who begins like this

<xarray.DataArray 'time' (time: 10)>
array(['2014-02-15T00:00:00.000000000+0100',
       '2014-02-15T18:10:00.000000000+0100',
       '2014-02-16T18:10:00.000000000+0100',
       '2014-02-17T18:10:00.000000000+0100',
       '2014-02-18T18:10:00.000000000+0100',
       '2014-02-19T18:10:00.000000000+0100',
       '2014-02-20T18:10:00.000000000+0100',
       '2014-02-21T18:10:00.000000000+0100',
       '2014-02-22T00:00:00.000000000+0100',
       '2014-02-23T00:00:00.000000000+0100'], dtype='datetime64[ns]')
Coordinates:
  * time     (time) datetime64[ns] 2014-02-14T23:00:00 2014-02-15T17:10:00 ...

And all is ok when I write and re-open the netdcdf file

Then i try to add to this dataset a reindexed variable like this

da["MeanRainfallHeigh"] = rain.reindex(time =da.time).fillna(0)

Everything is still good for the writing, but when I reopen the netcdf file, the time values are modified for the minutes part.

<xarray.DataArray 'time' (time: 10)>
array(['2014-02-15T00:00:00.000000000+0100',
       '2014-02-15T18:00:00.000000000+0100',
       '2014-02-16T18:00:00.000000000+0100',
       '2014-02-17T18:00:00.000000000+0100',
       '2014-02-18T18:00:00.000000000+0100',
       '2014-02-19T18:00:00.000000000+0100',
       '2014-02-20T18:00:00.000000000+0100',
       '2014-02-21T18:00:00.000000000+0100',
       '2014-02-22T00:00:00.000000000+0100',
       '2014-02-23T00:00:00.000000000+0100'], dtype='datetime64[ns]')
Coordinates:
  * time     (time) datetime64[ns] 2014-02-14T23:00:00 2014-02-15T17:00:00 ...

Thanks!

The text was updated successfully, but these errors were encountered:

jhamman · 2016-10-28T23:38:46Z

@Scheibs - thanks for the report. Can you provide a simple, minimum working example (MWE)?

Scheibs · 2016-11-16T16:29:32Z

This is the warning I got when I wrote on my file with "to_necdf()"
xarray\conventions.py:1060: RuntimeWarning: saving variable time with floating point data as an integer dtype without any _FillValue to use for NaNs
for k, v in iteritems(variables))

@jhamman It's seems than the error appears only with a variable "rain" who commes from a previous created netcdf file, but I will try to provide you an example. tanks

Scheibs · 2016-11-16T16:57:21Z

@jhamman Here my example file
ftp://ftp.irsn.fr/argon/Example

NotSqrt · 2018-01-17T14:41:14Z

I faced this issue when switching from a concat to a merge.

The first merged dataset had a time dimension which encoding says {'calendar': 'proleptic_gregorian', 'dtype': dtype('int64'), 'units': 'minutes since 2017-08-20 00:00:00'}, which meant that the data from the second merged dataset could not be stored with a finer resolution than minutes.

If I try to store values like '2017-08-20 00:00:30', I get the warning xarray\conventions.py:1092: RuntimeWarning: saving variable time with floating point data as an integer dtype without any _FillValue to use for NaNs.

Maybe it is similar in your case: netcdf stored the data as 'hours since XXXX', so you lose the minutes.

shoyer · 2018-01-17T21:34:13Z

@NotSqrt can you make a minimum working example for this? e.g., a netCDF file with problematic data, and associated code that writes a netCDF file with lost time resolution. That would really help us diagnose and solve this problem.

NotSqrt · 2018-01-18T08:16:06Z

There you go !

import numpy
import pandas
import tempfile
import warnings
import xarray


array1 = xarray.DataArray(
    numpy.random.rand(5),
    dims=['time'],
    coords={'time': pandas.to_datetime(['2018-01-01', '2018-01-01 00:01', '2018-01-01 00:02', '2018-01-01 00:03', '2018-01-01 00:04'])},
    name='foo'
)

array2 = xarray.DataArray(
    numpy.random.rand(5),
    dims=['time'],
    coords={'time': pandas.to_datetime(['2018-01-01 00:05', '2018-01-01 00:05:10', '2018-01-01 00:05:20', '2018-01-01 00:05:30', '2018-01-01 00:05:40'])},
    name='foo'
)

with tempfile.NamedTemporaryFile() as tmp:
    # save first array
    array1.to_netcdf(tmp.name)
    # reload it
    array1_reloaded = xarray.open_dataarray(tmp.name)

    # the time encoding stores minutes as int, so seconds won't be allowed at next call of to_netcdf
    assert array1_reloaded.time.encoding['dtype'] == numpy.int64
    assert array1_reloaded.time.encoding['units'] == 'minutes since 2018-01-01 00:00:00'

    merged = xarray.merge([array1_reloaded, array2])
    array1_reloaded.close()

    with warnings.catch_warnings():
        warnings.filterwarnings('error', category=RuntimeWarning)
        merged.to_netcdf(tmp.name)

NotSqrt · 2018-01-23T16:07:44Z

FYI, merged.time.encoding = {} before calling to_netcdf seems to avoid the RuntimeWarning.

kmuehlbauer · 2023-09-12T18:23:26Z

@NotSqrt If you are still in the works with this, I'd appreciate if you could test this against #7827.

This adds another warning with a some more detail what's going on. The issue remains that the wanted encoding in minutes does not work with the actual data, hence the second warning. But maybe we can find a way to also check dtypes.

NotSqrt · 2023-09-18T11:47:26Z

I've run the example I gave above.

import numpy
import pandas
import tempfile
import warnings
import xarray


array1 = xarray.DataArray(
    numpy.random.rand(5),
    dims=['time'],
    coords={'time': pandas.to_datetime(['2018-01-01', '2018-01-01 00:01', '2018-01-01 00:02', '2018-01-01 00:03', '2018-01-01 00:04'], format='ISO8601')},
    name='foo'
)

array2 = xarray.DataArray(
    numpy.random.rand(5),
    dims=['time'],
    coords={'time': pandas.to_datetime(['2018-01-01 00:05', '2018-01-01 00:05:10', '2018-01-01 00:05:20', '2018-01-01 00:05:30', '2018-01-01 00:05:40'], format='ISO8601')},
    name='foo'
)

with tempfile.NamedTemporaryFile() as tmp:
    # save first array
    array1.to_netcdf(tmp.name)
    # reload it
    array1_reloaded = xarray.open_dataarray(tmp.name)

    # the time encoding stores minutes as int, so seconds won't be allowed at next call of to_netcdf
    assert array1_reloaded.time.encoding['dtype'] == numpy.int64
    assert array1_reloaded.time.encoding['units'] == 'minutes since 2018-01-01 00:00:00'

    merged = xarray.merge([array1_reloaded, array2])
    array1_reloaded.close()

    # this line avoids losing precision and removes both warnings
    #merged.time.encoding = {}
    
    # this line removes the conversion to ints, which solves the resolution loss and removes the second warning
    #merged.time.encoding.pop('dtype')

    merged.to_netcdf(tmp.name)
    merged_reloaded = xarray.open_dataarray(tmp.name)
    numpy.testing.assert_array_equal(
        numpy.concatenate([array1.time, array2.time]), 
        merged_reloaded.time.values
    )

I see that now the warnings are:

UserWarning: Times can't be serialized faithfully with requested units 'minutes since 2018-01-01'. Resolution of 'seconds' needed. Serializing timeseries to floating point.
SerializationWarning: saving variable time with floating point data as an integer dtype without any _FillValue to use for NaNs

And as the last code statement still shows that the seconds are lost, we still have to use merged.time.encoding = {} or merged.time.encoding.pop('dtype') to be sure not to lose precision.
I guess that the serializing to floating points is overwritten by the integer dtype determined after the first save, which means the floating points were not helpful without changing the dtype encoding ..

If the resolution loss can't be fixed automatically, what would be nice in the warning is a link or a summary of what the user has to do to solve the resolution loss !

Thanks !

kmuehlbauer · 2023-09-18T13:03:11Z

Thanks @NotSqrt for the detailed test and reasoning.

The issue is as you already wrote with encoding. Only the encoding of the first dataset survives the process. If you switch the order of the objects your code runs successfully.

As we do not update encoding and want to get rid of it soon (see dicussion in #6323) there is not much to be done.

I very much agree, that the user should get as much information out of any warnings/errors to follow up easily.

There might be at least the 3 following actions:

The first thing what could be suggested is to use .reset_encoding on the merged dataset. As this might have unwanted side-effects on other variables, it might only be applied where necessary (eg. time variable).
Automatically change encoding dtype from to float64 in those cases.
a. Special case times/timedeltas in NonStringCoder to prevent conversion to int
b. Remove dtype in CFDatetimeCoder / CFTimedeltaCoder()

From my perspective the less intrusive action would be 3b. For your example this would just print the first warning (which provides the needed information) and the seconds will be preserved.

kmuehlbauer · 2023-09-18T13:18:45Z

@NotSqrt #8201 is not yet fully ready but you might check it already. Thanks!

kmuehlbauer · 2023-09-19T08:07:50Z

#8201 will take care of this issue as follows:

It issues that warning:

* `UserWarning: Times can't be serialized faithfully with requested units 'minutes since 2018-01-01'. Resolution of 'seconds' needed. Serializing timeseries to floating point.`

If the resolution loss can't be fixed automatically, what would be nice in the warning is a link or a summary of what the user has to do to solve the resolution loss !

And it automatically drops dtype from encoding if we need to encode to float64. That will prevent recast to int64 and with that precision loss.

dcherian added the topic-backends label Feb 19, 2019

kmuehlbauer mentioned this issue Sep 13, 2023

Preserve nanosecond resolution when encoding/decoding times #7827

Merged

9 tasks

kmuehlbauer closed this as completed in #7827 Sep 17, 2023

kmuehlbauer reopened this Sep 18, 2023

kmuehlbauer mentioned this issue Sep 18, 2023

override units for datetime64/timedelta64 variables to preserve integer dtype #8201

Merged

4 tasks

kmuehlbauer closed this as completed in #8201 Sep 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Differences on datetime values appears after writing reindexed variable on netCDF file #1064

Differences on datetime values appears after writing reindexed variable on netCDF file #1064

Scheibs commented Oct 27, 2016

jhamman commented Oct 28, 2016

Scheibs commented Nov 16, 2016

Scheibs commented Nov 16, 2016

NotSqrt commented Jan 17, 2018

shoyer commented Jan 17, 2018

NotSqrt commented Jan 18, 2018

NotSqrt commented Jan 23, 2018

kmuehlbauer commented Sep 12, 2023

NotSqrt commented Sep 18, 2023

kmuehlbauer commented Sep 18, 2023 •

edited

Loading

kmuehlbauer commented Sep 18, 2023

kmuehlbauer commented Sep 19, 2023

Differences on datetime values appears after writing reindexed variable on netCDF file #1064

Differences on datetime values appears after writing reindexed variable on netCDF file #1064

Comments

Scheibs commented Oct 27, 2016

jhamman commented Oct 28, 2016

Scheibs commented Nov 16, 2016

Scheibs commented Nov 16, 2016

NotSqrt commented Jan 17, 2018

shoyer commented Jan 17, 2018

NotSqrt commented Jan 18, 2018

NotSqrt commented Jan 23, 2018

kmuehlbauer commented Sep 12, 2023

NotSqrt commented Sep 18, 2023

kmuehlbauer commented Sep 18, 2023 • edited Loading

kmuehlbauer commented Sep 18, 2023

kmuehlbauer commented Sep 19, 2023

kmuehlbauer commented Sep 18, 2023 •

edited

Loading