Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Xarray v2024.5.0 Errors & Fixes #1

Open
jhaskinsPhD opened this issue Jun 3, 2024 · 1 comment
Open

Xarray v2024.5.0 Errors & Fixes #1

jhaskinsPhD opened this issue Jun 3, 2024 · 1 comment
Assignees

Comments

@jhaskinsPhD
Copy link

jhaskinsPhD commented Jun 3, 2024

Hi! I was trying to figure out how to make my own Obspack file for GEOS-Chem, (I had a version working previously in v13.3.4), but have been getting an error post v14 and found your code. I figured I'd use to generate a proper Obspack file I could compare against mine to see why my files aren't working anymore. But, I ran into a lot of xarray/dask errors when trying to use it with the latest version of xarray (2024.5.0) and needed to make the following edits to your code to get it to work. My updated process_obspack.py is also attached as a .txt file.
process_obspack.txt

Error 1:
All of the "where" masks used in the function filter_obspack() and in save_day() now require masks to have .compute() after them in order not to get the error: "Indexing with a boolean dask array is not allowed. This will result in a dask array of unknown shape. Such arrays are unsupported by Xarray.Please compute the indexer first using .compute()". See this discussion for details on why: hainegroup/oceanspy#332

My fix is to modify those masking lines as follows:

Within filter_obspack():

    # Subset for time and location
    data = data.where(
        (data['time'] >= config['start_time']).compute() &
        (data['time'] <= config['end_time']+pd.Timedelta('1D')).compute(),
        #data['time'].dt.year.isin(keepyears),
        drop=True
    )

    data = data.where(
        (data['latitude'] >= config['lat_min']).compute() & 
        (data['latitude'] <= config['lat_max']).compute(),
        drop=True
    )
    
    data = data.where(
        (data['longitude'] >= config['lon_min']).compute() & 
        (data['longitude'] <= config['lon_max']).compute(),
        drop=True
    )

Within saveday():

    daily = ds.where(
        (
            (ds['time'] >= mydate).compute() & 
            (ds['time'] < mydate+pd.Timedelta('1D')).compute() &
            (ds['CT_sampling_strategy'].isin([1,2,3,4])).compute()
        ),
        drop=True
    )

Error 2:
Additionally, I got an error that the data type for time when you save the netcdf files in saveday() must be specified if you specify the time units: "ValueError: When encoding chunked arrays of datetime values, both the units and dtype must be prescribed or both must be unprescribed. Prescribing only one or the other is not currently supported. Got a units encoding of seconds since 1970-01-01 00:00:00 UTC and a dtype encoding of None." I fixed this by adding that dtype specification for the time encoding (see: pydata/xarray#3739):

    # Otherwise, save out
    print(f'Saving {mydate.strftime("%Y-%m-%d")}',flush=True)
    outpath=f'{config["outdir"]}/{config["outfile_name_stem"]}'
    daily.to_netcdf(
        mydate.strftime(outpath),
        unlimited_dims=['obs'],
        encoding = {
            **{
                v:{'complevel':1,'zlib':True}
                for v in daily.data_vars
            },
            'time':{'units':tunits,'calendar':'proleptic_gregorian', 'dtype':np.int64}
        }
    )

INSTALLED VERSIONS:

commit: None
python: 3.12.3 | packaged by conda-forge | (main, Apr 15 2024, 18:38:13) [GCC 12.3.0]
python-bits: 64
OS: Linux
OS-release: 4.18.0-477.15.1.el8_8.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.14.3
libnetcdf: 4.9.2

xarray: 2024.5.0
pandas: 2.1.4
numpy: 1.26.4
scipy: 1.13.0
netCDF4: 1.6.5
pydap: None
h5netcdf: None
h5py: 3.11.0
zarr: None
cftime: 1.6.3
nc_time_axis: None
iris: None
bottleneck: 1.3.7
dask: 2024.5.1
distributed: 2024.5.1
matplotlib: 3.8.4
cartopy: None
seaborn: None
numbagg: None
fsspec: 2024.5.0
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 69.5.1
pip: 24.0
conda: None
pytest: None
mypy: None
IPython: 8.24.0
sphinx: 7.3.7

@eastjames eastjames self-assigned this Jun 6, 2024
@eastjames
Copy link
Owner

Hi! Thank you for opening the issue and letting me know! Thanks also for sharing your updated file. The version of xarray I used is older than yours and the current version. Feel free to make a pull request, otherwise I will eventually implement the fixes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants