You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi! I was trying to figure out how to make my own Obspack file for GEOS-Chem, (I had a version working previously in v13.3.4), but have been getting an error post v14 and found your code. I figured I'd use to generate a proper Obspack file I could compare against mine to see why my files aren't working anymore. But, I ran into a lot of xarray/dask errors when trying to use it with the latest version of xarray (2024.5.0) and needed to make the following edits to your code to get it to work. My updated process_obspack.py is also attached as a .txt file. process_obspack.txt
Error 1:
All of the "where" masks used in the function filter_obspack() and in save_day() now require masks to have .compute() after them in order not to get the error: "Indexing with a boolean dask array is not allowed. This will result in a dask array of unknown shape. Such arrays are unsupported by Xarray.Please compute the indexer first using .compute()". See this discussion for details on why: hainegroup/oceanspy#332
My fix is to modify those masking lines as follows:
Within filter_obspack():
# Subset for time and location
data = data.where(
(data['time'] >= config['start_time']).compute() &
(data['time'] <= config['end_time']+pd.Timedelta('1D')).compute(),
#data['time'].dt.year.isin(keepyears),
drop=True
)
data = data.where(
(data['latitude'] >= config['lat_min']).compute() &
(data['latitude'] <= config['lat_max']).compute(),
drop=True
)
data = data.where(
(data['longitude'] >= config['lon_min']).compute() &
(data['longitude'] <= config['lon_max']).compute(),
drop=True
)
Error 2:
Additionally, I got an error that the data type for time when you save the netcdf files in saveday() must be specified if you specify the time units: "ValueError: When encoding chunked arrays of datetime values, both the units and dtype must be prescribed or both must be unprescribed. Prescribing only one or the other is not currently supported. Got a units encoding of seconds since 1970-01-01 00:00:00 UTC and a dtype encoding of None." I fixed this by adding that dtype specification for the time encoding (see: pydata/xarray#3739):
# Otherwise, save out
print(f'Saving {mydate.strftime("%Y-%m-%d")}',flush=True)
outpath=f'{config["outdir"]}/{config["outfile_name_stem"]}'
daily.to_netcdf(
mydate.strftime(outpath),
unlimited_dims=['obs'],
encoding = {
**{
v:{'complevel':1,'zlib':True}
for v in daily.data_vars
},
'time':{'units':tunits,'calendar':'proleptic_gregorian', 'dtype':np.int64}
}
)
Hi! Thank you for opening the issue and letting me know! Thanks also for sharing your updated file. The version of xarray I used is older than yours and the current version. Feel free to make a pull request, otherwise I will eventually implement the fixes.
Hi! I was trying to figure out how to make my own Obspack file for GEOS-Chem, (I had a version working previously in v13.3.4), but have been getting an error post v14 and found your code. I figured I'd use to generate a proper Obspack file I could compare against mine to see why my files aren't working anymore. But, I ran into a lot of xarray/dask errors when trying to use it with the latest version of xarray (2024.5.0) and needed to make the following edits to your code to get it to work. My updated process_obspack.py is also attached as a .txt file.
process_obspack.txt
Error 1:
All of the "where" masks used in the function
filter_obspack()
and insave_day()
now require masks to have.compute()
after them in order not to get the error: "Indexing with a boolean dask array is not allowed. This will result in a dask array of unknown shape. Such arrays are unsupported by Xarray.Please compute the indexer first using .compute()". See this discussion for details on why: hainegroup/oceanspy#332My fix is to modify those masking lines as follows:
Within filter_obspack():
Within saveday():
Error 2:
Additionally, I got an error that the data type for time when you save the netcdf files in saveday() must be specified if you specify the time units: "ValueError: When encoding chunked arrays of datetime values, both the units and dtype must be prescribed or both must be unprescribed. Prescribing only one or the other is not currently supported. Got a units encoding of seconds since 1970-01-01 00:00:00 UTC and a dtype encoding of None." I fixed this by adding that dtype specification for the time encoding (see: pydata/xarray#3739):
INSTALLED VERSIONS:
commit: None
python: 3.12.3 | packaged by conda-forge | (main, Apr 15 2024, 18:38:13) [GCC 12.3.0]
python-bits: 64
OS: Linux
OS-release: 4.18.0-477.15.1.el8_8.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.14.3
libnetcdf: 4.9.2
xarray: 2024.5.0
pandas: 2.1.4
numpy: 1.26.4
scipy: 1.13.0
netCDF4: 1.6.5
pydap: None
h5netcdf: None
h5py: 3.11.0
zarr: None
cftime: 1.6.3
nc_time_axis: None
iris: None
bottleneck: 1.3.7
dask: 2024.5.1
distributed: 2024.5.1
matplotlib: 3.8.4
cartopy: None
seaborn: None
numbagg: None
fsspec: 2024.5.0
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 69.5.1
pip: 24.0
conda: None
pytest: None
mypy: None
IPython: 8.24.0
sphinx: 7.3.7
The text was updated successfully, but these errors were encountered: