Serving data with 2D coordinate variables using the opendap plugin. #246
Replies: 5 comments 2 replies
-
To hopefully simply the discussion, here is an example using the xarray tutorial data set 'rasm'. If I serve this data set with xpublish-opendap out of the box,
the resulting structures looks like the image below when reading the opendap URL with xarray. The data variable Tair has coordinate variables x and y which contain index values from 0...204, 0...274 based on the size of the cooresponding netCDF dimension. This structure is not correct. In fact, the data arrays that contain the lat and lon data are not even part of the opendap data structures. If I modify the xpublish_opendap/dap_array.py to add those arrays for those dimensions that are part of a 2D coordinate, I get a opendap data structure much closer what one gets when opening the data set directly (without opendap).
In fact, the structure is so close, that some legacy software can read the opendap url and correctly interpret the CF conventions in spite of the bogus x and y dimensions. However, I've been unable to come up with logic in xpublish_opendap/dap_array.py that creates the expected structure that one sees with opening the data set directly with xarray. |
Beta Was this translation helpful? Give feedback.
-
Ok, I've nudged a few folks, and this may be due to how Xarray handles attributes when a dataset is loaded. The attributes that help determine how Xarray interprets a dataset get shuffled into separate In your kerchunk, it defines a In [16]: ds['temp'].attrs
Out[16]:
{'field': 'temperature, scalar, series',
'long_name': 'time-averaged potential temperature',
'time': 'ocean_time',
'units': 'Celsius'}
In [17]: ds['temp'].encoding
Out[17]:
{'chunks': (1, 2, 258, 182),
'preferred_chunks': {'ocean_time': 1,
's_rho': 2,
'eta_rho': 258,
'xi_rho': 182},
'compressor': None,
'filters': None,
'_FillValue': 1e+37,
'dtype': dtype('float32'),
'coordinates': 'lon_rho lat_rho s_rho ocean_time'} I think we may want to cherry pick keys from encoding to serve out as OpenDAP attributes. In [19]: ds_opendap
Out[19]:
<xarray.Dataset> Size: 4MB
Dimensions: (s_rho: 2, ocean_time: 4, eta_rho: 258, xi_rho: 182)
Coordinates:
* s_rho (s_rho) float64 16B -0.05 -0.01667
* ocean_time (ocean_time) datetime64[ns] 32B 1974-12-15T12:00:00 ... 1975-...
* eta_rho (eta_rho) float64 2kB 0.0 1.0 2.0 3.0 ... 255.0 256.0 257.0
* xi_rho (xi_rho) float64 1kB 0.0 1.0 2.0 3.0 ... 178.0 179.0 180.0 181.0
Data variables:
Cs_r (s_rho) float64 16B ...
Jel (ocean_time, s_rho, eta_rho, xi_rho) float32 2MB ...
hc float64 8B ...
temp (ocean_time, s_rho, eta_rho, xi_rho) float32 2MB ...
zeta (ocean_time, eta_rho, xi_rho) float32 751kB ...
Attributes: (12/45)
CPP_options: NEP5, ADD_FSOBC, ADD_M2OBC, ANA_BIOLOGY, ANA_...
Conventions: CF-1.0
NCO: netCDF Operators version 5.1.8 (Homepage = ht...
ana_file: ROMS/Functionals/ana_btflux.h, /gscratch/bumb...
avg_base: ../../bering10k/output/hindcasts/npz_201904_d...
bio_file: ROMS/Nonlinear/bestnpz.h
... ...
svn_url:
tiling: 007x020
title: Bering Sea 10 km Grid
type: ROMS/TOMS averages file
var_info: /gscratch/bumblereem/bering10k/input/var/vari...
_xpublish_id: example_temp
In [20]: ds
Out[20]:
<xarray.Dataset> Size: 5MB
Dimensions: (s_rho: 2, ocean_time: 4, eta_rho: 258, xi_rho: 182)
Coordinates:
lat_rho (eta_rho, xi_rho) float64 376kB ...
lon_rho (eta_rho, xi_rho) float64 376kB ...
* ocean_time (ocean_time) datetime64[ns] 32B 1974-12-15T12:00:00 ... 1975-...
* s_rho (s_rho) float64 16B -0.05 -0.01667
Dimensions without coordinates: eta_rho, xi_rho
Data variables:
Cs_r (s_rho) float64 16B ...
Jel (ocean_time, s_rho, eta_rho, xi_rho) float32 2MB ...
hc float64 8B ...
temp (ocean_time, s_rho, eta_rho, xi_rho) float32 2MB ...
zeta (ocean_time, eta_rho, xi_rho) float32 751kB ...
Attributes: (12/44)
CPP_options: NEP5, ADD_FSOBC, ADD_M2OBC, ANA_BIOLOGY, ANA_...
Conventions: CF-1.0
NCO: netCDF Operators version 5.1.8 (Homepage = ht...
ana_file: ROMS/Functionals/ana_btflux.h, /gscratch/bumb...
avg_base: ../../bering10k/output/hindcasts/npz_201904_d...
bio_file: ROMS/Nonlinear/bestnpz.h
... ...
svn_rev: Unversioned directory
svn_url:
tiling: 007x020
title: Bering Sea 10 km Grid
type: ROMS/TOMS averages file
var_info: /gscratch/bumblereem/bering10k/input/var/vari... |
Beta Was this translation helpful? Give feedback.
-
Well, the way the coordinates are represented is different using the code in the pull request when compared to the way it was when I first tried it, but it still does not match what you get when you read using zarr directly. |
Beta Was this translation helpful? Give feedback.
-
Thanks for working on this some more. Since the client is using the attributes in the .das response to interpret the conventions I thought I'd have a look at those. One of the critical attributes is the "coordinates". In the case of the THREDDS server, the attributes look like:
In the case of the xpublish pull request code which I tested most recently the attributes for variables looks like:
I don't know what happens when the same attribute is sent twice, but it does make me wonder if removing the duplicate attribute would make a difference. |
Beta Was this translation helpful? Give feedback.
-
The .dds is fundamentally different as well. Now I'm more lost than ever because without the right structure of lon_rho and lat_rho it doesn't matter what the attributes are:
Should look more like:
|
Beta Was this translation helpful? Give feedback.
-
For reference, here is some discussion about such data variables in the xarray documentation.
I have such a data set which I am serving with xpublish. The xpublish server is reading kerchunk JSON files which have been derived from the netCDF data files (as in this example data set). I have also installed the opendap plug-in for xpublish.
When reading the data set using xarray as the client via zarr everything gets interpreted correctly.
eta_rho and xi_rho are "logical coordinates" to use the phrase in the xarray documentation referenced above and lat_rho and lon_rho are the physical coordinates for the data variables salt and temp.
However, when reading the same data set via the opendap interface the xarray looks like this:
In this case, eta_rho and xi_rho are not logical coordinates, they have data associate with them (just integers generated 1...N). I tried various ways to build the opendap objects, but I couldn't come up with a representation that worked.
The only example in the underlying opendap-protocol library shows building dimensions, data arrays and attributes.
I'd like to chat with y'all about this to see of if you can help me figure out how to build a representation of data set with multi-dimensional data sets.
Beta Was this translation helpful? Give feedback.
All reactions