Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Discrete Sampling Geometry datasets. #122

Open
malmans2 opened this issue Jan 7, 2021 · 6 comments
Open

Add support for Discrete Sampling Geometry datasets. #122

malmans2 opened this issue Jan 7, 2021 · 6 comments
Labels
opinion wanted User input requested

Comments

@malmans2
Copy link
Member

malmans2 commented Jan 7, 2021

I ran into this while trying to set up a dataset with a collection of vertical profiles (i.e., a transect).

I think we should consider adding a new axis named "discrete": http://cfconventions.org/cf-conventions/cf-conventions.html#discrete-axis
The dimensions of the discrete axis would then be defined by an attribute named "instance_dimension": http://cfconventions.org/cf-conventions/cf-conventions.html#collections-instances-elements
The "instance_dimension" attribute is assigned to all "index_variables", which are 1D coordinates.

This is an example where I've extracted a transect from a C-grid model. After extraction I've removed the "axis" attributes from all X and Y variables, and I've added the attribute da.attrs["instance_dimension"] = "station" to all 1D variables with dimension "station".

discrete

In this scenario, I think the axes should be "Z", "T", and "discrete", where ds.axes["discrete"] = ["station"]. Then we should probably also add ds.cf.index_variables, which returns all index variables (e.g., lon, lat, label, indexes on the original grid, ....).

There is a global attribute named "featureType": http://cfconventions.org/cf-conventions/cf-conventions.html#_features_and_feature_types
Not sure whether it would be preferable to add ds.cf["discrete"] and ds.cf.index_variables only if the attribute is present, and maybe axes ["X", "Y"] and "discrete" should be mutually exclusive?

@dcherian
Copy link
Contributor

dcherian commented Jan 7, 2021

Yes we should support this "discrete sampling geometry" stuff.

I am confused about how these are represented however. The CF CDL examples don't use the discrete attribute; http://cfconventions.org/cf-conventions/cf-conventions.html#_indexed_ragged_array_representation_of_trajectories but instance_dimension is used. cf_role also seems important

@ocefpaf Can you point us to a "nice" dataset that uses these attributes?

@dcherian dcherian changed the title Add support for discrete axis, instance dimensions, and index variables? Add support for Discrete Sampling Geometry datasets. Jan 8, 2021
@ocefpaf
Copy link
Contributor

ocefpaf commented Jan 8, 2021

@ocefpaf Can you point us to a "nice" dataset that uses these attributes?

I believe we have some "gold standards" somewhere. Let me check and get back to you.

@rsignell-usgs
Copy link

@dcherian and @ocefpaf, we have USGS oceanographic data in CF-1.6 compliant format, both featureType: timeSeries and featureType: timeSeriesProfile data on our THREDDS server, where you can download the data at NetCDF or access via OPeNDAP.

For example, all of the data from this experiment in Grand Bay
(thanks to @dnowacki-usgs):

Specific Examples:

@dcherian
Copy link
Contributor

dcherian commented Jan 9, 2021

Thanks @rsignell-usgs , those datasets are a lot more straightforward.

instance_dimension is still confusing to me but it is used to represent ragged arrays. For xarray, we could decode this to either a MultiIndexed dataset or a sparse array dataset.

OTOH cf_role tagged variables provide a unique identifier for a "trajectory", so if you were concatenating multiple trajectory files, you would create a new coordinate for this cf_role variable and concatenate along that. I think we can support indexing by cf_role. Only valid keys are trajectory_id, timeseries_id , profile_id

@dcherian
Copy link
Contributor

dcherian commented Jan 9, 2021

Also related: https://ncas-cms.github.io/cfdm/tutorial.html#discrete-sampling-geometries
and pydata/xarray#1077 (comment) where we are confused about which of these representations maps more cleanly to a sparse DataArray, and which to a MultiIndexed DataArray

@dcherian dcherian added the opinion wanted User input requested label Jan 26, 2021
@dcherian
Copy link
Contributor

The NCEI netCDF templates look useful (but I haven't looked closely): https://www.ncei.noaa.gov/data/oceans/ncei/formats/netcdf/v2.0/index.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
opinion wanted User input requested
Projects
None yet
Development

No branches or pull requests

4 participants