Skip to content
This repository has been archived by the owner on Dec 1, 2022. It is now read-only.

CovJSON, CF-JSON and NCO-JSON #86

Open
jonblower opened this issue Jul 27, 2017 · 6 comments
Open

CovJSON, CF-JSON and NCO-JSON #86

jonblower opened this issue Jul 27, 2017 · 6 comments
Assignees

Comments

@jonblower
Copy link
Member

jonblower commented Jul 27, 2017

There has been a discussion on the Climate and Forecast mailing list about different JSON formats for recording NetCDF data. Until this discussion I wasn't aware that there are a couple of other initiatives going on:

The discussion revealed that these two initiatives are quite similar in aim to each other. They both aim to translate the NetCDF(-3) [edit/correction - NCO also supports NetCDF4] data model into JSON and apply the CF metadata conventions directly.

CovJSON does not have quite the same aim: it operates at a higher level of abstraction and does not mimic any particular existing format. Here are a few comparison points between CovJSON, CF-JSON and NCO, intended to stimulate discussion. I'm going to make a simplifying assumption that CF-JSON and NCO are very similar in respect of the points made here:

  1. CF-JSON and NCO are likely to be more familiar with users who are already comfortable with NetCDF and the CF conventions.

  2. Conversely, users who are unfamiliar with CF/NetCDF may find CovJSON easier to understand (at least, that's our intention...). CovJSON does not assume that the data are "born in NetCDF format".

  3. CovJSON borrows concepts from ISO and OGC standards, and may be conceptually more familiar to folk from those communities. It's intended to provide a "bridge" between what we might loosely call the "NetCDF community" and the "GIS community".

  4. CovJSON cannot (yet) encode all possibilities afforded by CF/NetCDF. For example, cell methods and climatological time are not yet supported in CovJSON. So if entirely "lossless" encoding of CF-NetCDF in JSON is required, CF-JSON and NCO may be more appropriate choices.

  5. The NetCDF(-3) data model struggles to accommodate certain types of data structures. It is quite a "flat" structure, and the mechanisms required to link relevant data together in a NetCDF file can be quite hard to understand. (Coordinate reference systems, and their links to dimensions and variables are one example. Encoding geometries is another.) I assume that JSON formats based directly on NetCDF will suffer from similar issues, forcing clients to implement some of the more complex parts of the CF conventions in order to piece the information back together. By contrast, CovJSON aims to repartition the same information in a way that is (hopefully) easier for clients to deal with, using the possibilities afforded by JSON.

  6. By virtue of the above, I would argue that non-gridded data (e.g. observations from points or moving platforms), which often require the recording of geometries, trajectories and other "composite" coordinate types, are easier to encode and understand in CovJSON than in NetCDF. (Concretely, CovJSON provides the facility for "tuple" and "polygon" axis types: https://covjson.org/spec/#axis-objects. These require some gymnastics to encode in NetCDF.)

  7. CovJSON provides mechanisms to partition large datasets among different files (e.g. holding range objects in separate files, the tiling scheme). This is done for "web-friendliness", i.e. avoiding large monolithic files. I'm not aware that CF-JSON and NCO have this facility, although I may be wrong.

  8. On a more minor point of implementation, CovJSON encodes data values as flat, 1-D arrays (the reason why is explained here. CF-JSON and NCO use nested arrays.

Discussion of the above points (and addition of new ones!) is most welcome. My intention is not to evangelise for CovJSON, but to point out points of similarity and departure (philosophically and structurally) between CovJSON, CF-JSON and NCO. If we can understand these points we'll be in a better place to discuss whether we should look at merging these initiatives.

@ethanrd
Copy link

ethanrd commented Jul 27, 2017

My understanding is that NCO JSON can represent the full netCDF-4 data model.

@czender
Copy link

czender commented Jul 27, 2017

NCO produces JSON for classic (netCDF3) and extended (netCDF4) data. Sample input and output files are here:

http://dust.ess.uci.edu/tmp/in*

The in_grp* are hierarchical (netCDF4) and the in.* are flat/classic (netCDF3). NCO defines three options that trade-off increasing JSON complexity for binary-reproducibility. It's all documented in the NCO manual.

We welcome any discussion/feedback to improve/extend the format!

@BobSimons
Copy link

NCO's JSON options and covJSON have very different goals and approaches.
NCO's JSON can create a JSON file with a one-to-one representation of the information that is in any CF/NetCDF file.
covJSON is an encoding of specific feature types a specific way in a json file.
Each will have different users and uses.
There is no reason to try to merge them. There doesn't have to be just one JSON format.
I personally hope that no effort is made to merge them.

CF-JSON and NCO's JSON are very similar in goals and approaches.
They can create a JSON file with a representation of the information that is in a CF/NetCDF file.
NCO's lvl=2 "pedantic" variant preserves all of the information from any CF/NetCDF file -- a one-to-one reformatting.
It is nice that NCO's format is already supported by major CF/NetCDF tool (NCO!).
I plan to add to ERDDAP the ability to write JSON files with NCO's lvl=2 "pedantic" variant.
Since the CF-JSON proposal lacks a few features found in the NCO options (e.g., data types), I personally hope that the CF-JSON group revises their format to match one or all of the NCO options.

@ChrisBarker-NOAA
Copy link

I do think covJSON DOES overlap with CF, and where is does, it would be nice to have it be compatible with the netcdf-cf json representations.

In fact, there might be room to go the other way -- under CF, there is a discussion of how to model geometries in netcdf-CF -- maybe it could be informed by covJSON.

@jonblower
Copy link
Member Author

Thanks everyone for your comments. @ChrisBarker-NOAA - if you have any suggestions for any constructs that CovJSON could adopt/reuse from CF, please feel free to raise an issue on this site.

Just to expand on @BobSimons' point:

covJSON is an encoding of specific feature types a specific way in a json file.

The "core" CovJSON spec does not include specific feature types. It defines a general structure that can be used to encode a very wide range of features. The encoding of all features is structurally the same - the only thing that changes is the form of the domain. (The parameters and range objects don't need to know anything about feature types.)

However, we recognise that it's useful for clients to be able to quickly detect what kind of feature is being encoded (timeseries, grid, vertical profile etc). Hence a Coverage can contain a domainType property which specifies this. Each domain type comes with its own "rules", which help to simplify clients. More details on this can be found here. Strictly, the domain types specification is separate from the core CovJSON spec.

Users can create their own domain types if they want - this is part of the extensibility mechanism. Or they don't have to use domain types at all - but that makes general-purpose clients harder to write.

@ChrisBarker-NOAA
Copy link

@jonblower:

Honestly, I'm not much of a GIS guy, so I"m still a bit confused about what exactly a "coverage" is.

(I think I know that features are...)

But there is always been a bit of a disconnect between the data models for scientific data (and model results) that I deal with and the "standard" GIS data model. And it seems the CovJSON is trying to close this gap a bit. So:

CF was designed for scientific data and netcdf. But while CF was designed fr netcdf, it does, in fact, impose a data model that can be used and adapted to other file formats or programming environments.

And netcdf also defines both a specific file format, and a data model. So:

The CF data model can be mapped to other file formats
The netcdf data model can be mapped to other formats as well.

So these can be done more or less orthogonally, so I suggest that:

CF-JSON and NCO-JSON be focused on mapping netcdf to JSON in a standard way. Once that is done, then CF itself becomes a metadata standard that can simply be applied to JSON the same way as netcdf.

So CovJSON can them take a similar tack -- for those "things" that overlap between CF and what you want Cov-JSON t cover, the CF standards are used, and expressed in JSON in a way compatible with CF_JSON/NCO_JSON (maybe call it nc-JSON?).

Maybe we need a coverage CF spec? Note that additional specs can be "added on" to CF if they are done in a compatible way.

Also note that there is an effort afoot to add GIS_like geometry specs to CF:

https://github.com/twhiteaker/netCDF-CF-simple-geometry

and

cf-convention/cf-conventions#115

(I've lost track of the "official" status or even where the latest discussion lives...)

But I expect there is room for collaboration/alignment there, too.

-CHB

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants