Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use CoverageJSON for serializing model (and observational) data? #8

Closed
benbovy opened this issue Sep 14, 2020 · 9 comments · Fixed by #11
Closed

Use CoverageJSON for serializing model (and observational) data? #8

benbovy opened this issue Sep 14, 2020 · 9 comments · Fixed by #11

Comments

@benbovy
Copy link
Member

benbovy commented Sep 14, 2020

CoverageJSON is "a format for publishing geotemporal data to the web". I think it is very relevant for our case. We should probably just use it for much easier integration between the backend and the frontend, rather than using "plain" GeoJSON and define our own specifications on top of it (which would require many iterations).

Advantages of CoverageJSON:

  • The specifications are already very detailed and inspired by both the netCDF data model and some OGC geospatial data models.
  • it is listed in the encodings recommended by OGC API - Coverages.
  • it supports most (if not all) our use cases, i.e., grid, trajectory, profile, point collection, etc. It also supports multiple dimensions including time. See the examples in https://covjson.org/playground/.
  • There are some javascript tools available (https://covjson.org/tools/) that might help on the frontend side (it seems those tools haven't been updated for quite a while, though).
  • I don't think it would require much work from the backend side to support it (that's "just" data/metadata formatting). We could have a look at pycovjson and pygeoapi that both create CoverageJSON formatted data from netcdf/xarray datasets.

GeoJSON might still be useful at some places, though, especially for API query/post parameters.

Any thoughts @alirezamdv @willirath @koldunovn @suvarchal ?

@willirath
Copy link
Member

👍

@koldunovn
Copy link

Looks good! The only thing that worries me a bit is that it's still in a draft stage and is not updated since 2015 :)

@benbovy
Copy link
Member Author

benbovy commented Sep 14, 2020

The only thing that worries me a bit is that it's still in a draft stage and is not updated since 2015 :)

Yes that's a good point. Not a big issue if we just use the specs without relying too much on 3rd-party tools implementing it, though. The fact that an active project like pygeoapi uses it by default (for its xarray and rasterio providers) makes me a bit less worried too.

@alirezamdv
Copy link

@benbovy I have no experience with it, but looks easy to use it with leaflet.

@alirezamdv
Copy link

@benbovy this Format is just accessible with A third party library and this is just available for leaflet, i couldn't find one for Open Layers, and not easy to Style...
In my opinion Geojson was better and for large data maybe TopoJson:
https://github.com/topojson/topojson

@benbovy
Copy link
Member Author

benbovy commented Sep 16, 2020

Yes I agree there is not much tools currently available (and/or well maintained) to handle the coverageJSON format in a convenient way, and that's a bit unfortunate. That said, I'm still (quite strongly) in favor of using that format.

As far as I'm familiar with the GeoJSON format, it is too generic for our use cases IMO. How should we deal with the temporal dimension in model outputs? How should we encode model fields metadata like units, variable names, etc.? How to encode tiled data? Those are issues that have been already solved by CoverageJSON. Ocean models like NEMO or FESOM2 usually store their outputs using the NetCDF data model. As I understand it, CoverageJSON has been designed specifically as a bridge between the netCDF data model and the "OGC-like" formats commonly used to handle geospatial data in frontend applications, which seems very much what we actually need.

I'm not sure how TopoJSON would better work. Like for GeoJSON, we would still need some good amount work on adapting the NetCDF model to it.

In #11 I don't rely on any third-party libraries, I've rather implemented the CoverageJSON specs (the part that we need). It didn't take me much effort. Maybe on the frontend side you could implement some helper functions to deal with the format (e.g., CoverageJSON / GeoJSON converters)?

As there's not much tools for CoverageJSON, there's some implementation efforts to do on both the backend and frontend sides, but at least we don't have to settle on a data model design, which would anyway require a lot of work and discussion IMO.

@benbovy
Copy link
Member Author

benbovy commented Sep 17, 2020

A few more thoughts after some more research and self-education (to be honest, my experience with handling geospatial formats in web applications is pretty limited).

My understanding is that the GeoJSON specs allow custom properties be defined at the feature level but not at the geometry level. This means that in theory we cannot use "multi" geometries like LineString, MultiLineString or MultiPoint to store information like time or model data (one or more fields) at the vertices of those complex geometries (e.g., ship track line vertices, set of station points). So we would have to create a Point feature for every vertex instead, wrapped in a FeatureCollection. While this is probably fine with just a few dozens of vertices, for bigger queries we might quickly hit some serious performance issues.

There has been some proposals to extend GeoJSON for dealing with those specific issues, like here and here, but they seem stale.

TopoJSON is more compact than GeoJSON for some cases (e.g., many features with complex geometries that share the same arcs), but I'm not sure if that applies to our case since we would have to deal with many point features anyway.

I guess most people currently use ad-hoc solutions to deal with those issues? That's probably why some folks tried to come up with new, more adapted formats like coverageJSON.

Alternative candidates that I've found so far:

  1. cf-json
  2. netcdf-ld
  3. coverage-json

1 and 2 are still drafts and are too close to the (cf)-netcdf data model IMO, it won't play nicely with OGC-friendly front-end tools. 3 is approved by OGC, but to my understanding it is limited to grid/mesh domains only (i.e., it doesn't provide solutions for extracted data along sections, trajectories, profiles, etc.). This lefts us with coverageJSON, which seems the most adapted to our problem among those (tentative) standards. According to the activity in https://github.com/opengeospatial/ogc_api_coverages, its integration as a OGC standard is still actively discussed.

To summarize, sadly there's still no easy way (widely adopted standard and/or mature tools) to handle geotemporal model outputs in web applications. While I'm advocating here for coverageJSON, I'm really open to discussion and to any alternative - "standard" or ad-hoc - solution that would better suits our needs and constraints from a full-stack perspective!

Some useful links can be found in the discussion here: pangeo-data/pangeo-datastore#3

@alirezamdv
Copy link

thank you for your detailed description, let's stay with CoverageJson... maybe we should, as you said, fit our individual solution to it and turn it into our adaptable data structure... I'm going to try to modify and extract some piece of code from this extended leaflet library to create a module to work with in frontEnd, I will try to find out how to style it properly.

@benbovy
Copy link
Member Author

benbovy commented Sep 17, 2020

I've just had a deeper look at the covjson-reader and leaflet-coverage libraries, and although I'm not an expert in Javascript, I now realize that it represents more work than what I've done in #11.

Depending on how goes the experiments on your side, we could still get back to some GeoJSON + custom JSON home-made solution and see how it goes. At this stage, I think we could manage maintaining the two approaches on the backend side.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants