Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Layers appear out of sync in WMS and WCS if band math is used in styling #1041

Open
christophfriedrich opened this issue Jul 31, 2024 · 6 comments
Assignees

Comments

@christophfriedrich
Copy link
Contributor

christophfriedrich commented Jul 31, 2024

Description

The standard way to serve calculated products (e.g. NDVI) via WMS seems to be to load the raw bands and do the band math accordingly, e.g. like in this example from the documentation:

ndvi_bidirection_cfg = {
    "index_function": {
        "function": "datacube_ows.band_utils.norm_diff",
        "mapped_bands": True,
        "kwargs": {"band1": "nir", "band2": "red"},
    },
    "mpl_ramp": "RdYlGn",
    "range": [-1.0, 1.0]
}

Seeing this all over the docs, I set up a system that uses somewhat complex index functions to calculate various products (see e.g. https://github.com/eo2cube/ows_config/blob/4fe21710cc278e9c2ee4b46a742aaaa9f343899d/s2_vi/formulas.py#L16).

Now we need the raw values of those calculations elsewhere and thought "Perfect, we've got the WCS, we can just export them from there!" But as the WCS serves the "raw data" and the calculations are technically happening in the styling part, the only thing I got were the very raw imagery values, not the calculated product values. (To stay with the example: I got the Red and NIR values, not the NDVI calculated from them.)

I browsed through the existing issues and this has been touched upon before at #510 (comment) with Paul stating:

We feel that the purpose of WCS is to provide access to the raw data and that anybody interested in WCS would be more than capable of doing trivial bandmath on the data themselves after downloading.

I agree that I'm perfectly capable of doing the same bandmath again elsewhere, but (1) it's not me who wants to use those values and (2) I've got everything so nicely configured in OWS already that it would be super sweet to just download the stuff from there straight away. I also agree that WCS should provide the raw data, but the question is how raw should it be?

I would also like to raise the point of consistency of the same product across the various services. Which leads me to formulate the expected behaviour: WMS and WCS export "the same thing". As in: Taking the values from the WCS and applying the same colour ramp as in the styling config results in the same image that the WMS exports. Or worded vice versa: 'Un-applying' the colour ramp from the WMS image yields the WCS values.

Or is there another way to get values out of the WCS that have the index function already applied?

Steps to Reproduce

Compare the result of this WMS request:

https://ows.eo2cube.org/wms?SERVICE=WMS&VERSION=1.3.0&REQUEST=GetMap&LAYERS=s2_vi_ndvi_diff_winter_wheat&STYLES=&WIDTH=512&HEIGHT=512&FORMAT=image%2Fpng&CRS=EPSG%3A3857&DPI=96&MAP_RESOLUTION=96&FORMAT_OPTIONS=dpi%3A96&TRANSPARENT=TRUE&TIME=2020-01-02T00%3A00%3A00Z&BBOX=1389247.83251150674186647,7142683.10669049434363842,1392738.67198907374404371,7146173.94616806134581566

with what you get from this corresponding WCS request:

https://ows.eo2cube.org/wcs?SERVICE=WCS&VERSION=1.0.0&REQUEST=GetCoverage&FORMAT=GeoTIFF&COVERAGE=s2_vi_ndvi_diff_winter_wheat&WIDTH=512&HEIGHT=512&CRS=EPSG:3857&RESPONSE_CRS=EPSG:3857&TIME=2020-01-02T00%3A00%3A00Z&BBOX=1389247.83251150674186647,7142683.10669049434363842,1392738.67198907374404371,7146173.94616806134581566

(same TIME and BBOX and LAYERS/COVERAGE)

Context (Environment)

datacube-ows version

datacube-ows-update --version
Open Data Cube Open Web Services (datacube-ows) version 1.8.36.dev0+g67003f3.d20240617

datacube-ows config

https://github.com/eo2cube/ows_config/

datacube product metadata

datacube product show s2_c1_l2a

Result
description: The Sentinel-2 program provides global imagery in thirteen spectral bands
    at 10m-60m resolution and a revisit time of approximately five days. This dataset
    represents the global Sentinel-2 archive, from 2016 to the present, processed
    to L2A (bottom-of-atmosphere) using Sen2Cor.
measurements:
-   aliases:
    - B01
    dtype: uint16
    name: coastal
    nodata: 0
    units: '1'
-   aliases:
    - B02
    dtype: uint16
    name: blue
    nodata: 0
    units: '1'
-   aliases:
    - B03
    dtype: uint16
    name: green
    nodata: 0
    units: '1'
-   aliases:
    - B04
    dtype: uint16
    name: red
    nodata: 0
    units: '1'
-   aliases:
    - B05
    dtype: uint16
    name: rededge1
    nodata: 0
    units: '1'
-   aliases:
    - B06
    dtype: uint16
    name: rededge2
    nodata: 0
    units: '1'
-   aliases:
    - B07
    dtype: uint16
    name: rededge3
    nodata: 0
    units: '1'
-   aliases:
    - B08
    dtype: uint16
    name: nir
    nodata: 0
    units: '1'
-   aliases:
    - B8A
    dtype: uint16
    name: nir08
    nodata: 0
    units: '1'
-   aliases:
    - B09
    dtype: uint16
    name: nir09
    nodata: 0
    units: '1'
-   aliases:
    - B11
    dtype: uint16
    name: swir16
    nodata: 0
    units: '1'
-   aliases:
    - B12
    dtype: uint16
    name: swir22
    nodata: 0
    units: '1'
-   aliases:
    - SCL
    dtype: uint8
    flags_definition:
        sca:
            bits:
            - 0
            - 1
            - 2
            - 3
            - 4
            - 5
            - 6
            - 7
            description: Sen2Cor Scene Classification
            values:
                '1': saturated or defective
                '10': thin cirrus
                '11': snow or ice
                '2': dark area pixels
                '3': cloud shadows
                '4': vegetation
                '5': bare soils
                '6': water
                '7': unclassified
                '8': cloud medium probability
                '9': cloud high probability
    name: scl
    nodata: 0
    units: '1'
-   aliases:
    - aerosol_optical_thickness
    - AOT
    dtype: uint16
    name: aot
    nodata: 0
    units: '1'
-   aliases:
    - scene_average_water_vapour
    - WVP
    dtype: uint16
    name: wvp
    nodata: 0
    units: '1'
metadata:
    product:
        name: s2_c1_l2a
metadata_type: eo3
name: s2_c1_l2a
@SpacemanPaul
Copy link
Contributor

SpacemanPaul commented Aug 1, 2024

Thanks Christoph. As it happens we've had some very similar conversations around WMS/WCS interoperation internally at DEA recently.

The problem is that WMS and WCS are very different protocols with very different data models and intended use cases. OWS does blur those distinctions somewhat in an effort to promote consistency between them, but there's only so far we can go while still being standards compliant for both.

The main issues complicating consistency are:

  1. WMS has the concept of "styles" within a layer and leaves implementations fairly unconstrained in how they interpret those concepts. The original intent appears to be supplying different ways of rendering underlying vector data into map tiles, and the standard allows for selecting multiple styles simultaneously, which doesn't make much sense for OWS's pure raster approach.
  2. WCS has no concept of "styles", but rather thinks in terms of ranges (i.e. bands/measurements).
  3. The WCS spec is much more proscriptive across the board, and doesn't really have any conceptual areas where the implementation is unconstrained in re-interpreting them for particular use cases, as is the case in WMS.
  4. The fact that WCSv1 and WCSv2 are almost completely different protocols further complicates matters.
  5. When working with multiple dates, WMS always returns a 2D map tile where WCS can return a 3D (x,y,t) raster file.

With that mind however, it should be possible to collect the index styles offered by a given layer, and advertise those indexes as additional "raw bands" on the corresponding coverage in WCS. That way your WCS query above (which doesn't specify measurements and therefore gets all available measurements) would get the raw bands AND the calculated style indexes.

But multi-date handler indexes in WCS (as previously requested by @robbibt ) are probably unachievable without breaking WCS's request model.

@SpacemanPaul SpacemanPaul self-assigned this Aug 1, 2024
@christophfriedrich
Copy link
Contributor Author

I haven't really used WCS before, so your considerations are probably a lot more profound, but they are also going a lot further and are a lot more fundamental than what I actually meant. I don't need multi-date stuff or more than one style per layer, I just want a 2D raster of the values that I see colour-coded in the WMS.

And that's my main point: Currently, the output of the two services is fundamentally different when an index function is involved. When I see an NDVI map in the WMS, and specify COVERAGE=s2_vi_ndvi in the WCS, I expect a raster with NDVI values between -1 and 1, not two rasters with raw Sentinel-2 values in the four-digit range plus the categorical layer used to mask the whole thing. Because semantically that's not what I requested.

The problems you mentioned are valid if one wanted to fully map every feature of each standard to the other. But I don't see how they would block fixing the problem I described?


it should be possible to collect the index styles offered by a given layer [... to get] the calculated style indexes.

Because I needed this now, I plunged forward and constructed a hacky tweak that works for our use case. For this I replaced the standard WCS TIFF handler in

    "wcs": {
        "formats": {
            "GeoTIFF": {
                "renderers": {
                    "1": "datacube_ows.wcs1_utils.get_tiff"

with a self-written function

                    "1": "ows_refactored.s2_vi.wcs.as_tiff_wcs1",

that does essentially this:

from datacube_ows.wcs1_utils import get_tiff

class FakeBandIndex:
    def band_label(self, band):
        return "index_function"
    def nodata_val(self, band):
        return 0

def as_tiff_wcs1(req, data):
    dataarray = req.product.styles[0].apply_index(data)
    dataset = dataarray.to_dataset()
    req.product.band_idx = FakeBandIndex()  # very hacky...
    return get_tiff(req, dataset)

See eo2cube/ows_config@47d122f (the code there has comments)

So I just apply the index function, which of course fundamentally changes the data structure, createing several issues further down, which I just silence by injecting that FakeBandIndex that simply returns the hard-coded values I need in this case.

It works beautifully, you can try it by requesting the WCS link from my initial post.

But of course it's absolutely not generic...


The only current problem for me is that masking is not applied. I thought I could sneak that functionality too, but it turns out that the mask (1) is not built encapsulatedly, but in the middle of get_map, and (2) is not applied at data level, but directly in the visualisation: _write_png calls transform_data, which first does apply_index but then also immediately color_ramp.apply, before apply_mask_to_image is called, which already expects an RGB(A) array (into which it burns the mask by setting the alpha channel to transparent where needed).

Do you have an idea how this could be solved currently (without replicating a lot of code)?

And what do you think about this approach? Would a more generic version be of interest to be included into datacube-ows?

@christophfriedrich
Copy link
Contributor Author

Okay, so it turns out that I don't even need all the logic inside get_map to construct an extent_mask, and that the only thing apply_mask_to_image actually does is assuming that there is a red band and adding an alpha one, so I did this:

fake_rgb = dataset.rename({"index_function": "red"})
masked = req.product.styles[0].apply_mask_to_image(fake_rgb, mask, 1, 1)
result = masked["red"].where(masked["alpha"]).to_dataset().rename({"red": "index_function"})

Until I realised that I don't need the logic in apply_mask_to_image either and can literally just do:

mask = req.product.styles[0].to_mask(data)
result = dataset.where(mask)

See eo2cube/ows_config@bc4b795


But it appears that the original get_tiff in wcs1_utils.py doesn't expect nans to be present, because as soon as I had some in there, the TIFF statistics were all nan, causing the automatic scaling to fail when one drops the resulting TIFF into QGIS. So in the with memfile.open(driver="GTiff", ... section, I changed these lines:

if cfg.wcs_tiff_statistics:
dst.update_tags(idx, STATISTICS_MINIMUM=data[band].values.min())
dst.update_tags(idx, STATISTICS_MAXIMUM=data[band].values.max())
dst.update_tags(idx, STATISTICS_MEAN=data[band].values.mean())
dst.update_tags(idx, STATISTICS_STDDEV=data[band].values.std())

to the nan-aware variants:

if cfg.wcs_tiff_statistics:
    dst.update_tags(idx, STATISTICS_MINIMUM=numpy.nanmin(data[band].values))
    dst.update_tags(idx, STATISTICS_MAXIMUM=numpy.nanmax(data[band].values))
    dst.update_tags(idx, STATISTICS_MEAN=numpy.nanmean(data[band].values))
    dst.update_tags(idx, STATISTICS_STDDEV=numpy.nanstd(data[band].values))

That's the only modification in datacube-ows code I needed so far.

@SpacemanPaul
Copy link
Contributor

Nice - yes that's one (slightly hacky) way of doing it! I would like to handle this more generically at some point by treating the index functions of all defined colour ramp styles as pseudo-measurements within WCS.

The stats fix is a bug that could affect all floating-point bands though - if you want to do a quick PR just for that patch, I'll happily merge it.

@christophfriedrich
Copy link
Contributor Author

I would still argue that the index function(s) are the only 'proper' measurement in this case, but I get your point and if we just add more layers (instead of replacing), everyone gets what they want :)

I would highly appreciate if you were to implement this more generically! With my understanding of the architecture I don't feel like I can do it at this point, but experimenting with a few more requests against my current solution revealed several cases that don't seem to work just yet, so it really is a hacky workaround that is somewhat okay for now but shouldn't stay around for too long, desiring to be replaced by something proper.

PR's there :) I applied the same fix in WCS2 too.

@robbibt
Copy link
Contributor

robbibt commented Aug 9, 2024

Hi all, have been listening in to this in the background - just want to confirm that this below very much aligns with what I would think of as "expected functionality" too. Getting back the raw spectral band layers when I select "export" from a specific index function is pretty suprising, and I suspect pretty confusing to our users who just want to export the data that's actually being shown/styled on the map.

And that's my main point: Currently, the output of the two services is fundamentally different when an index function is involved. When I see an NDVI map in the WMS, and specify COVERAGE=s2_vi_ndvi in the WCS, I expect a raster with NDVI values between -1 and 1, not two rasters with raw Sentinel-2 values in the four-digit range plus the categorical layer used to mask the whole thing. Because semantically that's not what I requested.

(totally recognise the complexities to actually achieving this, but just emphasising that the current functionality isn't ideal from a usability perspective).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants