Integer-valued ranges and missing values #78

jonblower · 2016-07-25T14:00:38Z

In JSON there's nothing wrong with having an array of integers with missing values:

{
      "type" : "NdArray",
      "dataType": "integer",
      "axisNames": ["t","z","y","x"],
      "shape": [1, 1, 2, 3],
      "values" : [ 5, 6, 4, 6, null, 2 ]
}

But in many programming languages (e.g. numpy in Python) this can cause an issue as there is no way to record a "missing value" in an array of integers. (With floating point numbers one can use NaN for missing values.)

The workarounds would include:

Use a masked array (i.e. a parallel array of flags to indicate missing values), which adds inefficiency.
Use an array of objects (which can be nulled), instead of an array of primitives, which also adds inefficiency.
Use a "special" integer values (e.g. -999) to denote missing values, and make sure this is taken into account in calculations (requires extra metadata on the NdArray to advertise this special value, and usually translates into the creation of a masked array anyway).

So there are two possible courses of action:

Consider the presence of "null" in an integer array to be an error, and disallow it in the spec
Allow the use of "null", but provide advice to data providers of the difficulties it may cause for clients. (If the data are categorical, then assigning one category to "missing data" may be preferable to using nulls.)

letmaik · 2016-07-26T23:03:32Z

On 25/07/2016 15:00, Jon Blower wrote:

Consider the presence of "null" in an integer array to be an
error, and disallow it in the spec

Allow the use of "null", but provide advice to data providers of
the difficulties it may cause for clients. (If the data are
categorical, then assigning one category to "missing data" may be
preferable to using nulls.)

Hm, I think from a semantic point of view I don't like "missing data"
categories that much. For example, think about an observed property
"land cover" with categories grassland, urban, ... "Missing data"
doesn't fit into that collection in my opinion. And this also makes
rendering more complicated as there is no easy way anymore to detect
missing data and to skip rendering of those pixels etc.

I think this issue should be solved in the software by reading the
integer data into an array and replacing null's with an unused integer
(which would be fairly easy to find out by first scanning the input
array for the maximum value). This unused integer is then marked as
missing value, either with abstractions like numpy's masked arrays or
otherwise. I don't think the performance impact would be noticeable.

jonblower · 2016-07-27T07:26:14Z

I think you're right that that's the only realistic solution. A bit of a pain though.

With numpy arrays of floating point numbers, does it matter if we use NaNs for missing values, or are masked arrays better? Do they give different results, or perform differently?

letmaik · 2016-07-27T17:03:13Z

If efficiency is important, I would use NaNs because masked arrays are
still slower (but maybe not enough to notice for our purposes!). If you
use NaNs, then you have to be careful with aggregation operations and
there are special versions like np.nanmean which ignore NaNs. For
consistency though, I would probably use masked arrays everywhere (for
float and integer arrays).

letmaik · 2022-02-18T21:48:07Z

Closing this as it's not a real issue. The libraries we've created can deal with it, and yes it's a bit annoying, but it's also not too bad. I think, if anything, then this may be picked up in the future with a new range type that supports both missing value encoding but also offset/scaling.

chris-little mentioned this issue Feb 17, 2022

How should we process the 22 outstanding CoverageJSON issues opengeospatial/CoverageJSON#11

Closed

letmaik closed this as completed Feb 18, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integer-valued ranges and missing values #78

Integer-valued ranges and missing values #78

jonblower commented Jul 25, 2016 •

edited by letmaik

Loading

letmaik commented Jul 26, 2016

jonblower commented Jul 27, 2016 •

edited

Loading

letmaik commented Jul 27, 2016

letmaik commented Feb 18, 2022

Integer-valued ranges and missing values #78

Integer-valued ranges and missing values #78

Comments

jonblower commented Jul 25, 2016 • edited by letmaik Loading

letmaik commented Jul 26, 2016

jonblower commented Jul 27, 2016 • edited Loading

letmaik commented Jul 27, 2016

letmaik commented Feb 18, 2022

jonblower commented Jul 25, 2016 •

edited by letmaik

Loading

jonblower commented Jul 27, 2016 •

edited

Loading