Basic statistics for dimensions #5

wonder-sk · 2023-03-02T11:51:16Z

It would be useful to have an optional attribute with basic stats defined by this extension:

min/max for all dimensions
enumeration of distinct values (with counts) for some dimensions - e.g. classification, number of returns, return number, point source ID, edge of flight line, scan direction, scanner channel

For software that does visualization of point clouds, this is quite important for initialization of renderer settings. Without stats, one has to sample data to extract them upon load. This would be especially useful when working with a collection of items to quickly get aggregate stats of the whole collection, rather than having to touch every assets of individual items.

wonder-sk · 2023-03-02T12:22:06Z

A relevant discussion from some time ago when COPC was being designed: copcio/copcio.github.io#19
TL;DR: we could add also more detailed stats (mean, variance, histogram) but in the end those may not be needed by the clients or the clients may have more specific needs making the additional stats useless (e.g. picking a good bucket size for GpsTime can be tricky).

To kick off some discussion I would propose this a new pc:stats attribute with this kind of content:

{
  "Intensity" : {
    "minimum": 0,
    "maximum": 12345
  },
  "GpsTime": {
    "minimum": 123456.78,
    "maximum": 123999.99
  },
  "Classification": {
    "minimum": 0,
    "maximum": 7,
    "class-count": {
      "0": 1000,
      "1": 2000,
      "3": 4000,
      "7": 8000
    }
  },
  "ReturnNumber": {
    "minimum": 1,
    "maximum": 3,
    "class-count": {
      "1": 9000,
      "2": 4000,
      "3": 2000
    }
  },
  ...
}

wonder-sk · 2023-03-02T12:22:30Z

cc @hobu

wonder-sk · 2023-03-03T15:49:42Z

Oops only now I have realized that there is already a Stats object defined in the extension 🤦‍♂️
It just does not include support for classes and their counts.

Other notes on the existing Stats object:

there is stddev and variance which are essentially the same thing - worth dropping one of those
count I assume is the same for all dimensions and the same value as pc:count - probably not worth including it?
position does not seem relevant to statistics at all
average + stddev (or variance) - IMHO they are not that useful and could be dropped, but no problem to keep them either

raelwaed · 2023-03-05T19:49:35Z

Great post @wonder-sk - I was planning a similar post just this week. My concern is the stats object is just a dump of PDAL information without considering the value to STAC - i.e. What do people want to search for?

Many of the example stats objects provide little value, e.g. ScanDirectionFlag, EdgeOfFlightLine, Classification, UserData, etc. And within those stats objects fields like count and position are questionable.

stdev and variance are just one square root away from each other - but I think they can be left as optional.

I was planning to add a pc:classification field as a [string] of Classifications so you knew what was in a point cloud, but prefer your proposal so you can quantify how much of a particular classification exists.

Would consider changing these raw counts to percentages?

The number of returns is valuable, but we have lot of metadata that gives more context to the returns themselves that I would like to capture - e.g. "First and Last" would mean we have just two returns and ignored all intermediate returns, or ""4 Returns (1st, 2nd, 3rd, last)"

m-mohr · 2023-03-06T12:10:00Z

Maybe you can align with or use the Classification extension? https://github.com/stac-extensions/classification

hobu · 2023-03-06T13:03:16Z

My concern is the stats object is just a dump of PDAL information

Indeed this was the case, and the intention was to see if we could attract usage and attention to improve the extension. Maybe now is the time. I don't think we have found the stats particularly helpful for searching, but we haven't ditched the schema stuff. That said, I think the schema stuff would probably be better expressed in arrow or regular flatbuf for reusability in other contexts.

You can see an Item collection example we write for the USGS 3DEP lidar collection at https://usgs-lidar-stac.s3-us-west-2.amazonaws.com/ept/item_collection.json

If you visit https://viewer.copc.io you can also bring any of those in and viewing by clicking on the USGS 3DEP LiDAR link and then double clicking on any name that looks interesting or filtering by simple regex.

mccarthyryanc · 2023-08-30T20:12:58Z

I like @wonder-sk suggestions on updating the stats object. Since they are all optional, perhaps it is enough to add another optional class-count?

@m-mohr, I think that extension (correct me if I'm wrong, I just learned about it) describes all possible classes. In this case we just want to summarize the classes present in a single item. So if you were working with LAZ 1.4 data, you'd put the ASPRS Class definitions into the schema, not the statistics.

To simplify this for searching, I usually just want to know if a pointcloud has any building classified points (I don't really care how many points there are.) So maybe modify @wonder-sk suggestion into something like:

    "unique-classes": {
        "title": "unique list of classifications",
            "type": "array",
            "minItems": 1,
            "items": {
                "title": "point classifications present in pointcloud",
                "type": "integer"
            }
    }

And then add unique-classes as an optional field in the stats object?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Basic statistics for dimensions #5

Basic statistics for dimensions #5

wonder-sk commented Mar 2, 2023

wonder-sk commented Mar 2, 2023

wonder-sk commented Mar 2, 2023

wonder-sk commented Mar 3, 2023

raelwaed commented Mar 5, 2023

m-mohr commented Mar 6, 2023

hobu commented Mar 6, 2023

mccarthyryanc commented Aug 30, 2023

Basic statistics for dimensions #5

Basic statistics for dimensions #5

Comments

wonder-sk commented Mar 2, 2023

wonder-sk commented Mar 2, 2023

wonder-sk commented Mar 2, 2023

wonder-sk commented Mar 3, 2023

raelwaed commented Mar 5, 2023

m-mohr commented Mar 6, 2023

hobu commented Mar 6, 2023

mccarthyryanc commented Aug 30, 2023