Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Raster Field in Features #50

Open
flippmoke opened this issue Nov 19, 2015 · 18 comments
Open

Raster Field in Features #50

flippmoke opened this issue Nov 19, 2015 · 18 comments

Comments

@flippmoke
Copy link
Member

I know that @pnorman @pramsey and @ShibaBandit all commented in #39 about the experimental state of rasters. Here is the best clarification I can offer:

A decoder using v2 can simply ignore the raster field, we realize that a more detail specification for rasters is needed. We currently aren't sure how we think rasters should best be put into this field and some research/experimenting is required. We did want to include the field however, as we will be working in the mapnik-vector-tile repository to expand the capabilities for rasters in vector tiles. Additionally, @springmeyer has told me that some groups have already started to experiment with this as a feature. The goal is that we will have this as an experimental field for version 2 and it will lay the groundwork for the community to work on how specifically how the raster data will be stored.

@springmeyer
Copy link

Given things are still experimental and not well decribed, I propose we keep rasters out of v2 and plan to give them attention for a future spec.

@flippmoke
Copy link
Member Author

👍 removing from milestone, tagging for future releases.

@flippmoke flippmoke removed this from the v2.0 milestone Nov 20, 2015
@mojodna
Copy link

mojodna commented Nov 21, 2015

The raster support is the main thing I have an investment in, particularly making it clear to use while retaining the ability to use it in creative and unanticipated ways. I'll keep an eye for discussion once it's time.

@markeberhart
Copy link

With "video" map tiles on the horizon, I'd recommend a "raster" format that is more akin to small FLV or SWF-like files for enhanced support of time-based motion.

@JesseCrocker
Copy link

It would be nice to be able to include a raster as an attribute on a feature. I imagine using this to put icons used to render points features in tiles, rather than in a sprite sheet.

@e-n-f
Copy link
Contributor

e-n-f commented Mar 26, 2018

I want to think through some ideas here about what rasters are for and how they should be represented.

  • @JesseCrocker points out the use case for an icon type of raster: associated with a point, and with a fixed size, rather than scaled as the tile is zoomed
  • There is the common situation of image tiles. Usually these are the same size as the tile, but there is no reason they would have to cover the entire tile.
  • Digital elevation models are like images, but the pixels are numbers, not colors
  • Tile-count is something like a DEM, but the values are densities, not heights.
  • Another variation on the tile-count theme tracks things that are even more abstract, like a 3-tuple of (x offset, y offset, count) to log the average vector of travel from a given point.

For both of the image cases, I think we want an attribute type that is a binary blob that can contain JPEG or PNG data, and which can be used as part of compound attributes.

The icon attribute could be a compound attribute on a Point feature, something like

"type": "Point", "properties": { "icon": { "x_px": 16, "y_px": 16, "png": BLOB(…) } }

The tiled bitmap could be an attribute on a Polygon feature, just

"type": "Polygon", "properties": { "png": BLOB(…) }

which also generalizes to quadrilateral but not necessarily rectangular texture maps.

The downside of this representation of tiled bitmaps is that it requires some magic knowledge in the encoder during tiling, to know that if the feature as specified in the original dataset has a bitmap attribute, that bitmap should be split up when the feature is clipped into multiple tiles. This might be an argument that there should be a special grids section of the Feature that is explicitly known to be clippable:

"type": "Polygon", "grids": { "png": BLOB(…) }, "properties": { … }

instead of allowing the bitmap to be one of the properties, which are normally not modified during tiling.

For the DEM and count cases, compound attributes already give a natural notation with no other extensions:

"type": "Polygon", "properties": { "dem": [ [ 1, 2, 3 ], [ 4, 5, 6 ], [ 7 8 9 ] ] }

If it turns out that this compresses badly, we could consider a special attribute representation for multidimensional arrays of numbers that compresses better, which any other numeric-array attribute would also benefit from.

For the same clipping reason as for images above, we might want to consider putting DEM-type data in a grids section as well, so it can be clipped during tiling.

"type": "Polygon", "grids": { "dem": [ [ 1, 2, 3 ], [ 4, 5, 6 ], [ 7, 8, 9 ] ] }, "properties": { … }

The other thing to think about during tiling is whether there should be downsampling on these images by zoom level. For icons there shouldn't be, but for bitmaps and DEMs there probably should be. It probably makes sense to downsample using the tile resolution as the unit within the bitmap as well: if a tile is 4096x4096 and a bitmap feature occupies half of that area and has a higher resolution than 2048x2048 after clipping, the clipped data should be downsampled to 2048x2048.

With PNGs and JPEGs there is probably the expectation of bilinear or bicubic interpolation during downsampling. With DEM data this would probably also be appropriate, but there may be other cases where the encoder doesn't know the ultimate interpretation of the data, and can reasonably do nearest-neighbor samples from it but not do any other interpolation. Maybe we could establish a convention that raw numbers in the 2-dimensional array can be interpolated, but if the elements are nested one level deeper or contain strings:

"type": "Polygon", "grids": { "dem": [ [ [1], [2], [3] ], [ [4], [5], [6] ], [ [7], [8], [9] ] ] }, "properties": { … }
"type": "Polygon", "grids": { "dem": [ [ "1", "2", "3" ], [ "4", "5", "6" ], [ "7", "8", "9" ] ] }, "properties": { … }

then the grid is opaque and can be sampled but not interpolated. Alternately it could be in another special section of the feature to clarify what operations are legitimate.

@mojodna
Copy link

mojodna commented Mar 26, 2018

@jenningsanderson and I were discussing (ab)using the raster for storage of additional PBF-encoded data that doesn't necessarily make sense in the existing fields, so I'm in favor of either generalizing it (perhaps with a content-type for consumers) or including an alternative field so that opaque, application-specific blobs can be included in tiles.

@e-n-f
Copy link
Contributor

e-n-f commented Mar 26, 2018

Although it's not part of the vector tile spec itself, this also raises the question of how to talk about blobs in the JSON textual representation of a feature. The main options I see are:

  • Invent a JSON syntax for blobs (or use someone else's if there is a precedent)
  • Treat them as strings, but strings of bytes instead of UTF-8 strings
  • Treat them as strings that can only contain the characters U+0000 through U+00FF
  • Treat them as arrays of numbers
  • Use Node's <Buffer 88 13 70 17> form
  • Use Node's Buffer.from([1, 2, 3]) form

Additional option suggested by @sgillies:

  • Treat them as base64 strings

@e-n-f
Copy link
Contributor

e-n-f commented Mar 26, 2018

@mojodna I absolutely agree that there should be a namespaced way to store arbitrary binary data within a feature or layer, with the convention that any tool that doesn't know what a message means should pass it through unchanged.

@mojodna
Copy link

mojodna commented Mar 26, 2018

ArrayBuffers don't appear to serialize in JavaScript ("{}" is the output in Node), but Buffers do:

> JSON.stringify(Buffer.from([1,2,3]))
'{"type":"Buffer","data":[1,2,3]}'

I think my preference is for arrays of numbers, for whatever that's worth.

@e-n-f
Copy link
Contributor

e-n-f commented Apr 2, 2018

One other thing that I realized this morning: there are two types of gridded data:

  • Data on the grid intersections
  • Data on the pixels in between the intersections

What I mean is that, for instance, SRTM elevation data has a 3601x3601 grid for each of its 1-degree squares, not 3600x3600, because the elevation data is for points on the grid. The rightmost column and bottommost row of each grid are duplicated in the next adjacent tile, because, just like with vector tiles, a point that is right on the tile boundary is considered to be in both tiles.

In contrast, the worldwide 30-minute dataset for the Gridded Population of the World is 720 half-degrees wide, not 721, because the data is a sum for the interior of each grid cell, not an instantaneous measurement on each edge of it. Like raster tiles, the edges are not duplicated between adjacent tiles.

So we need to be able to clip and downsample both types of grids. Maybe grid_edges vs grid_cells? Is there a standard name for these?

@JesseCrocker
Copy link

@ericfischer In GeoTiff metadata that value is usually stored as AREA_OR_POINT

@e-n-f
Copy link
Contributor

e-n-f commented Apr 3, 2018

Representing a grid as a 2-D array feature attribute compresses reasonably well: 5.02MB for one 3601x3601 SRTM tile (N37W123.hgt), compared to 4.89 MB for the original binary data, gzipped. Surprisingly, the stringified form is hardly any worse: 5.03MB in the tileset.

(My implementation in Tippecanoe turns out to be very slow and memory-hungry for JSON data of this size, though.)

@e-n-f
Copy link
Contributor

e-n-f commented Apr 3, 2018

We should explicitly allow null elements in the grid to handle unknown data or excluded cells.

@e-n-f
Copy link
Contributor

e-n-f commented Apr 9, 2018

After further reflection, these probably really should be separate grid_value, area_value, and image_value attribute types, even if the former two have the same representation as list_value, just to clarify that they are potentially downsampled from the original, and so the input and output JSON representations can be consistent.

If the image is in PNG or JPEG format, rather than an array of pixel values, an image_value type would be especially important.

@e-n-f
Copy link
Contributor

e-n-f commented Apr 10, 2018

I have started experimenting with gridded data in mapbox/tippecanoe#557

The main additional thing that has some up so far is that this is going to be challenging to integrate with merging clipped sections of features across tile boundaries (#104).

The tricky part is that when you clip a gridded feature, the edges of the clipped feature must be snapped to the data grid of the original feature, to keep the grid from falling out of alignment across tile boundaries. So at the very least, the feature merging will have to expect up to one grid cell of overlap between features in the buffer of adjacent tiles, and know how to merge the grids.

The extra tricky part is that at high zooms, a grid cell can be a substantial multiple of the tile extent. So the tiling must be careful not to make the geometry big enough to hold the whole original grid cell (or it will run into integer overflow problems in rendering), while still setting it up to have the visible bounds at the right place if the tile contains the edge of the geometry or of a grid cell. I think it will always be possible to put the edges in the right place visually, but this further compounds the divergence from what feature merging would usually expect.

@andrewharvey
Copy link

Sorry for being a bit naive, but what's the advantage for supporting DEM or satellite/imagery rasters inside a MVT as opposed to serving raw png/whatever tiles with a bit of help from https://github.com/mapbox/tilejson-spec to make sense of it?

@e-n-f
Copy link
Contributor

e-n-f commented Apr 16, 2018

My perspective is that it is reasonable to continue to use image tiles for things that are images, but that there is still an advantage to treating them as feature attributes or at least as layers in that it makes it possible to give names to the image layers and to describe their content.

On the other hand I think encoding DEMs as PNGs is a hacky workaround for the lack of a proper representation, not something that should be perpetuated. If the data can be represented in its native units, on its native grid, in a self-describing way, that seems superior to me to representing it indirectly and approximately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants