Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid geom in some tiles (GeoJSON) #698

Closed
nvkelso opened this issue Apr 11, 2016 · 13 comments
Closed

Invalid geom in some tiles (GeoJSON) #698

nvkelso opened this issue Apr 11, 2016 · 13 comments
Assignees
Milestone

Comments

@nvkelso
Copy link
Member

nvkelso commented Apr 11, 2016

We've seen this a few times, and while Tangram mostly handles it okay, other tools barf on invalid geometries.

Seems like the data is being transformed during vector tile creation (I suspect it's fine in PostGIS) and that transform results in self-intersecting geoms.

@wboykinm reported this on Twitter last week. He's been testing Mapzen vector tiles with Turf.js and found recurring examples of geometry errors for many metro areas. Superficially, he suspects feature simplification is causing some of this.

Here's an example:

More about the app:

Bill is trying to erase US census geometry with the Mapzen water geom using turf.js, see https://github.com/wboykinm/tribes/blob/mapzen/processing/water/piranha.js.

PostGIS has ST_Makevalid() but Turf.js doesn't have a similar function so simple errors like self-intersecting polygons and non-noded intersections throw big unrecoverable errors that result in data loss.

Looks like Mapnik does some OGC validation in their tiles:

To find more invalid examples, Bill built a tool that tosses out self-intersecting polygons: https://www.npmjs.com/package/turf-bathwater. it'll also log the features that get thrown out.

@nvkelso nvkelso added this to the v0.10.0 milestone Apr 11, 2016
@nvkelso nvkelso added the ready label Apr 11, 2016
@rmarianski
Copy link
Member

It turns out that this is due to precision loss when we write the json out. @zerebubuth: what do you think we should do here?

@wboykinm
Copy link

@rmarianski Are you guys writing to a standardized decimal precision or is it zoom-dependent?

@zerebubuth I see you folks have already been down this road.

@zerebubuth
Copy link
Member

The precision depends on the format and the zoom level. For GeoJSON (.json tiles), we change the number of digits of precision by zoom level. For TopoJSON, MVT and OpenScienceMap, the coordinates within the tile are integers, so are inherently bounded by zoom level.

This is, indeed, a well travelled path, and I'm not entirely sure what the best option might be - or whether one exists.

If we want only valid features in the tiles (which seems like a sensible thing to want), then we must do the validity checks after any encoding steps which would simplify or truncate precision (e.g: rounding to integers, truncating decimal places, dropping points due to movement tolerances). However, these steps are different for each output format, so we might end up dropping different features for each format, which could be confusing. Alternatively, we could encode all features in all formats and only output those features which are present in all formats - but this seems like a great deal of effort.

@nvkelso
Copy link
Member Author

nvkelso commented Apr 12, 2016

@wboykinm
Copy link

#missioncreep

@rmarianski
Copy link
Member

Based on a previous discussion, it sounds like we are willing to accept that different formats can contain differences in terms of content.

In this particular case of geojson encoding though, do we want to:

  1. encode with truncated precisions based off zoom, make a decoding pass, and re-encode the invalid geometries with full precision?
  2. encode with truncated precisions based off zoom, make a decoding pass, and drop the invalid geometries?
  3. encode with truncated precisions based off zoom, try a decoding pass, and re-encode invalid geometries with buffer(0)?
  4. always encode all features with full precision? (I'm assuming this option will never generate invalid features. But can it still with default json encoding?)

@wboykinm
Copy link

@rmarianski In hacking away at these geoms, I haven't actually had any luck with buffer(0) on the client side (turf.buffer, and the topology remains invalid). Do you mean to try that with PostGIS? And if so, is ST_Makevalid() too slow?

More broadly, I would encourage you folks to consider whether you even want to support a javascript, heavy-geoprocessing use case with your vector tiles. I'm within a few thrown errors of giving up on turf and moving my processing to PostGIS anyway, and you may not encounter this line of inquiry again for months/years.

@wboykinm
Copy link

@rmarianski UPDATE: I caved and tried using PostGIS to fix topology on this particularly problematic water polygon from this Mapzen VT, but neither ST_Makevalid() nor ST_Buffer(0) (or 100m for that matter) resolved the feature into a usable geometry. Workflow:

  • Request tile as geojson
  • Import to postgis w/ OGR2OGR: ogr2ogr -t_srs "EPSG:4326" -f "PostgreSQL" PG:"host=localhost dbname=communities" osm_water.geojson -nln osm_water -nlt PROMOTE_TO_MULTI -lco PRECISION=NO
  • Verify the import by rendering in QGIS
  • Run these variations:
CREATE TABLE valid_water AS (SELECT ST_Buffer(wkb_geometry,0) AS the_geom FROM osm_water);
CREATE TABLE valid_water AS (SELECT ST_Buffer(wkb_geometry,0.001) AS the_geom FROM osm_water);
CREATE TABLE valid_water AS (SELECT ST_Makevalid(wkb_geometry) AS the_geom FROM osm_water);
  • In each case, the above feature (and a few others) are removed from the output. Even PostGIS can't handle this topology.

@wboykinm
Copy link

wboykinm commented Aug 5, 2016

@nvkelso Does the merged PR above address the original problem? I can try running turf over it again if it'd be helpful.

@rmarianski
Copy link
Member

It should address it yes. Running turf on it again would be helpful to know if it actually helped.

@nvkelso
Copy link
Member Author

nvkelso commented Aug 6, 2016

To set expectation, the fix will probably address many geom validity problems (and hopefully yours!) but not all geom problems. That is a Hard Problem (tm).

On Aug 5, 2016, at 14:02, Robert Marianski notifications@github.com wrote:

It should address it yes. Running turf on it again would be helpful to know if it actually helped.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

@nvkelso
Copy link
Member Author

nvkelso commented Sep 22, 2016

@rmarianski Anything more to test with this one before taking it to prod?

@rmarianski
Copy link
Member

Should be good to go here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants