Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support to parse some or all MVT feature properties as JSON #715

Merged
merged 6 commits into from
May 30, 2019

Conversation

bcamper
Copy link
Member

@bcamper bcamper commented Apr 21, 2019

Sometimes source data includes properties with a richer object format than just strings or numbers -- e.g. arrays, nested objects, etc. The MVT spec doesn't prescribe explicit behavior for these cases, but notes that common tools such as Tippecanoe and Mapnik will encode these properties as stringified JSON.

This requires client-side support for parsing these fields. This PR adds support for this on MVT data sources through a new parse_json property:

  • If parse_json is true, then each property will be checked to see if it "looks like" stringified JSON (defined as a string with first character being { or [); if so, it is parsed as JSON (with JSON.parse).
  • If parse_json is an array of property names, only those specific properties are checked for JSON parsing. This is preferred to the above, because it limits the parsing impact to only the fields that need it.
  • If parse_json is undefined/null/false, no special parsing of properties is done (e.g. the current behavior).

Example usage:

sources:
  tiles:
    type: MVT
    url: https://...
    parse_json: [prop_a, prop_b] # treat feature properties 'prop_a' and 'prop_b' as stringified JSON

These parsed properties will now be properly returned as full JS objects through functions such as scene.getFeatureAt() or scene.queryFeatures().

To make use of these properties in filtering and styling, you will need to write a JS function to access the complex properties, e.g. if prop_a is an array of values, you could filter on it containing the value x like so:

filter: function() { return feature.prop_a.indexOf('x') > -1; }

@bcamper
Copy link
Member Author

bcamper commented Apr 21, 2019

@nvkelso I think we also had some values similar to this in TIlezen for awhile (for multiple shields?), but removed them (for unrelated reasons).

@bcamper bcamper added this to the v0.19.0 milestone Apr 21, 2019
@nvkelso
Copy link
Member

nvkelso commented Apr 22, 2019

Tilezen still exports the array properties in GeoJSON and TopoJSON, but currently strips out in MVT. We can start exporting them in MVT when Tangram grows support. Primarily these are multishields (road, transit, walk, bike).

@bcamper
Copy link
Member Author

bcamper commented Apr 22, 2019

Tilezen still exports the array properties in GeoJSON and TopoJSON, but currently strips out in MVT. We can start exporting them in MVT when Tangram grows support. Primarily these are multishields (road, transit, walk, bike).

Oh great! Bring it on :)

@bcamper
Copy link
Member Author

bcamper commented Apr 22, 2019

@matteblair thoughts on this on the ES side?

@matteblair
Copy link
Member

This would be a bigger change for Tangram ES since we currently model feature properties explicitly as a mapping of strings to scalar values. However I can see the value in supporting more structured data for properties.

This is also a topic of discussion for the next version of the MVT spec (mapbox/vector-tile-spec#75). It does not seem like the proposed solution in MVT would be mutually exclusive with the one here, but it would make this one redundant. The MVT proposal also has the advantage that it's a much more compact representation of structured data: values can be de-duplicated in MVT but in stringified JSON they must be duplicated.

@bcamper
Copy link
Member Author

bcamper commented Apr 23, 2019

This would be a bigger change for Tangram ES since we currently model feature properties explicitly as a mapping of strings to scalar values. However I can see the value in supporting more structured data for properties.

This is also a topic of discussion for the next version of the MVT spec (mapbox/vector-tile-spec#75). It does not seem like the proposed solution in MVT would be mutually exclusive with the one here, but it would make this one redundant. The MVT proposal also has the advantage that it's a much more compact representation of structured data: values can be de-duplicated in MVT but in stringified JSON they must be duplicated.

I agree the eventual MVT v3 solution will be better, however after some time (that ticket was opened in 2016...) I decided to finally move ahead with this because: 1) there are popular tools that implement this JSON stringify behavior, particularly Tippecanoe, so it's a reality for a lot of existing data, and 2) MVT v3 is going to be a substantial upgrade (tons of other things in the spec) that will require widespread updates to both encoding and decoding libs and I think a lot of MVT v2 data will persist.

@matteblair
Copy link
Member

That makes sense. We probably shouldn't count on the MVT v3 spec being finished any time soon, and even with that approach we would have to update the feature data model in Tangram ES. This feature sounds good to me.

@matteblair
Copy link
Member

I'm also interested in finding ways to make these properties available to filters in a "native" way, since using a JS function in a filter makes it an order of magnitude slower in Tangram ES. Having built-in accesors for structured properties could also implicitly define which ones need to be parsed, and allow us to do it lazily.

@bcamper
Copy link
Member Author

bcamper commented Apr 23, 2019

Agree on finding adding support for more native syntax. I'm mostly concerned about how to do it in a way that doesn't impact performance, since the assumption of a 1:1 mapping from YAML string to feature property key is fast. But I'm sure we can use some clues from data source definition and/or caching strategy to mitigate this.

@bcamper
Copy link
Member Author

bcamper commented May 1, 2019

OK @matteblair inspired me to take this further. I've added support both for accessing nested properties with native syntax in filters, and for doing array-to-array filter operations. I'm not 100% sure of the syntax/naming decisions here, so feedback/alternatives welcome.

Nested properties in filters

Dot notation with . can be used to access nested feature properties (these properties could have been encoded in a GeoJSON/TopoJSON source, or parsed from MVT stringified JSON properties as covered by this PR):

Given a feature property a: { b: { c: 'test' } }, this filter will match:

filter: { a.b.c: test }

Feature property names that include a . can be escaped with \., e.g. a feature property named 'd.e.f': 'escaped' will match with:

filter: { d\.e\.f: escaped }

These could be mixed, e.g. a property { 'a.b': { c: 'mixed' } would match with:

filter: { a\.b.c: mixed }

Array-to-array filters

We have existing syntax for checking if a single value matches one of a list of values, e.g.:

Does value x match a, b, or c?

filter: { x: [a, b, c] }

As part of this new support for "complex" property values that may be arrays, we need to expand this syntax. This branch adds two keywords for this, by extending the same filter object pattern we use for min/max range filter syntax:

To check if an array a contains one or more of the values p, q, r:

filter: { a: { includes: [p, q, r] } }

Note: while it would be possible to check the type value of a to see if it's an array, and apply this logic without additional syntax, I was weary of introducing an additional run-time check that can occur on many many features (vs. the other additions in this branch, which only occur once per filter, lazily when the filter is first parsed/built), especially for a case that is only needed for a small subset of feature properties. On the other hand, jsperf says my browser can perform 500 million array type checks per second, so this may be unfounded.

There are also cases where you'd like to test that an array contains all of a list of values. An includes_all variant was added for this. For instance, to check if an array a contains the values p, q, AND r:

filter: { a: { includes_all: [p, q, r] } }

One could check for multiple array values with multiple filters, but supporting direct syntax for them would be much more efficient than this, which will execute 3 separate array tests, e.g.:

filter:
  all:
    - a: { includes: p }
    - a: { includes: q }
    - a: { includes: r }

Note: includes (implicit "or"/"one of", like existing single-value-to-array filters) vs. includes_all doesn't feel entirely right, but I couldn't come up with something I liked better, suggestions welcome!

@bcamper bcamper requested review from matteblair and nvkelso May 1, 2019 15:18
@matteblair
Copy link
Member

I like the property accesors! Should be both more performant and easier to use for authoring filters. I'd suggest includes_any instead of includes just to make the meaning more obvious.

@matteblair
Copy link
Member

An alternative way of accessing nested properties in YAML would be to use arrays for the filter keys, like:

filter: { [a, b, c]: test }

But I don't think this works in the JS YAML model - plus it's not very natural to read.

The \. escaping just seems a little awkward so I'm trying to think of other syntax options. Hopefully it should be very rare that we encounter data with property keys including ., so it's probably not worth worrying about.

@nvkelso
Copy link
Member

nvkelso commented May 2, 2019

I dig it, and @matteblair suggestion for includes_any.

filter: { d.e.f: escaped }

For the complicated property names, could we allow that be be quoted like "d.e.f", too?

@nvkelso
Copy link
Member

nvkelso commented May 2, 2019

@bcamper can you add an example on how to access the multishields in the all_shields properties in Tilezen's GeoJSON output, which are arrays (so first element in the array, or for each element in the array)?

Will Tangram abstract that GeoJSON and MVT store it in different ways, but I think Tangram will expand the MVT shorthand to the GeoJSON internally?

@bcamper
Copy link
Member Author

bcamper commented May 2, 2019

Agree on includes_any, will update.

Yep, property names can be quoted or unquoted (e.g. d.e.f or "d.e.f"), as per standard YAML syntax (it's just personal preference).

@matteblair you're following my thinking too -- I considered an array syntax, but you're correct JS YAML parser doesn't support it, and then it seemed just as complicated or more to parse it out manually from a string key. And this type of dot notation would be valid JS. Agree \. is a bit awkward, but I concluded it was common enough as a string escaping pattern in other cases. It could also use JS-style bracket notation like a[b][c], but this similarly felt a little more involved to parse and still requires some form of escaping (to support [ and ]) if you're really trying to be complete.

@bcamper
Copy link
Member Author

bcamper commented May 2, 2019

@nvkelso can you provide an example tile/location where that property shows up? Not finding it in a quick search.

@bcamper bcamper force-pushed the mvt-json-decode branch from 0a5cbd2 to cc378cb Compare May 3, 2019 17:13
@nvkelso
Copy link
Member

nvkelso commented May 6, 2019

"all_networks":["US:US","US:CA"],"all_shield_texts":["101","128"]

image

@bcamper
Copy link
Member Author

bcamper commented May 7, 2019

@nvkelso

"all_networks":["US:US","US:CA"],"all_shield_texts":["101","128"]

Can you elaborate on how you think we would want to filter on and use these array fields? For example, are the index positions 0, 1, etc. going to reliably reflect a fixed network "level" (national highway, state highway, etc.)? It seems like we couldn't necessarily guarantee that for all data?

It's easy enough to match on the values, like "does all_networks contain US:CA", but I don't have an immediate answer for how we would then match that index to all_shield_texts to get the desired shield text out (this could be done in a JS function, but we are first seeing what/if we can create native syntax support). We could put the whole object into one array, e.g. something like:

[{ network: 'US:US', text: '101' }, { network: 'US:CA', text: '128' }]

But that also doesn't fit simple filter matching syntax (it's more like the complex destructuring matching offered JS and other languages... which is cool but likely overkill here!); so we'd probably want a JS filter function for it as currently stored.

Let's figure out how we want to style them, then determine what additional scene syntax and/or data format changes are needed (since they aren't in use anyway, right?).

@matteblair other thoughts?

@nvkelso
Copy link
Member

nvkelso commented May 7, 2019

Mostly I'm interested in multishield support, for roads, walking paths, bicycle, and transit. As long as this gets us part way there, and it's possible to step all the way there with an extension then I'm good.

@matteblair
Copy link
Member

I don't know the full range of data for shields, but for the case above it we might be able to use an object property like:

"networks": { "US:US": "101", "US:CA": "128" }

Then do filtering and styling like:

filter: { networks.US:US : true }
draw:
  text:
    text_source: networks.US:US

This implies that property accessors work in styling expressions as well as filtering expressions - not sure if that was planned or not.
Encoding this way is also not very compact if we're stringifying the whole JSON object :\

@nvkelso
Copy link
Member

nvkelso commented May 7, 2019

😂 I think this is taking a complex turn when I'm just interested in multi-shields ;)

@bcamper
Copy link
Member Author

bcamper commented May 7, 2019 via email

@matteblair
Copy link
Member

@nvkelso I don't think I understand what you mean when you say "multi-shields". Can you explain?

@nvkelso
Copy link
Member

nvkelso commented May 8, 2019 via email

@bcamper bcamper merged commit ef88117 into master May 30, 2019
@bcamper bcamper deleted the mvt-json-decode branch May 30, 2019 12:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants