Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add draft proposal for property expressions spec #4754

Merged
merged 5 commits into from
Jun 2, 2017

Conversation

anandthakker
Copy link
Contributor

@anandthakker anandthakker commented May 25, 2017

Proposed plan for introducing arbitrary data-driven expressions into style functions:

  • deprecate zoom, property, and zoom+property functions
  • keep stop-based zoom functions
  • allow data-driven expressions (a) as style property values, and (b) as output values for zoom function stops. (I.e., (a) replaces "property functions", and (b) replaces zoom+property functions.)
  • replace all stop-based functions with expression-based functions:
    • property functions become { 'expression': ... }
    • zoom functions become { 'expression': ['curve', 'interp', ['zoom'], ... ] }
    • zoom+property functions become { 'expression': ['curve', 'interp', ['zoom'], 0, some, 10, another_expression, ... ] }

See docs/style-spec/expressions.md for the spec details.

@mapbox/gl @mapbox/studio @mapbox/cartography-cats @mapbox/support

@anandthakker
Copy link
Contributor Author

Note that this is against the feature/expressions branch, which is intended to be used as a base branch for the arbitrary expressions project (at least the spec & JS parts of it) and to stay in sync w/ studio and carto as they start to work in parallel.

The purpose of this PR is for focused review of the design that was developed over in #4715.

@jfirebaugh
Copy link
Contributor

jfirebaugh commented May 25, 2017

If I was designing style property values totally from scratch, I'd want one unified representation: property-name: property-value-expression, where the expression grammar includes:

  • Numeric literals
  • String literals
  • Array and object literals, which because of the JSON embedding, require a quote expression: ["quote", object-literal], ["quote", array-literal]. Strictly speaking, if we have a uniform expression syntax -- always a JSON array, or always a JSON object with a single key-value pair -- then only the opposite kind of literal need be quoted. But it might be a good idea to preserve our ability to grow the language later -- e.g. maybe we decide we need keyword arguments using JSON objects -- and require both array and object literals to be quoted.
  • Curves -- a generalization of piecewise-defined interval/exponential functions. ["curve", interpolation, x, y_1, n_1, ..., y_m, n_m, y_last]. interpolation is one of:
    • ["step"] - equivalent to existing "interval" function behavior
    • ["exponential", base] - equivalent to existing "exponential" function behavior
    • ["linear"] - equivalent to ["exponential", 1]
    • ["cubic-bezier", x1, y1, x2, y2] - define your own interpolation
    • ["lab", interpolation] - interpolate using interpolation, but in LAB color space
    • ["hsl", interpolation] - in HSL color space
    • future extensibility
    • I'm borrowing here from D3 curves, ease, and interpolate
  • ["case", x, ...] - replaces "categorical" functions. Borrow semantics from Scheme. Can be compiled to map/hash lookup.
  • ["data", key]
  • ["zoom"] - but in order to preserve the ability to do shader-based interpolation along the the zoom axis, we limit the contexts in which ["zoom"] can be used to top-level curve expressions.

In order to make this a superset of existing functions, we also need a way to express the existing default functionality. Still musing on that.

@andrewharvey
Copy link
Collaborator

deprecate zoom+property functions
deprecate property functions
allow data-driven expressions

I assume the top two deprecation's are replaced by the new data-driven expressions feature. I just wanted to say that the current zoom, property and zoom+property functions in Mapbox Studio are awesome and make working in Studio much nicer, so I hope this is an improvement upon the current state.

@anandthakker
Copy link
Contributor Author

anandthakker commented May 26, 2017

Array and object literals, which because of the JSON embedding, require a quote expression: ["quote", object-literal], ["quote", array-literal].

👍 I was thinking ["object", object-literal], ["array", array-literal]

Curves -- a generalization of piecewise-defined interval/exponential functions. ["curve", interpolation, x, y_1, n_1, ..., y_m, n_m, y_last].

This is a great abstraction & syntax for unifying linear, exponential, step, color, etc. functions. I like that it affords a way to express zoom-dependent functions without a redundant concept while retaining the constraints we need.

@jfirebaugh where do basic arithmetic functions (+ - * / ^) fit in here? While I think having higher level curves is useful, there are use cases that they either don't address or would make cumbersome. E.g.:

  • circle radius is proportional to the square root of population * mean income
  • convert property that's in feet to meters (['curve', ['linear'], ['numeric_data', 'height'], 0, 0, 0, 1000, 304.8] vs. [ '*', ['numeric_data', 'height'], 0.3048 ])

We also probably still want concat, upcase/downcase, comparison ops.

@jfirebaugh
Copy link
Contributor

Oh, yeah, that wasn't a complete list of expression types. I think we want most of the other things listed in PR too.

Other things I'm thinking about:

  • Not sure about "number_data", "string_data" etc. Remind me the rationale vs. "data" and dynamic types? If we need explicit coercion, wouldn't a separate, composable ["coerce", type, val] expression or family of expressions be preferable?
  • Array indexing operator, length, slicing, ...?
  • Are we actually adding objects to the type system? If so: ["data", key] ⇢ ["member", key, obj], where obj can be the expression ["properties"], to get the feature properties, or any other expression with object type.
  • Are we going to provide access to feature geometry? Then it would make sense to have
    • ["member", ["geometry"], "type"]
    • ["member", ["geometry"], "coordinates"]
      (Seems like a dubious idea. Are we going to unproject vector tile coordinates during evaluation? On the other hand, I want this for my hack day project!)
  • Can we unify the notions of truthiness with handling absent/inappropriate values? Along these lines or similar:
    • Define "undefined" as one of the inhabitants of the type system
    • Define the result of such expressions as a nonexistent "member", an invalid "curve" input, etc. (all the places where we currently use the default value) as "undefined"
    • Define false and "undefined" as falsy, everything else as truthy
    • Define "||" to return the value of the first truthy argument, making it useful as a coalescing operator: ["||", ["data", "x"], ["data", "y"], ["data", "z"]]
    • What about null?
    • OTOH, it's probably a bad idea to conflate logical || and coalescing. We could just provide ["coalesce", exprs...]
  • ["apply", name, arg-array] ???
  • ["map", ...] or other higher-order functions ???
  • Variable binding, e.g. ["let", [[name, expr], ...], body] ???

@anandthakker
Copy link
Contributor Author

Not sure about "number_data", "string_data" etc. Remind me the rationale vs. "data" and dynamic types?

The issue wasn't with "data"/dynamic types as such, but with automatic coercion: having the arithmetic operators coerce strings to numbers seemed to lead to a place where there was no meaningful distinction between number and string types... which felt like less type checking than we wanted.

Are we actually adding objects to the type system? If so: ["data", key] ⇢ ["member", key, obj], where obj can be the expression ["properties"], to get the feature properties, or any other expression with object type.

I strongly suspect we'll end up adding them later even if we don't now. 👍 to general syntax... maybe "get" instead of "member"?

Are we going to provide access to feature geometry?

For expressions to subsume existing filters, we at least need to expose the geometry type. Coordinates... heh probably not 😱

@jfirebaugh
Copy link
Contributor

I was thinking ["object", object-literal], ["array", array-literal]

How about ["literal", any JSON], i.e. allow strings, numbers, true, false, and null as well, even though they technically don't need to be quoted?

@jfirebaugh
Copy link
Contributor

"get"

Sounds good. Could be used for both object member and array indexing.

@ansis
Copy link
Contributor

ansis commented May 26, 2017

If I was designing style property values totally from scratch, I'd want one unified representation: property-name: property-value-expression, where the expression grammar includes:

@jfirebaugh that looks great. This looks like it covers everything we currently support with functions. This seems like a good subset to dig more deeply into to work out typing and error handling.

I'm not quite clear on how the easing/interpolation works. What do ["step"], ["exponential", base], ["lab", interpolation], etc evaluate to? A literal? A function? With what signature?

What are your thoughts on the typing?

  • strictly typed?
  • generic polymorphism? ad hoc polymorphism? no polymorphism?
  • constant expressions (for the y_n values in curves)? zoom-constant expressions to capture the ["zoom"] restrictions?
  • variants?

How are errors handled? for example, failed coercions

  • exceptions? can exceptions be caught or do they just bail out of the entire expression?
  • return values?
  • somehow avoid the possibility of runtime errors at all?

@jfirebaugh
Copy link
Contributor

Variable binding

Thinking ahead... if we do support this, we'll need to distinguish string literals from variable identifiers. Most languages use quotation marks as the lexical signifier for string literals, but ours does not have that luxury because we're using JSON strings as both identifiers (when in first position) and string constants (in other positions). But we can reverse the typical arrangement, and have a signifier for variable identifiers, e.g. ["var", name], evaluating to the value of the variable name.

@jfirebaugh
Copy link
Contributor

I'm not quite clear on how the easing/interpolation works. What do ["step"], ["exponential", base], ["lab", interpolation], etc evaluate to? A literal? A function? With what signature?

Good question. Thinking about this more, ["lab", interpolation] is out of place -- the color interpolation property should be a member of *-color-transition (and should be calculated in the shader for data-driven properties, as has been noted elsewhere). Could we drop alternate color space support until we re-do it in that way?

There are probably several possibilities for defining the semantics of interpolator as a function. I'm not sure it matters too much which one we pick until/unless we add support for first-class functions. Until then, ["step"], etc are just special expressions that can only be used in the second position of "curve".

@anandthakker
Copy link
Contributor Author

How about ["literal", any JSON], i.e. allow strings, numbers, true, false, and null as well, even though they technically don't need to be quoted?

@jfirebaugh what's the advantage to quoting primitives? I think doing this would add quite a bit of verbosity that it would be nice to avoid if possible.

 * Introduce `curve` expression, use it to cover zoom function `stops`.
 * Introduce object and array literals
 * Remove explicitly typed property lookup
@jfirebaugh
Copy link
Contributor

More proposals:

Type system

General principles:

  • The language is statically typed.
  • Expression types are usually inferred. In certain cases, explicit type annotations are required on expressions that access dynamically typed feature values.

The inhabitants of the type system are:

  • Null, the unit type
  • Boolean
  • Number
  • String
  • Color
  • Various enumerated types
    • One for each style spec enum
    • One for geometry types
    • One for Value types (see below)
  • Array<T, N>, a homogeneous, statically-sized array, for *-translate, icon-text-fit-padding, etc.
  • Vector<T>, a homogenous, dynamically-sized array, for text-font
  • Value, a variant type representing JSON values, permitting Null, Boolean, Number, String, JSONObject, and JSONArray. The latter two are not used elsewhere in the type system, i.e. there are no expressions typed as JSONObject, only expressions typed as Value.
  • Error, a bottom type, i.e. a subtype of all other types. Used wherever an expression is unable to return the appropriate supertype. Carries diagnostic information such as an explanatory message and the expression location where the Error value was generated.

We want type inference to proceed primarily via recursive AST descent, i.e. to infer the types of expression arguments given the expression form and an expected result type. At the top of the tree, the expected result type is determined by the style property being evaluated. Looking at the expression forms proposed so far, we'll need to make some adjustments:

  • Split "get" into "get" (object member) and "at" (array element), so that the type of the key argument can be inferred as String or Number, respectively. (The type of the other argument is Value in both cases.)
  • Replace "has" with ["error?", ["get"/"at", value, key]], where "error?" is the Error-checking predicate. (Naming up for debate. I believe "error?" is the most fundamental Error-checking expression. "coalesce" can be written in terms of "error?" and "if"/"case"/"not".)
  • "typeof" becomes a unary expression with argument type Value and the enumerated type mentioned above as the result type.
  • "==", ">", etc. are where it gets tricky. E.g. consider ["==", ["get", ["properties"], "foo"], ["get", ["properties"], "bar"]]. Possibilities:
    • Simply allow "=="'s argument types to be Value. This seems undesirable; we'd like "get", "at" and "typeof" to be the only two forms where an argument need be typed as Value, because then they are the only places where runtime types are necessary, and runtime errors due to unexpected Values can always be traced to a single specific "get"/"at". Also then we're forced to define and implement semantics for relational operators on JSONArray/JSONObject Values.
    • Separate equality expressions for each type: "number==", "string==", "color==", etc. Doesn't seem very ergonomic.
    • Require type annotations on every "get"/"at" (as discussed before). Seems annoying to have to do this in situations where the type can obviously be inferred.
    • Require type annotations, but only when they can't be inferred. E.g. ["==", ["get", ["properties"], "foo", "number"], ["get", ["properties"], "bar", "number"]]. This seems like a winner, and I've written the rest of the proposal assuming this solution.

To minimize explicit annotations (e.g. infer Number in ["==", ["get", ["properties"], "foo"], 1]), we'd need to go beyond strictly recursive descent inference. Maybe not full Hindley–Milner though?

Error handling

General principles:

  • No implicit coercion. Or perhaps very limited, e.g. implicit boolean conversions only.
  • Every expression propagates Error in an eagerly evaluated argument to its result. The exceptions are "error?" (obviously) and lazily evaluated arguments to conditionals and boolean operators.
  • "get"/"at" obviously can produce type-related Errors. Other expressions that may produce Errors include:
    • Explicit type-casting/coercion expressions, e.g. ["rgb", "blueee"], ["number", "11!one"]
    • The case/match expression, if the else branch is optional
    • ...? I presume Numbers include infinities and NaN so that things like ["/", 1, 0] aren't Errors. At least not immediately; there are certainly contexts where infinities and NaN are themselves an error condition. They shouldn't be allowed as values for numeric style properties for instance.

@jfirebaugh
Copy link
Contributor

jfirebaugh commented May 27, 2017

Null, the unit type

Actually, I'm not sure we need or want this. We want ["==" ["typeof", ["properties"], "foo"], "null"], sure, but is there a use for null outside of Value? The only possible use I can think of is if we want to allow expressions to conditionally express "use the spec-default value of the style property". But then it would be another bottom type, not a unit type, and it would complicate static typing rules.

@ansis
Copy link
Contributor

ansis commented May 27, 2017

@jfirebaugh that looks great!

Require type annotations, but only when they can't be inferred. E.g. ["==", ["get", ["properties"], "foo", "number"], ["get", ["properties"], "bar", "number"]].

Sounds good. Using a enum argument to determine the return signature seems like a bit of an exception. Two other possibilities, each with their own drawbacks:

  • ["get_number", ... ]
  • ["number", ["get", ... ]] where "number" is a typed identity function

Both get and at would produce indistinguishable errors in three different cases:

  • missing key, out of bounds index: ["get", ["properties"], "missing_key"]
  • provided Value is wrong type: ["get", json_array, "key"]
  • a return type mismatch: ["get", ["properties"], "key_for_value_with_wrong_type"]

Is this a problem? I think it should be fine, but wasn't completely sure.


What are your thoughts on adding a way to specify arguments that need to be determinable at compile time? For example the keys in case so that we can compile them to a map. Could something similar be used to impose the zoom restrictions?

@anandthakker
Copy link
Contributor Author

anandthakker commented May 27, 2017

@jfirebaugh this looks excellent!

[JSONObject, and JSONArray] are not used elsewhere in the type system, i.e. there are no expressions typed as JSONObject, only expressions typed as Value

You're saying that no expression's "output" type would be defined as JSONObject or JSONArray, right? During inference for, e.g., [get, e1, e2], we would want to type e1: JSONObject (rather than e1: Value), or else [ get, 1, 'foo'] would type check.

What are your thoughts on adding a way to specify arguments that need to be determinable at compile time?

@ansis I have been thinking of this as a syntax rule rather than a type rule, though we could also represent it with something like a Literal<T> type that only matches literals of type T.

Simply allow "=="'s argument types to be Value.

Let's play this out.... Intuitively, my hypothesis is that realistically, the cases where a == expression would lead to subexpressions whose types couldn't be inferred are ones where punting to runtime checking wouldn't be a big deal anyway.

Cases like:

[==,
  [
    case, (some condition), [get, [properties], 'a'], [else], [get, [properties] 'b']
  ],
  [
    curve, [step], 42, 0, [get, [properties], 'x'], 10, [get, [properties], 'y']
  ]
]

(let's say for the sake of argument that step curves can output strings)

Would requiring annotations here provide any meaningful/helpful compile-time help to the user? Are there other plausible examples that are worse than this one (especially if we go a bit further than just recursive descent in our inference algorithm)?

@jfirebaugh
Copy link
Contributor

["number", ["get", ... ]] where "number" is a typed identity function

👍

Both get and at would produce indistinguishable errors in three different cases:

Indistinguishable in the sense that there's no way to distinguish these cases within the expression language, or in the sense that they when encountered they'd produce identical diagnostic output? If the latter, I'm imagining the Error type to carry a string error message that can distinguish them. If the former, you can distinguish the second two cases from the first with a preflight typeof.

What are your thoughts on adding a way to specify arguments that need to be determinable at compile time?

Agree with @anandthakker, I think this is best as a syntax rule rather than a type checking rule.

You're saying that no expression's "output" type would be defined as JSONObject or JSONArray, right? During inference for, e.g., [get, e1, e2], we would want to type e1: JSONObject (rather than e1: Value), or else [ get, 1, 'foo'] would type check.

Yeah, you're right, get is best typed more strictly as (JSONObject, String) ⇢ Value. So all the Value "variant" types need to be top level types as well. Lets try to limit the set of expression types that accept them though -- ideally just get, at, and typeof.

Would requiring annotations here provide any meaningful/helpful compile-time help to the user?

I think the benefit is mostly in runtime help. In the example you gave, required annotations could for example turn a mysteriously invisible layer into an actionable error message:

When evaluating the expression ["get", ["properties"], "a", "string"] in filter for layer "foo": expected the value of property "a" to be a string, but it was null.

@kkaefer
Copy link
Member

kkaefer commented May 31, 2017

We should also add a "string length" function, which could be useful for selecting the correct highway shield icons based on the text.

@ChrisLoer
Copy link
Contributor

We should also add a "string length" function, which could be useful for selecting the correct highway shield icons based on the text.

By "string length" did you mean "number of characters" or something like "rendered width"? At layout time we should have the information necessary for returning the second.

@anandthakker
Copy link
Contributor Author

I think the benefit is mostly in runtime help.

@jfirebaugh okay I’m almost sold here, with one lingering question: the premise behind only requiring annotations when they can’t be inferred--rather than always requiring them--is that the latter would make authoring expressions too cumbersome (right?). Is this really true? My assumption is that users will only very rarely author expressions via the syntax tree, instead creating them using Studio or a text-based syntax. Studio would be able to generate annotations, so that's not an issue. How bad would it be to require them for the text-based case?

@anandthakker
Copy link
Contributor Author

@jfirebaugh Another thought following up on #4754 (comment): so far I've been imagining that the text-based syntax(es) would be totally isomorphic to the JSON AST structure... but we could say that every [get in the AST must be type-annotated, but relax that requirement for the text syntax and provide type inference as part of the text-to-AST parsing step.

- `[ "id" ]` returns the value of `feature.id`.

**Decision:**
- `["case", cond1, result1, cond2, result2, ..., ["else"], result_m]`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is an explicit "else" strictly necessary, given that "else if" is implied? Seems like we could simply interpret a dangling result_m as the else result. If so, then the "if" function becomes truly redundant to "case".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not necessary, but I personally think it (a) doesn't do much harm, (b) makes parsing slightly easier, (c) makes mistakes easier to spot. Guessing I'll be outvoted on this though.

@kkaefer
Copy link
Member

kkaefer commented Jun 1, 2017

By "string length" did you mean "number of characters" or something like "rendered width"? At layout time we should have the information necessary for returning the second.

"rendered width" would be cool to have, but it means that we'd need the glyph + shaping information available when evaluating the expression. Additionally it creates a circular dependency when you use it in a text-size expression, as @anandthakker notes. Overall, I agree that a platform-specific string length would be best here.

@nickidlugash
Copy link

Is adding evaluation for the existence of sprite values still up for discussion? (#4715 (comment)). While an expression may be more cumbersome than implementing an image stack, it seems like expressions offer more flexibility.

@anandthakker
Copy link
Contributor Author

@nickidlugash Yeah, I think that's still an open discussion, but my feeling is that we should take that on as a follow-up, since, unless I'm misunderstanding, it would be an additive change.

@nickidlugash
Copy link

@anandthakker that sounds good. I wanted to confirm whether this was even logically and technically compatible with the arbitrary expressions schema. It would be great if we could look into adding this as a follow-up.

]
}
}
```

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By only allowing one curve value per zoom function, this seems to have on of the same limitations as our current zoom functions: if you have more than 2 stops you are forced to define the interpolation behavior between the pairs of stops in the same way even though that's usually not what you want.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nickidlugash good point. Could you describe a couple/few use cases where a single interpolation base is non-ideal?

@anandthakker anandthakker merged commit 1d7f344 into feature/expressions Jun 2, 2017
@anandthakker anandthakker deleted the expressions-spec branch June 2, 2017 19:29
anandthakker added a commit that referenced this pull request Jun 28, 2017
* Add draft proposal for property expressions spec

Developed in #4715

* Update spec proposal

 * Introduce `curve` expression, use it to cover zoom function `stops`.
 * Introduce object and array literals
 * Remove explicitly typed property lookup

* Describe type system

* Remove ["else"]

* Minor edits
anandthakker added a commit that referenced this pull request Jun 30, 2017
* Add draft proposal for property expressions spec

Developed in #4715

* Update spec proposal

 * Introduce `curve` expression, use it to cover zoom function `stops`.
 * Introduce object and array literals
 * Remove explicitly typed property lookup

* Describe type system

* Remove ["else"]

* Minor edits
anandthakker added a commit that referenced this pull request Jul 24, 2017
* Add draft proposal for property expressions spec

Developed in #4715

* Update spec proposal

 * Introduce `curve` expression, use it to cover zoom function `stops`.
 * Introduce object and array literals
 * Remove explicitly typed property lookup

* Describe type system

* Remove ["else"]

* Minor edits
anandthakker added a commit that referenced this pull request Aug 3, 2017
* Add draft proposal for property expressions spec

Developed in #4715

* Update spec proposal

 * Introduce `curve` expression, use it to cover zoom function `stops`.
 * Introduce object and array literals
 * Remove explicitly typed property lookup

* Describe type system

* Remove ["else"]

* Minor edits
anandthakker added a commit that referenced this pull request Aug 4, 2017
* Add draft proposal for property expressions spec

Developed in #4715

* Update spec proposal

 * Introduce `curve` expression, use it to cover zoom function `stops`.
 * Introduce object and array literals
 * Remove explicitly typed property lookup

* Describe type system

* Remove ["else"]

* Minor edits
anandthakker added a commit that referenced this pull request Aug 10, 2017
* Add draft proposal for property expressions spec

Developed in #4715

* Update spec proposal

 * Introduce `curve` expression, use it to cover zoom function `stops`.
 * Introduce object and array literals
 * Remove explicitly typed property lookup

* Describe type system

* Remove ["else"]

* Minor edits
anandthakker added a commit that referenced this pull request Aug 18, 2017
* Add draft proposal for property expressions spec

Developed in #4715

* Update spec proposal

 * Introduce `curve` expression, use it to cover zoom function `stops`.
 * Introduce object and array literals
 * Remove explicitly typed property lookup

* Describe type system

* Remove ["else"]

* Minor edits
anandthakker added a commit that referenced this pull request Aug 29, 2017
* Add draft proposal for property expressions spec

Developed in #4715

* Update spec proposal

 * Introduce `curve` expression, use it to cover zoom function `stops`.
 * Introduce object and array literals
 * Remove explicitly typed property lookup

* Describe type system

* Remove ["else"]

* Minor edits
anandthakker added a commit that referenced this pull request Aug 30, 2017
* Add draft proposal for property expressions spec

Developed in #4715

* Update spec proposal

 * Introduce `curve` expression, use it to cover zoom function `stops`.
 * Introduce object and array literals
 * Remove explicitly typed property lookup

* Describe type system

* Remove ["else"]

* Minor edits
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants