Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce clusterProperties option for aggregated cluster properties #7584

Merged
merged 7 commits into from
Jan 10, 2019

Conversation

mourner
Copy link
Member

@mourner mourner commented Nov 12, 2018

Closes #2412. An alternative to #7004.

Finally got my hands on exploring this feature, building on an extremely helpful implementation by @redbmk (sorry for not jumping in earlier!). Here I'm proposing a different API, which is very terse and easy to use, although it might look weird at first:

"clusterProperties": {
  "max": ["max", 0, ["get", "scalerank"]],
  "sum": ["+", 0, ["get", "scalerank"]],
  "has_island": ["any", false, ["==", ["get", "featureclass"], "island"]]
}

The syntax is property: [operator, initialExpression, mapExpression]. The array is itself a valid expression, though using it for cluster properties adds additional limitations.

  • Initial value is calculated by evaluating initialExpression
  • Single point value is calculated with mapExpression (that can reference feature properties)
  • Reduction is basically accumulated = evaluate([operator, accumulated, value]).

To implement this, I had to introduce an internal-use only ["accumulated"] expression. An alternative would be to expose it in the syntax, making it more verbose but also maybe more self-descriptive:

"max": ["max", ["accumulated", 0], ["get", "scalerank"]]

Regarding Infinity and -Infinity, a point raised in #7004 (it produces null because JSON format doesn't support Infinity), — I think it would make sense to introduce an ["infinity"] expression, similar to how we have ["pi"] and ["e"].

Regarding access to ["zoom"] in the accumulating expression — I think this wouldn't be useful because of how map/reduce works: imagine that out of 10 points, 5 merge into one cluster on z16, and 5 remaining get merged into that cluster on z15. The accumulated value in such cluster would depend on multiple zoom values (depending on where it acquired new points), which doesn't semantically makes much sense.

Opening up the PR to start a discussion, but it still needs some work before a proper review:

TODO

  • Proper Flow types
  • Proper error handling
  • Validation of cluster property expressions (specific to this use case)
  • Documentation

Launch Checklist

  • briefly describe the changes in this PR
  • write tests for all new functionality
  • document any changes to public APIs
  • post benchmark scores we don't benchmark clustering
  • manually test the debug page
  • tagged @mapbox/studio if this PR includes style spec changes

@sheerun
Copy link

sheerun commented Nov 28, 2018

Looks awesome, I hope you'll have time to push this feature through :)

@behuda
Copy link

behuda commented Dec 4, 2018

Maybe off topic, but can i change geojson clusterRadius property without creating new geojson source.
Wanted to change clusterRadius property through a input slider.
But the problem old geojson src has to be destroyed and corresponding layer should be remove.
Only if the same geojson source could be updated retaining the source id.
So that error: source cannot be removed since layers are using it can be avoided.

@mourner mourner force-pushed the cluster-map-reduce branch 3 times, most recently from 7c55489 to f1fc585 Compare December 10, 2018 17:23
@mourner mourner force-pushed the cluster-map-reduce branch from f1fc585 to 6032e41 Compare January 4, 2019 16:13
Copy link
Contributor

@asheemmamoowala asheemmamoowala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mourner - In the event that one of the map-reduce expressions are invalid, it is no longer possible to report that in style validation. I think it would be useful to move the validation from the geojson worker to the style-spec module and make it part of geojson source validation.


const initialExpressionParsed = createExpression(initialExpression);
const mapExpressionParsed = createExpression(mapExpression);
const reduceExpressionParsed = createExpression([operator, ['accumulated'], ['get', key]]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think the map/reduce expressions need to support the feature-state operator as well? Given that these expressions are evaluated just once it probably doesn't make sense to support that operator - it can change through interactivity on the map. An additional isStateConstant check would be required.

If we want to support feature-state, it would require re-computing the map/reduce on every change of a feature's state, or at least the clusters affected by a feature. Not sure that there is a need for it right now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I think we shouldn't support feature-state, at least initially, because it would add a lot of complexity for a very obscure use case. I'll add the isStateConstant check in the validation.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@asheemmamoowala @mourner I'm just wondering if you're looking to support feature-state on clusterProperties in the near future?

I've got a use case where I need to highlight the marker cluster for a given feature when a reference of it is selected in a list view outside the map canvas.

This works nicely for non-clustered points, but I need a work around for this situation:

map.addSource('foos', {
  type: 'geojson',
  data,
  cluster: true,
  clusterMaxZoom: 14,
  clusterRadius: 50,
  clusterProperties: {
    'has_hovered_foo': ['==', ['feature-state', 'hover'], true]
  }
})

map.addLayer({
  id: 'clusters',
  type: 'circle',
  source: 'foos',
  filter: ['has', 'point_count'],
  paint: {
    'circle-color': [
      'case',
      ['get', 'has_hovered_foo'],
      '#000',
      '#FFF'
    ]
  }
})

"clusterProperties": {
"type": "*",
"doc": "An object defining custom properties on the generated clusters if clustering is enabled, aggregating values from clustered points. Has the form `{\"property_name\": [operator, initial_expression, map_expression]}`. `operator` is any expression function that accepts at least 2 operands (e.g. `\"+\"` or `\"max\"`) — it accumulates the property value from clusters/points the cluster contains; `initial_expression` evaluates the initial value of the property before accummulating other points/clusters; `map_expression` produces the value of a single point.\n\nExample: `{\"sum\": [\"+\", 0, [\"get\", \"scalerank\"]]}`."
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it would be useful to add an expression support block for each of the expressions - initial,map, reduce ?Something similar to what is in the the layer properties.

"expression": {
"interpolated": true,
"parameters": [
"zoom",
"feature",
"feature-state"
]
},
"property-type": "data-driven"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how to separate those into separate properties. There are two expressions (map, initial) and an operator that accepts them as parameters. Map expression would be just ["feature"], and initial expression would be [].

Given the non-standard syntax and that support block is dictated more by property aggregation semantics rather than technical limitations (like in case of many properties), perhaps we should leave that to validation, and maybe add a note clarifying this in the doc string.

return properties;
};
superclusterOptions.map = (pointProperties) => {
feature.properties = pointProperties;
Copy link

@redbmk redbmk Jan 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this (and in reduce below) using a shared feature and overriding the properties of it instead of defining a new const feature = { properties: pointProperties } here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The map function and expression evaluation is synchronous, meaning there are no side effects from using mutable state here (which is reset at the beginning). At the same time, we avoid constantly creating new feature objects, improving performance and lots of garbage collection passes.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, OK that makes sense. I thought it might have something to do with performance optimization.

superclusterOptions.reduce = (accumulated, clusterProperties) => {
feature.properties = clusterProperties;
for (const key of propertyNames) {
globals.accumulated = accumulated[key];
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean a reducer could access the accumulated property if it wanted to, and do something weird like the following?

"total": [
  ["case",
    [">", ["get", "accumulated"], 100],
    "too many",
    ["+", ["get", "accumulated"], ["get", "total"]
  ],
  0,
  ["get", "total"]
]

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently not, but I think this could be added (check if operator is string and if it's not, consider it as a reduce expression). Would add quite a bit of complexity to the docs, but add some flexibility as well. Not sure how to go here.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok, I see - the reduce expression has to be something that fits into [operator, ['accumulated'], ['get', key]]. So that's why max and any work.

👍

@redbmk
Copy link

redbmk commented Jan 4, 2019

Thanks so much for taking this on! I'm looking forward to having native mapbox support for cluster agg! I left a couple comments regarding some of the code, but overall it looks great and I think this is easier to read than what I had come up with in #7004

Copy link

@redbmk redbmk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! I'm looking forward to seeing this in release!

@mourner
Copy link
Member Author

mourner commented Jan 9, 2019

@asheemmamoowala moved the validation logic to style-spec and added the feature-state check — ready for another review.

@redbmk also added support custom expressions (referencing ["accumulated"]) in place of operator. The example above won't work because of type mismatches but I guess you could still do the same with something more verbose.

@@ -199,6 +199,7 @@ for (const name in CompoundExpression.definitions) {
}

delete types['error'];
delete types['accumulated']; // skip documenting `accumulated` since it is internal use only
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this still be deleted, given that it is now usable as part of a custom reduce expression?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, just readded it.

@mourner mourner merged commit 47925a6 into master Jan 10, 2019
@mourner mourner deleted the cluster-map-reduce branch January 10, 2019 09:38
@mourner
Copy link
Member Author

mourner commented Jan 15, 2019

I'm thinking that maybe I made a mistake of requiring an initial value, since for most cases, we could make it optional — see mapbox/supercluster#114. If it's merged, we could simplify the minimal syntax to:

"clusterProperties": {
  "max": ["max", ["get", "scalerank"]],
  "sum": ["+", ["get", "scalerank"]],
  "has_island": ["any", ["==", ["get", "featureclass"], "island"]]
}

And make the initial value the third optional argument. We could still make that change if we hurry to get this in before the next GL JS beta.

@mourner
Copy link
Member Author

mourner commented Jan 15, 2019

Just realized that it would mean we can define or omit initial values on a per-property basis, which doesn't fit the implementation in Supercluster (which is all or nothing). So our options are to either leave things as is, or ditch initial values completely (which I'm increasingly leaning towards).

@redbmk
Copy link

redbmk commented Jan 16, 2019

I'm inclined to leave initial as an optional param after the map expression. I shared an example in mapbox/supercluster#114 (comment) of using an initial value that wouldn't work by just using the first mapped value. However, I think in most cases you wouldn't need initial, so it makes sense to have it be optional.

@mourner
Copy link
Member Author

mourner commented Jan 16, 2019

My motivation behind removing it completely is reducing complexity as much as possible. Given technical details of how Supercluster implements map/reduce, we can either always require the initial argument for each of the specified properties, or remove that argument completely — we can't allow the syntax to accept initial in some of the properties but not others. So given that the need for initial is pretty marginal (which is my take at mapbox/supercluster#114 (comment)), that's why I'm still inclined to remove it. Curious to hear other opinions — tough decision when it's 1-1.

@behuda
Copy link

behuda commented Mar 3, 2019

"clusterProperties": { 
    "my_point_count" : [["+", ["accumulated"], 1], 1]
}

when i'm comparing it with point_count the my_point_count property seems to different on some cluster

For more advanced use cases, in place of operator, you can use a custom reduce expression that references a special ["accumulated"] value, e.g.: {"sum": [["+", ["accumulated"], ["get", "sum"]], ["get", "scalerank"]]}

why ["get", "sum"], "accumulated" need more information on this, an analogy with javascript reduce fn would be great, why not
{"sum": [["+", ["accumulated"], ["get", "scalerank"]], ["get", "scalerank"]]}

Also, how to calculate average from a field lets say salary.

@Empty2k12
Copy link

Sorry to revive this old thread, I have another interesting reduce problem where the reduce-syntax does not work like I expected it to.

Basically, all my markers have a custom property which is a string. Let's stay the property is ethnicity and the values are all from the set [Hispanic, White, Black, Asian]. Let's also assume this set is dynamic and I don't know the types of ethnicities in my data. Let's also assume I know all possible variations once I work with the data. I would like to aggregate the types of contained ethnicities.

What I am doing right now, is the following:

clusterProperties: {
    ethnicities: [
        [
            "concat",
            ["accumulated"],
            [
                "case",
                [
                    "!",
                    [
                        "in",
                        ["get", "ethnicity"],
                        ["accumulated"]
                    ]
                ],
                ["concat", "-", ["get", "ethnicity"]],
                ""
            ]
        ],
        ["get", "ethnicity"]
    ]
}

Basically in pseudo code:

concat to "accumulated" (if "ethnicity" is not substring of "accumulated" use "-" + "ethnicity" else emptystring)

Basically, add all the ethnicities in the cluster into a String, but do not repeat an ethnicity if it is already in the string. I want to use this to generate a fitting icon for the cluster.

However, it appears only the first ethnicity is appended into the string, and for each member of the cluster only the minus is appended, not -${ethnicity}. Additionally, I'd only expect as many minuses as there are ethnicities in the cluster, not one for every cluster member.

How can I reduce to properties without repeating?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support property aggregation on clustered features
7 participants