-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add geo_line aggregation #41612
Add geo_line aggregation #41612
Conversation
Pinging @elastic/es-analytics-geo |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few comments about potential optimizations and such :)
Also a thought... I wonder if we could/should implement this as a pipeline agg? It shares many similarities to metrics ordered by time. A user could define a date_histogram on the time field, some kind of aggregating "metric" to collapse multiple geo_points in one bucket to a single point (average them?) and then the geo_line pipeline agg strings the multiple buckets into a single linestring.
It gives you all the sorting stuff for free, and sorta gives you line simplification out of the box, in that larger date_histo interval automatically gives you a less-granular line string. Perhaps not "smart" in that it's simplifying by time and not line complexity, but might be an ok start?
Dunno, just a thought I had while looking over how it works.
x-pack/plugin/geo/src/main/java/org/elasticsearch/xpack/geo/GeoLineAggregator.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/geo/src/main/java/org/elasticsearch/xpack/geo/GeoLineAggregator.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/geo/src/main/java/org/elasticsearch/xpack/geo/GeoLineAggregator.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/geo/src/main/java/org/elasticsearch/xpack/geo/GeoLineAggregator.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/geo/src/main/java/org/elasticsearch/xpack/geo/InternalGeoLine.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/geo/src/main/java/org/elasticsearch/xpack/geo/GeoLineAggregator.java
Outdated
Show resolved
Hide resolved
Thanks, this is super useful! The return-type is now a Geojson-Feature. Is this an issue from an aesthetic standpoint for Elasticsearch-API? (I don't know of any parallels where ES is hacking another data-format as an agg-response). So just for some context: from a user-perspective (and specifically Maps), the idea to return valid GeoJson is mainly useful because it allows Maps to reuse the coordinates-array by-reference, without any post-processing (except for parsing of the JSON-response, which is a native browser function). If it feels "odd" to wrap the
Basically, as long as the coordinates-array is there in valid GeoJson, it's a win for Maps. The additional metadata about the individual coordinates (ie. Just a thought, because if most users would exclude the sort_values from their agg, it might feel odd to just have dangling empty properties there. Would it be useful to make it explicit if the line-string is complete or not in the response? Right now, we can compare doc-count of the bucket with point-count in the line, but this will no longer work when simplification would be introduced. |
heya @thomasneirynck. thanks for the comments/concerns/suggestions!
it does not feel odd to me, as this is a property of the geometry, just happens to be a multi-dimensional property :) there is an
the properties will not be dangling. There is one other property that exists. the
yes, the |
these changes include usage of BucketedSort and ability to order the lines by both ascending and descending time/sort order.
run elasticsearch-ci/2 |
...patial/src/main/java/org/elasticsearch/xpack/spatial/search/aggregations/MergedGeoLines.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm Just left a comment but not sure if it is possible
A metric aggregation that aggregates a set of points as a GeoJSON LineString ordered by some sort parameter. A `geo_line` aggregation request would specify a `geo_point` field, as well as a `sort` field. `geo_point` represents the values used in the LineString, while the `sort` values will be used as the total ordering of the points. the `sort` field would support any numeric field, including date. ``` { "query": { "bool": { "must": [ { "term": { "person": "004" } }, { "term": { "trajectory": "20090131002206.plt" } } ] } }, "aggs": { "make_line": { "geo_line": { "point": {"field": "location"}, "sort": { "field": "timestamp" }, "include_sort": true, "sort_order": "desc", "size": 15 } } } } ``` ``` { "took": 21, "timed_out": false, "_shards": {...}, "hits": {...}, "aggregations": { "make_line": { "type": "LineString", "coordinates": [ [ 121.52926194481552, 38.92878997139633 ], [ 121.52922699227929, 38.92876998055726 ], ] } } } ``` Due to the cardinality of points, an initial max of 10k points will be used. This should support many use-cases. One solution to overcome this limitation is to keep a PriorityQueue of points, and simplifying the line once it hits this max. If simplifying makes sense, it may be a nice option, in general. The ability to use a parameter to specify how aggressive one wants to simplify. This parameter could be the number of points. Example algorithm one could use with a PriorityQueue: https://bost.ocks.org/mike/simplify/. This would still require O(m) space, where m is the number of points returned. And would also require heapifying triangles sorted by their areas, which would be O(log(m)) operations. Since sorting is done, anyways, simplifying would still be a O(n log(m)) operation, where n is the total number of points to filter........... something to explore closes #41649
A metric aggregation that aggregates a set of points as
a GeoJSON LineString ordered by some sort parameter.
specifics
A
geo_line
aggregation request would specify ageo_point
field, as wellas a
sort
field.geo_point
represents the values used in the LineString,while the
sort
values will be used as the total ordering of the points.the
sort
field would support any numeric field, including date.sample usage
sample response
visual response
limitations
Due to the cardinality of points, an initial max of 10k points
will be used. This should support many use-cases.
One solution to overcome this limitation is to keep a PriorityQueue of
points, and simplifying the line once it hits this max. If simplifying
makes sense, it may be a nice option, in general. The ability to use a parameter
to specify how aggressive one wants to simplify. This parameter could be
the number of points. Example algorithm one could use with a PriorityQueue:
https://bost.ocks.org/mike/simplify/. This would still require O(m) space, where m
is the number of points returned. And would also require heapifying triangles
sorted by their areas, which would be O(log(m)) operations. Since sorting is done,
anyways, simplifying would still be a O(n log(m)) operation, where n is the total number
of points to filter........... something to explore
closes #41649