-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
geotile_grid implementation #37842
geotile_grid implementation #37842
Conversation
Pinging @elastic/es-analytics-geo |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I know this is labeled WIP. I was just passing by and noticed something I wanted to leave a comment on, then left a few more. Feel free to ignore if you're still iterating on things :)
} | ||
|
||
private static double tile2lon(final double x, final double tiles) { | ||
return x / tiles * 360.0 - 180; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tiny nit, should we put parenthesis around the components here just so the order of operation is immediately clear?
And not knowing anything about the algo, is there a chance tiles
could ever be zero (e.g. divide by zero scenario)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I refactored to put all logic in a single method. Tiles is a result of (4^N), so cannot be 0
Math.tan(Math.toRadians(lat)) + 1 / Math.cos(Math.toRadians(lat)) | ||
) / Math.PI) / 2 * tiles); | ||
if (xTile < 0) | ||
xTile = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style nit: I think we try to avoid conditionals without braces, too easy to accidentally introduce bugs if you're not paying attention when touching the code.
Could replace with xTile = Math.min(xTile, 0)
, if you're looking for a concise alternative. Probably similar for the rest too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thx, refactored.
*/ | ||
public class InternalQuadkeyGrid extends InternalGeoGrid<InternalQuadkeyGridBucket> { | ||
|
||
private static final String NAME = "geohash_grid"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Think this name needs to change? And ditto for javadoc on the class
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thx, good catch. The NAME shouldn't have even been here - its a dup. FIxed here and in InternalGeoHashGrid.
* @param parser {@link XContentParser} to parse the value from | ||
* @return int representing precision | ||
*/ | ||
public static int parsePrecision(XContentParser parser) throws IOException, ElasticsearchParseException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two questions:
- Should we support unit-less precision levels? It looks like it adds a lot of autodetect magic below, and we've consistently been moving towards requiring units everywhere in ES. I don't know enough about the algo to say, but is there something tangible that we gain by supporting unit-less precision over just
10km
, etc? - Is there a reason we fall back to the "old" style of xcontent parsing instead of using the newer static parsers here too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- for quadkey, most of the time it will be unitless - e.g .map visualizations will use
current_zoom + 2
as the precision to get 16 dots per visualization tile. - I tried to make as few changes as possible to the existing code to make reviewing easier. This code is an exact copy of the
GeoUtils.parsePrecision()
, adjusted for different validation and parsing. We might consider rewriting them, but I think we should do them at the same time and possibly consolidate it for reuse.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@polyfractal, per @talevy 's suggestion, I removed non-integer precision - it is not needed by our use cases, and we do not know if they will be needed by other users. This also simplifies precision parsing.
This change allows other grid aggregations to reuse the same tests. The change mostly just moves code to the base classes, trying to keep changes to a bare minimum.
Implements support for the quadkey geo aggregation.
* Create a new {@link InternalQuadkeyGrid} aggregation with the given name. | ||
*/ | ||
public static QuadkeyGridAggregationBuilder quadkeyGrid(String name) { | ||
return new QuadkeyGridAggregationBuilder(name); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can replace this one and the previous one with AggregationBuilders
's method. Otherwise it is not tested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@imotov I added more tests to use this function. Let me know if anything else is needed. Also, there were some unrelated CI issues, hopefully master merge will fix them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. But since I didn't sleep for the last 27 hours, it might make sense for @polyfractal to have another look.
server/src/main/java/org/elasticsearch/search/aggregations/bucket/geogrid/GeoTileUtils.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good! left a few questions
server/src/main/java/org/elasticsearch/search/aggregations/bucket/geogrid/GeoTileUtils.java
Outdated
Show resolved
Hide resolved
|
||
int xTile = (int) Math.floor((normalizeLon(longitude) + 180) / 360 * tiles); | ||
|
||
double latSin = Math.sin(Math.toRadians(normalizeLat(latitude))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are we concerned that the latitude is not clipped here to ensure that the projection is square?
latitude = Math.min(Math.max(lat, -85.05112878), 85.05112878);
this is regarding:
The latitude and longitude are assumed to be on the WGS 84 datum. Even though Bing Maps uses a spherical projection, it’s important to convert all geographic coordinates into a common datum, and WGS 84 was chosen to be that datum. The longitude is assumed to range from -180 to +180 degrees, and the latitude must be clipped to range from -85.05112878 to 85.05112878. This avoids a singularity at the poles, and it causes the projected map to be square. [1]
and
Because the Mercator projects the poles at infinity, Google Maps cannot show the poles. Instead it cuts off coverage at 85.051129° north and south. This is not considered a limitation, given the purpose of the service. The value 85.051129° is the latitude at which the full map becomes a square...[2]
[1] https://docs.microsoft.com/en-us/bingmaps/articles/bing-maps-tile-system
[2] https://en.wikipedia.org/wiki/Web_Mercator_projection
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, should there be tests for points in this region?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already test for points in that region, e.g. assertEquals(0x77FFFF4580000000L, longEncode(179.999, 89.999, 29));
, but i will add a few more. The mathematics automatically takes care of those values, assigning geopoints in that region with the min/max Y value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few minor questions/comments, but LGTM otherwise. Will defer to @talevy on algo review :)
// Zoom value is placed in front of all the bits used for the geotile | ||
// e.g. when max zoom is 29, the largest index would use 58 bits (57th..0th), | ||
// leaving 5 bits unused for zoom. See MAX_ZOOM comment above. | ||
return ((long) precision << ZOOM_SHIFT) | ((long) xTile << MAX_ZOOM) | ((long) yTile); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we just make all the relevant variables/static defaults long
instead of int
, to save on all the casting? Not because it matters for performance, but just for readability.
I don't think we're relying on int overflows anywhere, so it should be safe to just use longs?
I don't care strongly, so feel free to ignore this (or if I missed somewhere else that needs them as ints). Was just something I noticed in passing :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thx, changed xTile & yTile to long.
if (parts.length == 3) { | ||
return zxyToGeoPoint(Integer.parseInt(parts[0]), Integer.parseInt(parts[1]), Integer.parseInt(parts[2])); | ||
} | ||
} catch (IllegalArgumentException e) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious about this arrangement... what's the reason for just catching IllegalArgumentException
and then rethrowing it? Should we just catch any Exception
? I'm assuming this is here to make the exception message a little nicer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it was for better exception formatting. The new approach suggested by @talevy is to split checks into two groups, and (sadly) throw the same exception in two places.
} | ||
|
||
static ParsedGeoTileGridBucket fromXContent(XContentParser parser) throws IOException { | ||
return parseXContent(parser, false, ParsedGeoTileGridBucket::new, (p, bucket) -> bucket.hashAsString = p.textOrNull()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can the hash be null? (e.g. curious about p.textOrNull()
. Won't that cause keyToGeoPoint()
to NPE later?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thx, changed it to p.text()
instead
|
||
package org.elasticsearch.search.aggregations.bucket.geogrid; | ||
|
||
public class GeoTileGridAggregatorTests extends GeoGridAggregatorTestCase<InternalGeoTileGridBucket> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we add a few tests for invalid construction (expectThrows()
on bad precision, bad zoom, etc) to this suite? I know it's indirectly tested in GeoTileGridParserTests
, but that's done through JSON parsing and not the builder itself.
❤️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added an extra test to GeoTileGridAggregatorTests
to test precision. Also, added a few more tests to the utils class to check for invalid input vals.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM if CI is happy.
thanks for all the iterations Yuri, I like where it landed!
Implements
geotile_grid
aggregation, refactoring previous implementation #30240This code uses the same base classes as
geohash_grid
agg, but uses a different hashingalgorithm to allow zoom consistency. Each grid bucket is aligned to Web Mercator tiles.