Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

geotile_grid implementation #37842

Merged
merged 40 commits into from
Feb 1, 2019
Merged

geotile_grid implementation #37842

merged 40 commits into from
Feb 1, 2019

Conversation

nyurik
Copy link
Contributor

@nyurik nyurik commented Jan 25, 2019

Implements geotile_grid aggregation, refactoring previous implementation #30240

This code uses the same base classes as geohash_grid agg, but uses a different hashing
algorithm to allow zoom consistency. Each grid bucket is aligned to Web Mercator tiles.

@nyurik nyurik requested a review from talevy January 25, 2019 00:45
@nyurik nyurik added WIP :Analytics/Geo Indexing, search aggregations of geo points and shapes labels Jan 25, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo

Copy link
Contributor

@polyfractal polyfractal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I know this is labeled WIP. I was just passing by and noticed something I wanted to leave a comment on, then left a few more. Feel free to ignore if you're still iterating on things :)

}

private static double tile2lon(final double x, final double tiles) {
return x / tiles * 360.0 - 180;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tiny nit, should we put parenthesis around the components here just so the order of operation is immediately clear?

And not knowing anything about the algo, is there a chance tiles could ever be zero (e.g. divide by zero scenario)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I refactored to put all logic in a single method. Tiles is a result of (4^N), so cannot be 0

Math.tan(Math.toRadians(lat)) + 1 / Math.cos(Math.toRadians(lat))
) / Math.PI) / 2 * tiles);
if (xTile < 0)
xTile = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style nit: I think we try to avoid conditionals without braces, too easy to accidentally introduce bugs if you're not paying attention when touching the code.

Could replace with xTile = Math.min(xTile, 0), if you're looking for a concise alternative. Probably similar for the rest too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thx, refactored.

*/
public class InternalQuadkeyGrid extends InternalGeoGrid<InternalQuadkeyGridBucket> {

private static final String NAME = "geohash_grid";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think this name needs to change? And ditto for javadoc on the class

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thx, good catch. The NAME shouldn't have even been here - its a dup. FIxed here and in InternalGeoHashGrid.

* @param parser {@link XContentParser} to parse the value from
* @return int representing precision
*/
public static int parsePrecision(XContentParser parser) throws IOException, ElasticsearchParseException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two questions:

  1. Should we support unit-less precision levels? It looks like it adds a lot of autodetect magic below, and we've consistently been moving towards requiring units everywhere in ES. I don't know enough about the algo to say, but is there something tangible that we gain by supporting unit-less precision over just 10km, etc?
  2. Is there a reason we fall back to the "old" style of xcontent parsing instead of using the newer static parsers here too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. for quadkey, most of the time it will be unitless - e.g .map visualizations will use current_zoom + 2 as the precision to get 16 dots per visualization tile.
  2. I tried to make as few changes as possible to the existing code to make reviewing easier. This code is an exact copy of the GeoUtils.parsePrecision(), adjusted for different validation and parsing. We might consider rewriting them, but I think we should do them at the same time and possibly consolidate it for reuse.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@polyfractal, per @talevy 's suggestion, I removed non-integer precision - it is not needed by our use cases, and we do not know if they will be needed by other users. This also simplifies precision parsing.

This change allows other grid aggregations to reuse the same tests.

The change mostly just moves code to the base classes, trying to
keep changes to a bare minimum.
Implements support for the quadkey geo aggregation.
@nyurik nyurik changed the title (WIP) Quadkey implementation Quadkey implementation Jan 25, 2019
@nyurik nyurik changed the title Quadkey implementation geotile_grid implementation Jan 26, 2019
@nyurik nyurik removed the WIP label Jan 26, 2019
* Create a new {@link InternalQuadkeyGrid} aggregation with the given name.
*/
public static QuadkeyGridAggregationBuilder quadkeyGrid(String name) {
return new QuadkeyGridAggregationBuilder(name);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can replace this one and the previous one with AggregationBuilders's method. Otherwise it is not tested.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@imotov I added more tests to use this function. Let me know if anything else is needed. Also, there were some unrelated CI issues, hopefully master merge will fix them.

@nyurik nyurik requested review from imotov and polyfractal January 31, 2019 05:44
Copy link
Contributor

@imotov imotov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. But since I didn't sleep for the last 27 hours, it might make sense for @polyfractal to have another look.

Copy link
Contributor

@talevy talevy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good! left a few questions


int xTile = (int) Math.floor((normalizeLon(longitude) + 180) / 360 * tiles);

double latSin = Math.sin(Math.toRadians(normalizeLat(latitude)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we concerned that the latitude is not clipped here to ensure that the projection is square?

latitude = Math.min(Math.max(lat,  -85.05112878),  85.05112878);

this is regarding:

The latitude and longitude are assumed to be on the WGS 84 datum. Even though Bing Maps uses a spherical projection, it’s important to convert all geographic coordinates into a common datum, and WGS 84 was chosen to be that datum. The longitude is assumed to range from -180 to +180 degrees, and the latitude must be clipped to range from -85.05112878 to 85.05112878. This avoids a singularity at the poles, and it causes the projected map to be square. [1]

and

Because the Mercator projects the poles at infinity, Google Maps cannot show the poles. Instead it cuts off coverage at 85.051129° north and south. This is not considered a limitation, given the purpose of the service. The value 85.051129° is the latitude at which the full map becomes a square...[2]

[1] https://docs.microsoft.com/en-us/bingmaps/articles/bing-maps-tile-system
[2] https://en.wikipedia.org/wiki/Web_Mercator_projection

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, should there be tests for points in this region?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already test for points in that region, e.g. assertEquals(0x77FFFF4580000000L, longEncode(179.999, 89.999, 29));, but i will add a few more. The mathematics automatically takes care of those values, assigning geopoints in that region with the min/max Y value.

Copy link
Contributor

@polyfractal polyfractal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few minor questions/comments, but LGTM otherwise. Will defer to @talevy on algo review :)

// Zoom value is placed in front of all the bits used for the geotile
// e.g. when max zoom is 29, the largest index would use 58 bits (57th..0th),
// leaving 5 bits unused for zoom. See MAX_ZOOM comment above.
return ((long) precision << ZOOM_SHIFT) | ((long) xTile << MAX_ZOOM) | ((long) yTile);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we just make all the relevant variables/static defaults long instead of int, to save on all the casting? Not because it matters for performance, but just for readability.

I don't think we're relying on int overflows anywhere, so it should be safe to just use longs?

I don't care strongly, so feel free to ignore this (or if I missed somewhere else that needs them as ints). Was just something I noticed in passing :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx, changed xTile & yTile to long.

if (parts.length == 3) {
return zxyToGeoPoint(Integer.parseInt(parts[0]), Integer.parseInt(parts[1]), Integer.parseInt(parts[2]));
}
} catch (IllegalArgumentException e) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious about this arrangement... what's the reason for just catching IllegalArgumentException and then rethrowing it? Should we just catch any Exception? I'm assuming this is here to make the exception message a little nicer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it was for better exception formatting. The new approach suggested by @talevy is to split checks into two groups, and (sadly) throw the same exception in two places.

}

static ParsedGeoTileGridBucket fromXContent(XContentParser parser) throws IOException {
return parseXContent(parser, false, ParsedGeoTileGridBucket::new, (p, bucket) -> bucket.hashAsString = p.textOrNull());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the hash be null? (e.g. curious about p.textOrNull(). Won't that cause keyToGeoPoint() to NPE later?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thx, changed it to p.text() instead


package org.elasticsearch.search.aggregations.bucket.geogrid;

public class GeoTileGridAggregatorTests extends GeoGridAggregatorTestCase<InternalGeoTileGridBucket> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add a few tests for invalid construction (expectThrows() on bad precision, bad zoom, etc) to this suite? I know it's indirectly tested in GeoTileGridParserTests, but that's done through JSON parsing and not the builder itself.

❤️

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added an extra test to GeoTileGridAggregatorTests to test precision. Also, added a few more tests to the utils class to check for invalid input vals.

@nyurik nyurik requested a review from talevy January 31, 2019 20:27
Copy link
Contributor

@talevy talevy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM if CI is happy.

thanks for all the iterations Yuri, I like where it landed!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/Geo Indexing, search aggregations of geo points and shapes >feature v7.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants