Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ESQL: GEO_POINT and CARTESIAN_POINT type support #102177

Merged
merged 5 commits into from
Nov 30, 2023

Conversation

craigtaverner
Copy link
Contributor

@craigtaverner craigtaverner commented Nov 14, 2023

This is a port of a subset of the earlier prototype at #98845

The goal is support for geo_point and point in ES|QL.

At this point we support:

  • Support displaying geo_point columns in ESQL output tables
  • Support to_geopoint(field) function for encoded long and WKT keyword fields
  • Support for geo_point as argument to to_string(field) (renders as WKT)
  • Support format=csv with WKT output (defaults to GeoPoint.toString() which is "lat,lon")
  • Support for point type by duplicating the geo_point work
  • Disable mv_min/mv_max for spatial types (removing concept of sortability)
  • Support for geo_point and point with full precision (from source instead of doc-values)

This last option, using source and full precision requires a new Block type, or a new encoding to the BytesRef block type, and as such is being done in a followup PR at #103698

Fixes #103585

@craigtaverner craigtaverner added >feature Team:QL (Deprecated) Meta label for query languages team :Analytics/ES|QL AKA ESQL labels Nov 14, 2023
@craigtaverner craigtaverner changed the title WIP Initial support for geo_point ESQL: GEO_POINT type support Nov 14, 2023
@elasticsearchmachine
Copy link
Collaborator

Hi @craigtaverner, I've created a changelog YAML for you.

@craigtaverner craigtaverner marked this pull request as ready for review November 23, 2023 18:02
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-ql (Team:QL)

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/elasticsearch-esql (:Query Languages/ES|QL)

Copy link
Contributor

@luigidellaquila luigidellaquila left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much @craigtaverner!
I had just a very quick look (I'll need more time for a proper review) but there is one thing that is worth discussing at high level IMHO: I see you are returning geo_point values as WKT, but I wonder if it would be better to have plain JSON instead; it would be the same as in _search, more compact and easier to manipulate for clients

@@ -44,6 +44,7 @@ public static Iterable<Object[]> parameters() {
(size, values) -> equalTo(NumericUtils.asLongUnsigned(values.reduce(BigInteger::max).get()))
);
dateTimes(cases, "mv_max", "MvMax", (size, values) -> equalTo(values.max().getAsLong()));
geoPoints(cases, "mv_max", "MvMax", (size, values) -> equalTo(values.max().getAsLong()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that makes a lot of sense, do we need to support it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. It was simplest to code this, rather than explicitly reject these, but we can discuss alternatives. I think, however, that a deterministic ordering of all points is something that might be necessary anyway, and in that case ordering by the doc-value is as good a choice as any (order by lat first then lon second).

@@ -44,6 +44,7 @@ public static Iterable<Object[]> parameters() {
(size, values) -> equalTo(NumericUtils.asLongUnsigned(values.reduce(BigInteger::min).get()))
);
dateTimes(cases, "mv_min", "MvMin", (size, values) -> equalTo(values.min().getAsLong()));
geoPoints(cases, "mv_min", "MvMin", (size, values) -> equalTo(values.min().getAsLong()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same as above, I don't think that makes a lot of sense, do we need to support it?

@@ -175,25 +189,27 @@ unsupported:
- match: { values.0.5: null }
- match: { values.0.6: null }
- match: { values.0.7: null }
- match: { values.0.8: null }
- match: { values.0.9: null }
- match: { values.0.8: "POINT (9.999999990686774 11.999999997206032)" }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the problem with this approach, using doc values for reading points is loosy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, and in fact my earlier experiment with getting from source started by fixing this test. We now have that planned as a followup PR.

@@ -44,6 +44,7 @@ public final class EsqlDataTypes {

public static final DataType DATE_PERIOD = new DataType("DATE_PERIOD", null, 3 * Integer.BYTES, false, false, false);
public static final DataType TIME_DURATION = new DataType("TIME_DURATION", null, Integer.BYTES + Long.BYTES, false, false, false);
public static final DataType GEO_POINT = new DataType("geo_point", Double.BYTES * 2, false, false, false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why the size of the field here is 2 doubles and we treat it everywhere as a long, what is the relationship? how is this information used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I spotted that a while back and planned to fix it. I'll look again and see if I can get clarity on this.

Copy link
Contributor

@iverase iverase left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the PR is very clean and easy to follow, just have a couple of questions.

My only concern is that using doc values to return the stored value to the user might be confusing as doc values are loosy as they encode the provided values as integers, therefore the transformation back to double yield a double that is different to the provided one (although they are less that 1cm away on the surface of the earth).

If we are to change this, so in a future release we might want to read values from source so when projecting the value to the user, would we need to suffer of backwards compatibility issues?

@craigtaverner
Copy link
Contributor Author

If we are to change this, so in a future release we might want to read values from source so when projecting the value to the user, would we need to suffer of backwards compatibility issues?

The idea is to release as tech-preview and make no commitment to backwards compatibility for this type. Ideally we get the support for reading full precision from source into 8.12, but even if that only makes it into 8.13, since we do not need to commit to BWC, we only have to support this internally. We also don't plan for GA in 8.13, so we can continue to change things in 8.14 (and perhaps even a release after that).

@@ -0,0 +1,58 @@
/*
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not be in the ql module as it is only used in esql. In addition the class name is not useful, since it is specific to coordinate encoding. Consider renaming and moving to the esql plugin.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved this code to the ESQL module, but then server tests failed on classloader errors, so I've moved it back until we figure out what is going on with classloading. All other changes like renaming the class, and making it an enum in preparation for Cartesian support are retained. The only unexpected requirement is that this class must be in the QL module for some reason.

* Based on the earlier prototype at elastic#98845
* The spatial types support changes between 8.11 and 8.12, so mixed clusters of these two versions should claim these types as unsupported, while pure 8.12 clusters should claim the latest type support.
* The spatial types support changes between 8.11 and 8.12, so mixed clusters of these two versions should not run the spatial tests.
* The test assumed data arrives in the original ingest order, and started with a LIMIT 10 after which predicates were applied. But in many situations this will not work, as the data is not in that order and the LIMIT 10 will result in different documents. So the solution is to create a query that does not depend on original order.
* Fix format=csv output for ESQL geo_point
* Fix format=csv|tsv output for ESQL geo_point
* Refactor SpatialUtils and abstract away GeoPoint
* GeoPoint implements SpatialPoint and the geo-specific knowledge was already in the SpatialUtils class, so we could increase the level of abstraction by using SpatialPoint everywhere.
* This is in preparation for supporting CartesianPoint, and in future perhaps other systems.
* Moved the SpatialCoordinateTypes back to QL module
* For some reason the ESQL QA tests could not load it from ESQL, only from QL. Until we figure out what is going wrong with the classloader, we will return these classes to their previous module.
* Primarily the difference is only that mapping to doc-values puts x as high 4 bytes, and y as low 4 bytes, while geo_point does the opposite.
* As long as we only support doc-values for the compute engine, this is the only difference we need to worry about.
Even though geo_point and cartesian_point are backed by LONG, making them implicitly sortable, we exclude them from sorting because the sort order is not particularly useful.
@craigtaverner craigtaverner changed the title ESQL: GEO_POINT type support ESQL: GEO_POINT and CARTESIAN_POINT type support Nov 29, 2023
@@ -44,6 +44,8 @@ public final class EsqlDataTypes {

public static final DataType DATE_PERIOD = new DataType("DATE_PERIOD", null, 3 * Integer.BYTES, false, false, false);
public static final DataType TIME_DURATION = new DataType("TIME_DURATION", null, Integer.BYTES + Long.BYTES, false, false, false);
public static final DataType GEO_POINT = new DataType("geo_point", Double.BYTES * 2, false, false, false);
public static final DataType CARTESIAN_POINT = new DataType("cartesian_point", Double.BYTES * 2, false, false, false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be Float.Bytes?

Copy link
Contributor

@iverase iverase left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM from the geo side.

I would still like to understand what are the size of the esqltype and what is used for. I think someone with more knowledge on the engine should review it.

@craigtaverner
Copy link
Contributor Author

I would still like to understand what are the size of the esqltype and what is used for. I think someone with more knowledge on the engine should review it.

This was discussed with the team, and even though the current implementation would more accurately be Long.BYTES instead of Double.BYTES * 2, the consequences are not material and relate only to estimating page size allocations. Since the plan is to move towards reading from source and using two doubles in the compute engine in the next PR, I felt it was not worth changing it here, only to revert in the next PR.

return new BlockSourceReader.LongsBlockLoader(
valueFetcher(blContext.sourcePaths(name()), nullValue, GeometryFormatterFactory.WKT)
);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

This is plenty so long as we don't support synthetic _source without doc values. If we do that then we need more complexity here ala text or keyword. But this is fine.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually - does the WKT thing load everything that _source does?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually - does the WKT thing load everything that _source does?

What is the 'WKT thing'? Do you mean the fact that we display stuff as WKT in ESQL? Right now we generate the WKT from doc-values, but will instead generate it from _source in the next PR. Even if the _source was itself WKT, the final display will be generated, because the intermediate format will be binary block values. This differs from _search where the results are never generated, but always _source.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right right - you'll build WKT when you load from _source. You just aren't doing that. I suppose my question was "is the WKT thing here a parser config or can you still parse all supported point types?" Because I'm ok delaying any questions about parsing from source into the lossless render until later.

return new GeoPoint(randomDoubleBetween(-90, 90, true), randomDoubleBetween(-180, 180, true));
}

public static SpatialPoint randomCartesianPoint() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should have javadoc just because they are visible to like 100 people.

return new GeoPoint(point.getY(), point.getX());
}
},
Cartesian {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/Cartesian/CARTESIAN/?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I presume you want s/Geo/GEO/ too?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah

protected XContentBuilder valueToXContent(XContentBuilder builder, ToXContent.Params params, int valueIndex)
throws IOException {
// TODO Perhaps this is just a long for geo_point? And for more advanced types we need a new block type
long encoded = (block instanceof LongVectorBlock)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you cast to LongBlock?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can here.

@craigtaverner craigtaverner merged commit 1ea66d7 into elastic:main Nov 30, 2023
15 checks passed
@stratoula stratoula added the ES|QL-ui Impacts ES|QL UI label Dec 7, 2023
dej611 added a commit to elastic/kibana that referenced this pull request Dec 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL ES|QL-ui Impacts ES|QL UI >feature Team:QL (Deprecated) Meta label for query languages team v8.12.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support geo_point in ES|QL
6 participants