-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ESQL: GEO_POINT and CARTESIAN_POINT type support #102177
Conversation
Hi @craigtaverner, I've created a changelog YAML for you. |
f2be5eb
to
7d0e78b
Compare
7d0e78b
to
03ac699
Compare
Pinging @elastic/es-ql (Team:QL) |
Pinging @elastic/elasticsearch-esql (:Query Languages/ES|QL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much @craigtaverner!
I had just a very quick look (I'll need more time for a proper review) but there is one thing that is worth discussing at high level IMHO: I see you are returning geo_point values as WKT, but I wonder if it would be better to have plain JSON instead; it would be the same as in _search, more compact and easier to manipulate for clients
@@ -44,6 +44,7 @@ public static Iterable<Object[]> parameters() { | |||
(size, values) -> equalTo(NumericUtils.asLongUnsigned(values.reduce(BigInteger::max).get())) | |||
); | |||
dateTimes(cases, "mv_max", "MvMax", (size, values) -> equalTo(values.max().getAsLong())); | |||
geoPoints(cases, "mv_max", "MvMax", (size, values) -> equalTo(values.max().getAsLong())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that makes a lot of sense, do we need to support it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. It was simplest to code this, rather than explicitly reject these, but we can discuss alternatives. I think, however, that a deterministic ordering of all points is something that might be necessary anyway, and in that case ordering by the doc-value is as good a choice as any (order by lat first then lon second).
@@ -44,6 +44,7 @@ public static Iterable<Object[]> parameters() { | |||
(size, values) -> equalTo(NumericUtils.asLongUnsigned(values.reduce(BigInteger::min).get())) | |||
); | |||
dateTimes(cases, "mv_min", "MvMin", (size, values) -> equalTo(values.min().getAsLong())); | |||
geoPoints(cases, "mv_min", "MvMin", (size, values) -> equalTo(values.min().getAsLong())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same as above, I don't think that makes a lot of sense, do we need to support it?
@@ -175,25 +189,27 @@ unsupported: | |||
- match: { values.0.5: null } | |||
- match: { values.0.6: null } | |||
- match: { values.0.7: null } | |||
- match: { values.0.8: null } | |||
- match: { values.0.9: null } | |||
- match: { values.0.8: "POINT (9.999999990686774 11.999999997206032)" } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's the problem with this approach, using doc values for reading points is loosy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, and in fact my earlier experiment with getting from source started by fixing this test. We now have that planned as a followup PR.
@@ -44,6 +44,7 @@ public final class EsqlDataTypes { | |||
|
|||
public static final DataType DATE_PERIOD = new DataType("DATE_PERIOD", null, 3 * Integer.BYTES, false, false, false); | |||
public static final DataType TIME_DURATION = new DataType("TIME_DURATION", null, Integer.BYTES + Long.BYTES, false, false, false); | |||
public static final DataType GEO_POINT = new DataType("geo_point", Double.BYTES * 2, false, false, false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why the size of the field here is 2 doubles and we treat it everywhere as a long, what is the relationship? how is this information used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I spotted that a while back and planned to fix it. I'll look again and see if I can get clarity on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the PR is very clean and easy to follow, just have a couple of questions.
My only concern is that using doc values to return the stored value to the user might be confusing as doc values are loosy as they encode the provided values as integers, therefore the transformation back to double yield a double that is different to the provided one (although they are less that 1cm away on the surface of the earth).
If we are to change this, so in a future release we might want to read values from source so when projecting the value to the user, would we need to suffer of backwards compatibility issues?
The idea is to release as tech-preview and make no commitment to backwards compatibility for this type. Ideally we get the support for reading full precision from source into 8.12, but even if that only makes it into 8.13, since we do not need to commit to BWC, we only have to support this internally. We also don't plan for GA in 8.13, so we can continue to change things in 8.14 (and perhaps even a release after that). |
@@ -0,0 +1,58 @@ | |||
/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should not be in the ql
module as it is only used in esql
. In addition the class name is not useful, since it is specific to coordinate encoding. Consider renaming and moving to the esql
plugin.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved this code to the ESQL module, but then server tests failed on classloader errors, so I've moved it back until we figure out what is going on with classloading. All other changes like renaming the class, and making it an enum in preparation for Cartesian support are retained. The only unexpected requirement is that this class must be in the QL module for some reason.
* Based on the earlier prototype at elastic#98845 * The spatial types support changes between 8.11 and 8.12, so mixed clusters of these two versions should claim these types as unsupported, while pure 8.12 clusters should claim the latest type support. * The spatial types support changes between 8.11 and 8.12, so mixed clusters of these two versions should not run the spatial tests. * The test assumed data arrives in the original ingest order, and started with a LIMIT 10 after which predicates were applied. But in many situations this will not work, as the data is not in that order and the LIMIT 10 will result in different documents. So the solution is to create a query that does not depend on original order. * Fix format=csv output for ESQL geo_point * Fix format=csv|tsv output for ESQL geo_point * Refactor SpatialUtils and abstract away GeoPoint * GeoPoint implements SpatialPoint and the geo-specific knowledge was already in the SpatialUtils class, so we could increase the level of abstraction by using SpatialPoint everywhere. * This is in preparation for supporting CartesianPoint, and in future perhaps other systems. * Moved the SpatialCoordinateTypes back to QL module * For some reason the ESQL QA tests could not load it from ESQL, only from QL. Until we figure out what is going wrong with the classloader, we will return these classes to their previous module.
* Primarily the difference is only that mapping to doc-values puts x as high 4 bytes, and y as low 4 bytes, while geo_point does the opposite. * As long as we only support doc-values for the compute engine, this is the only difference we need to worry about.
Even though geo_point and cartesian_point are backed by LONG, making them implicitly sortable, we exclude them from sorting because the sort order is not particularly useful.
cb3520d
to
532838e
Compare
@@ -44,6 +44,8 @@ public final class EsqlDataTypes { | |||
|
|||
public static final DataType DATE_PERIOD = new DataType("DATE_PERIOD", null, 3 * Integer.BYTES, false, false, false); | |||
public static final DataType TIME_DURATION = new DataType("TIME_DURATION", null, Integer.BYTES + Long.BYTES, false, false, false); | |||
public static final DataType GEO_POINT = new DataType("geo_point", Double.BYTES * 2, false, false, false); | |||
public static final DataType CARTESIAN_POINT = new DataType("cartesian_point", Double.BYTES * 2, false, false, false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be Float.Bytes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM from the geo side.
I would still like to understand what are the size of the esqltype
and what is used for. I think someone with more knowledge on the engine should review it.
This was discussed with the team, and even though the current implementation would more accurately be |
return new BlockSourceReader.LongsBlockLoader( | ||
valueFetcher(blContext.sourcePaths(name()), nullValue, GeometryFormatterFactory.WKT) | ||
); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
This is plenty so long as we don't support synthetic _source without doc values. If we do that then we need more complexity here ala text or keyword. But this is fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually - does the WKT thing load everything that _source does?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually - does the WKT thing load everything that _source does?
What is the 'WKT thing'? Do you mean the fact that we display stuff as WKT in ESQL? Right now we generate the WKT from doc-values, but will instead generate it from _source in the next PR. Even if the _source was itself WKT, the final display will be generated, because the intermediate format will be binary block values. This differs from _search where the results are never generated, but always _source.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right right - you'll build WKT when you load from _source. You just aren't doing that. I suppose my question was "is the WKT thing here a parser config or can you still parse all supported point types?" Because I'm ok delaying any questions about parsing from source into the lossless render until later.
return new GeoPoint(randomDoubleBetween(-90, 90, true), randomDoubleBetween(-180, 180, true)); | ||
} | ||
|
||
public static SpatialPoint randomCartesianPoint() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These should have javadoc just because they are visible to like 100 people.
return new GeoPoint(point.getY(), point.getX()); | ||
} | ||
}, | ||
Cartesian { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/Cartesian/CARTESIAN/?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And I presume you want s/Geo/GEO/ too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah
protected XContentBuilder valueToXContent(XContentBuilder builder, ToXContent.Params params, int valueIndex) | ||
throws IOException { | ||
// TODO Perhaps this is just a long for geo_point? And for more advanced types we need a new block type | ||
long encoded = (block instanceof LongVectorBlock) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you cast to LongBlock
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can here.
## Summary Add support to `to_geopoint` as in elastic/elasticsearch#102177
This is a port of a subset of the earlier prototype at #98845
The goal is support for geo_point and point in ES|QL.
At this point we support:
geo_point
columns in ESQL output tablesto_geopoint(field)
function for encoded long and WKT keyword fieldsgeo_point
as argument toto_string(field)
(renders as WKT)format=csv
with WKT output (defaults to GeoPoint.toString() which is "lat,lon")point
type by duplicating thegeo_point
workSupport for geo_point and point with full precision (from source instead of doc-values)This last option, using source and full precision requires a new Block type, or a new encoding to the BytesRef block type, and as such is being done in a followup PR at #103698
Fixes #103585