Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ST_Distance doesn't return the minimum distance. #979

Closed
cuteDen-ECNU opened this issue Oct 31, 2023 · 9 comments
Closed

ST_Distance doesn't return the minimum distance. #979

cuteDen-ECNU opened this issue Oct 31, 2023 · 9 comments
Assignees
Milestone

Comments

@cuteDen-ECNU
Copy link

cuteDen-ECNU commented Oct 31, 2023

Consider this statement:

SELECT ST_Distance(a2, a1), ST_Distance(a1, a2)
FROM ST_GeomFromText('MULTIPOINT((-2 0), EMPTY)') As a1
,ST_GeomFromText(' GEOMETRYCOLLECTION(POINT(1 0),LINESTRING(0 0,1 0))') As a2;
--actual{3,2}
--expected{2,2}

According to the following definition, ST_Distance returns the minimum distance between two geometries :

returns the minimum 2D Cartesian (planar) distance between two geometries

The minimum distance between a1 and a2 is 2 instead of 3. But the result of ST_Distance(a2, a1) is 3, which is not the minimum distance.
Meanwhile, the result of ST_Distance(a2, a1) is not same as ST_Distance( a1, a2).
So I believe it is a functional issue that the current test suites haven't explored.

According to the crash #977, the function related to the GEOMETRYCOLLECTION with EMPTY seems not to be explored by before test suites?

The version is the newest in GitHub:
POSTGIS="3.5.0dev 3.4.0rc1-705-g5c3ec8392" [EXTENSION] PGSQL="170" GEOS="3.13.0dev-CAPI-1.18.0" PROJ="8.2.1 NETWORK_ENABLED=OFF URL_ENDPOINT=https://cdn.proj.org/ USER_WRITABLE_DIRECTORY=/tmp/proj DATABASE_PATH=/usr/share/proj/proj.db" LIBXML="2.9.13"

@pramsey
Copy link
Member

pramsey commented Oct 31, 2023

Distance in PostGIS is not delegated to GEOS, so this issue should properly be ticketed there. That said, I will add a test case here before closing, just in case it's also an issue in GEOS.

@pramsey
Copy link
Member

pramsey commented Oct 31, 2023

Well, your particular example crashes GEOS, and in general the question of "what is the distance to EMPTY" seems a little fraught. Distance between POINT EMPTY and LINESTRING EMPTY is zero. Distance between POINT EMPTY and 'POINT(1 1)' is zero. Distance between GEOMETRYCOLLECTION(LINESTRING EMPTY, POINT(1 1)) and POINT(2 1) is ... one.

https://trac.osgeo.org/postgis/wiki/DevWikiEmptyGeometry

The case of distance between geometry A, a collection with an empty component and a non-empty component and geometry B, a non-empty geometry, seems generally unclear. Probably the "least bad" answer is to ignore the empty components?

Meanwhile, over in PostGIS land, distance between EMPTY and EMPTY is NULL.

@dr-jts
Copy link
Contributor

dr-jts commented Oct 31, 2023

Probably the "least bad" answer is to ignore the empty components?

Agreed. And this should be very simple to implement - it's always easy to ignore things.

@pramsey
Copy link
Member

pramsey commented Oct 31, 2023

It's just a little odd that distance of POINT EMPTY and POINT(1 0) is zero, while distance of GEOMETRYCOLLECTION(POINT(0 0), POINT EMPTY) and POINT(1 0) is one. And makes things a little fiddly, implementation-wise, since one cannot iterate and take the minimum distance between components.

@dr-jts
Copy link
Contributor

dr-jts commented Oct 31, 2023

It's just a little odd that distance of POINT EMPTY and POINT(1 0) is zero, while distance of GEOMETRYCOLLECTION(POINT(0 0), POINT EMPTY) and POINT(1 0) is one. And makes things a little fiddly, implementation-wise, since one cannot iterate and take the minimum distance between components.

Agreed this is an inconsistency. One way to resolve it is to return Infinite for the distance to empty geoms (as per strks suggestion). But I think this might lead to difficulties elsewhere.

@pramsey
Copy link
Member

pramsey commented Oct 31, 2023

Yeah, ordinarily A.Distance(A) would be expected to return 0 by definition... but not for empty? It's a very pick your poison situation.

@latot
Copy link

latot commented Nov 1, 2023

Hi, I just wanted to write what I said in matrix to clarify.

There some distance operations that are not still well defined, I think we all agree what a distance is two geometries exists, but the story changes in edge cases like this.

I think the best is first check conceptually what represent an empty geometry, well we know it is the absence of geometries, but to get a better the behavior of distance lets get some thoughts.

Here some examples of empty geometries:

  • Intersection of two surfaces that not intersects, is empty because no surface intersects
  • Shortest path between two parallel lines, is empty because does not exists a geometry for that
  • Any path from two nodes in graph, where the nodes are in different, the path does not exists

A geometry is basically a set of rules, the point where is something, a space that follow that rules, an empty geometry represents that exists no geometry that accomplish all the rules.
Basically is the equivalent to a system equation with no solution.

Here the question, which is the distance from a object that does not exists to anything? I think the right answer is... "does not exists", not infinite.

Lets check the difference, if the distance is infinite at least represents that exists a path between two points, you will travel infinite space but the travel/route exists.

For example the travel from -pi/2 (positive side) to pi/2 (negative side) is a existent path that is infinite large.

The distance between an Empty Geom to something should be "does not exists" because should be clear that that path also does not exists.

Lets keep this concepts different from NA and NULL, where NA usually means "you don't know", and NULL has several meanings... like in join sql where means don't exists, as pointer is used like "pointer value that can be defined", and this ones are pretty different from the Empty Geometry case.

From my point of view, I think numerically there should be some particular values to represent, NA/nan (don't know), Infinite indeterminate, and "Not exists".

Maybe there is some cases that we can need new special values to represent correctly them?

Just because I like to write this table:

Real Value + Real Value = Real Value
Real Value + Infinite = Infinite
Rea Value + Not exists = Not Exists
Real Value + Indeterminate = Indeterminate
Infinity + Infinity = Infinity
Infinity - Infinity = Indeterminate
Indeterminate + Infinity = Indeterminate
Not Exists + Infinity = Not Exists
Not Exists + Indeterminate = Not Exists

Is like... the priority adding things is like Not Exists > Indeterminate > Infinity > Real Value

@pramsey
Copy link
Member

pramsey commented Nov 1, 2023

Well, we aren't going to be able to add another value to the FP zoo, we're just a little geometry project.
I was thinking about behaviour of empty and distance within the context of other geometries. Like, if I sum up the distances between a collection of geometries and one happens to be empty, what's the answer? I think ideally the empty case just disappears. This is similar to how we handle distances between collections that have a mix of real and empty components. The empty part is just ignored. Which implies that, in distance at least, zero is close to the right answer. The trouble is that for real geometries distance = 0 implies intersects = true. Which we already have hard behaviour on: empties don't intersect. But I don't think distance = nan, because then in the summing case, we end up with a sum of nan, which seems wrong. It's almost like the distance between empties is -0.0 (representable! but not zero!). Not that I'm going there ;0

@latot
Copy link

latot commented Nov 1, 2023

Hi, Actually all this values on math like Not Exists and Indeterminate exists for a good reason, the lack of one of them causes problems and broken relations, as you said, if you choose Infinity there will be some relations broken, if you choose 0 others will be broken.....

This happens because "Not Exists" has different properties than any value you want to use to replace it, and that properties will break the geometry relations.

There exists a middle point if there is no intention to include this values:

Nan: Not known value
Zoo: Indeterminate or Not exists value

Using that definitions can do at some extent some match and handle some properties, still some will be broken but is the best I think to keep all the most coherent possible.

While the right values don't exists there will be no right answer here, is like choose your poison, I recommend the one not most intuitive, choose the one more close to the right answers.

With that said, I think in only have Nan and Zoo the best would be the distance to bee Zoo, I know is weird, some geometry collections has empty geometries, and the result will be Zoo if any geometry is Empty, but that would be the most correct answer, seems wrong probs because we are not used to work with Empty Geometries, until know I didn't put too much thoughts in how to interpret an Empty Geometry, we should learn what we are using, to know how to use them, then we will understand why the result of the distance is "Not Exists" and for that for example, be Zoo in the end.

I still think create the right values would simplify all this relations and help to improve in the future :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants