Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Common type wrappers for gemoetry/crs formats #7

Open
rafaqz opened this issue Nov 13, 2019 · 16 comments
Open

Common type wrappers for gemoetry/crs formats #7

rafaqz opened this issue Nov 13, 2019 · 16 comments

Comments

@rafaqz
Copy link
Member

rafaqz commented Nov 13, 2019

Dealing with formats for geometry and crs data has been annoying me for a while. We end up with a lot of importWKT/toEPSG methods that mean that calling method/package needs to know about all the formats instead of using generic approaches. We also need to somehow track what format a string is holding we are passing it between packages.

Using the type system can resolve this pretty neatly by wrapping a format once when it is loaded, avoiding any future validation or tracking. We can then leverage multiple dispatch to handle everything after that.

In GeoFormatTypes.jl I'm defining WellKnownText, ProjString, EPGScode and other wrappers that classify geometries and metadata, see yeesian/ArchGDAL.jl#97 for an example use case.

Once formats are wrapped we can define pretty powerful generic tools in a few lines - like this reproject method that transforms coordinates between projections in any format such as WellKnownText-> EPSGcode(4326) without the calling method/package having to know about any of the formats.

Any thoughts? It would be good to move GeoFormatTypes.jl to JuliaGeo and have it adopted in the ecosystem.

@meggart
Copy link
Member

meggart commented Nov 13, 2019

A big +1 from my side for moving GeoFormatTypes to JuliaGeo. One candidate to use the interface would be Shapefiles.jl as discussed here. So your plan would be the the crs function from GeoInterface.jl would return a GeoFormat? Sounds good to me.

@visr
Copy link
Member

visr commented Nov 13, 2019

Thanks for starting the discussion. I agree it would be nice to improve our integration regarding coordinate reference systems. And I guess we would not want that to depend on Proj4.jl for the case we just want to carry it but not reproject. Although for converting between different GeoFormat types, I'm guessing we will need Proj4.jl in almost every case. For example, say we want to support writing the .prj files in Shapefile.jl by defining the EPSG code. We cannot do this without Proj4.jl, since we have to look up the EPSG code in the proj.db.

Why is this issue both about geometry and crs formats? Isn't geometry already addressed as part of GeoInterface? Different geometry formats of course sometimes have different crs formats, but I'd consider them separately.

@meggart regarding the crs function of GeoInterface, it is currently not part of https://github.com/yeesian/GeoInterfaceRFC.jl. I guess it could be added there, haven't thought about it much.

Some more quick comments. Perhaps we can generalize the EPSG to SRID, which holds both an authority, such as EPSG, as well as an identifier. I see GeoFormatTypes also has GeoJSON and KML. The latest specifications for both define these formats always to be EPSG:4326, so I think we can leave them out. I'm not so familiar with GML.

So SRID's are nice and should be used when possible, but of course they can only be used if the projection you use is defined in a database. So we still need generic formats like PROJ string and WKT. Using PROJ strings is now discouraged by PROJ itself (https://proj.org/faq.html#what-is-the-best-format-for-describing-coordinate-reference-systems). Although we probably need some support there to be able to handle Shapefiles for instance. For WKT it is important to track which version we mean, see https://proj.org/development/reference/cpp/cpp_general.html?#_CPPv44WKT2. Ideally I think we'd want to have a struct for WKT2, such that we can serialize it to different representations such as https://proj.org/usage/projjson.html.

@rafaqz
Copy link
Member Author

rafaqz commented Nov 13, 2019

Yes @visr i should have mentioned there are two issues here: geometry formats and CRS formats. It would be nice if they were separate things entirely but unfortunately they aren't... That's one reason I changed the package name. Well known text and GML both contain geometries and CRS, which can be in the same string. We could use the type system to distinguish CRS only, CRS/Geometry, and Geometry only formats, but that's something that will need to be worked through with implementation problems.

The fact that KML is always 4326 could also be encoded in this package so no-one else has to specify that. So crs(::KML) = EPSGcode(4326) or something, then we can automated conversions to other projecitons.

The other reason for including geometric only formats is they also suffer from the problem of method permutations. GDAL/ArchGDAL does conversion between geometries in different formats and each conversion has it's own named method each way. That means the calling code has to know all the format names and methods, which is what I'm trying to avoid. GeoInterface.jl defines the spatial types but not the formats they come in, so doesn't really help here.

@meggart I'll move GeoFormatTypes.jl here if someone wants to make me an owner of JuliaGeo. Your use case and questions about WKT in the other thread were similar to mine. If we wrap the types the conversions can at least be easy and automated, and eventually we can define an all julia type-system driven format for internal use, and some day julia-based parsers. And yes GeoInterface.jl should return a wrapped GeoFormat instead of a Dict for CRS.

@rafaqz
Copy link
Member Author

rafaqz commented Nov 14, 2019

@visr Those are good points about well known text.

I was thinking we could define AbstractWellKnownText and have various specific versions that inherit from it. WellKnownText2 <: AbstractWellKnownText,ESRIWellKnownText <: AbstractWellKnownText, etc (or whatever they are).

Can you also explain a little more what you mean by having both an authority and an identifier for SRID? How would we use method dispatch on this general type?

I would also like more details on why we don't need wrappers for string geometry formats? They can definately simplify some of the methods in ArchGDAL.

@c42f
Copy link
Member

c42f commented Nov 14, 2019

We debated how to handle SRIDs and CRS a lot at my previous job (without coming to a definitive answer mind you).

Can you also explain a little more what you mean by having both an authority and an identifier for SRID?

You may find the following documentation somewhat helpful for terminology: https://github.com/JuliaGeo/Geodesy.jl#coordinate-reference-systems-and-spatial-reference-identifiers (disclaimer: I wrote this, but I am not a Geodecist)

@rafaqz
Copy link
Member Author

rafaqz commented Nov 14, 2019

I mean how do we implement that generalisation in the context of dispatch and a type hierarchy. Having EPSGcode(1234) lets us easily use dispatch to send to methods that accept epsg and look them up in the epsg database.

I was imagining other authorities that matter would get their own types, so you could do reproject(x, EPSGcode(4326), SomeAuthority(5678)) and would dispatch to methods that can do the conversion.

@c42f
Copy link
Member

c42f commented Nov 14, 2019

Ah I misunderstood. Might it make sense to have SRID{:Authority}(code) as the generalization of EPSGcode? It seems likely that the typical julia function would manipulate SRIDs from only one authority at a time. The code is not really independent from the authority (ultimately it's an index into the authority's database), so I'd put them together into a composite type by preference.

@visr
Copy link
Member

visr commented Nov 14, 2019

geometry formats and CRS formats. It would be nice if they were separate things entirely but unfortunately they aren't...

They are different concepts and I believe we should treat them seperately in our APIs. I see a CRS as a property of any geospatial data (vector/raster/mesh/...) that may be present, at different levels. Since you often want to store the CRS with your data, they are encoded with that data in different forms. But instead of having a CRS type for each format, I'd rather just have the basic CRS types such as WKT, SRID, PROJ string. So Shapefile.jl would as you suggest return a PROJ string type for GeoInterface.crs (or define an interface on it, if we can). And when writing Shapefile.jl has the logic of how to encode a PROJ string to make a valid Shapefile (simply writing it to a .prj in this case).

The fact that KML is always 4326 could also be encoded

Agree, good suggestion. Though to me this would be most natural in a potential KML package rather than in this package.

I was thinking we could define AbstractWellKnownText and have various specific versions that inherit from it.

Something like that could work. Haven't looked much into the actual WKT format or the differences between versions, so it's hard to say much at this point.

I would also like more details on why we don't need wrappers for string geometry formats? They can definately simplify some of the methods in ArchGDAL.

I can see how it would simplify this in the case of GDAL and its API that just has strings going in and out, so you currently need to select the right method for the format. I wouldn't say we don't need it, or that we cannot improve the interoperability here. So if I understand correctly, this concerns specifically geometries serialized to strings in different formats, and not the parsed versions, which already would be a ArchGDAL.Polygon or Shapefile.Polygon, both implementing the GeoInterface? I do think I see value about having interoperable wrapped strings around the ecosystem, but I'm not yet sure about the benefit to confusion ratio of having this as a second wing of interoperability next to the GeoInterface. Say if I have a GeoJSON string, I'd want to parse it with GeoJSON.jl or ArchGDAL.jl for it to become useful to me. Or if I just want to pass it along, it can stay a string. What would wrapping it in a GeoJSON string wrapper mean? I already knew it was a GeoJSON string, so GeoJSON.read works and actually ArchGDAL.read is smart enough to figure out it is GeoJSON as well (if not, we can specify the format). I guess I need a concrete use case :)

Might it make sense to have SRID{:Authority}(code) as the generalization of EPSGcode?

Yeah indeed, I was thinking exactly that. We could always add a constructor EPSG(::Integer) such that EPSG(4326) -> SRID{:EPSG}(4326).

@rafaqz
Copy link
Member Author

rafaqz commented Nov 15, 2019

So if I understand correctly, this concerns specifically geometries serialized to strings in different formats, and not the parsed versions,

Yeah mostly everything here concerns handling strings or numbers representing some kind of geospatial data that is otherwise unidentifiable or requires parsing to do so, and for whatever reason isn't parsed immediately (e.g. not required for current use case or just lazy loading). With one caveat that if something is parsed then put in a Dict it would also need a wrapper... because we can't reliably dispatch on Dict or know what it contains, as was discussed in the other thread by @meggart.

Or if I just want to pass it along, it can stay a string

This is really the problem. Some intermediate package, or just higher-level methods in the receiving package, will still need to know that it's a GeoJSON string and send it to a specific method if it wants to convert it. And what if it was KML or WellKnownText? we will need lists of custom methods like in ArchGDAL (where they are actually needed to interface with the C++ methods) and that requirement will propagate through the ecosystem.

If the strings are wrapped the method permutations are only necessary at the point of conversion - say in Base.convert(Geometry, x::GML) or some other method we define. Outside that, things can be general.

@yeesian
Copy link
Member

yeesian commented Nov 16, 2019

At some point, I'll like to have an interface for projections and reference systems in GeoInterface, and I'm not entirely convinced it needs to be outside of GeoInterface.jl just based on yeesian/ArchGDAL.jl#95 (comment). Nonetheless,

  1. I've been slow to participate of late since I'm still ramping up in my full-time job, so things on GeoInterfaceRFC has been slower moving,
  2. There is a lot of activity in Geodesy.jl that's worth considering (c.f., Common type wrappers for gemoetry/crs formats #7 (comment)), and
  3. The proposal in https://github.com/yeesian/GeoInterfaceRFC.jl is for GeoInterface.jl to move from a type-based dispatch mechanism to a trait-based dispatch mechanism. Given that GeoFormatTypes.jl is a type-based system, some thought needs to be had about how that should work out.

So I'm okay with GeoInterface being slower in the pace of development (w.r.t. GeoFormatTypes), and for GeoFormatTypes.jl to move to JuliaGeo and for packages to start adopting it (given the endorsement in #7 (comment)).


Say if I have a GeoJSON string, I'd want to parse it with GeoJSON.jl or ArchGDAL.jl for it to become useful to me. Or if I just want to pass it along, it can stay a string. What would wrapping it in a GeoJSON string wrapper mean? I already knew it was a GeoJSON string, so GeoJSON.read works and actually ArchGDAL.read is smart enough to figure out it is GeoJSON as well (if not, we can specify the format). I guess I need a concrete use case :)

Sorry I've been slow to the discussion. I also agree that I prefer to operate from understanding concrete use-cases. From what I understood in yeesian/ArchGDAL.jl#95, it allows us to have functions like

transform(sourceproj::WellKnownText, targetproj::EPSGcode, val::Geometry)

rather than having the type in the function names like what ArchGDAL currently does:

julia> ArchGDAL.from
fromGML  fromJSON  fromWKB   fromWKT

julia> ArchGDAL.import
importEPSG   importEPSGA!  importPROJ4   importURL!    importXML
importEPSG!  importESRI    importPROJ4!  importWKT     importXML!
importEPSGA  importESRI!   importURL     importWKT!

julia> ArchGDAL.to
toColorTable toISOWKT      toMICoordSys  toWKB
toGML        toJSON        toPROJ4       toWKT
toISOWKB     toKML         toRGBA        toXML

@rafaqz
Copy link
Member Author

rafaqz commented Nov 21, 2019

@yeesian, to reply to some of your points:

  1. There is a lot of activity in Geodesy.jl that's worth considering (c.f., Common type wrappers for gemoetry/crs formats #7 (comment)), and

As far as I can tell the datums etc in Geodesy.jl are mostly orthogonal to this package?

  1. The proposal in https://github.com/yeesian/GeoInterfaceRFC.jl is for GeoInterface.jl to move from a type-based dispatch mechanism to a trait-based dispatch mechanism. Given that GeoFormatTypes.jl is a type-based system, some thought needs to be had about how that should work out.

These wrappers can't be trait-based as the objects are mostly just String - we have nothing to dispatch traits on.

And yes being able to write that transform function and other things like it is the reason for this package.

GeoFormatTypes is pretty much finished for now:
https://github.com/rafaqz/GeoFormatTypes.jl

I'm keen to register it as it's one of the last things blocking GeoData.jl from having coherent load/save between NetCDF/tiff/grd/hdf5 files, which will be great.

I would like it to be part of JuliaGeo, but I'll need to work on it further so I need to be an owner here to want to transfer it here. My pull request at DimensionalArrayTraits is sitting dormant because I can't merge it even though @meggart reviewed it, so I'm hesitant to base my infrastructure on packages here.

@yeesian
Copy link
Member

yeesian commented Nov 21, 2019

I have made you an owner of JuliaGeo, sorry I took so long to get around to it -- I like how you continue to push through friction where you encounter it, and escalate it if you feel things are not moving for you.

Friction does exist in JuliaGeo, not always without reason, and should not be taken as a passive-aggressive display of people's intentions. (Volunteers work on different schedules, and with varying levels of commitment, etc. There have been multi-year initiatives and discussions that did not pan out, but they often provide learning points for the next round of packages and initiatives.)

@rafaqz
Copy link
Member Author

rafaqz commented Nov 21, 2019

Thanks! And no problem at all. I have a bunch of other projects I barely respond to at all due to priorities, and people here are comparatively pretty responsive to ideas. These current changes are just central to my paid work and other research moving forward right now, so there is a tension between wanting to work collectively and integrate (which is always better in the long run), and getting things done now. Apologies if that ever appears pushy or rushed.

@c42f
Copy link
Member

c42f commented Nov 22, 2019

there is a tension between wanting to work collectively and integrate (which is always better in the long run), and getting things done now

Well said! Collaborating with mixed priorities is always difficult.

I skimmed through GeoFormatTypes quickly, it looks useful. A few thoughts which may or may not be helpful:

  • The CRS is a combination of coordinate system (could be a map projection or something else) and the datum. (Talking about datum is as confusing as it is hard to escape!) Using the term projection to refer to both of these — as has historically been done by proj.4 — can be very confusing IMO. coordinate_reference_system or some shortening of that would be more transparent.
  • The CRS for some geoformats may not be embedded within them. Eg, you can have WKT without the CRS. At least this was valid last I knew about it. So projection is not well defined for WKT without a full WKT parser, I would think.
  • If an add-on package was to provide a parser for GML it could implement projection(::GML) but this would override the definition at https://github.com/JuliaGeo/GeoFormatTypes.jl/blob/285866a5a4d75df7df3d516bf49d51b59a5c07e2/src/GeoFormatTypes.jl#L157. May be good to just have a generic fallback for one of the abstract geo format types which returns some kind of "no parser exists for this GeoFromat" so that individual parser packages can define the parsers on demand.

As far as I can tell the datums etc in Geodesy.jl are mostly orthogonal to this package?

Unfortunately, not really! Datum is inescapable (at least for high accuracy ~decimeter level measurements). When you measure a point using a GNSS receiver, what are you really measuring with respect to? It's not the satellite constellation! The satellites are just a mechanism for transferring the reference frame defined by an ensemble of ground stations to the handset position. That is, the satellite positions themselves are defined relative to this ensemble of ground stations. If you choose different ground stations, you have a different datum. WGS84 is the datum defined by a set of ground stations maintained by the US department of defense (eg, see here: https://www.ga.gov.au/scientific-topics/positioning-navigation/wgs84 ). The ITRF datums (and regional refinements like ETRS) are defined by a different set of ground stations.

Of course, these are just some thoughts from a quick read and may not be applicable to you right now. So feel free to do with them what you will.

@rafaqz
Copy link
Member Author

rafaqz commented Nov 22, 2019

By orthogonal I mean the datum is a property of the data, and this package simply labels the format of the data and knows very little about what is contained in it. It's just explicitly categorising format standards to make it easy to write combinatorial method dispatch, it doesn't actually do anything :)

I think the main conflict is that I included the method projection and probably shouldn't have!!

The CRS for some geoformats may not be embedded within them. Eg, you can have WKT without the CRS. At least this was valid last I knew about it. So projection is not well defined for WKT without a full WKT parser, I would think.

Absolutely. Same goes for GML as you say - that's how I imagined it would work. This kind of organised type pyracy seems not that uncommon in Base type packages, especially for Julia Base methods dispatching on shared types.

But I think having the projection method might be overstepping the purpose of this package, it just seems like a very cheap bonus to having things explicitly wrapped, but maybe should be deleted. I was also throwing up wether to use projection or crs and swapped crs for projection because crs breaks the "avoid vague acronyms" rule. I actually didn't know what it meant the first time I read it in a package, but projection is obvious, though as you say somewhat erroneous.

I kind of assumed someone would pull me up on it too, so projection can totally go back to crs, or just get deleted, especially seeing it's already complicated discussion of what is otherwise a dead simple package.

Edit: the acronym thing is a larger question here too. I used KML because I don't think people know or care that the K stands for Keyhole, and they know what kml is. It also follows the name of XML so people know roughly what it is by similarity with a widely known standard. But WKT isn't universally known to be Well Known Text so I avoided the acronym. But it's all somewhat arbitrary.

@c42f
Copy link
Member

c42f commented Nov 22, 2019

This kind of organised type pyracy seems not that uncommon in Base type packages, especially for Julia Base methods dispatching on shared types.

Yeah, I don't mean to suggest that this form of type piracy is bad. It's a very reasonable way of splitting code into a lightweight core where the detail can be implemented elsewhere. All I mean is that implementing some foo(::GML) for concrete type GML should only be done in one package. If you want a fallback which says "not implemented", that's good to implement on an abstract type. That way, when loading a package which does implement foo(::GML), it adds to the method table rather than deleting and replacing an existing method.

Edit: the acronym thing is a larger question here too. I used KML because I don't think people know or care that the K stands for Keyhole, and they know what kml is.

Agreed. The larger goal of the "avoid acronyms" thing is to improve clarity and reduce unnecessary jargon so it's definitely subjective. But people will only learn that the company Keyhole existed via kml, not the other way around. At that stage, kml has become the proper name for this thing.

(Side note, I've always thought "well known text" was a really terrible name for a format. The expansion says virtually nothing about the format or the domain in which it's used.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants