Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PROJ 6 drops +datum= specification; transform degraded in workflows #1146

Closed
rsbivand opened this issue Sep 11, 2019 · 39 comments
Closed

PROJ 6 drops +datum= specification; transform degraded in workflows #1146

rsbivand opened this issue Sep 11, 2019 · 39 comments

Comments

@rsbivand
Copy link
Member

rsbivand commented Sep 11, 2019

This is linked to r-spatial/discuss#28.

This zip contains a simple reprex, the Broad Street pump location in a shapefile used in ASDAR (https://asdar-book.org/), includes a +datum= specification. There are also two R scripts each for sf and rgdal, one pair just using +towgs84=, the other pair using +nadgrids=. The first pair were run with PROJ 4.9.3/GDAL 2.4.2 and PROJ 6.2.0/GDAL 3.0.1 (output Rout files with *4* and *6*), the grid*.R files only under PROJ 6.2.0/GDAL 3.0.1 to use the new PROJ search path extension mechanism tested in test.cpp.

reprex.zip

Basic findings: for PROJ 4.9.3/GDAL 2.4.2, the +datum= spec gives a reasonable transformation of the input to EPSG:4326. For PROJ 6.2.0/GDAL 3.0.1, the +datum= spec is dropped on reading in sf and rgdal, and variants of +towgs84= specs need to be added manually to the CRS to get a reasonable transformation (not off by ~100m).

Side finding: writing the object with a +towgs84= with the ESRI Shapefile driver in both sf and rgdal drops the +towgs84=, but preserves it for the GPKG driver.

Only for PROJ 6: using the NTv2 grid in the Europe datumgrid zip archive indicated by projinfo gives a good transformation with both sf and rgdal using a CRS modified by adding +nadgrids= and extending the search path to include the current directory (where the *.gsb file is located, not included in zipfile, see projinfo output for download link).

Conclusion: Moving to PROJ 6 in software is feasible, but typical workflow input may suffer invisible degradation during processing for rgdal and sf workflows and their reverse dependencies. @tim-salabim (re: mapview) @mtennekes (re: tmap) @Nowosad (re. spData).

@Nowosad
Copy link
Contributor

Nowosad commented Sep 12, 2019

@rsbivand - thank you for your work in this regard. Do you have any advice on how to proceed with those changes? How can I manually check which dataset in spData could be problematic, and then how to decide on the correct +towgs84 values?

@rsbivand
Copy link
Member Author

I'm thinking that sf and rgdal and any other packages with transform functions may need to include a test on startup to see whether current PROJ diverges from PROJ 4.9.3. So far, CRAN Windows binaries are 4., so the effects are seen only where the packages are installed from source against PROJ 6.. If a package only offers files, like spData, the consequences will only occur once the objects have been read in, so *.rda and native GIS format behaviour will probably differ (the *.rda freeze the CRS representation, but we don't know how a CRS with a +datum= string will be handled - yet). One way is to mapview::mapview or tmap in view mode in the two settings and check visually; another to add the equivalent transformation to EPSG:4326 and check for equivalence of a sample point across the two settings. The QGIS approach is to ask the user to choose the transformation path manually, which is not appropriate in workflows and scripts. For geocompr, I guess adding (hidden) code to trap changes is feasible, say on ch 6. Maybe a blog / addendum to ch 6 might be a goal?

@mtennekes
Copy link
Contributor

Trying to link the concepts ellipsoid/spheroid, datum, grid. As far I as understand it:

  • the spheroid (+ellps) defines the global shape of the Earth with the distances from the center to the pole and from the center to the equator.
  • the datum (+datum) is then just a modification (i.e. transformation, rotation and scaling) of this spheroid
  • in PROJ.4, the +towgs84 tag is always needed, since transformations between CRS's always go via wgs84?
  • What is a grid exactly? What I can imagine, is that it is regular grid of lat long coordinates, with as values the difference in elevation between the surface defined by the spheroid. What is the granularity of these grids? I haven't had time to download them myself.
  • Last week in Muester, I also thought about combining the two different worlds: one of the projections/datums, and one of the grid systems like Google S2 and Uber H3. Maybe I am too naive, but it sounds like when combining one of these grid systems with the delta-elevation grid I had in mind in my previous point, we don't need to have all these different complex projections. Depending on the level of detail that is needed for the application, different versions of the delta-elevation grid are required.

@rsbivand
Copy link
Member Author

@edzer teaches using Datums and Map Projections by Iliffe and Lott https://www.whittlespublishing.com/Datums_and_Map_Projections. Maybe we need his course online?

@edzer
Copy link
Member

edzer commented Sep 13, 2019

+towgs84 gives 3 or 7 parameters for a parametric shift, datum grids give the shift for every location up to some resolution, but can then be interpolated. Reading the book Roger mentioned will make you realize that datum transformation will always be there, and will never be simple.

@rsbivand
Copy link
Member Author

I'm begining to suspect that the GDAL3/PROJ6 combination is unkind to the shapefile driver. I'll try later to add a shapefile without a +datum= but with an equivalent +towgs84= copied in manually from GDAL 2.4.2 data/datum:shift.csv (446.448 | -125.157 | 542.06 | 0.15 | 0.247 | 0.842 | -20.489 I think). We'll need to check a range of drivers to see which handle input CRS differently too.

@edzer
Copy link
Member

edzer commented Sep 15, 2019

I just submitted sf 0.8-0 to CRAN addressing the PROJ 6.2.0 issues seen on debian testing, and those caused by the changed tidyr 1.0-0 API.

@Nowosad
Copy link
Contributor

Nowosad commented Oct 31, 2019

Dear @rsbivand, @edzer and the rest of #rspatial developers,
I still try to wrap my head around the future changes related to the PROJ library (PROJ6 and after). What is the future of the proj4string definitions? Are there plans to (1) keep it, (2) drop it from #rspatial, (3) add some other way of defining projections (e.g., WKT), or (4) do something different?
(https://gdal.org/api/ogr_srs_api.html#_CPPv416OSRExportToProj420OGRSpatialReferenceHPPc).

@edzer
Copy link
Member

edzer commented Oct 31, 2019

Good question - what do you suggest?

@rsbivand
Copy link
Member Author

As of now, rgdal on R-Forge (1.5.1, very unstable, rev 862) when built with GDAL >= 3 and PROJ >= 6 will issue warnings when a DATUM is in the input vector SRS but discarded in the PROJ string. Next step to extend this to raster (see http://fwarmerdam.blogspot.com/2010/03/in-last-few-weeks-i-believe-i-have-made.html).

See thread on gdal-dev.

New list_coordOps() takes two CRS strings and returns coordinate operation pipelines. The easiest CRS strings are such as "EPSG:4326". For sf, users are going to have to insert EPSG codes, but there is somewhere to put them in a "crs" object.

In sp and dependencies, "CRS" is S4, so no free slot. I suggest the same kludge as for rings and holes, that is to add a comment() to "CRS" objects to contain the CRS string now needed. This needs to update the "projargs" slot, and needs checking too. Updating it would largely be manual, and users would have to be alerted to the need to do so.

The CRS string could also be added to sf "crs" objects, checking for NULL (as with a NULL comment), to provide somewhere to put WKT, URN, PROJJSON or other string representations. There still isn't clarity about where to put the EPOCH if known.

Feedback on rgdal very welcome. Most ideas should propagate to sf without extra difficulty. Comment now if this matters for you, @mdsumner made useful comments on the gdal-dev thread.

Notes on r-spatial/discuss#28 about packages depending on sf and rgdal for coordinate operations entered last week.

@rsbivand
Copy link
Member Author

rsbivand commented Oct 31, 2019

Maybe also extend proj string degradation testing to towgs84 and nadgrids; datum was the most obvious problem, but towgs84 is also checked in the current code, and the warning qualified if a towgs84 makes it through OK (helps with -DACCEPT_USE_OF_DEPRECATED_PROJ_API_H pj_transform()), but won't help with PROJ 6 transformation I think (these need +type=crs too anyway).

@rsbivand
Copy link
Member Author

@Nowosad : could you see whether current rgdal tickles any bugs with the spData vector files?

@edzer
Copy link
Member

edzer commented Oct 31, 2019

Link to the gdal-dev thread.

@rsbivand
Copy link
Member Author

R-Forge rgdal rev 864 adds raster read warnings (we'll see if it builds on old PROJ/GDAL). Next checks for discarded PROJ string keys/tags on write for vector and raster.

@Nowosad
Copy link
Contributor

Nowosad commented Nov 2, 2019

@rsbivand should I check spData on dev rgdal using PROJ5 or PROJ6?

@rsbivand
Copy link
Member Author

rsbivand commented Nov 2, 2019 via email

@rsbivand
Copy link
Member Author

rsbivand commented Nov 12, 2019

This is the WIP-vignette from R-Forge/rgdal 1.5-1 rev 888. Comments welcome!
PROJ6_GDAL3.zip

RFC now on http://rgdal.r-forge.r-project.org/articles/PROJ6_GDAL3.html, no need to unzip.

edzer added a commit that referenced this issue Nov 29, 2019
edzer added a commit that referenced this issue Nov 29, 2019
edzer added a commit that referenced this issue Nov 29, 2019
edzer added a commit that referenced this issue Nov 29, 2019
@edzer
Copy link
Member

edzer commented Nov 29, 2019

I started working on a wkt2 branch, anticipating GDAL3/PROJ6, and assuming @rsbivand 's sp fork will be merged into master sp. I assume that GDAL3 implies PROJ6. This branch:

  1. adds a field wkt2 to crs objects, which is only filled if GDAL >= 3.0.0, and is otherwise NA_character_
  2. derives wkt2 from the dataset read by OGR (dominant use case I assume), or from an EPSG set (as in st_crs(4326)), or else from the proj4string (as in st_crs("+proj=longlat"))
  3. uses, if wkt2 is not NA, always the wkt2 field to set a coordinate reference system on a dataset e.g. in st_write or st_transform and the target CRS in st_transform
  4. uses comment(x) for conversion from CRS (sp) to crs (sf) in case x is of class CRS (sp), if the comment is not NULL
  5. adds a proper S4 conversion from crs to CRS, which sets the comment field too.
  6. prints crs objects with their wkt2 field in case it is not NA

I now see warnings of this kind:

< Warning message:
< In showSRID(uprojargs, format = "PROJ", multiline = "NO") :
<   Discarded datum Amersfoort in CRS definition

emitted by (Roger's forked) sp, and am not sure sf should do that too (and when).

Still todo:

  • In the process of leaning more on wkt2 than proj4strings, I think that the print methods for sf and sfc should have a one-line summary of the crs that takes from the wkt rather than the proj4.
  • run / reproduce Roger's experiments here
  • offer the possibility to show more than one st_transform path, and the ability to choose a secondary option

@edzer
Copy link
Member

edzer commented Dec 17, 2019

We now have this, which reproduces Roger's example:

library(sf)
# Linking to GEOS 3.8.0, GDAL 3.0.2, PROJ 6.2.1
p = read_sf(system.file("gpkg/b_pump.gpkg", package="sf"))
st_transform(p, 3857)$geom[[1]]
# POINT (-15218.79 6712598)
# on Broadway St.: try mapview::mapview(p)
crs = st_crs(p)
crs$wkt2 = NULL # remove
st_crs(p) = crs
# Warning message:
# st_crs<- : replacing crs does not reproject data; use st_transform for that 
st_transform(p, 3857)$geom[[1]]
# POINT (-15040.05 6712507)

@rsbivand
Copy link
Member Author

Good. I guess we take things in two steps, leaving the CDN and grid deployment for the next step?

@edzer
Copy link
Member

edzer commented Dec 17, 2019

Yes. And thanks for the video. I watched it entirely. Fascinating! "Why are they doing this to me before I retire"

@rsbivand
Copy link
Member Author

"They"'ll think of something juicy for you ... convert everything to s2?

@rsbivand
Copy link
Member Author

rsbivand commented Dec 17, 2019

Aha, I pushed commits to my fork of sp to relax the need to have rgdal >= 1.5-1 installed first, but I have a check NOTE that I haven't solved for rgdal < 1.5-1 - for obvious reasons functions introduced in 1.5-1 are not used but:

* checking dependencies in R code ... NOTE
Missing or unexported objects:
  ‘rgdal::checkCRSArgs_ng’ ‘rgdal::new_proj_and_gdal’

I can't see how to use globalVariables() or suppressForeignCheck() in this context.

Earlier the conditions were run together, so an < 1.5-1 instance just failed, so this is much better, and maybe CRAN would accept an sp with a NOTE given the circumstances.

@edzer
Copy link
Member

edzer commented Dec 17, 2019

I suggest we try the latter option. It will go away when we update rgdal too, right?

@rsbivand
Copy link
Member Author

Yes, it does.

@edzer
Copy link
Member

edzer commented Dec 17, 2019

Here is the modified printing of headers of sf and sfc objects. It

  • mentions geographic/projected
  • if not NA, uses the srs->GetName() from GDAL, unless this is "unkown"
  • in case of unknown falls back to printing the proj4string
  • if NA, prints only a single line
library(sf)
# Linking to GEOS 3.8.0, GDAL 3.0.2, PROJ 6.2.1
p = st_read(system.file("gpkg/b_pump.gpkg", package="sf"))
# Reading layer `b_pump' from data source `/home/edzer/R/x86_64-pc-linux-gnu-library/3.6/sf/gpkg/b_pump.gpkg' using driver `GPKG'
# Simple feature collection with 1 feature and 1 field
# geometry type:  POINT
# dimension:      XY
# bbox:           xmin: 529393.5 ymin: 181020.6 xmax: 529393.5 ymax: 181020.6
# projected CRS:  Transverse_Mercator
# on Broadway St.: try mapview::mapview(p)
st_transform(p, 3857)
# Simple feature collection with 1 feature and 1 field
# geometry type:  POINT
# dimension:      XY
# bbox:           xmin: -15218.79 ymin: 6712598 xmax: -15218.79 ymax: 6712598
# projected CRS:  WGS 84 / Pseudo-Mercator
# epsg (SRID):    3857
#   cat                      geom
# 1   1 POINT (-15218.79 6712598)
st_transform(p, 4326)
# Simple feature collection with 1 feature and 1 field
# geometry type:  POINT
# dimension:      XY
# bbox:           xmin: -0.1367127 ymin: 51.5133 xmax: -0.1367127 ymax: 51.5133
# geographic CRS: WGS 84
# epsg (SRID):    4326
#   cat                       geom
# 1   1 POINT (-0.1367127 51.5133)
st_transform(p, "+proj=laea")
# Simple feature collection with 1 feature and 1 field
# geometry type:  POINT
# dimension:      XY
# bbox:           xmin: -10539.23 ymin: 5518341 xmax: -10539.23 ymax: 5518341
# proj4string:    +proj=laea +lat_0=0 +lon_0=0 +x_0=0 +y_0=0 +datum=WGS84 +units=m +no_defs
#   cat                      geom
# 1   1 POINT (-10539.23 5518341)
suppressWarnings(st_crs(p) <- NA)
p
# Simple feature collection with 1 feature and 1 field
# geometry type:  POINT
# dimension:      XY
# bbox:           xmin: 529393.5 ymin: 181020.6 xmax: 529393.5 ymax: 181020.6
# CRS:            NA
#   cat                      geom
# 1   1 POINT (529393.5 181020.6)

@edzer
Copy link
Member

edzer commented Dec 17, 2019

This leaves only one TODO, but I don't think the last TODO is essential before releasing this, it's a new feature really (though a nice one). I also want to allow giving a +pipe ... transformation, directly to PROJ, such that e.g. one can swap x and y coordinates.

First priority now would be writing an sf vignette and/or r-spatial blog explaining the changes so far, as they come with GDAL3 and PROJ6.

@Nowosad
Copy link
Contributor

Nowosad commented Feb 16, 2020

@edzer I am unsure if you seen it, but maybe it could be useful as a source of inspiration/disussion - https://jorisvandenbossche.github.io/blog/2020/02/11/geopandas-pyproj-crs/

@edzer
Copy link
Member

edzer commented Feb 17, 2020

Thanks @Nowosad , helpful! I like their output, it is more human-readable than WKT.

@rsbivand
Copy link
Member Author

I think the formatting is somewhere here: https://github.com/pyproj4/pyproj/blob/master/pyproj/_crs.pyx, or nearby.

@rsbivand
Copy link
Member Author

Now +towgs84=: #1328

@edzer
Copy link
Member

edzer commented Jul 8, 2020

If anyone is interested in taking up WKT formatting/parsing as a project, I'd be interested to hear!

@edzer edzer closed this as completed Jul 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants