Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up and enhance reading from (and writing to) a GRASS GIS database #93

Merged
merged 34 commits into from
Jun 18, 2024

Conversation

florisvdh
Copy link
Collaborator

@florisvdh florisvdh commented Jun 16, 2024

This PR attempts to address the following:

  • read_RAST(), read_VECT(): implement the standalone GDAL-GRASS drivers by default. This avoids exporting a GRASS GIS layer to a temporary file before reading into R. As a result one will get higher reading rates; this effect will increase with the size of the layer. Notes:
    • In read_RAST() this has only been implemented for the terra case, as it is the default.
    • Some issues still exist with the driver, such as a missing EPSG code. This is referred in the documentation. However the data are successfully returned.
    • The v.out.ogr and r.out.gdal pathways (writing a temporary file) remain in place when the necessary GDAL-GRASS driver is missing or if the use_gdal_grass_driver argument is set as FALSE.
    • in case the terra object already refers to a GRASS GIS database layer (i.e. has been created using the GDAL-GRASS driver), write_RAST() and write_VECT() will prevent calls to GRASS GIS to import such a source, as this would not succeed.
    • this closes Experimenting with reading straight from the GRASS database #75.
  • read_VECT(): provide support for the SpatVectorProxy class by adding a proxy argument (see terra::query() to use SpatVectorProxy in R).
  • arguments ignore.stderr and Sys_ignore.stdout: implement them in the default execGRASS() approach (uses system2()) – they were only implemented in the legacy approach – and make sure both arguments are exposed by read_RAST() and read_VECT().
  • write_VECT(): when the SpatVector object refers to a source file, a temporary GeoPackage is no longer written and the file is used directly by v.in.ogr. A similar shortcut was already implemented for write_RAST().

Further testing, especially with Windows and macOS are most welcome!

A small benchmark is shown below.

Code to create the benchmark
library(dplyr, warn.conflicts = FALSE)
library(rgrass)
#> GRASS GIS interface loaded with GRASS version: (GRASS not running)
grass_conf <- initGRASS(
  gisBase = "/usr/lib/grass83",
  gisDbase = "/home/floris/grassdata",
  location = "nc_basic_spm_grass7",
  mapset = "PERMANENT",
  override = TRUE
)
vec_without_gdalgrass <- bench::mark(
  vec_without_gdalgrass = read_VECT(
    "streets",
    use_gdal_grass_driver = FALSE,
    ignore.stderr = TRUE
  ),
  iterations = 4
)
suppressWarnings({
  vec_with_gdalgrass <- bench::mark(
    vec_with_gdalgrass = read_VECT(
      "streets",
      use_gdal_grass_driver = TRUE
    ),
    iterations = 4
  )
})
rast_without_gdalgrass <- bench::mark(
  rast_without_gdalgrass = read_RAST(
    "elevation_shade",
    use_gdal_grass_driver = FALSE,
    ignore.stderr = TRUE
  ),
  iterations = 10
)
rast_with_gdalgrass <- bench::mark(
  rast_with_gdalgrass = read_RAST(
    "elevation_shade",
    use_gdal_grass_driver = TRUE
  ),
  iterations = 10
)

bind_rows(
  vec_without_gdalgrass,
  vec_with_gdalgrass,
  rast_without_gdalgrass,
  rast_with_gdalgrass
) |> 
  select(expression, min, median, iterations = n_itr) |> 
  knitr::kable()
expression min median iterations
vec_without_gdalgrass 12.92s 13.09s 3
vec_with_gdalgrass 4.13s 4.18s 4
rast_without_gdalgrass 476.39ms 484.12ms 9
rast_with_gdalgrass 143.08ms 146.35ms 9

Created on 2024-06-16 with reprex v2.1.0

Some system specs:

$ ogrinfo --version
GDAL 3.8.4, released 2024/02/08
$ 
$ apt policy libgdal-grass
libgdal-grass:
  Installed: 1:1.0.2-7+jammy3
  Candidate: 1:1.0.2-7+jammy3
  Version table:
 *** 1:1.0.2-7+jammy3 500
        500 http://ppa.launchpad.net/ubuntugis/ubuntugis-unstable/ubuntu jammy/main amd64 Packages
        100 /var/lib/dpkg/status
     3.4.1-3 500
        500 http://ftp.belnet.be/ubuntu jammy/universe amd64 Packages
$ 
$ inxi -Sxxx
System:
  Host: xxx Kernel: 5.15.0-112-generic x86_64 bits: 64 compiler: gcc
    v: 11.4.0 Desktop: Cinnamon 6.0.4 tk: GTK 3.24.33 wm: muffin vt: 7
    dm: LightDM 1.30.0 Distro: Linux Mint 21.3 Virginia
    base: Ubuntu 22.04 jammy

vect() supports a missing layer argument, so supporting this too for GDAL-GRASS driver.

Also, for efficiency reasons, this commit only requests layer names in case
of polygons, to work around OSGeo/gdal-grass#46.
Copy link
Owner

@rsbivand rsbivand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am deeply sceptical about this for the following reasons:

  1. terra must be built against a GDAL installation that knows about the plugin for its GDAL version and platform. GDAL cannot be built in this way unless all the GRASS libraries it needs also being built. This may work for source installs on Linux/Unix, but cannot work for MXE/Rtools for Windows or for macOS/recipes, because using terra or sf or other R packages building against GDAL do not need that large extra burden. Using OSGEO4W will not work as MXE/Rtools build static for the chosen compilers using UCRT, but OSGEO4W has a mess of CRTs.
  2. Previous experience even with the plugin was a nightmare, as the GDAL+plugin constantly got out of synch with other components. Since the plugin was dropped by GDAL, I haven't even tried to touch it. Any modifications would need very careful prior attention to all the build scenarios. Saying that the user has to sort these out is not going to work.
  3. terra doesn't read raster data at all, but rather sets up an environment in C++ code pointing to the data source layer, unless forced to the in.memory setting. I really don't see any difference in the sense that when the "spatRaster" is in.memory, terra shadows the memory by a temporary file anyway, so instead of r.out.gdal then rast setting up pointers to the file created by r.out.gdal, terra would likely set up pointers to the single GRASS raster, but create a shadow file itself should the data be needed in the R workspace.
  4. The vect case is slightly different, because terra always reads into a "spatVector" object, but I'm unsure whether there is always a shadow file.
  5. If 1 and 2 can be shown to be feasible within reasonable developer time constraints. careful tests would be needed to show that the inclusion of the unproven (read only) driver is robust and scales.
  6. Given more time, I could find even more reasons for not pursuing the plugin; data transfer speed really isn't important - the chosen raster format "RRASTER" is not compressed and is native for terra, and was upgraded by me to support WKT2-2019.

@hellik
Copy link
Collaborator

hellik commented Jun 16, 2024

Point 1 by @rsbivand is an important one regarding winGRASS. Different GDAL versions built by different compilers may interfere. That's a packaging issue.

Though I find it an interesting approach worth to be followed.

The fallbacks are the methods currently used?

@florisvdh
Copy link
Collaborator Author

florisvdh commented Jun 17, 2024

The fallbacks are the methods currently used?

@hellik yes, if GDAL is not aware of the GDAL-GRASS driver (so indeed it needs to be set up correctly), the current implementation is used. In that case GRASS GIS is called with r.out.gdal or v.out.ogr to first write a temporary file either with the RRASTER (raster) or the GPKG (vector) GDAL driver, respectively.

For a binary driver installation that works out of the box on Ubuntu-based systems, one can use the ubuntugis-unstable PPA, which is where the official GRASS GIS release lives, and install libgdal-grass. It is on par with its GRASS GIS and GDAL dependencies in the same repo. Still, to be user-friendly and work out of the box on all systems, it would indeed be great to have it natively included in GDAL.

These are the driver issues encountered a few days ago and posted at the driver repo, with help of @neteler: OSGeo/gdal-grass#46, OSGeo/gdal-grass#47, OSGeo/gdal-grass#48.

@rsbivand
Copy link
Owner

So also this is bringing back code for example at: https://r-forge.r-project.org/scm/viewvc.php/pkg/rgrass7/R/bin_link.R?root=spgrass#l330, dropped on transfer to github. It was dropped because effectively nobody used the plugin, and because it was read-only. Please try some comparisons writing objects from R to GRASS using the plugin, as I think you suggest that it can write. I understand that this will only work for users with terra built from source against a GDAL built with the GRASS and OGR_GRASS drivers, so is effectively a reversion to how spgrass6 and rgrass7 used to work. It will not be available on platforms where terra is not built with driver-enabled GDAL.

@rsbivand
Copy link
Owner

Running rgrass built from the gdal_grass branch fails CMD check with this log:
00check.log

@florisvdh
Copy link
Collaborator Author

suggest that it can write

No, it's only relevant to read_RAST() and read_VECT(), since indeed it's a read-only driver. So in this PR the driver functionality is only added in read_RAST() and read_VECT().

In relation to the driver, write_*() functions were only updated to refuse objects that already refer to a GRASS data source.

Apart from that, this PR also adds some read_*() and write_*() functionality that is agnostic to which approach is being used (see PR description).

It will not be available on platforms where terra is not built with driver-enabled GDAL.

GDAL needs no specific configuration to use the driver; it uses its autoload mechanism to detect the driver and make it available. See https://github.com/OSGeo/gdal-grass/blob/main/README.md:

With this driver package you configure and install GDAL normally, then build and install GRASS normally and finally build and install this driver in GDAL's "autoload" directory.

On my system it was sufficient for {terra} to have been compiled against the GDAL version that was in place before the additional driver was installed. Also no additional GDAL_DRIVER_PATH environment variable (as documented at https://github.com/OSGeo/gdal-grass/blob/main/README.md) was needed for the driver to be detected, since the binary driver from the ubuntugis-unstable PPA is installed in an expected path:

$ locate gdalplugins | grep GRASS
/usr/lib/x86_64-linux-gnu/gdalplugins/gdal_GRASS.so
/usr/lib/x86_64-linux-gnu/gdalplugins/ogr_GRASS.so

@florisvdh
Copy link
Collaborator Author

Running rgrass built from the gdal_grass branch fails CMD check with this log: 00check.log

Thanks. It's introduced by 682c7fc, 'needed' in order to not get the same error on my system, where there's no rsb mapset.

So since this error now pops up on your side I guess on your machine the rsb mapset will be the active mapset, so that write_RAST() writes the layers there, invoking the error.

The solution will be to be explicit about the mapset where new layers will be written. I suggest that we create a custom mapset explicitly in the examples, so that written layers end up there, not in PERMANENT. Do you agree @rsbivand?

@rsbivand
Copy link
Owner

Yes, @florisvdh creating a mapset to write into is a good idea in the examples; will you implement it?

Copy link
Collaborator

@veroandreo veroandreo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested the PR and works nicely, thanks @florisvdh! I just needed to create an .Renviron file with the GDAL_DRIVER_PATH="/usr/lib/gdalplugins" variable in $HOME and update terra to the latest version.

In the course of this fiddling with mapsets, it was seen (by accident) that the mapset
region settings are not respected by the GDAL driver.

This only affects raster.

This is a no-go for the GDAL-GRASS raster driver as long as it doesn't expose an option to
respect the mapset's current region, or at least to set a region.

A feature request will be made at the driver's repo, but as long as it's not implemented,
we cannot consider using the driver for rasters since results with and without using the
driver must of course have the same extent and resolution.

For sake of completeness, this commit takes control of the region too, in order to
guarantee that things always work. But this will be dropped from the examples in a next
commit, followed by dropping driver-related code from read_RAST() and write_RAST().
It was seen (by accident) that the mapset region settings are not respected by the GDAL
driver.

This only affects raster.

This is a no-go for the GDAL-GRASS raster driver as long as it doesn't expose an option to
respect the mapset's current region, or at least to set a region.

A feature request will be made at the driver's repo, but as long as it's not implemented,
we cannot consider using the driver for rasters since results with and without using the
driver must of course have the same extent and resolution.

So next is dropping driver-related code from read_RAST() and write_RAST().
, 4389249) *

In the course of this fiddling with mapsets, it was seen (by accident) that the mapset
region settings are not respected by the GDAL driver.

This only affects raster.

This is a no-go for the GDAL-GRASS raster driver as long as it doesn't expose an option to
respect the mapset's current region, or at least to set a region.

A feature request has been made at the driver's repo
(OSGeo/gdal-grass#49). As long as it's not implemented, we
cannot consider using the driver for rasters since results with and without using the
driver must of course have the same extent and resolution.
@florisvdh
Copy link
Collaborator Author

Thanks for testing @veroandreo !

@florisvdh
Copy link
Collaborator Author

florisvdh commented Jun 18, 2024

@rsbivand Setting a mapset in the examples of this PR has now been taken care of (in readRAST.Rd and readVECT.Rd).

I discovered that the vect2neigh() example in readVECT.Rd still somehow modifies the PERMANENT mapset by writing 3 extra layers into it (based on the census dataset). This happens even though the custom mapset is active. Since I won't soon have time to dig into this function (already returning from the GRASS GIS meeting), I have outcommented its example so that PERMANENT remains untouched.

@florisvdh
Copy link
Collaborator Author

While fiddling with mapsets and regions yesterday, it was discovered that the GDAL-GRASS driver does not respect a mapset's region. It just reads out a raster layer as it is stored in the GRASS GIS database. This is an important difference with r.out.gdal, which applies the mapset's region settings (extent & resolution).

As this causes inconsistency between the r.out.gdal approach (i.e. writing a temporary file for use by {terra}) and the GDAL-GRASS driver approach (direct reading), we cannot consider uptake of the GDAL-GRASS raster driver already.

Consequently I have reverted the use of the GDAL-GRASS raster driver in this PR (a828acf).

A corresponding driver feature request has been posted at OSGeo/gdal-grass#49, so at least awaiting a solution for that one before reconsideration. For the driver it makes sense to take into account the region, since this GDAL driver is tailored for GRASS GIS.

Thanks @neteler and @veroandreo for the discussion about this yesterday.

Note that for vector datasets the region is not considered in GRASS GIS, so only the raster driver approach has been reverted.

@rsbivand
Copy link
Owner

@florisvdh looks OK now, would you like to merge?

@florisvdh
Copy link
Collaborator Author

@florisvdh looks OK now, would you like to merge?

OK, let's do that. I'll add updates to NEWS.md and the pkgdown website before merging.

@florisvdh florisvdh merged commit 934b528 into main Jun 18, 2024
@florisvdh florisvdh deleted the gdal_grass branch June 18, 2024 17:21
@florisvdh
Copy link
Collaborator Author

@rsbivand you can of course rephrase NEWS.md if needed. Thanks for reviewing this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Experimenting with reading straight from the GRASS database
4 participants