Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Soilmap: early exploration + creation of soilmap_simple #34

Merged
merged 11 commits into from
Apr 10, 2020
Merged

Conversation

florisvdh
Copy link
Member

Currently, the soilmap branch holds commits from Sep 2019 - Jan 2020:

  • early exploratory work on the raw soilmap data source, in preparation of n2khab::read_soilmap()
  • a bookdown project to create the first version of the soilmap_simple data source, which was created on Jan 22 and released March 30: https://doi.org/10.5281/zenodo.3732904

New commits will arrive in order to create a successor of the soilmap_simple GeoPackage, as discussed at inbo/n2khab#29. It will have a bsm_converted variable instead of a bsm_ge_coastalplain variable (see previous hyperlink), and it will contain a separate, non-spatial table with levels of the _explan variables (avoiding an extra 30 MB).

Compared to keeping explan vars as columns, this saves about 30 MB.
From this experience, experiments have been carried out (discarded!)
to convert all factors to integers, and storing even the 'double'-type var
'bsm_poly_id' as an integer, and store the corresponding levels in
non-spatial tables. However, this saved only about 4 MB, meaning that
the storage as 'REAL' and as 'TEXT' in SQLite does not take up much more
space (for 270550 rows) than storing as 'MEDIUMINTEGER'.
Hence, discarded these extra steps to ease writing and reading.
Non-spatial tables need to be registered in the GeoPackage
in order to conform to the GPKG standard.
For now, this is done in a manual fashion, despite the ability
of GDAL to take care of this for GPKG. I found no way to use
this ability from R, so implemented the easy, though obsoleted
approach of GeoPackage 1.0 to register non-spatial tables as
data_type "aspatial" in the gpkg_contents table; see:
https://gdal.org/drivers/vector/aspatial.html#vector-aspatial

Question on implementation in R, see:
r-spatial/sf#1345

The current way (GPKG 1.2) to handle this (data_type "attributes")
is supported by GDAL, see:
https://gdal.org/drivers/vector/gpkg.html#layer-creation-options .
But it needs more work to do it manually, so
made a GPKG 1.0 compliant version for now.
See https://www.ogc.org/standards/geopackage .

Note that GDAL is supportive when reading non-compliant
gpkg-files, as it 'will also, by default, list non spatial tables
that are not registered', which is why this worked beforehand
with st_read().
See https://gdal.org/drivers/vector/gpkg.html#non-spatial-tables .
@florisvdh
Copy link
Member Author

florisvdh commented Apr 10, 2020

Version soilmap_simple_v2 of the soilmap_simple data source has been released at Zenodo 🎈

See there for a summary of implemented changes.

Also a copy was made on our internal 'Q' network drive.

Special thanks to @hansvancalster and @DriesAdriaens for their idea to include the category explanations (optionally provided by read_soilmap(), see vignette 022 in n2khab), to @DriesAdriaens for his findings on the (now obsoleted) bsm_ge_coastalplain variable and to @w-jan who recently discovered a cosmectic CRS flaw in our older GPKG files (discussed at r-spatial/sf#1293).

Concluding this PR here. Further testing is welcomed. Issues with the data source can go into https://github.com/inbo/n2khab-preprocessing/issues, issues with n2khab-functionality can go into https://github.com/inbo/n2khab/issues.

@florisvdh florisvdh merged commit 805cfa1 into master Apr 10, 2020
@florisvdh florisvdh deleted the soilmap branch April 10, 2020 13:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant