-
Notifications
You must be signed in to change notification settings - Fork 0
/
README.Rmd
216 lines (178 loc) · 8.56 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
---
title: "ABMI Soil layers processing"
author: "Tomislav Hengl (EnvirometriX.nl)"
date: '`r Sys.Date()`'
output: github_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
```{r, include=FALSE, message=FALSE, results='hide'}
ls <- c("rgdal", "raster", "terra", "ranger", "sf")
new.packages <- ls[!(ls %in% installed.packages()[,"Package"])]
if(length(new.packages)) install.packages(new.packages)
lapply(ls, require, character.only = TRUE)
```
## ABMI layers
The Agricultural Region of Alberta Soil Inventory Database (AGRASID) contains
various GIS layers that are available publicly via the **[ABMI project page](https://abmi.ca/home/data-analytics/da-top/da-product-overview/Other-Geospatial-Land-Surface-Data/ABMI-Soil-Layers.html)**.
The code below shows how to process the ABMI’s soil layer and gap-fill them so
they can be used for spatial modeling. The dataset consists of two parts:
- a polygon geodatabase with 2 feature classes: vector 1 sq-km grid and corresponding centroids. At the Alberta border the cells are not 1 sq-km.
- a soil summary file.
A common reference column `GRID_LABEL` can be used to join the soil summary with
a feature class.
Note: the gap-filling has been implemented using modeling and hence comes with
potentially significant prediction errors / especially in the extrapolation areas.
**Use at own risk**.
Soil types codes can be accessed from:
```{r}
library(knitr)
abmi.leg = read.csv("./abmi_1km/20230119_soil/ABMI_legend.csv")
kable(head(abmi.leg[,c(3,2,1)]), format = "markdown")
```
## Rasterizing of polygons
We first need to rasterize polygons (664,767 cells) and convert them to GeoTIFFs where each
GeoTIFF represents one soil types.
```{r}
library(rgdal)
library(terra)
library(raster)
library(sf)
library(tidyverse)
if(!exists("g1km.r")){
g1km = sf::read_sf("./abmi_1km/20230119_soil/grid_1sqkm.gpkg")
g1km.xy = as.data.frame(st_centroid(g1km))
## 664,767 cells
## read the soil summary table:
val = vroom::vroom("./abmi_1km/20230119_soil/soil_summary.csv")
g1km.xy = plyr::join(g1km.xy, val)
g1km.xy$X = st_coordinates(g1km.xy$geom)[,1]
g1km.xy$Y = st_coordinates(g1km.xy$geom)[,2]
## generate gridded object:
g1km.r = SpatialPixelsDataFrame(as.matrix(g1km.xy[,c("X","Y")]),
data=g1km.xy[,names(val)], proj4string = CRS("EPSG:3400"), tolerance = 0.820279)
}
```
Some ABMI units are relatively rare and probably not suitable for gap-filling.
We focus instead on the following soil types:
```{r}
t.l = c("BdL", "BlO", "CS", "Cy", "Gr", "LenA", "LenS", "LenSP", "LenT", "Li", "Lo",
"LtcC", "LtcD", "LtcH", "LtcS", "Ov", "Sa", "Sb", "SL", "SwG", "Sy", "TB")
```
Next, we need to remove non-soil map and then can write each ABMI unit as a
separate image with values in 0--100%:
```{r}
rs = rowSums(g1km.r@data[,t.l])
sel.r = which(!rs == 0)
if(!any(file.exists(paste0("./abmi_1km/", t.l, "_1km.tif")))){
x = lapply(t.l, function(i){
r <- round(raster::raster(g1km.r[sel.r,i])/1e6*100);
r = raster::focal(r, w=matrix(1,3,3), fun=mean, na.rm=TRUE, NAonly=TRUE);
raster::writeRaster(r, paste0("./abmi_1km/", i, "_1km.tif"), datatype='INT1U', NAflag=255, options="COMPRESS=DEFLATE")
})
}
```
We can next resample the temporary images to some standard grid covering whole of
Alberta:
```{r, eval=FALSE, echo=FALSE}
x = parallel::mclapply(t.l, function(i){ system(paste0('gdalwarp ./abmi_1km/', i,
'_1km.tif -tr 1000 1000 -te 170000 5426000 866000 6660000 -r \"average\" ./tmp/', i,
'_1km.tif -co \"COMPRESS=DEFLATE\" -srcnodata 255')) }, mc.cores = length(t.l))
```
## Gap filling
Because the AGRASID covers only agricultural region of Alberta, about 30-40% of pixels
have missing values. One way to fill-in those gaps is to use correlation with
soil-forming factors and then use the modeling results to predict all missing pixels.
We can attach number of covariate layers representing relief (elevation, slope, TWI),
climate (e.g. snow probability, [CHELSA Bioclimatic variables](https://chelsa-climate.org/bioclim/)),
[cropland intensity](https://doi.org/10.5194/essd-13-5403-2021), long-term annual EVI based on MOD13Q1 product.
We can load all layers:
```{r}
if(!exists("f1km")){
f1km = raster::stack(c(paste0('./tmp/', t.l, '_1km.tif'),
"./grids1km/2020/lcv.cropland_bowen.et.al_m_1km_s_20200101_20201231_ab_epsg.3402_v20221130.tif",
paste0("./grids1km/static/", c(
#"wetland.grade_abmi_m_1km_s_2000_2014_ab_epsg.3402_v20230120.tif",
"dtm.slope_geomorphon90m_m_1km_s_2018_2019_ab_epsg.3402_v20221201.tif",
"dtm.twi_merit.dem_m_1km_s_2018_2019_ab_epsg.3402_v20221201.tif",
"dtm.elev_glo90_m_1km_s_2019_2020_ab_epsg.3402_v20221201.tif",
"bioclim.1_chelsa.clim_m_1km_s_1981_2010_ab_epsg.3402_v20221201.tif",
"bioclim.12_chelsa.clim_m_1km_s_1981_2010_ab_epsg.3402_v20221201.tif",
"snow.prob_esacci.apr_p.90_1km_s_2000_2012_ab_epsg.3402_v20221201.tif",
"annual.evi_mod13q1.v061_p50_1km_s_20150101_20211231_ab_epsg.3402_v20221201.tif",
"annual.evi_mod13q1.v061_p95_1km_s_20150101_20211231_ab_epsg.3402_v20221201.tif"))))
f1km = as(f1km, "SpatialGridDataFrame")
}
sel.m = rowSums(f1km@data[,paste0(t.l, "_1km")]) > 0
## 262,370
```
Next we need to derive the most probable class per pixel that we can then use for
model training:
```{r}
maxm <- sapply(data.frame(t(as.matrix(f1km@data[which(sel.m),paste0(t.l, "_1km")]))),
FUN=function(x){max(x, na.rm=TRUE)})
## class having the highest membership
cout <- rep(NA, length(which(sel.m)))
for(c in t.l){
cout[which(f1km@data[which(sel.m),paste0(c, "_1km")] == maxm)] <- c
}
f1km$soil = NA
f1km@data[which(sel.m),"soil"] <- cout
f1km$soil = as.factor(f1km$soil)
summary(f1km$soil)
```
This produce the following map:
```{r soil-type-gaps, echo=TRUE, fig.width=8, out.width="60%", fig.cap="Soil type map for Alberta based on most frequent component."}
plot(raster(f1km["soil"]))
```
Now that we have prepared target variable (ABMI units) and all covariates, we can fit a
random forest model to try to explain distribution of soil types using soil-forming
factors (climate, relief, vegetation etc). For this we use the ranger package:
```{r}
library(ranger)
fm.n = names(f1km)[grep("epsg.3402", names(f1km))]
fm.soil = as.formula(paste(" soil ~ ", paste(fm.n, collapse = "+")))
sel = complete.cases(f1km@data[,all.vars(fm.soil)]) & sel.m
if(!exists("m.soil")){
m.soil = ranger::ranger(fm.soil, data=f1km@data[sel,], mtry=5,
num.trees=150, importance="impurity", probability = TRUE)
m.soil
}
```
The model seems to be relatively accurate with OOB error of about 0.12. This means
that the model can be indeed used to gap fill missing pixels.
Variable importance shows that the most important variable is elevation:
```{r}
varImp = data.frame(m.soil$variable.importance)
varImp
```
## Final maps
Now that we have fitted a model to gap-fill missing values, we can bind the
existing and predicted values per soil type. We run this per ABMI unit and then
save the ouput as a probability / fraction image with values 0-100%:
```{r}
new.r = which(is.na(sel.m) & complete.cases(f1km@data[,fm.n]))
pred.soil = predict(m.soil, data = f1km@data[new.r, m.soil$forest$independent.variable.names])
#str(pred.soil)
## write final rasters:
t.i = c("BdL", "BlO", "CS", "Cy", "Gr", "LenA", "LenSP", "Li", "Lo", "LtcC", "LtcD", "Ov", "Sa", "Sb", "SL", "SwG", "Sy", "TB")
for(c in t.i){
out.tif = paste0("./grids1km/static/soiltype.", tolower(c), "_abmi_p_1km_s_2000_2014_ab_epsg.3402_v20230120.tif")
f1km$out = NULL
f1km@data[,"out"] = f1km@data[,paste0(c, "_1km")]
f1km@data[new.r,"out"] = round(100*pred.soil$predictions[,which(c == attr(m.soil$predictions, "dimnames")[[2]])])
rgdal::writeGDAL(f1km["out"], out.tif, drivername = "GTiff", type="Byte", mvFlag=255, options=c("-co COMPRESS=LZW"))
}
```
The whole process can be summarize with the following animation:
```{r comparison, echo=FALSE, fig.width=6, fig.cap="Gap-filling illustration: red color shades indicate fractions (0-100%).", out.width="80%"}
knitr::include_graphics("./img/before_after_gapfilling.gif")
```
Have in mind that the predicted ABMI units might be of poor accuracy, especially
in mountain areas and similar that are significantly different than the training data.
In summary: the gap-filled maps should be used for testing purposes only.
If you discover a bug or have an idea how to advance the analysis, please commit a code + a merge request.
## Acknowledgments
This work has been implemented within the **Alberta Background Soil Quality System**
lead by [Innotech Alberta](https://innotechalberta.ca/).