spatialStats_zinc.rmd

---
##-- arkriger: October 2023
title: "Spatial Statistics with R"
output:
  html_document:
    df_print: paged
  html_notebook: default
---
<style> 
p.comment {
background-color: #DBDBDB;
padding: 10px;
border: 1px solid black;
margin-left: 25px;
border-radius: 5px;
font-style: italic;
}

</style>
<div class="alert alert-info"> <strong>NAME: </strong> _[write your name.surname between the brackets (like that name.surname)]_

</div>  

Welcome to this exercise which explores **(Geo)statistical analysis with R**. 

This assignment is an opportunity for you to dive into the `R` statistical package for spatial interpolation and in particular, to start to learn some of the capabilities of the `gstat` package (_and the various commands and some of the graphical outputs which can be produced)_.

<div class="alert alert-block alert-success"> **Our focus is understanding concepts, the analysis, potential and application of these methods.** </div>

We do not cover all the details of the `gstat`. If you are unsure what the parameters for a particular function are or if you wish to explore other arguments related to a function, please consult the `help` files or the relevant package vignettes.

In this assignment, you will work through the two major _**interpolation**_ categories. These are: 

&nbsp;&nbsp;&nbsp;**1. Deterministic Methods**  
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;a. Thiessen polygons  
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;c. Linear Regression  
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;b. Inverse Distance Weighting  
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;d. Ordinary Least Squares  
&nbsp;&nbsp;&nbsp;**2. Stochastic Methods**  
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;a. Variograms and Kriging  
&nbsp;&nbsp;&nbsp;**3.** we also briefly highlight technics to **interrogate the quality of an interpolation** with  
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;a. $n$-fold cross Validation and  
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;b. Residual Mean Squared Error (rmse).

<div class="alert alert-danger">
  <strong>REQUIRED!</strong> 
  
You are required to insert your outputs and any comment into this document. The document you submit should therefore contain the existing text in addition to:

 - Plots and other outputs from executing the code chunks
 - Discussion of your plots and other outputs as well as conclusions reached.
 - This should also include any hypotheses and assumptions made as well as factors that may affect your conclusions.
</div>

To help you  interpret and understand your results, please consult the following resources:

|||
|-|:---|
|**Chapter 15:**|Introduction to Geographic Information Systems [9th ed.] Kang-tsung Chan.|
|**Chapter 8:**|Applied Spatial Data Analysis with R by Bivand et. al. (2008).|

A copy of these are available on [Amathuba](https://amathuba.uct.ac.za/d2l/login?sessionExpired=0&target=%2fd2l%2fle%2flessons%2f30111%2ffolders%2f1057404) inside the `Resources` folder, Study Material & Books

----

```{r install }
options(prompt="> ", continue="+ ", digits=3, width=70,  show.signif.stars=F, repr.plot.width=7, repr.plot.height=7)
rm(list=ls())

# Install necessary packages: You only need to run this part once
##- install.packages(c("sf", "gstat", "lattice", "automap"))))

#library(sp)
library(sf) # 'simple features' representations of spatial objects
library(gstat) # geostatistics
library(ggplot2) # plotting
#library(lattice)
#library(automap)
```

```{r help }
#- help
help(package="gstat")
```

```{r data }
#data(package="sp")
```

<div class="alert alert-success">
  <strong>THE DATASET: This assignment is based on the meuse data set.</strong> 

The data set which can be found in the `gstat` package consists of 155 samples of top soil heavy metal concentrations (ppm), along with a number of soil and landscape variables. The samples were collected in a flood plain of the river Meuse, near the village Stein in the Netherlands. Historic metal mining has caused the widespread dispersal of lead, zinc, copper and cadmium in the alluvial soil. 
</div>

<div class="alert alert-warning"> <strong>QUESTION!</strong> </div>

- **Question 1.**  What would be the purpose of producing a map of zinc or cadmium deposits? Who would use a map of this nature? 

<p class="comment">
[ Answer 1. click in this cell and type your answer here. your answer must be between the outer [] brackets ]
</p>


```{r load-data }
data("meuse", package = "sp")
```

```{r descr-data }
#- describe data
meuse
```

```{r class }
class(meuse)
```

```{r summary }
summary(meuse)
```

Notice the spatial aspect of the data (x and y) are fields. We need to explicitly define the data as a spatial object. We do this through identifying the coordinates.

```{r to-sf }
meuse.sf <- st_as_sf(meuse, coords = c("x","y"))
class(meuse.sf)
```

```{r structure}
str(meuse.sf)
```

<div class="alert alert-warning"> <strong>QUESTION!</strong> </div>

- **Question 2.** Which field refers to the spatial object’s geometry? What is its data type?

<p class="comment">
[ Answer 2. click in this cell and type your answer here. your answer must be between the outer [] brackets ]
</p>

Although we have coordinate values we have yet to define a coordinate reference system (crs)

```{r crs }
st_crs(meuse.sf) <- 28992
print(st_crs(meuse.sf))
```

-----
#### Explore the dataset

<div class="alert alert-warning"> <strong>QUESTION!</strong> </div>

- **Question 3.** Execute the following 5 `blocks (chunks)` of `code` and briefly describe **ALL** the outputs

```{r cadmium-summary }
# Summary Statistics
summary(meuse.sf$zinc) 
summary(log(meuse.sf$zinc))
```

```{r plot-stem-leaf }
# Steam and leaf plot
stem(meuse.sf$zinc) 
stem(log(meuse.sf$zinc))
```

```{r plot-hist }
# Histogram and Q-Q Plots
par(mfrow=c(1,2)) # arranged in 1 row and 2 columns
hist(meuse.sf$zinc, n=20, main = "Histogram of Zinc (ppm)")
abline(v=mean(meuse.sf$zinc), col="red")-
abline(v=median(meuse.sf$zinc), col="blue")

hist(log(meuse.sf$zinc), n=20, main = "Histogram of Log-Zinc (ppm)")
abline(v=mean(log(meuse.sf$zinc)), col="red")
abline(v=median(log(meuse.sf$zinc)), col="blue", main = NULL, sub = NULL)
```
Mean (red) and Median (blue) have been highlighted.

```{r plot-box }
par(mfrow=c(1,2))
boxplot(meuse.sf$zinc, main = "Boxplot of Zinc (ppm)")
boxplot(log(meuse.sf$zinc), main = "Boxplot of Log-Zinc (ppm)")
```

```{r plot-qq }
par(mfrow=c(1,2))
qqnorm(meuse$zinc,  main = "Q-Q Plot of Zinc  (ppm)")
qqnorm(log(meuse$zinc),  main = "Q-Q Plot of Log-Zinc (ppm)")
par(mfrow=c(1,1)) # Reset to default plotting of 1 figure per page
```

<div class="alert alert-info"> <strong>HINT</strong> Describe the original and log transformed variables with respect to data distribution _(symmetry, modality, distribution, variance [or homoescedascity] presence of outliers, etc.)_. Also indicate if the transformation was appropriate for the data.</div>

<p class="comment">
[ Answer 3. click in this cell and type your answers here. your answers must be between the outer [] brackets  
- the summary:  
- the histogram:  
- the boxplot:  
- the qq plot:  
- the mean, median, distribution, etc.
      ]</p>

To add context; the dataset includes a river object.

```{r river }
data(meuse.riv, package="sp")
class(meuse.riv)
```

```{r river-crs }
meuse.riv.sf <- st_sfc(st_linestring(meuse.riv, dim="XY"), crs = st_crs(meuse.sf))
class(meuse.riv.sf)
```

```{r river-summary }
summary(meuse.riv.sf)
```

<div class="alert alert-warning"> <strong>QUESTION!</strong> </div>

- **Question 4.** Execute the following `code chunk` and briefly describe what you observe with respect to the spatial distribution of Cadmium.

```{r points-plot }
#basic plot
plot(meuse.sf['zinc'], asp = 1, pch = 1) 
```

</style>
<div class="alert alert-info"> <strong>HINT:</strong> How are the points distributed over the study area? For example, are they evenly-spaced (gridded)? Random? Denser in some areas?</div> 

<p class="comment">
[ Answer 4. click in this cell and type your answer here. your answer must be between the outer [] brackets ]
</p>

We can add context with the river

```{r points-plot02 }
plot(meuse.sf["zinc"], reset = FALSE, nbreaks = 64, pch = 20, 
     cex=4*meuse.sf$zinc/max(meuse.sf$zinc), 
     #col=rainbow(100),
     main = "Zn concentration [ppm]") 
plot(meuse.riv.sf, add = TRUE)
```

<div class="alert alert-warning"> <strong>QUESTION!</strong> </div>

- **Question 5.** Talk about positions of the data in relation to the bounds of the region, the locations of high and low values and the presence of possible outliers.

<p class="comment">
[ Answer 5. click in this cell and type your answer here. your answer must be between the outer [] brackets ]
</p>

Or we could `plot()` the dataset differently

```{r points-plot03 }
ggplot(data = meuse.sf) + 
  geom_sf(mapping = aes(size=zinc, color=dist.m)) + 
  labs(x = "Longitude", y = "Latitude", title = "Meuse River (NL)", 
       color="Distance to river [m]", size = "Zn concentration, ppm")
```

This specific dataset comes with a 40*40-m interpolation grid (a set of regularly-spaced points that covers the study area and specify locations were predictions will be done)

```{r grid }
data(meuse.grid, package="sp")
class(meuse.grid)
```

```{r grid-names }
names(meuse.grid)
```

Like the sample points; the grid needs to be explicitly defined

```{r grid-crs }
meuse.grid.sf <- st_as_sf(meuse.grid, coords = c("x", "y"))
st_crs(meuse.grid.sf) <- st_crs(meuse.sf)
summary(meuse.grid.sf)
```

If you explore the grid object you'll notice it contains several attributes

```{r grid-summary }
summary(meuse.grid.sf)
```

Its even possible to plot some of the attributes

```{r ffreq-plot }
plot(meuse.grid.sf["ffreq"], pch = 15,
main = "Meuse River, flooding frequency classes")
```

<div class="alert alert-warning"> <strong>QUESTION!</strong> </div>

- **Question 6.** What is the meaning of flood frequency class 1? What is its spatial distribution? 

</style>
<div class="alert alert-info"> <strong>HINT:</strong> Consider using the `help` function.</div> 

<p class="comment">
[ Answer 6. click in this cell and type your answer here. your answer must be between the outer [] brackets ]
</p>

___

<div class="alert alert-success">
  <strong>Why do we need to interpolate?</strong> 

Generally we want to _interpolate_ because **we either do not have the resources _(money and time)_ or its simply not feasible** to collect data everywhere. 

Our data will have gaps. When these cases arrive, and they often do, we need a method to _predict_ values where we have none. 
</div>

## 1. Deterministic (non-geostatistical) Methods 

**These are models where arbitrary** _(random)_ **or empirical** _(observed)_ **parameters are used with no estimates of the model error and no assumptions about the variability of a feature.** 

In this exercise we will cover four.

### 1. a. Thiessen polygons

Also called a voronoi diagram these methods are characterized by abrupt edge changes with only one point for each prediction.

```{r voronoi-vector }
#- Voronoi tesselation
pnts <- st_union(meuse.sf)
voronoi_grid <- st_voronoi(pnts)
voronoi_sf <- st_sf(geometry = st_collection_extract(voronoi_grid, "POLYGON")) |>
  st_join(meuse.sf[,"zinc"])
```

```{r plot-voronoi-vector }
#-- plot
plot(voronoi_sf[,"zinc"],
     xlim=c(178605,181390),
     ylim=c(329714,333611), main="Thiessen Polygons")
#title("Tesselated Voronoi Surface")
```
Or we could create a tessellated surface directly

```{r voronoi }
# Voronoi tesselation
thiessen = krige(log10(zinc) ~ 1, meuse.sf, meuse.grid.sf, nmax = 1)
```

```{r plot-voronoi }
#-- plot
pts.s <- list("sp.points", meuse.sf, col="white",pch=20)
plot(thiessen["var1.pred"], pch=15, main="Thiessen Polygons")
```
Notice we have **not _interpolated_**. We have assigned a value to an area.

___

## 1. b. Regression Modelling 

A common non-spatial (feature-space) approach to **_approximation_** is to model one variables’ distribution (the dependent or response variable) by one or more other variables (the independent or predictor variables). This is sometimes commonly called **_'regression modelling'_**.

<div class="alert alert-success"> <strong>CONCEPT CHECK: Regression Modeling and Interpolation</strong> 

While this might seem excessive; the words we use here are relevant and their application is determined by the challenge. 

When we **interpolate** we look for a function that fits the values of a dataset **exactly**. Here we say: given $n$ $(x_i, y_i)$ points we look for a function $F$ that satisfies $F(x_i, y_i) = z_i$. 

Very often, within the spatial domain what this mean is; we generally have a coordinate value $(x, y)$ and want to predict a $z$. Additionally $F$ is typically a polynomial (1st, 2nd, etc) or a Spline.

**Regression** refers to fitting a function to minimise some cost; typically **_sum of squares error_**. What this means is given a dataset of $n$ points regression looks to **_find the line of best fit_**. 

Regression can approximate while Interpolation can predict. Interpolation goes further than regression because it typically uses a regression model to predict values **between** _(at upsampled locations)_ a given dataset. 

We start with Regression and follow on with Interpolation.

</div>

<div class="alert alert-warning"> <strong>QUESTION!</strong> </div>

- **Question 7.**  Use the following 8 `code chunks` to perform the various analyses required for a regression model with distance to river as an independent variable.

```{r plot-dist }
# Map of distance to river: 
meuse.grid.sf$sqrtdist <- sqrt(meuse.grid.sf$dist)
plot(meuse.grid.sf["sqrtdist"], pch = 15, 
     #sp.layout = list("sp.points", meuse.sf, col = 3, cex=.5), 
     main = "Distance to river")
```

```{r plot-scatter }
# Map of distance to river: 
plot(log(zinc)~sqrt(dist), as.data.frame(meuse.sf),
     main="Scatterplot of log(Zn) vs. Squ. dist.")
abline(lm(log(zinc)~sqrt(dist), meuse.sf))#, col="blue")
```

```{r fit-regression }
# Fit the regression model
zinc.lm <- lm(log(zinc)~sqrt(dist), meuse.sf)
```

```{r summary-reg }
# Get a summary of the regression model
summary(zinc.lm)
```

```{r plot-diagnostic }
# Get diagnostic plots
layout(matrix(1:4, ncol=2))
plot(zinc.lm, add.smooth = FALSE)
layout(1)
```

<!-- # ```{r plot-diagnostic02 } -->
<!-- # # Get diagnostic plots -->
<!-- # layout(matrix(1:4, ncol=2)) -->
<!-- # plot(zinc.lm, add.smooth = FALSE) -->
<!-- # layout(1) -->
<!-- # ``` -->

```{r predict-std-error }
# Get Predicted Values and Standard Error of Fit for all locations on the grid
meuse.grid.sf$lzn.fit <- predict(zinc.lm, meuse.grid.sf)
meuse.grid.sf$se.fit <- predict(zinc.lm, meuse.grid.sf, se.fit=TRUE)$se.fit
```

```{r plot-preds }
# Plot the predicted values
plot(meuse.grid.sf['lzn.fit'], pch = 15, #"lzn.fit", #sp.layout = meuse.lt,
     main = "Log(Zinc) - ppm: Regression model values")
```

```{r plot-std-fit }
# Plot the Standard Error of fit
plot(meuse.grid.sf["se.fit"], pch = 15,
     main = "Log(Zinc) - ppm: Regression model Standard Error of fit")
```

<div class="alert alert-info"> <strong>HINT</strong> You should include a discussion on the nature of relationship between the dependent and independent variable, the model fit: model summary and regression diagnostics [say which, (if any), of the assumptions underlying linear regression are violated and the quality of interpolation with respect to the distribution and density of controls points etc. </div>

<p class="comment">
[ Answer 7. click in this cell and type your answers here. your answers must be between the outer [] brackets  
- the distance to river:  
- the scatter plot:  
- the regression summary:  
- the diagnostic plots:  
- the standard error, prediction and standard fit:
      ] </p>

___

### 1. b. Inverse Distance Weighting (IDW)

With IDW we start **_interpolating_**. This means we convert point data, of numerical values, into a continuous surface and visualize how the data may be distributed across space.

IDW _interpolates_ point data by using a weighted average of a variable from nearby points to predict the value of that variable for each location. The weighting of the points is determined by their inverse distances; drawing on [Tobler’s first law of geography](https://en.wikipedia.org/wiki/Tobler%27s_first_law_of_geography) which states: 

<div align="center">
_'everything is related to everything else, but near things are more related than distant things'_. </div>

____
<div class="alert alert-success"> <strong>CONCEPT CHECK:</strong> This is a concept we will come back to when we introduce spatial correlation.
</div>

<div class="alert alert-warning"> <strong>QUESTION!</strong> </div>

- **Question 8.**  Use the following `code chunk` to generate interpolated surfaces of log (zinc) concentrations obtained using Inverse-Distance Weighting (IDW) with different power functions (`idp`). Discuss all outputs and draw conclusions. Give reasons for your conclusions.

```{r idw }
# Run the IDW interpolation. check the effect of power (p) on the IDW interpolation by changing the value of "idp"

meuse.grid.sf$idwp05 = idw(log(zinc)  ~ 1, meuse.sf, meuse.grid.sf, idp = 0.5)$var1.pred
meuse.grid.sf$idwp25 = idw(log(zinc)  ~ 1, meuse.sf, meuse.grid.sf, idp = 2.5)$var1.pred
meuse.grid.sf$idwp5 = idw(log(zinc)  ~ 1, meuse.sf, meuse.grid.sf, idp = 5)$var1.pred
meuse.grid.sf$idwp10 = idw(log(zinc)  ~ 1, meuse.sf, meuse.grid.sf, idp = 10)$var1.pred

plot(meuse.grid.sf[c("idwp05",  "idwp25", "idwp5", "idwp10")], pch=15,
     main = "Log(Cadmium) - ppm , IDW Interpolation")
```

<div class="alert alert-info"> <strong>HINT</strong> You should say something about the role of the power function and the quality/behaviour of the predictions with respect to the distribution and density of controls points.</div>

<p class="comment">
[ Answer 8. click in this cell and type your answer here. your answer must be between the outer [] brackets ] </p>

___

## 1. c. Trend Surface Analysis with Ordinary Least Squares (OLS)

We are now firmly in the realm of _Trend Surface Analysis_. What we mean by a **_trend surface_** is: a variable (say $z$, zinc deposits, elevation, etc.) can be expressed as some smooth function of coordinates ($x_i, y_i$). A polynomial function of the coordinates similar to:

$$
    Z = 𝜷_0 + 𝜷_1x + 𝜷_2y + ϵ
$$

where: $Z$ is the is the elevation of the surface at $(x,y)$; $𝜷$ are possible coefficients representing the average elevation _(and possibly slope and curvature)_ in the $x$ and $y$ directions and $ε$ is the error term accounting for small fluctuations and deviations from the average.

In other words we; _predict_ values at upsampled locations (the `meuse grid`) with a 1st-, 2nd- or 3rd-order polynomial. The higher the degree of the polynomial the more the surface matches the original data. We should however take care.

We want realistic results within our dataset. **The higher the degree the more extreme the extrapolations.**

<div class="alert alert-warning"> <strong>QUESTION!</strong> </div>

- **Question 9.**  Run the following `code chunk` to perform trend surface interpolation. Compare the three interpolation maps and discuss the results. Which one gives a better fit? Why?

```{r ols }
#-- trend surface up to degree 3
#- Note: tr1, tr2 and tr3 are names for Trend surface of order 1, 2 and 3 respectively

meuse.grid.sf$tr1 = krige(log(zinc) ~ 1, meuse.sf, meuse.grid.sf, degree = 1)$var1.pred
meuse.grid.sf$tr2 = krige(log(zinc) ~ 1, meuse.sf, meuse.grid.sf, degree = 2)$var1.pred
meuse.grid.sf$tr3 = krige(log(zinc) ~ 1, meuse.sf, meuse.grid.sf, degree = 3)$var1.pred
plot(meuse.grid.sf[c("tr1", "tr2", "tr3")], pch=15,
     main = "Log(zinc) - ppm Trend Surface Interpolation") #main="IDW") #"idwp1",

```

<div class="alert alert-info"> <strong>HINT</strong> Talk about the appropriateness of trend orders with respect to the nature of the interpolated surface, the quality of interpolation with respect to the distribution and density of controls points.</div>

<p class="comment">
[ Answer 9. click in this cell and type your answer here. your answer must be between the outer [] brackets ] </p>

___

### 2. Variogram and Kriging

While Ordinary Least Squares (OLS) predicts with a polynomial and applies a general error term we can in fact go further. 

One of the principles of Spatial Data Science is; **_data is spatially correlated_**. Patterns and particularly residuals (errors) have a spatial structure / are connected.

We can model the spatial structure of residuals with a **variogram**; and then use this model to refine a trend surface with Kriging; a powerful geostatistical interpolation method.

First; we investigate the variance of the dataset _(how the observed values change)_ then model and later fit this model to a special form of linear interpolation: Kriging. 


<div class="alert alert-success"> <strong>CONCEPT CHECK: Think about what this means.</strong> 

We determine the structure (spatial correlation) of the dataset. And we use this knowledge to refine our prediction.  
</div>

### - The Variogram

In geostatistics, **spatial correlation** is modeled with a **variogram**. What it does is explain the degree to which nearby locations have similar values using _semi-variance_. 

<div class="alert alert-warning"> <strong>QUESTION!</strong> </div>

- **Question 10.**  Use the next set of commands to estimate, model and fit the variogram and comment on the output, plots and results.

```{r h-scatter }
###################################################
### 8.4 Estimating Spatial Correlation: The Variogram - Bivand et al. (2008)
### 8.4.1 Exploratory Variogram Analysis
###################################################

### h-scatterplots/ lagged scatterplots
hscat(log(zinc)~1, meuse.sf, (0:8)*100)
```

```{r v-cloud }
### Variogram cloud
plot(variogram(log(zinc) ~ 1, meuse.sf, cloud = TRUE))
```

```{r binned-variogram }
###################################################
### Sample variogram (binned variogram) plot of (8.4)
###################################################
plot(variogram(log(zinc) ~ 1, meuse.sf))
```

```{r variogram-anistropy }
###################################################
### Variograms in four different angles
###################################################
plot(variogram(log(zinc) ~ 1, meuse.sf, alpha = c(0, 45, 90, 135)))
```

```{r variogram-cut }
###################################################
### Override the default cutoff and interval width values ###  See Bivand et al. (2008)
###################################################
plot(variogram(log(zinc) ~ 1, meuse.sf, cutoff = 1000, width = 50))
```

```{r variogram-distance }
###################################################
### Specifying interval for the distance vector  - See Bivand et al. (2008)
###################################################
variogram(log(zinc) ~ 1, meuse.sf, boundaries = c(0,50,100,seq(250,1500,250)))
```

```{r variogram }
######################################### ##########
### Variogram in Fig. 8.6 - Bivand et al. (2008)
###################################################
v <- variogram(log(zinc) ~ 1, meuse.sf)
plot(v)
```

```{r variogram-initial }
###################################################
### Initial values for the variogram fit 
###################################################
v.fit <- fit.variogram(v, vgm(1, "Sph", 800, 1))
plot(v, pl = T, model = v.fit)
```

```{r variogram-partial }
###################################################
### Partial fitting of variogram coefficients : PAGE 204 - Bivand et al. (2008)
###################################################
fit.variogram(v, vgm(1, "Sph", 800, 0.06), fit.sills = c(FALSE, TRUE))
```

```{r variogram-anistropy-plot }
###################################################
### 8.4.4 Anisotropy  - Bivand et al. (2008)
###################################################
v.dir <- variogram(log(zinc)~1, meuse.sf, alpha=(0:3)*45) 
v.anis <- vgm(.6, "Sph", 1600, .05, anis=c(45, 0.3))

###################################################
### Fig. 8.7 - Bivand et al. (2008)
###################################################
plot(v.dir, v.anis)
```

```{r variogram-map }
###################################################
### variogram map - Bivand et al. (2008)
###################################################
plot(variogram(log(zinc)~1, meuse.sf, map=TRUE, cutoff=1000, width=100))
```

<div class="alert alert-info"> <strong>HINT</strong> Page numbers or plot numbers from sections with similar outputs from Bivand et al. (2008) have been indicated in the comments attached to the codes. Please refer to those sections/plots for guidance. In general, discuss all outputs (text and plots). What they are, what they do, what assumptions have been made etc.</div>

<p class="comment">
[ Answer 10. click in this cell and type your answer here. your answer must be between the outer [] brackets  
- the lagged scatterplots:  
- the variogram cloud and sample variogram:  
- variogram in four directions and what these indicate:  
- the variogram model and fit. its sill range and nugget:  
      ]</p>

### - Kriging

Now that we have modelled the structure of our dataset _(the spatial correlation / variance)_, with a variogram, we use this knowledge and refine our prediction. We now predict with Kriging.

Kriging allows variance to be non-constant.If you recall the general form of a Least Squares solution from earlier. Notice the error term: $ϵ$. One value applied to the entire dataset. 

$$
    Z = 𝜷_0 + 𝜷_1x + 𝜷_2y + ϵ
$$
The error term applied with Kriging is dependent on the distance between the points; as modeled by the variogram. It are the spatially correlated residuals that we modeled in the previous step that allows us to refine our _prediction_.

<div class="alert alert-warning"> <strong>QUESTION!</strong> </div>

- **Question 11.** Use the next `Chunck` to perform Kriging interpolation and comment on the output, plot and results.


```{r osk-ok-kriging }
###################################################
### 8.5.1 Simple Kriging and Ordinary Kriging  - Bivand et al. (2008)
###################################################
lz.sk <- krige(log(zinc)~1, meuse.sf, meuse.grid.sf, v.fit, beta = 5.9)
lz.ok <- krige(log(zinc)~1, meuse.sf, meuse.grid.sf, v.fit)

par(mfrow=c(1,2)) # arranged in 1 row and 2 columns
plot(lz.sk['var1.pred'], pch=15, nbreaks = 64, main ="Simple Kriging [log10(Zn ppm)]",
     reset = FALSE)
plot(lz.ok['var1.pred'], pch=15, nbreaks = 64, main = "Ordinary Kriging [log10(Zn ppm)]")
par(mfrow=c(1,1)) # Reset to default plotting of 1 figure per page
```

<p class="comment">
[ Answer 11. click in this cell and type your answer here. your answer must be between the outer [] brackets ] </p>


<div class="alert alert-warning"> <strong>QUESTION!</strong> </div>

- **Question 11.**  What other output(s) would you generate to investigate the quality of the Simple and Ordinary Kriging Interpolation?

<div class="alert alert-info"> <strong>HINT</strong> How would you judge the quality of the interpolation? </div>

<p class="comment">
[ Answer 11. click in this cell and type your answer here. your answer must be between the outer [] brackets ] </p>

- **Question 12.**  Do a visual comparison of **_ALL_** the interpolation results from the previous methods and discuss your observations

<div class="alert alert-info"> <strong>HINT</strong> What differences do you see from the interpolated maps you produced in terms of the behaviour of the predictions with respect to the distribution and density of controls points etc. </div>

<p class="comment">
[ Answer 12. click in this cell and type your answer here. your answer must be between the outer [] brackets ] </p>

- **Question 13.**  How else would you compare the results from the various interpolation methods?

<div class="alert alert-info"> <strong>HINT</strong> Summarise section 15.5 of [9th ed.] Kang-tsung Chan. </div>

<p class="comment">
[ Answer 13. click in this cell and type your answer here. your answer must be between the outer [] brackets ] </p>

### 3. Validation

While we have _**looked**_ at, and performed a **_qualitative analysis_** of, how the different interpolations differ we now perform a statistical analysis and execute a **_quantitative assessment_**.

We evaluate the quality of our predictions with two measures.

- $N$-fold cross validation and
- Residual Mean Squared Error (RMSE)


```{r cross }
idw.cv.a <- krige.cv(log(zinc) ~ 1, meuse.sf, nmax = 7, nfold=5, set = list(idp = 2.5))
ols.cv.a <- krige.cv(log(zinc) ~ 1, meuse.sf, degree = 2, nmax = 40, nfold=5)
ok.cv.a  <- krige.cv(log(zinc)~1, meuse.sf, v.fit, nmax = 40, nfold=5)
```

```{r idw-cv }
#- idw-2.5 cv at 5-fold
idw.cv.a[1:5,]
```

```{r ols-cv }
#- 2nd-ols cv at 5-fold
ols.cv.a[1:5,]
```

```{r iok-cv }
#- ok cv at 5-fold
ok.cv.a[1:5,]
```
<div class="alert alert-warning"> <strong>QUESTION!</strong> </div>

- **Question 14.** Explain what we mean with $N$-fold validation and comment on the other validation method offered by the `krige.cv()` function of the `gstat` package.

<p class="comment">
[ Answer 14. click in this cell and type your answer here. your answer must be between the outer [] brackets ] </p>

```{r cv-correlation}
par(mfrow=c(1,3))
plot(var1.pred ~ observed, idw.cv.a, main="IDW 2.5 ", ylab="IDW pred.")
plot(var1.pred ~ observed, ols.cv.a, main="2nd-order OLS", ylab="OLS pred.")
plot(var1.pred ~ observed, ok.cv.a, main="OK", ylab="OK pred.")
par(mfrow=c(1,1)) # Reset to default plotting of 1 figure per page
```
<div class="alert alert-warning"> <strong>QUESTION!</strong> </div>

- **Question 14.** Discuss the correlation between the predicted and the observed values. 

<div class="alert alert-info"> <strong>HINT</strong> Imagine a _**line-of-best-fit**_ running through the correlation. </div>

<p class="comment">
[ Answer 14. click in this cell and type your answer here. your answer must be between the outer [] brackets ] </p>

#### Examine the distribution _(and magnitude)_ of the residuals .

```{r cv-bubble-idw }
plot(idw.cv.a["residual"], main = "log(zinc): IDW 2.5 5-fold CV residuals", #reset = FALSE,
     cex=4*idw.cv.a$residual/max(idw.cv.a$residual), reset = FALSE)
plot(meuse.riv.sf, add = TRUE)
```

```{r cv-bubble-ols }
plot(ols.cv.a["residual"], main = "log(zinc): 2nd-order OLS 5-fold CV residuals",
          cex=4*ols.cv.a$residual/max(ols.cv.a$residual), reset = FALSE)
plot(meuse.riv.sf, add = TRUE)
```
```{r cv-bubble-ok }
plot(ok.cv.a["residual"], main = "log(zinc): OK 5-fold CV residuals",
          cex=4*ok.cv.a$residual/max(ok.cv.a$residual), reset = FALSE)
plot(meuse.riv.sf, add = TRUE)
```
```{r cv-box }
boxplot(idw.cv.a$residual, ols.cv.a$residual, ok.cv.a$residual, main="IDW 2.5 , 2nd-order OLS, OK")
```

```{r rmse-idw}
#- idw 2.5 rmse
sqrt(sum(idw.cv.a$residual^2)/length(idw.cv.a$residual))
```
```{r rmse-ols}
#- 2nd-order ols rmse
sqrt(sum(ols.cv.a$residual^2)/length(ols.cv.a$residual))
```

```{r rmse-iok}
#- ok rmse
sqrt(sum(ok.cv.a$residual^2)/length(ok.cv.a$residual))
```
<div class="alert alert-warning"> <strong>QUESTION!</strong> </div>

- **Question 15.** Explain the residuals. Comment on their magnitude, location and distribution.

<p class="comment">
[ Answer 15. click in this cell and type your answer here. your answer must be between the outer [] brackets ] </p>

<div class="alert alert-block alert-success"> <strong>DISCUSSION!</strong> </div>

- **Question 16.** Having gone through a variety of interpolation methods along with both a **qualitative and quantitative evaluation**; which method would you choose? Motivate your choice. Your answer must be between 100 and 150 words.

<p class="comment">
[ Answer 16. click in this cell and type your answer here. your answer must be between the outer [] brackets ] </p>