01_preparing-climate-data.Rmd

---
editor_options: 
  chunk_output_type: console
---

# Preparing historical climate data and geographical covariates

The overall aim is to prepare historical climate data associated with specific locations, and to link the data to geoghraphical covariates.
This will prepare the data for a semi-spatial statistical model that will be used to predict historical climatic conditions across the study area.

## Load libraries
```{r}
library(tidyverse)
library(lubridate)
library(terra)
library(sf)
```

## Reading raw historical climate data

We first read in historical climate data from the Western Ghats, and convert them to a format suitable for further processing. This data was obtained from IIT Guwahati and the Indian Meteorological Department. Please contact the lead author for a copy of this data. We will add this to Zenodo as a release upon publication.

The climate data are in the form of delimited text files with four columns for the daily total precipitation, minimum temperature, maximum temperature, and wind speed. Each file represents readings associated with a particular location, with coordinates indicated by the file name. We read the coordinates from the file names to assign them to each dataset. The number of rows represents the number of days of data, with a start date of 1st of January 1870, and an end date of 31st December 2018.

We further classify each daily reading by the year and month, and assign each day to one of two seasons: rainy (June through November) or dry (December through May).
We combine the data and filter for the southern Western Ghats as our area of interest based on the latitudinal range \[9, 12\], and write them to file.

```{r}
# List the climate data files and read them in, assigning column names
clim_files <- list.files("data/climate/historical", full.names = TRUE)
data <- map(clim_files, read_delim,
  delim = " ", col_names = c("ppt", "tmin", "tmax", "wind")
)

# Prepare a sequence of dates
seq_date <- seq(
  ymd("1870-01-01"),
  ymd("2018-12-31"),
  by = "days"
)

# Read the coordinates from each file name
# for each date, add the year and month, as well as the season
data <- map2(
  data, clim_files,
  function(df, clf) {
    # get the coordinates
    coords <- str_extract_all(clf, "\\d+\\.\\d+")[[1]]
    mutate(
      df,
      # add the date, year, and month, and season
      # dry season is december to may 12 - 5
      date = seq_date,
      year = year(date),
      month = month(date),
      season = if_else(between(month, 6, 11), "rainy", "dry"),

      # convert coordinates
      x = as.numeric(coords[2]),
      y = as.numeric(coords[1])
    )
  }
)

# combine all data into a single dataframe
data <- bind_rows(data)

# Check that the number of unique coordinates corresponds to the number of
# data files (since one data file per location)
assertthat::assert_that(
  nrow(
    distinct(data, x, y)
  ) == length(clim_files),
  msg = "error in some weather station coordinates"
)

# Filter for southern western ghats
# y < 12 and y > 9
data <- filter(
  data,
  between(y, 9, 12)
)

# save data
write_csv(
  data,
  file = "results/climate/data_clim_daily.csv"
)
```

## Summarising historical climate data

We summarise the historical climate data for each location by calculating the mean daily temperature from the minimum and maximum temperature, and then calculating the monthly total precipitation, mean temperature, and the standard deviation of the mean temperature. The coordinates, year, month, and season are preserved as identifying variables.

```{r}
# Get daily mean temperature
data <- mutate(
  data,
  tmean = (tmax + tmin) / 2
)

# Group data by coordinate and select precipitation and mean daily temp.
data <- group_by(
  data,
  x, y, month, year, season,
) |>
  select(
    ppt, tmean
  )

# Summarise monthly total precip. and mean and SD of temp. mean
data <- summarise(
  data,
  ppt = sum(ppt),
  t_mean = mean(tmean),
  t_sd = sd(tmean)
)

# save summary
write_csv(
  data,
  "results/climate/data_monthly_climate.csv"
)
```

## Prepare study area spatial extent

We prepare a spatial polygon for the extent of our area of interest. All operations are with reference to the subset of historical climate data from the southern Western Ghats.

First collecting the unique locations associated with the data, we prepare a bounding box that covers all points, and then add a 25km buffer (transforming coordinates as appropriate into the UTM 43N coordinate reference system for buffer creation to ensure accurate buffer distances).

```{r}
# Get unique coordinates and create an `sf` points object
coords <- ungroup(data) |>
  distinct(x, y) |>
  mutate_all(
    as.numeric
  )

# Create `sf` spatial object and assign WGS 84 CRS as standard
coord_sf <- st_as_sf(
  coords,
  coords = c("x", "y"),
  crs = 4326
)

# Create a bounding box, transform to UTM 43N CRS, and add a 25km buffer
ext <- st_bbox(coord_sf) |>
  st_as_sfc() |>
  st_transform(32643) |>
  st_buffer(25000) |>
  st_transform(4326)
```

## Prepare spatial data for the study area

We use the buffered spatial extent of the study area to subset (or crop) raster spatial data. We save these cropped data for further use as they are smaller and easier to work with than the full datasets.

## Elevation data

We read in elevation data and crop the raster to the buffered study area.

```{r masking_spatial_rasters}
# Load elevation raster and crop by buffered study area extent
elevation <- terra::rast("data/elevation/alt")
elevation_hills <- terra::crop(elevation, as(ext, "Spatial"))

# Save the smaller cropped elevation data in spatial TIF format
terra::writeRaster(
  elevation_hills,
  "results/elevation/raster_elevation.tif",
  overwrite = TRUE
)
```

## Distance to coast data

We prepare a synthetic raster layer for the distance of points in the study area from the coastline.
Reading in vector spatial data for the boundary of India, we 

```{r}
# Load coastline and re-load elevation
coast <- st_read("data/landcover/India_Boundary.shp")
coast_line <- st_cast(coast, "MULTILINESTRING")

elev <- terra::rast("results/elevation/raster_elevation.tif")
elev_utm <- terra::project(elev, "epsg:32643")
```

We resample the elevation raster from 30m to 600m as this is the resolution of interest (converting to a UTM CRS when passing the resolution in metres). The resampled raster is also in UTM 43N projection, and we covert that WGS 84 for future use.

```{r}
# Resample the elevation raster to 600 m from 30m
elev_utm_600 <- terra::rast(ext(elev_utm), resolution = 600)
elev_utm_600 <- terra::resample(
  elev_utm, elev_utm_600
)
crs(elev_utm_600) <- "epsg:32643"

# recast to 4326
elev_600 <- terra::project(elev_utm_600, "epsg:4326")
```

We prepare the distance-to-coast raster by calculating the distance between each cell in the 600m resolution elevation raster and the cost vector data; we refer to this as the coast raster on occasion.
We mask (set to `NA`) all cells in the coast raster that are not on land.
**Note that** this operation takes a long time to run!

```{r}
# Coast raster is a distance raster to the India land boundary vector
# masked by the same vector; this sets all cells in the sea to zero

# NOTE: this code takes a very long time to run!
coast_raster <- terra::distance(elev_600, terra::vect(coast_line))
coast_raster <- terra::mask(coast_raster, terra::vect(coast))
```

## Latitude data

We prepare a raster containing the latitude of each cell of the 600m resolution elevation raster in order to use this as a latitude predictor.

**Note** the use of `terra::deepcopy()` to make a copy of the elevation raster; this is to prevent the values of the elevation raster chaning when the copy's values are re-assigned. See the help for `terra::deepcopy()` for more.

```{r}
# Make a deep copy of the elevation data, and assign the Y coordinate
# of each cell as the latitude
# NOTE: using terra::deepcopy() is a MUST to prevent reassignment of values
# in elev_600
lat_raster <- terra::deepcopy(elev_600)
terra::values(lat_raster) <- terra::yFromCell(
  lat_raster,
  cell = seq_along(values(lat_raster))
)
```

We combine the three data rasters into layers of a stack, and save them for later use.

```{r}
# Stack each data layer into a single stack
stack <- c(coast_raster, elev_600, lat_raster)
names(stack) <- c("coast", "elev", "lat")

terra::writeRaster(
  stack,
  "results/climate/raster_gam_stack.tif",
  overwrite = TRUE
)
```

## Link predictors to climate data

Here we link the geographical predictors --- elevation, distance from the coast, and latitude --- to the monthly weather data locations, in order to model the relationship between them.

This is to help us get a statistical model that can be used to predict historical climate over the entire study area, in different historical periods.

```{r}
# Extract the elevation, distance from the coast, and latitude
# at each historical climate location
coords <- mutate(
  coords,

  # extract elevation
  elev = terra::extract(
    elevation_hills,
    st_coordinates(coord_sf)
  )$alt,

  # get distance to coastline
  coast = as.numeric(
    st_distance(coord_sf, coast_line)
  ),

  # get latitude
  lat = y
)

# Link by coordinates to the monthly climate data summary
data <- left_join(
  data,
  coords
)

# Bin data into 10 year intervals
data <- mutate(
  data,
  year_bin = floor(year / 10) * 10
)

# Save data
write_csv(
  data,
  "results/climate/data_for_gam.csv"
)
```