-
Notifications
You must be signed in to change notification settings - Fork 1
/
01_preparing-climate-data.Rmd
293 lines (233 loc) · 9.25 KB
/
01_preparing-climate-data.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
---
editor_options:
chunk_output_type: console
---
# Preparing historical climate data and geographical covariates
The overall aim is to prepare historical climate data associated with specific locations, and to link the data to geoghraphical covariates.
This will prepare the data for a semi-spatial statistical model that will be used to predict historical climatic conditions across the study area.
## Load libraries
```{r}
library(tidyverse)
library(lubridate)
library(terra)
library(sf)
```
## Reading raw historical climate data
We first read in historical climate data from the Western Ghats, and convert them to a format suitable for further processing. This data was obtained from IIT Guwahati and the Indian Meteorological Department. Please contact the lead author for a copy of this data. We will add this to Zenodo as a release upon publication.
The climate data are in the form of delimited text files with four columns for the daily total precipitation, minimum temperature, maximum temperature, and wind speed. Each file represents readings associated with a particular location, with coordinates indicated by the file name. We read the coordinates from the file names to assign them to each dataset. The number of rows represents the number of days of data, with a start date of 1st of January 1870, and an end date of 31st December 2018.
We further classify each daily reading by the year and month, and assign each day to one of two seasons: rainy (June through November) or dry (December through May).
We combine the data and filter for the southern Western Ghats as our area of interest based on the latitudinal range \[9, 12\], and write them to file.
```{r}
# List the climate data files and read them in, assigning column names
clim_files <- list.files("data/climate/historical", full.names = TRUE)
data <- map(clim_files, read_delim,
delim = " ", col_names = c("ppt", "tmin", "tmax", "wind")
)
# Prepare a sequence of dates
seq_date <- seq(
ymd("1870-01-01"),
ymd("2018-12-31"),
by = "days"
)
# Read the coordinates from each file name
# for each date, add the year and month, as well as the season
data <- map2(
data, clim_files,
function(df, clf) {
# get the coordinates
coords <- str_extract_all(clf, "\\d+\\.\\d+")[[1]]
mutate(
df,
# add the date, year, and month, and season
# dry season is december to may 12 - 5
date = seq_date,
year = year(date),
month = month(date),
season = if_else(between(month, 6, 11), "rainy", "dry"),
# convert coordinates
x = as.numeric(coords[2]),
y = as.numeric(coords[1])
)
}
)
# combine all data into a single dataframe
data <- bind_rows(data)
# Check that the number of unique coordinates corresponds to the number of
# data files (since one data file per location)
assertthat::assert_that(
nrow(
distinct(data, x, y)
) == length(clim_files),
msg = "error in some weather station coordinates"
)
# Filter for southern western ghats
# y < 12 and y > 9
data <- filter(
data,
between(y, 9, 12)
)
# save data
write_csv(
data,
file = "results/climate/data_clim_daily.csv"
)
```
## Summarising historical climate data
We summarise the historical climate data for each location by calculating the mean daily temperature from the minimum and maximum temperature, and then calculating the monthly total precipitation, mean temperature, and the standard deviation of the mean temperature. The coordinates, year, month, and season are preserved as identifying variables.
```{r}
# Get daily mean temperature
data <- mutate(
data,
tmean = (tmax + tmin) / 2
)
# Group data by coordinate and select precipitation and mean daily temp.
data <- group_by(
data,
x, y, month, year, season,
) |>
select(
ppt, tmean
)
# Summarise monthly total precip. and mean and SD of temp. mean
data <- summarise(
data,
ppt = sum(ppt),
t_mean = mean(tmean),
t_sd = sd(tmean)
)
# save summary
write_csv(
data,
"results/climate/data_monthly_climate.csv"
)
```
## Prepare study area spatial extent
We prepare a spatial polygon for the extent of our area of interest. All operations are with reference to the subset of historical climate data from the southern Western Ghats.
First collecting the unique locations associated with the data, we prepare a bounding box that covers all points, and then add a 25km buffer (transforming coordinates as appropriate into the UTM 43N coordinate reference system for buffer creation to ensure accurate buffer distances).
```{r}
# Get unique coordinates and create an `sf` points object
coords <- ungroup(data) |>
distinct(x, y) |>
mutate_all(
as.numeric
)
# Create `sf` spatial object and assign WGS 84 CRS as standard
coord_sf <- st_as_sf(
coords,
coords = c("x", "y"),
crs = 4326
)
# Create a bounding box, transform to UTM 43N CRS, and add a 25km buffer
ext <- st_bbox(coord_sf) |>
st_as_sfc() |>
st_transform(32643) |>
st_buffer(25000) |>
st_transform(4326)
```
## Prepare spatial data for the study area
We use the buffered spatial extent of the study area to subset (or crop) raster spatial data. We save these cropped data for further use as they are smaller and easier to work with than the full datasets.
## Elevation data
We read in elevation data and crop the raster to the buffered study area.
```{r masking_spatial_rasters}
# Load elevation raster and crop by buffered study area extent
elevation <- terra::rast("data/elevation/alt")
elevation_hills <- terra::crop(elevation, as(ext, "Spatial"))
# Save the smaller cropped elevation data in spatial TIF format
terra::writeRaster(
elevation_hills,
"results/elevation/raster_elevation.tif",
overwrite = TRUE
)
```
## Distance to coast data
We prepare a synthetic raster layer for the distance of points in the study area from the coastline.
Reading in vector spatial data for the boundary of India, we
```{r}
# Load coastline and re-load elevation
coast <- st_read("data/landcover/India_Boundary.shp")
coast_line <- st_cast(coast, "MULTILINESTRING")
elev <- terra::rast("results/elevation/raster_elevation.tif")
elev_utm <- terra::project(elev, "epsg:32643")
```
We resample the elevation raster from 30m to 600m as this is the resolution of interest (converting to a UTM CRS when passing the resolution in metres). The resampled raster is also in UTM 43N projection, and we covert that WGS 84 for future use.
```{r}
# Resample the elevation raster to 600 m from 30m
elev_utm_600 <- terra::rast(ext(elev_utm), resolution = 600)
elev_utm_600 <- terra::resample(
elev_utm, elev_utm_600
)
crs(elev_utm_600) <- "epsg:32643"
# recast to 4326
elev_600 <- terra::project(elev_utm_600, "epsg:4326")
```
We prepare the distance-to-coast raster by calculating the distance between each cell in the 600m resolution elevation raster and the cost vector data; we refer to this as the coast raster on occasion.
We mask (set to `NA`) all cells in the coast raster that are not on land.
**Note that** this operation takes a long time to run!
```{r}
# Coast raster is a distance raster to the India land boundary vector
# masked by the same vector; this sets all cells in the sea to zero
# NOTE: this code takes a very long time to run!
coast_raster <- terra::distance(elev_600, terra::vect(coast_line))
coast_raster <- terra::mask(coast_raster, terra::vect(coast))
```
## Latitude data
We prepare a raster containing the latitude of each cell of the 600m resolution elevation raster in order to use this as a latitude predictor.
**Note** the use of `terra::deepcopy()` to make a copy of the elevation raster; this is to prevent the values of the elevation raster chaning when the copy's values are re-assigned. See the help for `terra::deepcopy()` for more.
```{r}
# Make a deep copy of the elevation data, and assign the Y coordinate
# of each cell as the latitude
# NOTE: using terra::deepcopy() is a MUST to prevent reassignment of values
# in elev_600
lat_raster <- terra::deepcopy(elev_600)
terra::values(lat_raster) <- terra::yFromCell(
lat_raster,
cell = seq_along(values(lat_raster))
)
```
We combine the three data rasters into layers of a stack, and save them for later use.
```{r}
# Stack each data layer into a single stack
stack <- c(coast_raster, elev_600, lat_raster)
names(stack) <- c("coast", "elev", "lat")
terra::writeRaster(
stack,
"results/climate/raster_gam_stack.tif",
overwrite = TRUE
)
```
## Link predictors to climate data
Here we link the geographical predictors --- elevation, distance from the coast, and latitude --- to the monthly weather data locations, in order to model the relationship between them.
This is to help us get a statistical model that can be used to predict historical climate over the entire study area, in different historical periods.
```{r}
# Extract the elevation, distance from the coast, and latitude
# at each historical climate location
coords <- mutate(
coords,
# extract elevation
elev = terra::extract(
elevation_hills,
st_coordinates(coord_sf)
)$alt,
# get distance to coastline
coast = as.numeric(
st_distance(coord_sf, coast_line)
),
# get latitude
lat = y
)
# Link by coordinates to the monthly climate data summary
data <- left_join(
data,
coords
)
# Bin data into 10 year intervals
data <- mutate(
data,
year_bin = floor(year / 10) * 10
)
# Save data
write_csv(
data,
"results/climate/data_for_gam.csv"
)
```