notesAnalysisPNG.Rmd

---
title: "Notes on data processing and analysis of available Papua New Guinea datasets"
author: "Ernest Guevarra"
date: "16/08/2018"
output: pdf_document
geometry: margin=2cm
classoption: a4paper
fontsize: 12pt
highlight: tango
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
#
# Require maptools, rgeos, rgdal, raster
#
if(!require(maptools, quietly = TRUE)) install.packages("maptools") # If maptools required but not installed, install
if(!require(rgeos, quietly = TRUE)) install.packages("rgeos")       # If rgeos required but not installed, install
if(!require(rgdal, quietly = TRUE)) install.packages("rgdal")       # If rgdal required but not installed, install
if(!require(raster, quietly = TRUE)) install.packages("raster")     # If raster required but not installed, install
if(!require(readxl)) install.packages("readxl")                     # to read Excel files
if(!require(stringr)) install.packages("stringr")                   # to manipulate strings
if(!require(ggplot2)) install.packages("ggplot2")                   # for plotting
if(!require(zoo)) install.packages("zoo")                           # for dates
if(!require(dplyr)) install.packages("dplyr")                       # for data manipulation
if(!require(tidyr)) install.packages("tidyr")                       # for data manipulation
if(!require(classInt)) install.packages("classInt")                 # for classifying for mapping
#
# install papuanewguinea R package from OMNeoHealth Github; devtools
#
if(!require(devtools, quietly = TRUE)) install.packages("devtools") # If devtools required but not installed, install
install_github("OMNeoHealth/papuanewguinea")                        # Install OMNeoHealth/papuanewguinea from GitHub
library(papuanewguinea)                                             # Load papuanewguinea package
```


# 1. Processing and preparing data for analysis and mapping

Given Microsoft Excel files containing data from the Papua New Guinea NHIS per month for 2015 and 2016, we would like to read each of these files and then concatenate them into a single dataset. This can be done in R as follows:

&nbsp;

```{r, echo = TRUE, eval = TRUE}
#
# Get the filenames of all .XLSX files in folder named "data"
#
fileNames <- list.files(path = "data/")
#
# Create a concatenating object
#
png_maternal <- NULL
#
# Loop through each of the XLSX files in data and read them
#
for(i in fileNames) {
  #
  # Use read_xlsx() to read current filename
  #
  temp <- read_xlsx(path = paste("data/", i, sep = ""), 
                    col_names = FALSE, 
                    skip = 3)
  #
  # extract month of current data
  #
  month <- str_split(string = i, pattern = " ")[[1]][1]
  #
  # extract year of current data
  #
  year <- str_split(string = str_split(string = i, pattern = " ")[[1]][2],
                    pattern = ".xlsx")[[1]][1]
  #
  # Add month variable to temp dataset
  #
  temp$month <- month
  #
  # Add year variable to temp dataset
  #
  temp$year <- year
  #
  # concatenate current dataset with png_maternal
  #
  png_maternal <- data.frame(rbind(png_maternal, temp))
}
```

&nbsp;

This results in a data frame object called `png_maternal` with `r ncol(png_maternal)` columns and `r nrow(png_maternal)` rows. The resulting data frame is as follows (first 60 rows):

&nbsp;

```{r, echo = FALSE, eval = TRUE}
knitr::kable(x = png_maternal[1:60, ],
             booktabs = TRUE,                   
             format = "latex") %>%
  kableExtra::kable_styling(latex_options = c("HOLD_position", "striped", "scale_down"),
                            position = "center") %>%
  kableExtra::landscape()
```

The object `png_maternal` lacks meaningful column names. Also, it would be good to create codes corresponding to the province code and district codes from the code specified in the first column of `png_maternal`. These can be processed in R as follows:

&nbsp;

```{r, echo = TRUE, eval = TRUE}
################################################################################
#
# Get province, district and facility codes
#
################################################################################
#
# Extract first two digits from code
#
png_maternal$pcode <- floor(png_maternal$X__1 / 10000)
#
# pad the pcode with a 0 at the start
#
png_maternal$pcode <- str_pad(string = png_maternal$pcode, 
                              width = 2, side = "left", pad = "0")
#
# Extract first 4 digits from code
#
png_maternal$dcode <- floor(png_maternal$X__1 / 100)
#
# pad the dcode with a 0 at the start
#
png_maternal$dcode <- str_pad(string = png_maternal$dcode, 
                              width = 4, side = "left", pad = "0")
#
# pad the code with a 0 at the start
#
png_maternal$X__1 <- str_pad(string = png_maternal$X__1, 
                             width = 6, side = "left", pad = "0")

################################################################################
#
# Created codebook for PNG maternal mortality data
#
################################################################################

longName  <- c("Five to six-digit facility code",
               "Name of facility",
               "Report recieved? 1 = YES; 2 = NO",
               "New attendance breastfeeding pills",
               "New attendance combined pills",
               "New attendance injection",
               "Unkown Number 1",
               "Permanent vasectomy",
               "New attendance IUD",
               "New attendance ovulation",
               "New attendance condom",
               "Re-attendance breastfeeding pills",
               "Re-attendance combined pills",
               "Re-attendance injection",
               "Re-attendance IUD",
               "Re-attendance ovulation",
               "Re-attendance condom",
               "Antenatal first visit",
               "Antenatal fourth visit",
               "Antenatal other",
               "Antenatal TT1",
               "Antenatal TT2",
               "Antenatal booster",
               "Unknown Number 2",
               "Deliveries in health facility",
               "Maternal deaths in facility",
               "Birthweight less than 2500 grams",
               "Stillbirths",
               "Village births supervised",
               "Village births complications",
               "Born before arrival",
               "Delivery complications",
               "Maternal deaths not in facility",
               "Transferred to hospital",
               "Month", 
               "Year",
               "Province code",
               "District code")

shortName <- c("code", "facility", "report",
               "bfpills1", "combpills1", "inj1", "uno1", "vasectomy", "iud1", 
               "ovulation1", "condom1", "bfpills2", "combpills2", "inj2", 
               "iud2", "ovulation2", "condom2", "anc1", "anc4", "ancother", 
               "tt1", "tt2", "ttbooster", "uno2", "delhf", "deadhf", "lbw", 
               "still", "vbsup", "vbcomp", "bba", "delcomp", "deadnothf", 
               "transhop", "month", "year", "pcode", "dcode")

names(png_maternal) <- shortName
```

&nbsp;

Checking `png_maternal` object again, we see that the columns have been labelled with more meaningful names and corresponding province and district codes have been added.

```{r, echo = FALSE, eval = TRUE}
knitr::kable(x = png_maternal[1:60, ],
             booktabs = TRUE,
             format = "latex") %>%
  kableExtra::kable_styling(latex_options = c("HOLD_position", "striped", "scale_down"),
                            position = "center") %>%
  kableExtra::landscape()
```

## a. Processing `png_maternal` to create datasets that can be mapped

Further processing of `png_maternal` can be done to allow for mapping of the data at province and district level.

Province level data can be produced from `png_maternal` as follows:

&nbsp;

```{r, echo = TRUE, eval = TRUE}
#
# Aggregate data by province and per year
#
provincedata <- aggregate(
  cbind(bfpills1, combpills1, inj1, uno1, vasectomy, iud1, ovulation1, 
        condom1, bfpills2, combpills2, inj2, iud2, ovulation2, condom2,
        anc1, anc4, ancother, tt1, tt2, ttbooster, uno2,
        delhf, deadhf, lbw, still, vbsup, vbcomp, bba, 
        delcomp, deadnothf, transhop) ~ pcode + year, 
  data = png_maternal, FUN = sum)
```

&nbsp;

This produces a data frame object named `provincedata`.

```{r, echo = FALSE, eval = TRUE}
knitr::kable(x = provincedata[1:60, ],
             booktabs = TRUE,
             format = "latex") %>%
  kableExtra::kable_styling(latex_options = c("HOLD_position", "striped", "scale_down"),
                            position = "center") %>%
  kableExtra::landscape()
```

District level data can be produced from `png_maternal` as follows:

&nbsp;

```{r, echo = TRUE, eval = TRUE}
#
# Aggregate data by district and per year
#
districtdata <- aggregate(
  cbind(bfpills1, combpills1, inj1, uno1, vasectomy, iud1, ovulation1, 
        condom1, bfpills2, combpills2, inj2, iud2, ovulation2, condom2,
        anc1, anc4, ancother, tt1, tt2, ttbooster, uno2,
        delhf, deadhf, lbw, still, vbsup, vbcomp, bba, 
        delcomp, deadnothf, transhop) ~ dcode + year, 
  data = png_maternal, FUN = sum)
```

&nbsp;

This produces a data frame object named `districtdata`.

```{r, echo = FALSE, eval = TRUE}
knitr::kable(x = districtdata[1:60, ],
             booktabs = TRUE,
             format = "latex") %>%
  kableExtra::kable_styling(latex_options = c("HOLD_position", "striped", "scale_down"),
                            position = "center") %>%
  kableExtra::landscape()
```

To be able to use `provincedata` and `districtdata`, we will need to standardise the raw counts. The usual way to standardise is by calculating rates usually expressed per 10,000 or per 100,000 of a particular population. However, data on specific populations required for the indicators of interest are not available.

As an alternative, we can use available population data per province and per district as a standardising factor to be able to compare the raw counts with each other. It should be made clear that these are not the same as the standard rates hence are not comparable to those. However, standardised values will allow comparison of values across provinces and districts to show general trends rather than on specific absolute values.

Population data at province and district level of Papua New Guinea is available via the `papuanewguinea` R package. Province population data can be accessed in R via a call to `pop_adm1`. District population data can be accessed in R via a call to `pop_adm2`.

The data frame `pop_adm1` is as follows:

```{r, echo = FALSE, eval = TRUE}
knitr::kable(x = pop_adm1,
             booktabs = TRUE,
             format = "latex") %>%
  kableExtra::kable_styling(latex_options = c("HOLD_position", "striped", "scale_down"),
                            position = "center")
```

\newpage

The data frame `pop_adm2` is as follows:

```{r, echo = FALSE, eval = TRUE}
knitr::kable(x = pop_adm2,
             booktabs = TRUE,
             format = "latex") %>%
  kableExtra::kable_styling(latex_options = c("HOLD_position", "striped", "scale_down"),
                            position = "center")
```

We will need to extract the appropriate information from these population datasets to use for standardising the raw counts. For the type of indicators we will be looking at, the data on population of women of reproductive age would be the most appropriate. This is the data identified as `WRA` in the population datasets.

The most efficient way to work with this population data and the province and district data we produced awhile ago will be to extract the population of women of reproductive age and then attaching it to the province and district data accordingly.

To do this, we will first need to organise the population data in such a way that it can be merged with the province data.

First, we need to extract the data columns that we need from the population data. These will be the province name, the province code and the women of reproductive age population.

For the province population, this can be done as follows:

&nbsp;

```{r, echo = TRUE, eval = TRUE}
wra_adm1 <- pop_adm1[ , c("ADM1_PCODE", "ADM1_EN", "WRA")]
```

&nbsp;

This produces the following data frame object:

```{r, echo = FALSE, eval = TRUE}
knitr::kable(x = wra_adm1,
             booktabs = TRUE,
             format = "latex") %>%
  kableExtra::kable_styling(latex_options = c("HOLD_position", "striped"),
                            position = "center")
```

For the district population, this can be done as follows:

&nbsp;

```{r, echo = TRUE, eval = TRUE}
wra_adm2 <- pop_adm2[ , c("ADM2_PCODE", "ADM2_EN", "ADM1_PCODE", "ADM1_EN", "WRA")]
```

&nbsp;

This produces the following data frame object:

```{r, echo = FALSE, eval = TRUE}
knitr::kable(x = wra_adm2[1:40, ],
             booktabs = TRUE,
             format = "latex") %>%
  kableExtra::kable_styling(latex_options = c("HOLD_position", "striped", "scale_down"),
                            position = "center")
```

Then, we need to organise the population datasets such that the rows of data are in the same sequence as that of the province and district data. This means, the population datasets will have to be ordered in such a way that the province code and district code are sequential. We see that the population datasets are not sequential and as such will need to be re-ordered. This can be done in R as follows:

&nbsp;

```{r, echo = TRUE, eval = TRUE}
wra_adm1 <- wra_adm1[order(wra_adm1$ADM1_PCODE), ]
wra_adm2 <- wra_adm2[order(wra_adm2$ADM2_PCODE), ]
```

&nbsp;

We now need to adjust the admin codes to match the admin codes in the province and district data. We notice that the population admin codes start with `PG` whilst the province and district data don't have this. So, we should adjust the population codes by removing the appended `PG`. This can be done as follows:

&nbsp;

```{r, echo = TRUE, eval = TRUE}
wra_adm1$ADM1_PCODE <- as.numeric(str_replace(wra_adm1$ADM1_PCODE, "PG", ""))
wra_adm2$ADM2_PCODE <- as.numeric(str_replace(wra_adm2$ADM2_PCODE, "PG", ""))
wra_adm2$ADM1_PCODE <- as.numeric(str_replace(wra_adm2$ADM1_PCODE, "PG", ""))
```

&nbsp;

Once the admin codes have been adjusted, we should now calculate a standardising factor which we will call `sf`. Using the population size for women of reproductive age (WRA), we divide this by 100,000 to get a standardising factor that will give an indicator value that is per 100,000 WRA population. This can be done as follows:

&nbsp;

```{r, echo = TRUE, eval = TRUE}
wra_adm1$sf <- wra_adm1$WRA / 100000
wra_adm2$sf <- wra_adm2$WRA / 100000
```

&nbsp;

Once the standardising factor (`sf`) is calculated, the population data can now be merged with the province and district data respectively. 

For the province data, this can be done in R as follows:

&nbsp;

```{r, echo = TRUE, eval = TRUE}
provincedata <- merge(wra_adm1, 
                      provincedata, 
                      by.x = "ADM1_PCODE", 
                      by.y = "pcode")
```

&nbsp;

For the district data, we will need to do some processing of the district data because there are two additional districts for the National Capital District whilst in the population data and the map data, there is only one. This can be adjusted in such a way that we can collapse the district data for the National Capital District into a single district. This can be done as follows:

&nbsp;

```{r, echo = TRUE, eval = TRUE}
x <- colSums(districtdata[districtdata$dcode %in% c(401, 402, 403) & 
                            districtdata$year == 2015, ])
y <- colSums(districtdata[districtdata$dcode %in% c(401, 402, 403) & 
                            districtdata$year == 2016, ])

xy <- rbind(x, y)

xy[1,1] <- 401
xy[2,1] <- 401

xy[1,2] <- 2015
xy[2,2] <- 2016

districtdata <- data.frame(rbind(
  districtdata[!districtdata$dcode %in% c(401, 402, 403), ], xy))
```

&nbsp;

Once the district data has been adjusted, we can now merge the district data with the district population data. This can be done in R as follows:

&nbsp;

```{r, echo = TRUE, eval = TRUE}
districtdata <- merge(wra_adm2, 
                      districtdata, 
                      by.x = "ADM2_PCODE", 
                      by.y = "dcode")
```

&nbsp;

We now have processed datasets for province and district data that has all the information needed to produce various analysis outputs.

## b. Processing `png_maternal` to create datasets that can be used for time series analysis

Further processing of `png_maternal` can be done to allow for time series analysis of the data at monthly intervals for year 2015 and 2016.

Monthly province level data can be produced from `png_maternal` as follows:

&nbsp;

```{r, echo = TRUE, eval = TRUE}
#
# Aggregate data by month and per year
#
mProvince <- aggregate(
  cbind(bfpills1, combpills1, inj1, uno1, vasectomy, iud1, ovulation1, 
        condom1, bfpills2, combpills2, inj2, iud2, ovulation2, condom2,
        anc1, anc4, ancother, tt1, tt2, ttbooster, uno2,
        delhf, deadhf, lbw, still, vbsup, vbcomp, bba, 
        delcomp, deadnothf, transhop) ~ month + pcode + year, 
  data = png_maternal, FUN = sum)
```

&nbsp;

This produces a data frame object named `mProvince`.

```{r, echo = FALSE, eval = TRUE}
knitr::kable(x = mProvince[1:60, ],
             booktabs = TRUE,
             format = "latex") %>%
  kableExtra::kable_styling(latex_options = c("HOLD_position", "striped", "scale_down"),
                            position = "center") %>%
  kableExtra::landscape()
```

Monthly district level data can be produced from `png_maternal` as follows:

&nbsp;

```{r, echo = TRUE, eval = TRUE}
#
# Aggregate data by district and per year
#
mDistrict <- aggregate(
  cbind(bfpills1, combpills1, inj1, uno1, vasectomy, iud1, ovulation1, 
        condom1, bfpills2, combpills2, inj2, iud2, ovulation2, condom2,
        anc1, anc4, ancother, tt1, tt2, ttbooster, uno2,
        delhf, deadhf, lbw, still, vbsup, vbcomp, bba, 
        delcomp, deadnothf, transhop) ~ month + dcode + year, 
  data = png_maternal, FUN = sum)
```

&nbsp;

This produces a data frame object named `mDistrict`.

```{r, echo = FALSE, eval = TRUE}
knitr::kable(x = mDistrict[1:60, ],
             booktabs = TRUE,
             format = "latex") %>%
  kableExtra::kable_styling(latex_options = c("HOLD_position", "striped", "scale_down"),
                            position = "center") %>%
  kableExtra::landscape()
```

We can then merge these datasets with the population data that contains the standardising factor. This can be done as follows:

&nbsp;

```{r, echo = FALSE, eval = TRUE}
mProvince$pcode <- as.numeric(mProvince$pcode)
```

```{r, echo = TRUE, eval = TRUE}
mProvince <- merge(wra_adm1, mProvince, by.x = "ADM1_PCODE", by.y = "pcode")
```

&nbsp;

For the district data, we will need to make the same adjustments we did to the district data. This can be done as follows:

&nbsp;

```{r, echo = FALSE, eval = TRUE}
mDistrict$dcode <- as.numeric(mDistrict$dcode)
mDistrict$year <- as.numeric(mDistrict$year)
```

```{r, echo = TRUE, eval = TRUE}
x <- aggregate(
       cbind(dcode, year, bfpills1, combpills1, inj1, uno1, vasectomy, iud1, 
             ovulation1, condom1, bfpills2, combpills2, inj2, iud2, ovulation2, 
             condom2, anc1, anc4, ancother, tt1, tt2, ttbooster, uno2, 
             delhf, deadhf, lbw, still, vbsup, vbcomp, bba, delcomp, deadnothf, 
             transhop) ~ month, 
       data = mDistrict[mDistrict$dcode %in% c(401, 402, 403) & 
                          mDistrict$year == 2015, ], 
       FUN = sum)

y <- aggregate(
       cbind(dcode, year, bfpills1, combpills1, inj1, uno1, vasectomy, iud1, 
             ovulation1, condom1, bfpills2, combpills2, inj2, iud2, ovulation2, 
             condom2, anc1, anc4, ancother, tt1, tt2, ttbooster, uno2, 
             delhf, deadhf, lbw, still, vbsup, vbcomp, bba, delcomp, deadnothf, 
             transhop) ~ month, 
       data = mDistrict[mDistrict$dcode %in% c(401, 402, 403) &
                          mDistrict$year == 2016, ], 
       FUN = sum)

xy <- rbind(x, y)

xy$dcode <- 401

xy[ 1:12, 3] <- 2015
xy[13:24, 3] <- 2016

mDistrict <- data.frame(rbind(
  mDistrict[!mDistrict$dcode %in% c(401, 402, 403), ], xy))
```

&nbsp;

The resulting `mDistrict` data can now be merged with the district population data as follows:

&nbsp;

```{r, echo = TRUE, eval = TRUE}
mDistrict <- merge(wra_adm2, mDistrict, by.x = "ADM2_PCODE", by.y = "dcode")
```

&nbsp;

We now have processed datasets for time series province and district data that has all the information needed to produce various analysis outputs.

# 3. Indicators

Given the Papua New Guinea NHIS data, the following indicators can be possibly calculated:

* Number of pregnant women who has had at least one antenatal care visit (ANC) with a trained health worker per 100,000 women of reproductive age

&nbsp;

$$\begin{aligned}
n_{anc1} & ~ \div ~ \frac{n_{WRA}}{100000}\\
\\
where: & \\
\\
n_{anc1} ~ = ~ & \text{Number of pregnant women who has had at least 1 ANC visit} \\
& \text{with a trained health worker} \\
n_{WRA} ~ = ~ & \text{Number of women of reproductive age}
\end{aligned}$$

&nbsp;

* Number of pregnant women who has had at least four antenatal care visits (ANC) with any service provider per 100,000 women of reproductive age

&nbsp;

$$\begin{aligned}
n_{anc4} & ~ \div ~ \frac{n_{WRA}}{100000}\\
\\
where: & \\
\\
n_{anc4} ~ = ~ & \text{Number of pregnant women who has had at least 4 ANC visits} \\
& \text{with any service provider} \\
n_{WRA} ~ = ~ & \text{Number of women of reproductive age}
\end{aligned}$$

&nbsp;

* Number of pregnant women who received first tetanus toxoid vaccination per 100,000 women of reproductive age

&nbsp;

$$\begin{aligned}
n_{tt1} & ~ \div ~ \frac{n_{WRA}}{100000}\\
\\
where: & \\
\\
n_{tt1} ~ = ~ & \text{Number of pregnant women who received} \\
& \text{first tetanus toxoid vaccination} \\
n_{WRA} ~ = ~ & \text{Number of women of reproductive age}
\end{aligned}$$

&nbsp;

* Number of pregnant women who received second tetanus toxoid vaccination per 100,000 women of reproductive age

&nbsp;

$$\begin{aligned}
n_{tt2} & ~ \div ~ \frac{n_{WRA}}{100000}\\
\\
where: & \\
\\
n_{tt2} ~ = ~ & \text{Number of pregnant women who received} \\
& \text{second tetanus toxoid vaccination} \\
n_{WRA} ~ = ~ & \text{Number of women of reproductive age}
\end{aligned}$$

&nbsp;

* Number of pregnant women who received tetanus toxoid booster vaccination per 100,000 women of reproductive age

&nbsp;

$$\begin{aligned}
n_{ttbooster} & ~ \div ~ \frac{n_{WRA}}{100000}\\
\\
where: & \\
\\
n_{ttbooster} ~ = ~ & \text{Number of pregnant women who received} \\
& \text{tetanus toxoid booster vaccination} \\
n_{WRA} ~ = ~ & \text{Number of women of reproductive age}
\end{aligned}$$

&nbsp;

* Number of pregnant women who delivered in a health facility per 100,000 women of reproductive age

&nbsp;

$$\begin{aligned}
n_{delhf} & ~ \div ~ \frac{n_{WRA}}{100000}\\
\\
where: & \\
\\
n_{delhf} ~ = ~ & \text{Number of pregnant women who delivered} \\
& \text{in a health facility} \\
n_{WRA} ~ = ~ & \text{Number of women of reproductive age}
\end{aligned}$$

&nbsp;

* Number of pregnant women who delivered a low birth weight child per 100,000 women of reproductive age

&nbsp;

$$\begin{aligned}
n_{lbw} & ~ \div ~ \frac{n_{WRA}}{100000}\\
\\
where: & \\
\\
n_{lbw} ~ = ~ & \text{Number of pregnant women who delivered} \\
& \text{a low birth weight child} \\
n_{WRA} ~ = ~ & \text{Number of women of reproductive age}
\end{aligned}$$

&nbsp;

* Number of pregnant women who delivered a stillbirth per 100,000 women of reproductive age

&nbsp;

$$\begin{aligned}
n_{still} & ~ \div ~ \frac{n_{WRA}}{100000}\\
\\
where: & \\
\\
n_{still} ~ = ~ & \text{Number of pregnant women who delivered} \\
& \text{a stillbirth} \\
n_{WRA} ~ = ~ & \text{Number of women of reproductive age}
\end{aligned}$$

&nbsp;

* Number of pregnant women who died during childbirth per 100,000 women of reproductive age

&nbsp;

$$\begin{aligned}
n_{deadhf} ~ + ~ n_{deadnothf} & ~ \div ~ \frac{n_{WRA}}{100000}\\
\\
where: & \\
\\
n_{deadhf} ~ = ~ & \text{Number of pregnant women who died} \\
& \text{during childbirth at health facility} \\
n_{deadnothf} ~ = ~ & \text{Number of pregnant women who died} \\
& \text{during childbirth outside of health facility} \\
n_{WRA} ~ = ~ & \text{Number of women of reproductive age}
\end{aligned}$$

&nbsp;

# 2. Time-series analysis of monthly NHIS data

Using the data frame objects `mProvince` and `mDistricts`, we can now produce time-series analysis of specific indicators specified above.

## At least one antenatal care visit with a trained health worker

We first work with the province data.

We will work with the data columns labelled `anc1` and `sf`

```{r, echo = TRUE, eval = TRUE}
temp1 <- aggregate(anc1 ~ ADM1_PCODE + ADM1_EN + month + year, 
                   data = mProvince, 
                   FUN = sum)
temp2 <- aggregate(sf ~ ADM1_PCODE + ADM1_EN + month + year, 
                   data = mProvince, 
                   FUN = unique)

temp1$anc1Std <- temp1$anc1 / temp2$sf

temp1$month <- as.character(temp1$month)

temp1$month[temp1$month == "jan"] <- 1
temp1$month[temp1$month == "feb"] <- 2
temp1$month[temp1$month == "mar"] <- 3
temp1$month[temp1$month == "apr"] <- 4
temp1$month[temp1$month == "may"] <- 5
temp1$month[temp1$month == "jun"] <- 6
temp1$month[temp1$month == "jul"] <- 7
temp1$month[temp1$month == "aug"] <- 8
temp1$month[temp1$month == "sep"] <- 9
temp1$month[temp1$month == "oct"] <- 10
temp1$month[temp1$month == "nov"] <- 11
temp1$month[temp1$month == "dec"] <- 12

temp1$date <- paste(temp1$year, temp1$month, sep = "-")

temp1$date <- zoo::as.yearmon(temp1$date)

temp1 <- temp1[order(temp1$date), ]

temp1 <- temp1 %>%
  group_by(ADM1_EN) %>%
  mutate(anc1Sm = rollmean(x = anc1Std, k = 3, na.pad = TRUE))

temp1long <- gather(data = temp1, 
                    key = "anc1", 
                    value = "value", 
                    anc1Std, anc1Sm, 
                    factor_key = TRUE)
```

```{r, echo = TRUE, eval = TRUE}
themeSettings <- theme_bw() + 
                   theme(panel.grid.major = element_line(linetype = 1, 
                                                         size = 0.2, 
                                                         colour = "gray80"),
                         panel.grid.minor = element_line(linetype = 0),
                         axis.text.x = element_text(size = 6, angle = 90),
                         legend.key = element_rect(linetype = 0), 
                         legend.key.size = unit(1, "cm"),
                         legend.position = "top")
```

\newpage

```{r, echo = TRUE, eval = TRUE, fig.width = 12, fig.height = 12, fig.align = "center", fig.pos = "H", fig.retina = 1}
ggplot(temp1, aes(as.Date(date), anc1Std)) + 
  geom_line(colour = "#08519c", size = 1) + 
  scale_x_date(name = "Month", 
               date_breaks = "1 month", 
               date_labels = "%b %y") +
  scale_y_continuous(name = "ANC1", 
                     breaks = seq(from = 0, 
                                  to = max(temp1$anc1Std), 
                                  by = 500)) + 
  facet_wrap(vars(ADM1_EN)) +
  themeSettings
```

\newpage

We now work with the district data.

&nbsp;

```{r, echo = TRUE, eval = TRUE}
dist1 <- aggregate(anc1 ~ ADM2_PCODE + ADM2_EN + ADM1_PCODE + ADM1_EN + month + year, 
                   data = mDistrict, 
                   FUN = sum)
dist2 <- aggregate(sf ~ ADM2_PCODE + ADM2_EN + ADM1_PCODE + ADM1_EN + month + year, 
                   data = mDistrict, 
                   FUN = unique)

dist1$anc1Std <- dist1$anc1 / dist2$sf

dist1$month <- as.character(dist1$month)

dist1$month[dist1$month == "jan"] <- 1
dist1$month[dist1$month == "feb"] <- 2
dist1$month[dist1$month == "mar"] <- 3
dist1$month[dist1$month == "apr"] <- 4
dist1$month[dist1$month == "may"] <- 5
dist1$month[dist1$month == "jun"] <- 6
dist1$month[dist1$month == "jul"] <- 7
dist1$month[dist1$month == "aug"] <- 8
dist1$month[dist1$month == "sep"] <- 9
dist1$month[dist1$month == "oct"] <- 10
dist1$month[dist1$month == "nov"] <- 11
dist1$month[dist1$month == "dec"] <- 12

dist1$date <- paste(dist1$year, dist1$month, sep = "-")

dist1$date <- zoo::as.yearmon(dist1$date)
```

\newpage

```{r, echo = TRUE, eval = TRUE, fig.width = 12, fig.height = 4, fig.align = "center", fig.pos = "H", fig.retina = 1}
ggplot(dist1[dist1$ADM1_PCODE == 1, ], aes(as.Date(date), anc1Std)) + 
  geom_line(colour = "#08519c", size = 1) + 
  scale_x_date(name = "Month", date_breaks = "1 month", date_labels = "%b %y") +
  scale_y_continuous(name = "ANC1", 
                     breaks = seq(from = 0, 
                                  to = max(dist1$anc1Std), 
                                  by = 500)) + 
  facet_grid(ADM1_EN ~ ADM2_EN) +
  themeSettings
```

```{r, echo = TRUE, eval = TRUE, fig.width = 12, fig.height = 4, fig.align = "center", fig.pos = "H", fig.retina = 1}
ggplot(dist1[dist1$ADM1_PCODE == 2, ], aes(as.Date(date), anc1Std)) + 
  geom_line(colour = "#08519c", size = 1) + 
  scale_x_date(name = "Month", date_breaks = "1 month", date_labels = "%b %y") +
  scale_y_continuous(name = "ANC1", 
                     breaks = seq(from = 0, 
                                  to = max(dist1$anc1Std), 
                                  by = 100)) + 
  facet_grid(ADM1_EN ~ ADM2_EN) +
  themeSettings
```

```{r, echo = TRUE, eval = TRUE, fig.width = 12, fig.height = 4, fig.align = "center", fig.pos = "H", fig.retina = 1}
ggplot(dist1[dist1$ADM1_PCODE == 3, ], aes(as.Date(date), anc1Std)) + 
  geom_line(colour = "#08519c", size = 1) + 
  scale_x_date(name = "Month", date_breaks = "1 month", date_labels = "%b %y") +
  scale_y_continuous(name = "ANC1", 
                     breaks = seq(from = 0, 
                                  to = max(dist1$anc1Std), 
                                  by = 100)) + 
  facet_grid(ADM1_EN ~ ADM2_EN) +
  themeSettings
```

```{r, echo = TRUE, eval = TRUE, fig.width = 12, fig.height = 4, fig.align = "center", fig.pos = "H", fig.retina = 1}
ggplot(dist1[dist1$ADM1_PCODE == 4, ], aes(as.Date(date), anc1Std)) + 
  geom_line(colour = "#08519c", size = 1) + 
  scale_x_date(name = "Month", date_breaks = "1 month", date_labels = "%b %y") +
  scale_y_continuous(name = "ANC1", 
                     breaks = seq(from = 0, 
                                  to = max(dist1$anc1Std), 
                                  by = 100)) + 
  facet_grid(ADM1_EN ~ ADM2_EN) +
  themeSettings
```

```{r, echo = TRUE, eval = TRUE, fig.width = 12, fig.height = 4, fig.align = "center", fig.pos = "H", fig.retina = 1}
ggplot(dist1[dist1$ADM1_PCODE == 5, ], aes(as.Date(date), anc1Std)) + 
  geom_line(colour = "#08519c", size = 1) + 
  scale_x_date(name = "Month", date_breaks = "1 month", date_labels = "%b %y") +
  scale_y_continuous(name = "ANC1", 
                     breaks = seq(from = 0, 
                                  to = max(dist1$anc1Std), 
                                  by = 100)) + 
  facet_grid(ADM1_EN ~ ADM2_EN) +
  themeSettings
```

```{r, echo = TRUE, eval = TRUE, fig.width = 12, fig.height = 4, fig.align = "center", fig.pos = "H", fig.retina = 1}
ggplot(dist1[dist1$ADM1_PCODE == 6, ], aes(as.Date(date), anc1Std)) + 
  geom_line(colour = "#08519c", size = 1) + 
  scale_x_date(name = "Month", date_breaks = "1 month", date_labels = "%b %y") +
  scale_y_continuous(name = "ANC1", 
                     breaks = seq(from = 0, 
                                  to = max(dist1$anc1Std), 
                                  by = 100)) + 
  facet_grid(ADM1_EN ~ ADM2_EN) +
  themeSettings
```

```{r, echo = TRUE, eval = TRUE, fig.width = 12, fig.height = 4, fig.align = "center", fig.pos = "H", fig.retina = 1}
ggplot(dist1[dist1$ADM1_PCODE == 7, ], aes(as.Date(date), anc1Std)) + 
  geom_line(colour = "#08519c", size = 1) + 
  scale_x_date(name = "Month", date_breaks = "1 month", date_labels = "%b %y") +
  scale_y_continuous(name = "ANC1", 
                     breaks = seq(from = 0, 
                                  to = max(dist1$anc1Std), 
                                  by = 100)) + 
  facet_grid(ADM1_EN ~ ADM2_EN) +
  themeSettings
```

```{r, echo = TRUE, eval = TRUE, fig.width = 12, fig.height = 4, fig.align = "center", fig.pos = "H", fig.retina = 1}
ggplot(dist1[dist1$ADM1_PCODE == 8, ], aes(as.Date(date), anc1Std)) + 
  geom_line(colour = "#08519c", size = 1) + 
  scale_x_date(name = "Month", date_breaks = "1 month", date_labels = "%b %y") +
  scale_y_continuous(name = "ANC1", 
                     breaks = seq(from = 0, 
                                  to = max(dist1$anc1Std), 
                                  by = 100)) + 
  facet_grid(ADM1_EN ~ ADM2_EN) +
  themeSettings
```

```{r, echo = TRUE, eval = TRUE, fig.width = 12, fig.height = 4, fig.align = "center", fig.pos = "H", fig.retina = 1}
ggplot(dist1[dist1$ADM1_PCODE == 9, ], aes(as.Date(date), anc1Std)) + 
  geom_line(colour = "#08519c", size = 1) + 
  scale_x_date(name = "Month", date_breaks = "1 month", date_labels = "%b %y") +
  scale_y_continuous(name = "ANC1", 
                     breaks = seq(from = 0, 
                                  to = max(dist1$anc1Std), 
                                  by = 100)) + 
  facet_grid(ADM1_EN ~ ADM2_EN) +
  themeSettings
```

```{r, echo = TRUE, eval = TRUE, fig.width = 12, fig.height = 4, fig.align = "center", fig.pos = "H", fig.retina = 1}
ggplot(dist1[dist1$ADM1_PCODE == 10, ], aes(as.Date(date), anc1Std)) + 
  geom_line(colour = "#08519c", size = 1) + 
  scale_x_date(name = "Month", date_breaks = "1 month", date_labels = "%b %y") +
  scale_y_continuous(name = "ANC1", 
                     breaks = seq(from = 0, 
                                  to = max(dist1$anc1Std), 
                                  by = 100)) + 
  facet_grid(ADM1_EN ~ ADM2_EN) +
  themeSettings
```

```{r, echo = TRUE, eval = TRUE, fig.width = 12, fig.height = 4, fig.align = "center", fig.pos = "H", fig.retina = 1}
ggplot(dist1[dist1$ADM1_PCODE == 11, ], aes(as.Date(date), anc1Std)) + 
  geom_line(colour = "#08519c", size = 1) + 
  scale_x_date(name = "Month", date_breaks = "1 month", date_labels = "%b %y") +
  scale_y_continuous(name = "ANC1", 
                     breaks = seq(from = 0, 
                                  to = max(dist1$anc1Std), 
                                  by = 100)) + 
  facet_grid(ADM1_EN ~ ADM2_EN) +
  themeSettings
```

```{r, echo = TRUE, eval = TRUE, fig.width = 12, fig.height = 4, fig.align = "center", fig.pos = "H", fig.retina = 1}
ggplot(dist1[dist1$ADM1_PCODE == 12, ], aes(as.Date(date), anc1Std)) + 
  geom_line(colour = "#08519c", size = 1) + 
  scale_x_date(name = "Month", date_breaks = "1 month", date_labels = "%b %y") +
  scale_y_continuous(name = "ANC1", 
                     breaks = seq(from = 0, 
                                  to = max(dist1$anc1Std), 
                                  by = 100)) + 
  facet_grid(ADM1_EN ~ ADM2_EN) +
  themeSettings
```

```{r, echo = TRUE, eval = TRUE, fig.width = 12, fig.height = 4, fig.align = "center", fig.pos = "H", fig.retina = 1}
ggplot(dist1[dist1$ADM1_PCODE == 13, ], aes(as.Date(date), anc1Std)) + 
  geom_line(colour = "#08519c", size = 1) + 
  scale_x_date(name = "Month", date_breaks = "1 month", date_labels = "%b %y") +
  scale_y_continuous(name = "ANC1", 
                     breaks = seq(from = 0, 
                                  to = max(dist1$anc1Std), 
                                  by = 100)) + 
  facet_grid(ADM1_EN ~ ADM2_EN) +
  themeSettings
```

```{r, echo = TRUE, eval = TRUE, fig.width = 12, fig.height = 4, fig.align = "center", fig.pos = "H", fig.retina = 1}
ggplot(dist1[dist1$ADM1_PCODE == 14, ], aes(as.Date(date), anc1Std)) + 
  geom_line(colour = "#08519c", size = 1) + 
  scale_x_date(name = "Month", date_breaks = "1 month", date_labels = "%b %y") +
  scale_y_continuous(name = "ANC1", 
                     breaks = seq(from = 0, 
                                  to = max(dist1$anc1Std), 
                                  by = 100)) + 
  facet_grid(ADM1_EN ~ ADM2_EN) +
  themeSettings
```

```{r, echo = TRUE, eval = TRUE, fig.width = 12, fig.height = 4, fig.align = "center", fig.pos = "H", fig.retina = 1}
ggplot(dist1[dist1$ADM1_PCODE == 15, ], aes(as.Date(date), anc1Std)) + 
  geom_line(colour = "#08519c", size = 1) + 
  scale_x_date(name = "Month", date_breaks = "1 month", date_labels = "%b %y") +
  scale_y_continuous(name = "ANC1", 
                     breaks = seq(from = 0, 
                                  to = max(dist1$anc1Std), 
                                  by = 1000)) + 
  facet_grid(ADM1_EN ~ ADM2_EN) +
  themeSettings
```

```{r, echo = TRUE, eval = TRUE, fig.width = 12, fig.height = 4, fig.align = "center", fig.pos = "H", fig.retina = 1}
ggplot(dist1[dist1$ADM1_PCODE == 16, ], aes(as.Date(date), anc1Std)) + 
  geom_line(colour = "#08519c", size = 1) + 
  scale_x_date(name = "Month", date_breaks = "1 month", date_labels = "%b %y") +
  scale_y_continuous(name = "ANC1", 
                     breaks = seq(from = 0, 
                                  to = max(dist1$anc1Std), 
                                  by = 100)) + 
  facet_grid(ADM1_EN ~ ADM2_EN) +
  themeSettings
```

```{r, echo = TRUE, eval = TRUE, fig.width = 12, fig.height = 4, fig.align = "center", fig.pos = "H", fig.retina = 1}
ggplot(dist1[dist1$ADM1_PCODE == 17, ], aes(as.Date(date), anc1Std)) + 
  geom_line(colour = "#08519c", size = 1) + 
  scale_x_date(name = "Month", date_breaks = "1 month", date_labels = "%b %y") +
  scale_y_continuous(name = "ANC1", 
                     breaks = seq(from = 0, 
                                  to = max(dist1$anc1Std), 
                                  by = 100)) + 
  facet_grid(ADM1_EN ~ ADM2_EN) +
  themeSettings
```

```{r, echo = TRUE, eval = TRUE, fig.width = 12, fig.height = 4, fig.align = "center", fig.pos = "H", fig.retina = 1}
ggplot(dist1[dist1$ADM1_PCODE == 18, ], aes(as.Date(date), anc1Std)) + 
  geom_line(colour = "#08519c", size = 1) + 
  scale_x_date(name = "Month", date_breaks = "1 month", date_labels = "%b %y") +
  scale_y_continuous(name = "ANC1", 
                     breaks = seq(from = 0, 
                                  to = max(dist1$anc1Std), 
                                  by = 100)) + 
  facet_grid(ADM1_EN ~ ADM2_EN) +
  themeSettings
```

```{r, echo = TRUE, eval = TRUE, fig.width = 12, fig.height = 4, fig.align = "center", fig.pos = "H", fig.retina = 1}
ggplot(dist1[dist1$ADM1_PCODE == 19, ], aes(as.Date(date), anc1Std)) + 
  geom_line(colour = "#08519c", size = 1) + 
  scale_x_date(name = "Month", date_breaks = "1 month", date_labels = "%b %y") +
  scale_y_continuous(name = "ANC1", 
                     breaks = seq(from = 0, 
                                  to = max(dist1$anc1Std), 
                                  by = 100)) + 
  facet_grid(ADM1_EN ~ ADM2_EN) +
  themeSettings
```

```{r, echo = TRUE, eval = TRUE, fig.width = 12, fig.height = 4, fig.align = "center", fig.pos = "H", fig.retina = 1}
ggplot(dist1[dist1$ADM1_PCODE == 20, ], aes(as.Date(date), anc1Std)) + 
  geom_line(colour = "#08519c", size = 1) + 
  scale_x_date(name = "Month", date_breaks = "1 month", date_labels = "%b %y") +
  scale_y_continuous(name = "ANC1", 
                     breaks = seq(from = 0, 
                                  to = max(dist1$anc1Std), 
                                  by = 100)) + 
  facet_grid(ADM1_EN ~ ADM2_EN) +
  themeSettings
```

```{r, echo = TRUE, eval = TRUE, fig.width = 12, fig.height = 4, fig.align = "center", fig.pos = "H", fig.retina = 1}
ggplot(dist1[dist1$ADM1_PCODE == 21, ], aes(as.Date(date), anc1Std)) + 
  geom_line(colour = "#08519c", size = 1) + 
  scale_x_date(name = "Month", date_breaks = "1 month", date_labels = "%b %y") +
  scale_y_continuous(name = "ANC1", 
                     breaks = seq(from = 0, 
                                  to = max(dist1$anc1Std), 
                                  by = 100)) + 
  facet_grid(ADM1_EN ~ ADM2_EN) +
  themeSettings
```

```{r, echo = TRUE, eval = TRUE, fig.width = 12, fig.height = 4, fig.align = "center", fig.pos = "H", fig.retina = 1}
ggplot(dist1[dist1$ADM1_PCODE == 22, ], aes(as.Date(date), anc1Std)) + 
  geom_line(colour = "#08519c", size = 1) + 
  scale_x_date(name = "Month", date_breaks = "1 month", date_labels = "%b %y") +
  scale_y_continuous(name = "ANC1", 
                     breaks = seq(from = 0, 
                                  to = max(dist1$anc1Std), 
                                  by = 100)) + 
  facet_grid(ADM1_EN ~ ADM2_EN) +
  themeSettings
```

### Smoothing time-series data

In the line charts above, we will notice that it is not very easy to see the trend of the indicators over time. Smoothing the time-series data is usually done to address this. Smoothing is usually performed using a rolling/running averages. This can be done in R using the function `rollmean()` from the `zoo` package and is implemented as shown below:

&nbsp;

```{r, echo = TRUE, eval = TRUE}
temp1 <- temp1[order(temp1$date), ]

temp1 <- temp1 %>%
  group_by(ADM1_EN) %>%
  mutate(anc1Sm = rollmean(x = anc1Std, k = 3, na.pad = TRUE))

temp1long <- gather(data = temp1, 
                    key = "anc1", 
                    value = "value", 
                    anc1Std, anc1Sm, 
                    factor_key = TRUE)
```

&nbsp;

The smoothed data can then be plotted alongside the raw data as follows (for province data):

&nbsp;

```{r, echo = TRUE, eval = TRUE, warning = FALSE, fig.width = 12, fig.height = 12, fig.align = "center", fig.pos = "H", fig.retina = 1}
ggplot(temp1long, aes(as.Date(date), value, colour = anc1)) + 
  geom_line(size = 1) + 
  scale_colour_manual(labels = c("raw", "smooth"), 
                      values = c("#e41a1c", alpha("#377eb8", 0.3))) +
  scale_x_date(name = "Month", 
               date_breaks = "1 month", 
               date_labels = "%b %y") +
  scale_y_continuous(name = "ANC1", 
                     breaks = seq(from = 0, 
                                  to = max(temp1$anc1Std), 
                                  by = 500)) + 
  facet_wrap(vars(ADM1_EN)) +
  themeSettings
```

&nbsp;

We can do the same for the district data.

&nbsp;

```{r, echo = TRUE, eval = TRUE}
dist1 <- dist1[order(dist1$date), ]

dist1 <- dist1 %>%
  group_by(ADM2_EN) %>%
  mutate(anc1Sm = rollmean(x = anc1Std, k = 3, na.pad = TRUE))

dist1long <- gather(data = dist1, 
                    key = "anc1", 
                    value = "value", 
                    anc1Std, anc1Sm, 
                    factor_key = TRUE)
```

\newpage

```{r, echo = TRUE, eval = TRUE, warning = FALSE, fig.width = 12, fig.height = 4, fig.align = "center", fig.pos = "H", fig.retina = 1}
ggplot(dist1long[dist1long$ADM1_PCODE == 1, ], 
       aes(as.Date(date), value, colour = anc1)) + 
  geom_line(size = 1) +
  scale_colour_manual(labels = c("raw", "smooth"),
                      values = c("#e41a1c", alpha("#377eb8", 0.3))) +
  scale_x_date(name = "Month", 
               date_breaks = "1 month", 
               date_labels = "%b %y") +
  scale_y_continuous(name = "ANC1", 
                     breaks = seq(from = 0, 
                                  to = max(dist1$anc1Std), 
                                  by = 500)) + 
  facet_grid(ADM1_EN ~ ADM2_EN) +
  themeSettings
```

```{r, echo = TRUE, eval = TRUE, warning = FALSE, fig.width = 12, fig.height = 4, fig.align = "center", fig.pos = "H", fig.retina = 1}
ggplot(dist1long[dist1long$ADM1_PCODE == 2, ], 
       aes(as.Date(date), value, colour = anc1)) + 
  geom_line(size = 1) +
  scale_colour_manual(labels = c("raw", "smooth"),
                      values = c("#e41a1c", alpha("#377eb8", 0.3))) +
  scale_x_date(name = "Month", 
               date_breaks = "1 month", 
               date_labels = "%b %y") +
  scale_y_continuous(name = "ANC1", 
                     breaks = seq(from = 0, 
                                  to = max(dist1$anc1Std), 
                                  by = 100)) + 
  facet_grid(ADM1_EN ~ ADM2_EN) +
  themeSettings
```

```{r, echo = TRUE, eval = TRUE, warning = FALSE, fig.width = 12, fig.height = 4, fig.align = "center", fig.pos = "H", fig.retina = 1}
ggplot(dist1long[dist1long$ADM1_PCODE == 3, ], 
       aes(as.Date(date), value, colour = anc1)) + 
  geom_line(size = 1) +
  scale_colour_manual(labels = c("raw", "smooth"),
                      values = c("#e41a1c", alpha("#377eb8", 0.3))) +
  scale_x_date(name = "Month", 
               date_breaks = "1 month", 
               date_labels = "%b %y") +
  scale_y_continuous(name = "ANC1", 
                     breaks = seq(from = 0, 
                                  to = max(dist1$anc1Std), 
                                  by = 100)) + 
  facet_grid(ADM1_EN ~ ADM2_EN) +
  themeSettings
```

```{r, echo = TRUE, eval = TRUE, warning = FALSE, fig.width = 12, fig.height = 4, fig.align = "center", fig.pos = "H", fig.retina = 1}
ggplot(dist1long[dist1long$ADM1_PCODE == 4, ], 
       aes(as.Date(date), value, colour = anc1)) + 
  geom_line(size = 1) +
  scale_colour_manual(labels = c("raw", "smooth"),
                      values = c("#e41a1c", alpha("#377eb8", 0.3))) +
  scale_x_date(name = "Month", 
               date_breaks = "1 month", 
               date_labels = "%b %y") +
  scale_y_continuous(name = "ANC1", 
                     breaks = seq(from = 0, 
                                  to = max(dist1$anc1Std), 
                                  by = 100)) + 
  facet_grid(ADM1_EN ~ ADM2_EN) +
  themeSettings
```

```{r, echo = TRUE, eval = TRUE, warning = FALSE, fig.width = 12, fig.height = 4, fig.align = "center", fig.pos = "H", fig.retina = 1}
ggplot(dist1long[dist1long$ADM1_PCODE == 5, ], 
       aes(as.Date(date), value, colour = anc1)) + 
  geom_line(size = 1) +
  scale_colour_manual(labels = c("raw", "smooth"),
                      values = c("#e41a1c", alpha("#377eb8", 0.3))) +
  scale_x_date(name = "Month", 
               date_breaks = "1 month", 
               date_labels = "%b %y") +
  scale_y_continuous(name = "ANC1", 
                     breaks = seq(from = 0, 
                                  to = max(dist1$anc1Std), 
                                  by = 100)) + 
  facet_grid(ADM1_EN ~ ADM2_EN) +
  themeSettings
```

```{r, echo = TRUE, eval = TRUE, warning = FALSE, fig.width = 12, fig.height = 4, fig.align = "center", fig.pos = "H", fig.retina = 1}
ggplot(dist1long[dist1long$ADM1_PCODE == 6, ], 
       aes(as.Date(date), value, colour = anc1)) + 
  geom_line(size = 1) +
  scale_colour_manual(labels = c("raw", "smooth"),
                      values = c("#e41a1c", alpha("#377eb8", 0.3))) +
  scale_x_date(name = "Month", 
               date_breaks = "1 month", 
               date_labels = "%b %y") +
  scale_y_continuous(name = "ANC1", 
                     breaks = seq(from = 0, 
                                  to = max(dist1$anc1Std), 
                                  by = 100)) + 
  facet_grid(ADM1_EN ~ ADM2_EN) +
  themeSettings
```

```{r, echo = TRUE, eval = TRUE, warning = FALSE, fig.width = 12, fig.height = 4, fig.align = "center", fig.pos = "H", fig.retina = 1}
ggplot(dist1long[dist1long$ADM1_PCODE == 7, ], 
       aes(as.Date(date), value, colour = anc1)) + 
  geom_line(size = 1) +
  scale_colour_manual(labels = c("raw", "smooth"),
                      values = c("#e41a1c", alpha("#377eb8", 0.3))) +
  scale_x_date(name = "Month", 
               date_breaks = "1 month", 
               date_labels = "%b %y") +
  scale_y_continuous(name = "ANC1", 
                     breaks = seq(from = 0, 
                                  to = max(dist1$anc1Std), 
                                  by = 100)) + 
  facet_grid(ADM1_EN ~ ADM2_EN) +
  themeSettings
```

```{r, echo = TRUE, eval = TRUE, warning = FALSE, fig.width = 12, fig.height = 4, fig.align = "center", fig.pos = "H", fig.retina = 1}
ggplot(dist1long[dist1long$ADM1_PCODE == 8, ], 
       aes(as.Date(date), value, colour = anc1)) + 
  geom_line(size = 1) +
  scale_colour_manual(labels = c("raw", "smooth"),
                      values = c("#e41a1c", alpha("#377eb8", 0.3))) +
  scale_x_date(name = "Month", 
               date_breaks = "1 month", 
               date_labels = "%b %y") +
  scale_y_continuous(name = "ANC1", 
                     breaks = seq(from = 0, 
                                  to = max(dist1$anc1Std), 
                                  by = 100)) + 
  facet_grid(ADM1_EN ~ ADM2_EN) +
  themeSettings
```

```{r, echo = TRUE, eval = TRUE, warning = FALSE, fig.width = 12, fig.height = 4, fig.align = "center", fig.pos = "H", fig.retina = 1}
ggplot(dist1long[dist1long$ADM1_PCODE == 9, ], 
       aes(as.Date(date), value, colour = anc1)) + 
  geom_line(size = 1) +
  scale_colour_manual(labels = c("raw", "smooth"),
                      values = c("#e41a1c", alpha("#377eb8", 0.3))) +
  scale_x_date(name = "Month", 
               date_breaks = "1 month", 
               date_labels = "%b %y") +
  scale_y_continuous(name = "ANC1", 
                     breaks = seq(from = 0, 
                                  to = max(dist1$anc1Std), 
                                  by = 100)) + 
  facet_grid(ADM1_EN ~ ADM2_EN) +
  themeSettings
```

```{r, echo = TRUE, eval = TRUE, warning = FALSE, fig.width = 12, fig.height = 4, fig.align = "center", fig.pos = "H", fig.retina = 1}
ggplot(dist1long[dist1long$ADM1_PCODE == 10, ], 
       aes(as.Date(date), value, colour = anc1)) + 
  geom_line(size = 1) +
  scale_colour_manual(labels = c("raw", "smooth"),
                      values = c("#e41a1c", alpha("#377eb8", 0.3))) +
  scale_x_date(name = "Month", 
               date_breaks = "1 month", 
               date_labels = "%b %y") +
  scale_y_continuous(name = "ANC1", 
                     breaks = seq(from = 0, 
                                  to = max(dist1$anc1Std), 
                                  by = 100)) + 
  facet_grid(ADM1_EN ~ ADM2_EN) +
  themeSettings
```

```{r, echo = TRUE, eval = TRUE, warning = FALSE, fig.width = 12, fig.height = 4, fig.align = "center", fig.pos = "H", fig.retina = 1}
ggplot(dist1long[dist1long$ADM1_PCODE == 11, ], 
       aes(as.Date(date), value, colour = anc1)) + 
  geom_line(size = 1) +
  scale_colour_manual(labels = c("raw", "smooth"),
                      values = c("#e41a1c", alpha("#377eb8", 0.3))) +
  scale_x_date(name = "Month", 
               date_breaks = "1 month", 
               date_labels = "%b %y") +
  scale_y_continuous(name = "ANC1", 
                     breaks = seq(from = 0, 
                                  to = max(dist1$anc1Std), 
                                  by = 100)) + 
  facet_grid(ADM1_EN ~ ADM2_EN) +
  themeSettings
```

```{r, echo = TRUE, eval = TRUE, warning = FALSE, fig.width = 12, fig.height = 4, fig.align = "center", fig.pos = "H", fig.retina = 1}
ggplot(dist1long[dist1long$ADM1_PCODE == 12, ], 
       aes(as.Date(date), value, colour = anc1)) + 
  geom_line(size = 1) +
  scale_colour_manual(labels = c("raw", "smooth"),
                      values = c("#e41a1c", alpha("#377eb8", 0.3))) +
  scale_x_date(name = "Month", 
               date_breaks = "1 month", 
               date_labels = "%b %y") +
  scale_y_continuous(name = "ANC1", 
                     breaks = seq(from = 0, 
                                  to = max(dist1$anc1Std), 
                                  by = 100)) + 
  facet_grid(ADM1_EN ~ ADM2_EN) +
  themeSettings
```

```{r, echo = TRUE, eval = TRUE, warning = FALSE, fig.width = 12, fig.height = 4, fig.align = "center", fig.pos = "H", fig.retina = 1}
ggplot(dist1long[dist1long$ADM1_PCODE == 13, ], 
       aes(as.Date(date), value, colour = anc1)) + 
  geom_line(size = 1) +
  scale_colour_manual(labels = c("raw", "smooth"),
                      values = c("#e41a1c", alpha("#377eb8", 0.3))) +
  scale_x_date(name = "Month", 
               date_breaks = "1 month", 
               date_labels = "%b %y") +
  scale_y_continuous(name = "ANC1", 
                     breaks = seq(from = 0, 
                                  to = max(dist1$anc1Std), 
                                  by = 100)) + 
  facet_grid(ADM1_EN ~ ADM2_EN) +
  themeSettings
```

```{r, echo = TRUE, eval = TRUE, warning = FALSE, fig.width = 12, fig.height = 4, fig.align = "center", fig.pos = "H", fig.retina = 1}
ggplot(dist1long[dist1long$ADM1_PCODE == 14, ], 
       aes(as.Date(date), value, colour = anc1)) + 
  geom_line(size = 1) +
  scale_colour_manual(labels = c("raw", "smooth"),
                      values = c("#e41a1c", alpha("#377eb8", 0.3))) +
  scale_x_date(name = "Month", 
               date_breaks = "1 month", 
               date_labels = "%b %y") +
  scale_y_continuous(name = "ANC1", 
                     breaks = seq(from = 0, 
                                  to = max(dist1$anc1Std), 
                                  by = 100)) + 
  facet_grid(ADM1_EN ~ ADM2_EN) +
  themeSettings
```

```{r, echo = TRUE, eval = TRUE, warning = FALSE, fig.width = 12, fig.height = 4, fig.align = "center", fig.pos = "H", fig.retina = 1}
ggplot(dist1long[dist1long$ADM1_PCODE == 15, ], 
       aes(as.Date(date), value, colour = anc1)) + 
  geom_line(size = 1) +
  scale_colour_manual(labels = c("raw", "smooth"),
                      values = c("#e41a1c", alpha("#377eb8", 0.3))) +
  scale_x_date(name = "Month", 
               date_breaks = "1 month", 
               date_labels = "%b %y") +
  scale_y_continuous(name = "ANC1", 
                     breaks = seq(from = 0, 
                                  to = max(dist1$anc1Std), 
                                  by = 1000)) + 
  facet_grid(ADM1_EN ~ ADM2_EN) +
  themeSettings
```

```{r, echo = TRUE, eval = TRUE, warning = FALSE, fig.width = 12, fig.height = 4, fig.align = "center", fig.pos = "H", fig.retina = 1}
ggplot(dist1long[dist1long$ADM1_PCODE == 16, ], 
       aes(as.Date(date), value, colour = anc1)) + 
  geom_line(size = 1) +
  scale_colour_manual(labels = c("raw", "smooth"),
                      values = c("#e41a1c", alpha("#377eb8", 0.3))) +
  scale_x_date(name = "Month", 
               date_breaks = "1 month", 
               date_labels = "%b %y") +
  scale_y_continuous(name = "ANC1", 
                     breaks = seq(from = 0, 
                                  to = max(dist1$anc1Std), 
                                  by = 100)) + 
  facet_grid(ADM1_EN ~ ADM2_EN) +
  themeSettings
```

```{r, echo = TRUE, eval = TRUE, warning = FALSE, fig.width = 12, fig.height = 4, fig.align = "center", fig.pos = "H", fig.retina = 1}
ggplot(dist1long[dist1long$ADM1_PCODE == 17, ], 
       aes(as.Date(date), value, colour = anc1)) + 
  geom_line(size = 1) +
  scale_colour_manual(labels = c("raw", "smooth"),
                      values = c("#e41a1c", alpha("#377eb8", 0.3))) +
  scale_x_date(name = "Month", 
               date_breaks = "1 month", 
               date_labels = "%b %y") +
  scale_y_continuous(name = "ANC1", 
                     breaks = seq(from = 0, 
                                  to = max(dist1$anc1Std), 
                                  by = 100)) + 
  facet_grid(ADM1_EN ~ ADM2_EN) +
  themeSettings
```

```{r, echo = TRUE, eval = TRUE, warning = FALSE, fig.width = 12, fig.height = 4, fig.align = "center", fig.pos = "H", fig.retina = 1}
ggplot(dist1long[dist1long$ADM1_PCODE == 18, ], 
       aes(as.Date(date), value, colour = anc1)) + 
  geom_line(size = 1) +
  scale_colour_manual(labels = c("raw", "smooth"),
                      values = c("#e41a1c", alpha("#377eb8", 0.3))) +
  scale_x_date(name = "Month", 
               date_breaks = "1 month", 
               date_labels = "%b %y") +
  scale_y_continuous(name = "ANC1", 
                     breaks = seq(from = 0, 
                                  to = max(dist1$anc1Std), 
                                  by = 100)) + 
  facet_grid(ADM1_EN ~ ADM2_EN) +
  themeSettings
```

```{r, echo = TRUE, eval = TRUE, warning = FALSE, fig.width = 12, fig.height = 4, fig.align = "center", fig.pos = "H", fig.retina = 1}
ggplot(dist1long[dist1long$ADM1_PCODE == 19, ], 
       aes(as.Date(date), value, colour = anc1)) + 
  geom_line(size = 1) +
  scale_colour_manual(labels = c("raw", "smooth"),
                      values = c("#e41a1c", alpha("#377eb8", 0.3))) +
  scale_x_date(name = "Month", 
               date_breaks = "1 month", 
               date_labels = "%b %y") +
  scale_y_continuous(name = "ANC1", 
                     breaks = seq(from = 0, 
                                  to = max(dist1$anc1Std), 
                                  by = 100)) + 
  facet_grid(ADM1_EN ~ ADM2_EN) +
  themeSettings
```

```{r, echo = TRUE, eval = TRUE, warning = FALSE, fig.width = 12, fig.height = 4, fig.align = "center", fig.pos = "H", fig.retina = 1}
ggplot(dist1long[dist1long$ADM1_PCODE == 20, ], 
       aes(as.Date(date), value, colour = anc1)) + 
  geom_line(size = 1) +
  scale_colour_manual(labels = c("raw", "smooth"),
                      values = c("#e41a1c", alpha("#377eb8", 0.3))) +
  scale_x_date(name = "Month", 
               date_breaks = "1 month", 
               date_labels = "%b %y") +
  scale_y_continuous(name = "ANC1", 
                     breaks = seq(from = 0, 
                                  to = max(dist1$anc1Std), 
                                  by = 100)) + 
  facet_grid(ADM1_EN ~ ADM2_EN) +
  themeSettings
```

```{r, echo = TRUE, eval = TRUE, warning = FALSE, fig.width = 12, fig.height = 4, fig.align = "center", fig.pos = "H", fig.retina = 1}
ggplot(dist1long[dist1long$ADM1_PCODE == 21, ], 
       aes(as.Date(date), value, colour = anc1)) + 
  geom_line(size = 1) +
  scale_colour_manual(labels = c("raw", "smooth"),
                      values = c("#e41a1c", alpha("#377eb8", 0.3))) +
  scale_x_date(name = "Month", 
               date_breaks = "1 month", 
               date_labels = "%b %y") +
  scale_y_continuous(name = "ANC1", 
                     breaks = seq(from = 0, 
                                  to = max(dist1$anc1Std), 
                                  by = 100)) + 
  facet_grid(ADM1_EN ~ ADM2_EN) +
  themeSettings
```

```{r, echo = TRUE, eval = TRUE, warning = FALSE, fig.width = 12, fig.height = 4, fig.align = "center", fig.pos = "H", fig.retina = 1}
ggplot(dist1long[dist1long$ADM1_PCODE == 22, ], 
       aes(as.Date(date), value, colour = anc1)) + 
  geom_line(size = 1) +
  scale_colour_manual(labels = c("raw", "smooth"),
                      values = c("#e41a1c", alpha("#377eb8", 0.3))) +
  scale_x_date(name = "Month", 
               date_breaks = "1 month", 
               date_labels = "%b %y") +
  scale_y_continuous(name = "ANC1", 
                     breaks = seq(from = 0, 
                                  to = max(dist1$anc1Std), 
                                  by = 100)) + 
  facet_grid(ADM1_EN ~ ADM2_EN) +
  themeSettings
```

&nbsp;

These approaches can be used for all the indicators.

# 3. Mapping of spatial distribution of indicators

The `provincedata` and the `districtdata` data frame objects are to be used for mapping. In addition, the `province` and `district` map data from the `papuanewguinea` package is for plotting the boundaries.

First, let us inspect the `province` and the `district` map objects.

&nbsp;

```{r, echo = FALSE, eval = TRUE}
knitr::kable(x = province@data,
             booktabs = TRUE,
             format = "latex") %>%
  kableExtra::kable_styling(latex_options = c("HOLD_position", "striped", "scale_down"),
                            position = "center")
```

&nbsp;

We notice that the order of the provinces are not sequential based on administrative code. We will need to re-order this in such a way that the admin codes are sequential. This can be done as follows:

&nbsp;

```{r, echo = TRUE, eval = TRUE}
province@data <- province@data[order(province@data$ADM1_PCODE), ]
```

&nbsp;

We can now map the `provincedata`. For this, we will use `anc1` as our index indicator.

The first thing we need to do is to standardise the `anc1` indicator in the same way we did earlier in the time-series analysis. This can be done as follows:

&nbsp;

```{r, echo = TRUE, eval = TRUE}
anc1Province <- provincedata[ , c("ADM1_PCODE", "ADM1_EN", "WRA", 
                                  "sf", "year", "anc1")]

anc1Province$anc1Std <- anc1Province$anc1 / anc1Province$sf
```

&nbsp;

We then need to classify the standardised `anc1` values into groups/classes that will allow us to colour the province based on their `anc1` classification. We can use an approach in which the standardised `anc1` counts are grouped into meaningful classes. A useful approach will be using quantiles. For this, we can use the R package called `classInt` which has a function called `classIntervals()`. The `anc1` can be classified as follows:

&nbsp;

```{r, echo = TRUE, eval = TRUE}
anc1Province$class <- cut(x = anc1Province$anc1Std,
                          breaks = classIntervals(var = anc1Province$anc1Std, 
                                                  n = 5, 
                                                  style = "quantile")$brks,
                          labels = FALSE)

anc1Province$class <- ifelse(is.na(anc1Province$class), 0, anc1Province$class)
```

&nbsp;

We can now map the `anc1` indicator for year 2015 and year 2016 as follows:

&nbsp;

```{r, echo = TRUE, eval = TRUE, fig.width = 12, fig.height = 6, fig.align = "center", fig.pos = "H", fig.retina = 1}
colourscheme <- c("#eff3ff", "#c6dbef", "#9ecae1", 
                  "#6baed6", "#3182bd", "#08519c")

par(mar = c(0, 0, 0, 0), mfrow = c(1, 2))

plot(province,
     col = colourscheme[anc1Province$class[anc1Province$year == 2015] + 1],
     border = "gray90",
     lwd = 0.5)

plot(province,
     col = colourscheme[anc1Province$class[anc1Province$year == 2016] + 1],
     border = "gray90",
     lwd = 0.5)
```

&nbsp;

Now, it will be useful to add a title to each plot to identify which map is for which year and to add a legend to show what the colours refer to. This can be done as follows:

&nbsp;

```{r, echo = TRUE, eval = TRUE, message = FALSE, fig.width = 12, fig.height = 6, fig.align = "center", fig.pos = "H", fig.retina = 1}
par(mar = c(0, 0, 0, 0), mfrow = c(1, 2))

plot(province,
     col = colourscheme[anc1Province$class[anc1Province$year == 2015] + 1],
     border = "gray90",
     lwd = 0.5)
title(main = "At least one antenatal care visits in 2015", line = -1, adj = 1)

plot(province,
     col = colourscheme[anc1Province$class[anc1Province$year == 2016] + 1],
     border = "gray90",
     lwd = 0.5)
title(main = "At least one antenatal care visits in 2016", line = -1, adj = 1)
legend(x = "bottomright",
       inset = 0.1,
       y.intersp = 1.2,
       legend = c("0", names(print(classIntervals(anc1Province$anc1Std,
                                                  n = 5,
                                                  style = "quantile",
                                                  dataPrecision = 0), 
                                   between = "to", 
                                   cutlabels = FALSE))),
       pch = 15, pt.cex = 2,
       col = colourscheme)
```

&nbsp;

We can now map the `districtdata`. For this, we will use `anc1` as our index indicator.

We will need to reorder the `district` map sequentially based on administrative code. This can be done as follows:

&nbsp;

```{r, echo = TRUE, eval = TRUE}
district@data <- district@data[order(district@data$ADM2_PCODE), ]
```

&nbsp;

We now need to standardise the `anc1` indicator in the same way we did earlier in the time-series analysis. This can be done as follows:

```{r, echo = TRUE, eval = TRUE}
anc1District <- districtdata[ , c("ADM1_PCODE", "ADM1_EN", 
                                  "ADM2_PCODE", "ADM2_EN", "WRA", 
                                  "sf", "year", "anc1")]

anc1District$anc1Std <- anc1District$anc1 / anc1District$sf
```

&nbsp;

We then need to classify the standardised `anc1` values into groups/classes that will allow us to colour the districts based on their `anc1` classification. We can use an approach in which the standardised `anc1` counts are grouped into meaningful classes. A useful approach will be using quantiles. For this, we can use the R package called `classInt` which has a function called `classIntervals()`. The `anc1` can be classified as follows:

&nbsp;

```{r, echo = TRUE, eval = TRUE}
anc1District$class <- cut(x = anc1District$anc1Std,
                          breaks = classIntervals(var = anc1District$anc1Std, 
                                                  n = 5, 
                                                  style = "quantile")$brks,
                          labels = FALSE)

anc1District$class <- ifelse(is.na(anc1District$class), 0, anc1District$class)
```

&nbsp;

We can now do some final inspection of whether the districts in the map data correspond to the districts in `districdata`.

Checking the number of districts, we note that there is one district more in the map data compared to the `districtdata`. District with administrative code 0905 is not included in `districtdata`. To be able to map, we can create additional rows of data corresponding to this district and then just adding `NA` data. This can be done as follows:

&nbsp;

```{r, echo = FALSE, eval = TRUE}
anc1District$ADM2_PCODE <- as.numeric(anc1District$ADM2_PCODE)
```

```{r, echo = TRUE, eval = TRUE}
rowdata <- anc1District[1:2, ]
rowdata$ADM2_PCODE <- c(905, 905)
rowdata$ADM2_EN <- c("Mul/Baiyer District", "Mul/Baiyer District")
rowdata$WRA <- rep(as.numeric(pop_adm2[pop_adm2$ADM2_PCODE == "PG0905", 
                                       "WRA"]), 
                   2)
rowdata$sf <- rowdata$WRA / 100000
rowdata$anc1 <- NA
rowdata$anc1Std <- NA
rowdata$class <- NA

anc1District <- data.frame(rbind(anc1District, rowdata))

anc1District <- anc1District[order(anc1District$ADM2_PCODE), ]
```

&nbsp;

We can now map the `anc1` indicator for year 2015 and year 2016 as follows:

&nbsp;

```{r, echo = TRUE, eval = TRUE, fig.width = 12, fig.height = 6, fig.align = "center", fig.pos = "H", fig.retina = 1}
colourscheme <- c("#eff3ff", "#c6dbef", "#9ecae1", 
                  "#6baed6", "#3182bd", "#08519c")

par(mar = c(0, 0, 0, 0), mfrow = c(1, 2))

plot(district,
     col = colourscheme[anc1District$class[anc1District$year == 2015] + 1],
     border = "gray90",
     lwd = 0.5)

plot(district,
     col = colourscheme[anc1District$class[anc1District$year == 2016] + 1],
     border = "gray90",
     lwd = 0.5)
```