HDI/hdi_tv.Rmd

---
title: "Regression and Other Stories: Human Development Index"
author: "Andrew Gelman, Jennifer Hill, Aki Vehtari"
date: "`r Sys.Date()`"
output:
  github_document:
    toc: true
---
Tidyverse version by Bill Behrman.

Human Development Index - Looking at data in different ways. See
Chapter 2 in Regression and Other Stories.

-------------

```{r, message=FALSE}
# Packages
library(tidyverse)
library(haven)
library(sf)

# Parameters
  # Human Development Index data
file_hdi <- here::here("HDI/data/hdi.dat")
  # Data that includes state income
file_income <- here::here("HDI/data/state vote and income, 68-00.dta")
  # Common code
file_common <- here::here("_common.R")

#===============================================================================

# Run common code
source(file_common)
```

# 2 Data and measurement

## 2.1 Examining where data come from

Data

Human Development Index data.

```{r}
hdi <- 
  file_hdi %>% 
  read.table(header = TRUE) %>% 
  as_tibble() %>% 
  select(state, hdi, hdi_rank = rank, canada_dist = canada.dist)

hdi
```

State income data for 2000.

```{r}
income <- 
  file_income %>% 
  read_dta() %>% 
  filter(st_year == 2000) %>% 
  select(
    state = st_state,
    state_abbr = st_stateabb,
    income_2000 = st_income
  ) %>% 
  mutate(income_2000_rank = min_rank(-income_2000))

income
```

`hdi` has 51 rows and `income` has 50. Let's verify that the difference is the District of Columbia and that we can otherwise join on the states.

```{r}
all(income$state %in% hdi$state)

setdiff(hdi$state, income$state)
```

```{r}
hdi_income <- 
  hdi %>% 
  full_join(income, by = "state")

hdi_income
```

Human Development Index vs. average state income.

```{r, fig.asp=0.75}
hdi_income %>% 
  drop_na(income_2000) %>% 
  ggplot(aes(income_2000, hdi)) +
  geom_point() +
  ggrepel::geom_text_repel(aes(label = state_abbr)) +
  labs(
    title = "Human Development Index vs. average state income",
    x = "Average state income in 2000",
    y = "Human Development Index"
  )
```

Human Development Index rank vs. state income rank.

```{r, fig.asp=0.75}
hdi_income %>% 
  drop_na(income_2000_rank) %>%
  ggplot(aes(income_2000_rank, hdi_rank)) +
  geom_point() +
  ggrepel::geom_text_repel(aes(label = state_abbr)) +
  scale_x_reverse() +
  scale_y_reverse() +
  labs(
    title = "Human Development Index rank vs. state income rank",
    x = "State income rank in 2000",
    y = "Human Development Index rank"
  )
```

Boundaries for U.S. states using [ussf package](https://github.com/dcl-docs/ussf).

```{r}
us <- ussf::boundaries(geography = "state")
```

```{r}
setdiff(us$NAME, hdi$state)
setdiff(hdi$state, us$NAME)
```

Join boundaries to Human Development Index data.

```{r}
us_hdi <- 
  us %>% 
  left_join(
    hdi %>% 
      mutate(
        state = str_replace(state, "Washington, D.C.", "District of Columbia")
      ),
    by = c("NAME" = "state")
  )
```

Human Development Index by state.

```{r}
us_hdi %>% 
  ggplot() +
  geom_sf(aes(fill = hdi)) +
  scale_fill_distiller(palette = "YlGnBu", direction = 1) +
  theme_void() +
  theme(plot.title = element_text(hjust = 0.5)) +
  labs(
    title = "Human Development Index by state",
    fill = "HDI"
  )
```

Number of states you need to drive through to reach the Canadian border.

```{r}
us_hdi %>% 
  ggplot() +
  geom_sf(aes(fill = factor(canada_dist))) +
  scale_fill_brewer(palette = "YlGnBu", direction = -1) +
  theme_void() +
  theme(plot.title = element_text(hjust = 0.5)) +
  labs(
    title = 
      "Number of states you need to drive through to reach the Canadian border",
    fill = NULL
  )
```