churn.Rmd

# Customer churn and deep learning {#churn}

```{r, message = FALSE, warning = FALSE, echo = FALSE}
library(drake)
library(glue)
library(purrr)
library(rlang)
library(tidyverse)
tmp <- suppressWarnings(drake_plan(x = 1, y = 2))
```


```{r, include = FALSE}
library(drake)
library(keras)
library(tidyverse)
library(rsample)
library(recipes)
library(yardstick)
options(
  drake_make_menu = FALSE,
  drake_clean_menu = FALSE,
  warnPartialMatchArgs = FALSE,
  crayon.enabled = FALSE,
  readr.show_progress = FALSE,
  tidyverse.quiet = TRUE
)
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
knitr::opts_knit$set(root.dir = fs::dir_create(tempfile()))
```

```{r, include = FALSE}
invisible(drake_example("customer-churn", overwrite = TRUE))
invisible(file.copy("customer-churn/data/customer_churn.csv", ".", overwrite = TRUE))
```

[`drake`](https://github.com/ropensci/drake) is designed for workflows with long runtimes, and a major use case is deep learning. This chapter demonstrates how to leverage [`drake`](https://github.com/ropensci/drake) to manage a deep learning workflow. The original example comes from a [blog post by Matt Dancho](https://blogs.rstudio.com/tensorflow/posts/2018-01-11-keras-customer-churn/), and the chapter's content itself comes directly from [this R notebook](https://github.com/sol-eng/tensorflow-w-r/blob/master/workflow/tensorflow-drake.Rmd), part of an [RStudio Solutions Engineering example demonstrating TensorFlow in R](https://github.com/sol-eng/tensorflow-w-r). The notebook is modified and redistributed under the terms of the [Apache 2.0 license](https://github.com/sol-eng/tensorflow-w-r/blob/master/LICENSE), copyright RStudio ([details here](https://github.com/ropensci-books/drake/blob/main/COPYRIGHT)).

## Churn packages

First, we load our packages into a fresh R session.

```{r, keraspkgs}
library(drake)
library(keras)
library(tidyverse)
library(rsample)
library(recipes)
library(yardstick)
```

## Churn functions

[`drake`](https://github.com/ropensci/drake) is R-focused and function-oriented. We create functions to [preprocess the data](https://github.com/tidymodels/recipes),

```{r}
prepare_recipe <- function(data) {
  data %>%
    training() %>%
    recipe(Churn ~ .) %>%
    step_rm(customerID) %>%
    step_naomit(all_outcomes(), all_predictors()) %>%
    step_discretize(tenure, options = list(cuts = 6)) %>%
    step_log(TotalCharges) %>%
    step_mutate(Churn = ifelse(Churn == "Yes", 1, 0)) %>%
    step_dummy(all_nominal(), -all_outcomes()) %>%
    step_center(all_predictors(), -all_outcomes()) %>%
    step_scale(all_predictors(), -all_outcomes()) %>%
    prep()
}
```

define a [`keras`](https://github.com/rstudio/keras) model, exposing arguments to set the dimensionality and activation functions of the layers,

```{r, kerasmodel}
define_model <- function(rec, units1, units2, act1, act2, act3) {
  input_shape <- ncol(
    juice(rec, all_predictors(), composition = "matrix")
  )
  keras_model_sequential() %>%
    layer_dense(
      units = units1,
      kernel_initializer = "uniform",
      activation = act1,
      input_shape = input_shape
    ) %>%
    layer_dropout(rate = 0.1) %>%
    layer_dense(
      units = units2,
      kernel_initializer = "uniform",
      activation = act2
    ) %>%
    layer_dropout(rate = 0.1) %>%
    layer_dense(
      units = 1,
      kernel_initializer = "uniform",
      activation = act3
    )
}
```

train a model,

```{r, kerastrain}
train_model <- function(
  rec,
  units1 = 16,
  units2 = 16,
  act1 = "relu",
  act2 = "relu",
  act3 = "sigmoid"
) {
  model <- define_model(
    rec = rec,
    units1 = units1,
    units2 = units2,
    act1 = act1,
    act2 = act2,
    act3 = act3
  )
  compile(
    model,
    optimizer = "adam",
    loss = "binary_crossentropy",
    metrics = c("accuracy")
  )
  x_train_tbl <- juice(
    rec,
    all_predictors(),
    composition = "matrix"
  )
  y_train_vec <- juice(rec, all_outcomes()) %>%
    pull()
  fit(
    object = model,
    x = x_train_tbl,
    y = y_train_vec,
    batch_size = 32,
    epochs = 32,
    validation_split = 0.3,
    verbose = 0
  )
  model
}
```

compare predictions against reality,

```{r, kerasconf}
confusion_matrix <- function(data, rec, model) {
  testing_data <- bake(rec, testing(data))
  x_test_tbl <- testing_data %>%
    select(-Churn) %>%
    as.matrix()
  y_test_vec <- testing_data %>%
    select(Churn) %>%
    pull()
  yhat_keras_class_vec <- model %>%
    predict_classes(x_test_tbl) %>%
    as.factor() %>%
    fct_recode(yes = "1", no = "0")
  yhat_keras_prob_vec <-
    model %>%
    predict_proba(x_test_tbl) %>%
    as.vector()
  test_truth <- y_test_vec %>%
    as.factor() %>%
    fct_recode(yes = "1", no = "0")
  estimates_keras_tbl <- tibble(
    truth = test_truth,
    estimate = yhat_keras_class_vec,
    class_prob = yhat_keras_prob_vec
  )
  estimates_keras_tbl %>%
    conf_mat(truth, estimate)
}
```

and compare the performance of multiple models. 

```{r, kerascompare}
compare_models <- function(...) {
  name <- match.call()[-1] %>%
    as.character()
  df <- map_df(list(...), summary) %>%
    filter(.metric %in% c("accuracy", "sens", "spec")) %>%
    mutate(name = rep(name, each = n() / length(name))) %>%
    rename(metric = .metric, estimate = .estimate)
  ggplot(df) +
    geom_line(aes(x = metric, y = estimate, color = name, group = name)) +
    theme_gray(24)
}
```

## Churn plan

Next, we define our workflow in a [`drake` plan](https://books.ropensci.org/drake/plans.html). We will prepare the data, train different models with different activation functions, and compare the models in terms of performance. 

```{r, kerasplan1}
activations <- c("relu", "sigmoid")

plan <- drake_plan(
  data = read_csv(file_in("customer_churn.csv"), col_types = cols()) %>%
    initial_split(prop = 0.3),
  rec = prepare_recipe(data),
  model = target(
    train_model(rec, act1 = act),
    format = "keras", # Supported in drake > 7.5.2 to store models properly.
    transform = map(act = !!activations)
  ),
  conf = target(
    confusion_matrix(data, rec, model),
    transform = map(model, .id = act)
  ),
  metrics = target(
    compare_models(conf),
    transform = combine(conf)
  )
)
```

The plan is a data frame with the steps we are going to do.

```{r, paged.print = FALSE, warning = FALSE}
plan
```

## Churn dependency graph

The graph visualizes the dependency relationships among the steps of the workflow.

```{r, message = FALSE}
vis_drake_graph(plan)
```

## Run the Keras models

Call [`make()`](https://docs.ropensci.org/drake/reference/make.html) to actually run the workflow.

```{r}
make(plan)
```

## Inspect the Keras results

The two models performed about the same.

```{r}
readd(metrics) # see also loadd()
```

## Add Keras models

Let's try the softmax activation function.

```{r}
activations <- c("relu", "sigmoid", "softmax")

plan <- drake_plan(
  data = read_csv(file_in("customer_churn.csv"), col_types = cols()) %>%
    initial_split(prop = 0.3),
  rec = prepare_recipe(data),
  model = target(
    train_model(rec, act1 = act),
    format = "keras", # Supported in drake > 7.5.2 to store models properly.
    transform = map(act = !!activations)
  ),
  conf = target(
    confusion_matrix(data, rec, model),
    transform = map(model, .id = act)
  ),
  metrics = target(
    compare_models(conf),
    transform = combine(conf)
  )
)
```

```{r, message = FALSE}
vis_drake_graph(plan) # see also outdated() and predict_runtime()
```

[`make()`](https://docs.ropensci.org/drake/reference/make.html) skips the relu and sigmoid models because they are already up to date. (Their dependencies did not change.) Only the softmax model needs to run.

```{r}
make(plan)
```

## Inspect the Churn results again

```{r}
readd(metrics) # see also loadd()
```

## Update the Churn code

If you change upstream functions, even nested ones, `drake` automatically refits the affected models. Let's increase dropout in both layers.

```{r}
define_model <- function(rec, units1, units2, act1, act2, act3) {
  input_shape <- ncol(
    juice(rec, all_predictors(), composition = "matrix")
  )
  keras_model_sequential() %>%
    layer_dense(
      units = units1,
      kernel_initializer = "uniform",
      activation = act1,
      input_shape = input_shape
    ) %>%
    layer_dropout(rate = 0.15) %>% # Changed from 0.1 to 0.15.
    layer_dense(
      units = units2,
      kernel_initializer = "uniform",
      activation = act2
    ) %>%
    layer_dropout(rate = 0.15) %>% # Changed from 0.1 to 0.15.
    layer_dense(
      units = 1,
      kernel_initializer = "uniform",
      activation = act3
    )
}
```

All the models and downstream results are affected.

```{r}
make(plan)
```

## Churn history and provenance

`drake` tracks history and provenance. You can see which models you ran, when you ran them, how long they took, and which settings you tried (i.e. named arguments to function calls in your commands).

```{r}
history <- drake_history()
history
```

And as long as you did not run `clean(garbage_collection = TRUE)`, you can get the old data back. Let's find the oldest run of the relu model.

```{r}
hash <- history %>%
  filter(act1 == "relu") %>%
  pull(hash) %>%
  head(n = 1)
drake_cache()$get_value(hash)
```

```{r, echo = FALSE}
clean(destroy = TRUE)
```