merge pr #83: add alt text

tidymodels · May 20, 2021 · 7db8d82 · 7db8d82
2 parents 35cb85c + 728f89b
commit 7db8d82
Show file tree

Hide file tree

Showing 3 changed files with 54 additions and 30 deletions.
diff --git a/README.Rmd b/README.Rmd
@@ -51,25 +51,25 @@ Rather than diving right into the implementation, we'll focus here on how the pi
 
 At the highest level, ensembles are formed from _model definitions_. In this package, model definitions are an instance of a minimal [workflow](https://workflows.tidymodels.org/), containing a _model specification_ (as defined in the [parsnip](https://parsnip.tidymodels.org/) package) and, optionally, a _preprocessor_ (as defined in the [recipes](https://recipes.tidymodels.org/) package). Model definitions specify the form of candidate ensemble members.
 
-![](man/figures/model_defs.png)
+![A diagram representing "model definitions," which specify the form of candidate ensemble members. Three colored boxes represent three different model types; a K-nearest neighbors model (in salmon), a linear regression model (in yellow), and a support vector machine model (in green).](man/figures/model_defs.png)
 
 To be used in the same ensemble, each of these model definitions must share the same _resample_. This [rsample](https://rsample.tidymodels.org/) `rset` object, when paired with the model definitions, can be used to generate the tuning/fitting results objects for the candidate _ensemble members_ with tune.
 
-![](man/figures/candidates.png)
+![A diagram representing "candidate members" generated from each model definition. Four salmon-colored boxes labeled "KNN" represent K-nearest neighbors models trained on the resamples with differing hyperparameters. Similarly, the linear regression model generates one candidate member, and the support vector machine model generates six.](man/figures/candidates.png)
 
 Candidate members first come together in a `data_stack` object through the `add_candidates()` function. Principally, these objects are just [tibble](https://tibble.tidyverse.org/)s, where the first column gives the true outcome in the assessment set (the portion of the training set used for model validation), and the remaining columns give the predictions from each candidate ensemble member. (When the outcome is numeric, there's only one column per candidate ensemble member. Classification requires as many columns per candidate as there are levels in the outcome variable.) They also bring along a few extra attributes to keep track of model definitions.
 
-![](man/figures/data_stack.png)
+![A diagram representing a "data stack," a specific kind of data frame. Colored "columns" depict, in white, the true value of the outcome variable in the validation set, followed by four columns (in salmon) representing the predictions from the K-nearest neighbors model, one column (in tan) representing the linear regression model, and six (in green) representing the support vector machine model.](man/figures/data_stack.png)
 
 Then, the data stack can be evaluated using `blend_predictions()` to determine to how best to combine the outputs from each of the candidate members.  In the stacking literature, this process is commonly called _metalearning_.
 
 The outputs of each member are likely highly correlated. Thus, depending on the degree of regularization you choose, the coefficients for the inputs of (possibly) many of the members will zero out—their predictions will have no influence on the final output, and those terms will thus be thrown out.  
 
-![](man/figures/coefs.png)
+![A diagram representing "stacking coefficients," the coefficients of the linear model combining each of the candidate member predictions to generate the ensemble's ultimate prediction. Boxes for each of the candidate members are placed besides each other, filled in with color if the coefficient for the associated candidate member is nonzero.](man/figures/coefs.png)
 
 These stacking coefficients determine which candidate ensemble members will become ensemble members. Candidates with non-zero stacking coefficients are then fitted on the whole training set, altogether making up a `model_stack` object. 
 
-![](man/figures/class_model_stack.png)
+![A diagram representing the "model stack" class, which collates the stacking coefficients and members (candidate members with nonzero stacking coefficients that are trained on the full training set). The representation of the stacking coefficients is as before, where the members (shown next to their associated stacking coefficients) are colored-in pentagons. Model stacks are a list subclass.](man/figures/class_model_stack.png)
 
 This model stack object, outputted from `fit_members()`, is ready to predict on new data! The trained ensemble members are often referred to as _base models_ in the stacking literature.
 

diff --git a/README.md b/README.md
@@ -48,35 +48,33 @@ remotes::install_github("tidymodels/stacks", ref = "main")
 
 stacks is generalized with respect to:
 
-  - Model type: Any model type implemented in
+-   Model type: Any model type implemented in
     [parsnip](https://parsnip.tidymodels.org/) or adjacent packages is
     fair game to add to a stacks model stack.
     [Here](https://www.tidymodels.org/find/parsnip/)’s a table of many
     of the implemented model types in the tidymodels core, with a link
     there to an article about implementing your own model classes as
     well.
-  - Cross-validation scheme: Any resampling algorithm implemented in
+-   Cross-validation scheme: Any resampling algorithm implemented in
     [rsample](https://rsample.tidymodels.org/) or adjacent packages is
     fair game for resampling data for use in training a model stack.
-  - Error metric: Any metric function implemented in
+-   Error metric: Any metric function implemented in
     [yardstick](https://yardstick.tidymodels.org/) or adjacent packages
     is fair game for evaluating model stacks and their members. That
     package provides some infrastructure for creating your own metric
-    functions as well\!
+    functions as well!
 
 stacks uses a regularized linear model to combine predictions from
 ensemble members, though this model type is only one of many possible
 learning algorithms that could be used to fit a stacked ensemble model.
 For implementations of additional ensemble learning algorithms, check
 out
 [h2o](http://docs.h2o.ai/h2o/latest-stable/h2o-r/docs/reference/h2o.stackedEnsemble.html)
-and
-[SuperLearner](https://CRAN.R-project.org/package=SuperLearner).
+and [SuperLearner](https://CRAN.R-project.org/package=SuperLearner).
 
 Rather than diving right into the implementation, we’ll focus here on
 how the pieces fit together, conceptually, in building an ensemble with
-`stacks`. See the `basics` vignette for an example of the API in
-action\!
+`stacks`. See the `basics` vignette for an example of the API in action!
 
 ## a grammar
 
@@ -89,15 +87,24 @@ specification* (as defined in the
 [recipes](https://recipes.tidymodels.org/) package). Model definitions
 specify the form of candidate ensemble members.
 
-![](man/figures/model_defs.png)
+![A diagram representing “model definitions,” which specify the form of
+candidate ensemble members. Three colored boxes represent three
+different model types; a K-nearest neighbors model (in salmon), a linear
+regression model (in yellow), and a support vector machine model (in
+green).](man/figures/model_defs.png)
 
 To be used in the same ensemble, each of these model definitions must
 share the same *resample*. This
 [rsample](https://rsample.tidymodels.org/) `rset` object, when paired
 with the model definitions, can be used to generate the tuning/fitting
 results objects for the candidate *ensemble members* with tune.
 
-![](man/figures/candidates.png)
+![A diagram representing “candidate members” generated from each model
+definition. Four salmon-colored boxes labeled “KNN” represent K-nearest
+neighbors models trained on the resamples with differing
+hyperparameters. Similarly, the linear regression model generates one
+candidate member, and the support vector machine model generates
+six.](man/figures/candidates.png)
 
 Candidate members first come together in a `data_stack` object through
 the `add_candidates()` function. Principally, these objects are just
@@ -110,7 +117,13 @@ Classification requires as many columns per candidate as there are
 levels in the outcome variable.) They also bring along a few extra
 attributes to keep track of model definitions.
 
-![](man/figures/data_stack.png)
+![A diagram representing a “data stack,” a specific kind of data frame.
+Colored “columns” depict, in white, the true value of the outcome
+variable in the validation set, followed by four columns (in salmon)
+representing the predictions from the K-nearest neighbors model, one
+column (in tan) representing the linear regression model, and six (in
+green) representing the support vector machine
+model.](man/figures/data_stack.png)
 
 Then, the data stack can be evaluated using `blend_predictions()` to
 determine to how best to combine the outputs from each of the candidate
@@ -123,43 +136,54 @@ inputs of (possibly) many of the members will zero out—their predictions
 will have no influence on the final output, and those terms will thus be
 thrown out.
 
-![](man/figures/coefs.png)
+![A diagram representing “stacking coefficients,” the coefficients of
+the linear model combining each of the candidate member predictions to
+generate the ensemble’s ultimate prediction. Boxes for each of the
+candidate members are placed besides each other, filled in with color if
+the coefficient for the associated candidate member is
+nonzero.](man/figures/coefs.png)
 
 These stacking coefficients determine which candidate ensemble members
 will become ensemble members. Candidates with non-zero stacking
 coefficients are then fitted on the whole training set, altogether
 making up a `model_stack` object.
 
-![](man/figures/class_model_stack.png)
+![A diagram representing the “model stack” class, which collates the
+stacking coefficients and members (candidate members with nonzero
+stacking coefficients that are trained on the full training set). The
+representation of the stacking coefficients is as before, where the
+members (shown next to their associated stacking coefficients) are
+colored-in pentagons. Model stacks are a list
+subclass.](man/figures/class_model_stack.png)
 
 This model stack object, outputted from `fit_members()`, is ready to
-predict on new data\! The trained ensemble members are often referred to
+predict on new data! The trained ensemble members are often referred to
 as *base models* in the stacking literature.
 
 The full visual outline for these steps can be found
 [here](https://github.com/tidymodels/stacks/blob/main/inst/figs/outline.png).
 The API for the package closely mirrors these ideas. See the `basics`
-vignette for an example of how this grammar is implemented\!
+vignette for an example of how this grammar is implemented!
 
 ## contributing
 
 This project is released with a [Contributor Code of
 Conduct](https://github.com/tidymodels/stacks/blob/main/CODE_OF_CONDUCT.md).
 By contributing to this project, you agree to abide by its terms.
 
-  - For questions and discussions about tidymodels packages, modeling,
+-   For questions and discussions about tidymodels packages, modeling,
     and machine learning, please [post on RStudio
     Community](https://community.rstudio.com/new-topic?category_id=15&tags=tidymodels,question).
 
-  - If you think you have encountered a bug, please [submit an
+-   If you think you have encountered a bug, please [submit an
     issue](https://github.com/tidymodels/stacks/issues).
 
-  - Either way, learn how to create and share a
+-   Either way, learn how to create and share a
     [reprex](https://reprex.tidyverse.org/articles/articles/learn-reprex.html)
     (a minimal, reproducible example), to clearly communicate about your
     code.
 
-  - Check out further details on [contributing guidelines for tidymodels
+-   Check out further details on [contributing guidelines for tidymodels
     packages](https://www.tidymodels.org/contribute/) and [how to get
     help](https://www.tidymodels.org/help/).
 

diff --git a/vignettes/basics.Rmd b/vignettes/basics.Rmd
@@ -75,7 +75,7 @@ Let's give this a go!
 
 At the highest level, ensembles are formed from _model definitions_. In this package, model definitions are an instance of a minimal [`workflow`](https://workflows.tidymodels.org/), containing a _model specification_ (as defined in the [`parsnip`](https://parsnip.tidymodels.org/) package) and, optionally, a _preprocessor_ (as defined in the [`recipes`](https://recipes.tidymodels.org/) package). Model definitions specify the form of candidate ensemble members. 
 
-```{r, echo = FALSE}
+```{r, echo = FALSE, fig.alt = "A diagram representing 'model definitions,' which specify the form of candidate ensemble members. Three colored boxes represent three different model types; a K-nearest neighbors model (in salmon), a linear regression model (in yellow), and a support vector machine model (in green)."}
 knitr::include_graphics("https://raw.githubusercontent.com/tidymodels/stacks/main/man/figures/model_defs.png")
 ```
 
@@ -252,7 +252,7 @@ svm_res
 
 Altogether, we've created three model definitions, where the K-nearest neighbors model definition specifies 4 model configurations, the linear regression specifies 1, and the support vector machine specifies 6.
 
-```{r, echo = FALSE}
+```{r, echo = FALSE, fig.alt = "A diagram representing 'candidate members' generated from each model definition. Four salmon-colored boxes labeled 'KNN' represent K-nearest neighbors models trained on the resamples with differing hyperparameters. Similarly, the linear regression (LM) model generates one candidate member, and the support vector machine (SVM) model generates six."}
 knitr::include_graphics("https://raw.githubusercontent.com/tidymodels/stacks/main/man/figures/candidates.png")
 ```
 
@@ -262,7 +262,7 @@ With these three model definitions fully specified, we are ready to begin stacki
 
 The first step to building an ensemble with stacks is to create a `data_stack` object—in this package, data stacks are tibbles (with some extra attributes) that contain the assessment set predictions for each candidate ensemble member.
 
-```{r, echo = FALSE}
+```{r, echo = FALSE, fig.alt = "A diagram representing a 'data stack,' a specific kind of data frame. Colored 'columns' depict, in white, the true value of the outcome variable in the validation set, followed by four columns (in salmon) representing the predictions from the K-nearest neighbors model, one column (in tan) representing the linear regression model, and six (in green) representing the support vector machine model."}
 knitr::include_graphics("https://raw.githubusercontent.com/tidymodels/stacks/main/man/figures/data_stack.png")
 ```
 
@@ -308,7 +308,7 @@ tree_frogs_model_st <-
 
 The `blend_predictions` function determines how member model output will ultimately be combined in the final prediction by fitting a LASSO model on the data stack, predicting the true assessment set outcome using the predictions from each of the candidate members. Candidates with nonzero stacking coefficients become members. 
 
-```{r, echo = FALSE}
+```{r, echo = FALSE, fig.alt = "A diagram representing 'stacking coefficients,' the coefficients of the linear model combining each of the candidate member predictions to generate the ensemble's ultimate prediction. Boxes for each of the candidate members are placed besides each other, filled in with color if the coefficient for the associated candidate member is nonzero."}
 knitr::include_graphics("https://raw.githubusercontent.com/tidymodels/stacks/main/man/figures/coefs.png")
 ```
 
@@ -339,13 +339,13 @@ tree_frogs_model_st <-
   fit_members()
 ```
 
-```{r, echo = FALSE}
+```{r, echo = FALSE, fig.alt = "A diagram representing the ensemble members, where each are pentagons labeled and colored-in according to the candidate members they arose from."}
 knitr::include_graphics("https://raw.githubusercontent.com/tidymodels/stacks/main/man/figures/members.png")
 ```
 
 Model stacks can be thought of as a group of fitted member models and a set of instructions on how to combine their predictions.
 
-```{r, echo = FALSE}
+```{r, echo = FALSE, fig.alt = "A diagram representing the 'model stack' class, which collates the stacking coefficients and members (candidate members with nonzero stacking coefficients that are trained on the full training set). The representation of the stacking coefficients and members is as before. Model stacks are a list subclass."}
 knitr::include_graphics("https://raw.githubusercontent.com/tidymodels/stacks/main/man/figures/class_model_stack.png")
 ```