-
Notifications
You must be signed in to change notification settings - Fork 1
/
README.Rmd
205 lines (162 loc) · 8.02 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
---
output: github_document
editor_options:
chunk_output_type: console
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, echo = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "images/"
)
version <- as.vector(read.dcf('DESCRIPTION')[, 'Version'])
version <- gsub('-', '.', version)
##version <- "0.0.900X"
```
# gnumaker
Version: `r version`
## Overview
**gnumaker** makes it easy to create and use GNU Makefiles to aid a reproducible work flow for data analysis projects.
GNU Make is the defacto standard for efficiently rerunning appropriate steps in the data analysis or reporting process if a particular file is changed. Only the necessary steps are rerun.
Rather than creating a new system for setting up and building output
from statistical software syntax files, **gnumaker** leverages off
existing GNU Make rules. These rules, for `R`, `Sweave`, `R Markdown`,
`Stata`, `SAS`, `PSPP`, `Python`, `Perl` and other syntax files are
available at [r-makefile-definitions on
Github](https://github.com/petebaker/r-makefile-definitions). These
are described in P Baker (2020) Using `GNU Make` to Manage the Workflow
of Data Analysis Projects, _Journal of Statistical Software
(Accepted)_.
For those not familiar with `GNU Make`, **gnumaker** allows simple
dependencies between files to be specified to produce a working
`Makefile` and the associated directed acyclic graph (DAG). I'd
welcome `Github` issues containing error reports or feature requests.
Alternatively, you can email the package maintainer at `drpetebaker at
gmail dot com`.
## Installation
<!--
Install the latest CRAN version of **gnumaker** with:
```{r cran-installation, eval = FALSE}
##install.packages("gnumaker")
```
## Note that three dependencies are in BioConductor so use BiocManager
NB: parked here in case biocViews: line in DESCRIPTION does not work
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
devtools::install_github("petebaker/gnumaker", repos = BiocManager::repositories())
-->
You can install the development version of **gnumaker** from `GitHub` with:
```{r gh-installation, eval = FALSE}
## if you don't have devtools installed, automatically install it from CRAN
if (!requireNamespace("devtools", quietly = TRUE))
install.packages("devtools")
devtools::install_github("petebaker/gnumaker")
```
## Usage
There are currently four key functions in **gnumaker**. These are:
* `create_makefile()` creates a `gnu_makefile` object given the
specified dependencies between syntax, data and output files,
* `write_makefile()` writes a `gnu_makefile` object to a `Makfile` on disk,
* `info_rules()` provides information about data analysis `GNU Make` rules for various target and dependency filename extensions, and
* `plot()` plots the DAG of a `gnu_makefile` object.
## Example
Suppose we have a data file `simple.csv` and use `read.R` to read and
clean the data. After storing the cleaned data in a .RData file, we
then employ `linmod.R` to plot and analyse the data. Next, using the
stored results, two reports `report1.pdf` and `report2.docx` are
produced from `report1.Rmd` and `report2.Rmd`. The workflow may be
encapsulated in a `Makefile` which is then employed to manage the
process and generate or regenerate any intermediate files when the
data or syntax changes.
Rather than writing `GNU Make` commands, we can employ the
**gnumaker** package to create an appropriate `Makefile` called
`Makefile.demo` with:
```{r, simple-demo0}
library(gnumaker)
gm1 <-
create_makefile(targets = list(read = c("read.R", "simple.csv"),
linmod = c("linmod.R", "read"),
rep1 = c("report1.Rmd", "linmod"),
rep2 = c("report2.Rmd", "linmod")),
target.all = c("rep1", "rep2"),
all.exts = list(rep1 = "pdf", rep2 = "docx"))
write_makefile(gm1, file = "Makefile.demo")
```
To plot the corresponding DAG, which shows the relationships between
target files (wheat coloured circles) which we can (re)generate from a
minimal set of syntax and data files (green rectangles) using `GNU Make`, use:
```{r, simple-dag, fig.cap = "DAG of `Makefile` for simple example. The DAG of the `gnu_makefile` object can be produced with `plot(gm1)`. Using the minimal set of files (shown in green rectangles), then `GNU Make` allows us to (re)generate all other files shown as wheat coloured circles)"}
plot(gm1)
```
### Details
Using the **gnumaker** package we simply need to provide a list of
targets to the the `create_makefile` function where the components
specify a target as a name and dependency file(s) as a character
vector. The package uses the `GNU Make` pattern rules in `r-rules.mk`
to choose file names for targets but we can override the defaults.
For instance, in this example the first two dependency files are
`simple.csv` and `read.R` so we provide the first target as the first
component of the list as `read = c("read.R", "simple.csv")`, where the
name `read` can be anything we like. Note that the target filename for
`read` will be derived from the first filename in the list `read.R`.
The second target depends on the `read` target and `linmod.R` and so
we specify this with `linmod = c("linmod.R", "read")` and so on.
Using the function `create_makefile` to create a `gnu_makefile`
object, target file names are substituted using defaults and the
appropriate `Make` commands are rearranged using the DAG of the
relationships. For instance, the default target file for the first
dependency in the `read` component, which is `read.R`, becomes
`read.Rout` but we can change the default target file extension for
all `.R` files using the `default.exts` argument and specify say a
HTML target file with `default.exts = list(R = "html")`.
Finally we specify the first target (usually `all`) as two reports
`report1.pdf` and `report2.docx` using `target.all = c("rep1","rep2")`
which by default would be `report1.html` and `report2.html` but which
we specify as `report1.pdf` and `report2.docx` by specifying the
option `all.exts = list(rep1 = "pdf", rep2 = "docx")`.
Once we have constructed a suitable `gnu_makefile` object then we
write it to disk with `write_makefile`. To run all `R` script files
and analyses in order we simply type `make` in a terminal or set up
`RStudio` or our IDE to use GNU Make as the build mechanism which
allows us to (re)run analyses by pressing the appropriate Build
button.
The `Makefile`, with optional comment for `linmod` target, is
specified using:
```{r, simple-demo1}
library(gnumaker)
gm1 <-
create_makefile(targets = list(read = c("read.R", "simple.csv"),
linmod = c("linmod.R", "read"),
rep1 = c("report1.Rmd", "linmod"),
rep2 = c("report2.Rmd", "linmod")),
target.all = c("rep1", "rep2"),
all.exts = list(rep1 = "pdf", rep2 = "docx"),
comments = list(linmod = "plots and analysis using 'linmod.R'"))
```
A Makefile `Makefile.demo` is produced with `write_makefile(gm1)`
```{r, simple-makefile1}
write_makefile(gm1, file = "Makefile.demo")
```
```{bash, simple-makefile-gm1, comment="", echo = FALSE}
cat Makefile.demo
```
<!-- The DAG of the `gnu_makefile` object can be produced with `plot(gm1)`. -->
<!-- ```{r, simple-dag2, fig.cap = "DAG of Makefile for simple example. The DAG of the `gnu_makefile` object can be produced with `plot(gm1)`. Using the minimal set of files (shown in green rectangles), then GNU Make allows us to (re)generate all other files shown as wheat coloured circles)"} -->
<!-- plot(gm1) -->
<!-- ``` -->
We can use the function `info_rules` to determine the possible target
files for dependency files. For instance, what target files have
`Makefile` rules for an `.R` `R` syntax file?
```{r, inforulesr}
info_rules("R")
```
For `.Rmd` R Markdown files, use
```{r, inforulesrmd}
info_rules("Rmd")
```
For more examples, see the **gnumaker** vignette (under construction).
## Note
The **gnumaker** `R` package is under construction and could change
(and improve) rapidly at various times but this depends on work/life
balance.