Version: 0.0.0.9009
gnumaker makes it easy to create and use GNU Makefiles to aid a reproducible work flow for data analysis projects.
GNU Make is the defacto standard for efficiently rerunning appropriate steps in the data analysis or reporting process if a particular file is changed. Only the necessary steps are rerun.
Rather than creating a new system for setting up and building output
from statistical software syntax files, gnumaker leverages off
existing GNU Make rules. These rules, for R
, Sweave
, R Markdown
,
Stata
, SAS
, PSPP
, Python
, Perl
and other syntax files are
available at r-makefile-definitions on
Github. These are
described in P Baker (2020) Using GNU Make
to Manage the Workflow of
Data Analysis Projects, Journal of Statistical Software (Accepted).
For those not familiar with GNU Make
, gnumaker allows simple
dependencies between files to be specified to produce a working
Makefile
and the associated directed acyclic graph (DAG). I’d welcome
Github
issues containing error reports or feature requests.
Alternatively, you can email the package maintainer at drpetebaker at gmail dot com
.
You can install the development version of gnumaker from GitHub
with:
## if you don't have devtools installed, automatically install it from CRAN
if (!requireNamespace("devtools", quietly = TRUE))
install.packages("devtools")
devtools::install_github("petebaker/gnumaker")
There are currently four key functions in gnumaker. These are:
create_makefile()
creates agnu_makefile
object given the specified dependencies between syntax, data and output files,write_makefile()
writes agnu_makefile
object to aMakfile
on disk,info_rules()
provides information about data analysisGNU Make
rules for various target and dependency filename extensions, andplot()
plots the DAG of agnu_makefile
object.
Suppose we have a data file simple.csv
and use read.R
to read and
clean the data. After storing the cleaned data in a .RData file, we then
employ linmod.R
to plot and analyse the data. Next, using the stored
results, two reports report1.pdf
and report2.docx
are produced from
report1.Rmd
and report2.Rmd
. The workflow may be encapsulated in a
Makefile
which is then employed to manage the process and generate or
regenerate any intermediate files when the data or syntax changes.
Rather than writing GNU Make
commands, we can employ the gnumaker
package to create an appropriate Makefile
called Makefile.demo
with:
library(gnumaker)
gm1 <-
create_makefile(targets = list(read = c("read.R", "simple.csv"),
linmod = c("linmod.R", "read"),
rep1 = c("report1.Rmd", "linmod"),
rep2 = c("report2.Rmd", "linmod")),
target.all = c("rep1", "rep2"),
all.exts = list(rep1 = "pdf", rep2 = "docx"))
write_makefile(gm1, file = "Makefile.demo")
#> File: Makefile.demo written at Mon Jul 20 00:44:21 2020
To plot the corresponding DAG, which shows the relationships between
target files (wheat coloured circles) which we can (re)generate from a
minimal set of syntax and data files (green rectangles) using GNU Make
, use:
plot(gm1)
Using the gnumaker package we simply need to provide a list of
targets to the the create_makefile
function where the components
specify a target as a name and dependency file(s) as a character vector.
The package uses the GNU Make
pattern rules in r-rules.mk
to choose
file names for targets but we can override the defaults.
For instance, in this example the first two dependency files are
simple.csv
and read.R
so we provide the first target as the first
component of the list as read = c("read.R", "simple.csv")
, where the
name read
can be anything we like. Note that the target filename for
read
will be derived from the first filename in the list read.R
. The
second target depends on the read
target and linmod.R
and so we
specify this with linmod = c("linmod.R", "read")
and so on.
Using the function create_makefile
to create a gnu_makefile
object,
target file names are substituted using defaults and the appropriate
Make
commands are rearranged using the DAG of the relationships. For
instance, the default target file for the first dependency in the read
component, which is read.R
, becomes read.Rout
but we can change the
default target file extension for all .R
files using the
default.exts
argument and specify say a HTML target file with
default.exts = list(R = "html")
.
Finally we specify the first target (usually all
) as two reports
report1.pdf
and report2.docx
using target.all = c("rep1","rep2")
which by default would be report1.html
and report2.html
but which we
specify as report1.pdf
and report2.docx
by specifying the option
all.exts = list(rep1 = "pdf", rep2 = "docx")
.
Once we have constructed a suitable gnu_makefile
object then we write
it to disk with write_makefile
. To run all R
script files and
analyses in order we simply type make
in a terminal or set up
RStudio
or our IDE to use GNU Make as the build mechanism which allows
us to (re)run analyses by pressing the appropriate Build button.
The Makefile
, with optional comment for linmod
target, is specified
using:
library(gnumaker)
gm1 <-
create_makefile(targets = list(read = c("read.R", "simple.csv"),
linmod = c("linmod.R", "read"),
rep1 = c("report1.Rmd", "linmod"),
rep2 = c("report2.Rmd", "linmod")),
target.all = c("rep1", "rep2"),
all.exts = list(rep1 = "pdf", rep2 = "docx"),
comments = list(linmod = "plots and analysis using 'linmod.R'"))
A Makefile Makefile.demo
is produced with write_makefile(gm1)
write_makefile(gm1, file = "Makefile.demo")
#> File: Makefile.demo written at Mon Jul 20 00:44:21 2020
# File: Makefile.demo
# Created at: Mon Jul 20 00:44:21 2020
# Produced by gnumaker: 0.0.0.9009 on R version 4.0.2 (2020-06-22)
# Before running make, please check file and edit if necessary
# .PHONY all target: run when 'make' is invoked without a target
.PHONY: all
all: report1.pdf report2.docx
# report1.pdf depends on report1.Rmd, linmod.Rout
report1.pdf: report1.Rmd linmod.Rout
# report2.docx depends on report2.Rmd, linmod.Rout
report2.docx: report2.Rmd linmod.Rout
# plots and analysis using 'linmod.R'
linmod.Rout: linmod.R read.Rout
# read.Rout depends on read.R, simple.csv
read.Rout: read.R simple.csv
# include GNU Makfile rules. Most recent version available at
# https://github.com/petebaker/r-makefile-definitions
include ~/lib/r-rules.mk
# remove all target, output and extraneous files
.PHONY: cleanall
cleanall:
rm -f *~ *.Rout *.RData *.docx *.pdf *.html *-syntax.R *.RData
We can use the function info_rules
to determine the possible target
files for dependency files. For instance, what target files have
Makefile
rules for an .R
R
syntax file?
info_rules("R")
#> Possible filename extensions for 'R':
#> [1] "docx" "html" "odt" "pdf" "Rout" "rtf"
#>
#> Default: 'Rout'
#>
#> Example rules:
#> example1.Rout: example1.R dep_file2 dep_file3
#> or
#> example1.Rout: {@:.Rout=.R} dep_file2 dep_file3
#>
#> NB: For further help on Makefile rules, type 'make help' in a terminal once
#> an appropriate 'Makefile' is present in the current directory
For .Rmd
R Markdown files, use
info_rules("Rmd")
#> Possible filename extensions for 'Rmd':
#> [1] "_beamer-handout.Rmd" "_beamer.pdf" "_ioslides.html"
#> [4] "_slidy.html" "_tufte.pdf" "-syntax.R"
#> [7] "docx" "html" "odt"
#> [10] "pdf" "pptx" "rtf"
#>
#> Default: 'html'
#>
#> Example rules:
#> example1.html: example1.Rmd dep_file2 dep_file3
#> or
#> example1.html: {@:.html=.Rmd} dep_file2 dep_file3
#>
#> Other options are available for R Markdown files, such as:
#>
#> example1_ioslides.html: example1.Rmd dep_file2 dep_file3
#> example1_beamer.pdf: example1.Rmd dep_file2 dep_file3
#>
#> to produce ioslide and beamer presentation formats.
#>
#> An R syntax file can be produced with
#> make example1-syntax.R
#> and a similar rule can be specified if required with
#> example1-syntax.R: example1.Rmd dep_file2 dep_file3
#>
#> NB: For further help on Makefile rules, type 'make help' in a terminal once
#> an appropriate 'Makefile' is present in the current directory
For more examples, see the gnumaker vignette (under construction).
The gnumaker R
package is under construction and could change (and
improve) rapidly at various times but this depends on work/life balance.