forked from hadley/adv-r
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Documenting-functions.rmd
441 lines (310 loc) · 22.1 KB
/
Documenting-functions.rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
---
title: Documenting functions
layout: default
---
# Documenting functions
Documentation is one of the most important aspects of good code. Without it, users won't know how to use your package, and are unlikely to do so. Documentation is also useful for you in the future (so you remember what the heck you were thinking!), and for other developers working on your package.
The standard way to write R documentation is to create `.Rd` files in the `man/` directory. These files describe each object (function, data set, class, generic or method) in your package, documenting how it works and giving the user examples of how to use it. Rather than writing Rd files by hand, we're going to use the [roxygen2](http://roxygen.org/) package. With roxygen, you write the documentation in comments next to each function, and then run an R script to create the main files.
This workflow has a number of advantages over writing `.Rd` files by hand:
* Code and documentation are adjacent so when you modify your code, it's easy
to remember that you need to update the documentation.
* Roxygen2 dynamically inspects the objects that are documenting, so it can
figure out a lot of information `.Rd` files need by itself. For example, it
can automatically add links to super and subclasses.
* It takes care of other files that are fiddly or downright painful to
maintain by hand: the [namespace](`NAMESPACE`), collate order in
description, and the demos index.
* Abstracts over the differences in documenting S3 and S4 methods, generics
and classes so that they behave basically the same.
**Note**: this chapter currently describes the use of the in-development [roxygen3](http://github.com/hadley/roxygen3) package. This will be merged back into roxygen2 in the near future.
This chapter will proceed as follows. First, we'll discuss the basic documentation process, what each step does, and how to see the output at every stage. Next we'll show you, at a high-level, how to document all the different types of R objects (functions, datasets, s3 methods, s4 methods, s4 classes, r5 classes, and packages). After that you'll learn how to format text within the documentation.
## Help
You've probably used help a lot, but you might not be aware of the more advanced features:
* `package?lubridate`
* `class?myclass`
* `methods?`
* `method?`
* `method?combo("numeric", "numeric")`
* `?combo(1:10, letters)`
Roxygen automatically takes care of generating the special aliases needed to make these lookups work.
How does help work? It finds the matching Rd file, and compiles it to either text or HTML output. It's complicated by the fact that binary packages don't include individual Rd files, they actually included a pre-parsed database of Rd files.
## Roxygen process
There's a three step process to go from the R comments in the source files to Rd files to human readable documentation. The process starts with special comments starting with `#'` that indicate a comment is a roxygen comment:
#' Order a data frame by its columns.
#'
#' This function completes the subsetting, transforming and ordering triad
#' with a function that works in a similar way to \code{\link{subset}} and
#' \code{\link{transform}} but for reordering a data frame by its columns.
#' This saves a lot of typing!
#'
#' @param df data frame to reorder
#' @param ... expressions evaluated in the context of \code{df} and
#' then fed to \code{\link{order}}
#' @keywords manip
#' @export
#' @examples
#' mtcars[with(mtcars, order(cyl, disp)), ]
#' arrange(mtcars, cyl, disp)
#' arrange(mtcars, cyl, desc(disp))
arrange <- function(df, ...) {
ord <- eval(substitute(order(...)), df, parent.frame())
unrowname(df[ord, ])
}
To convert roxygen comments to the official `.Rd` files, we'll use `devtools::document()`. Like the other devtools functions you've seen so far, it takes a package directory as its first argument, and if you omit it, it will use the current working directory.
This produces an `.Rd` file in the `man/` directory:
\name{arrange}
\alias{arrange}
\title{Order a data frame by its columns.}
\usage{arrange(df, ...)}
\description{
This function completes the subsetting, transforming and
ordering triad with a function that works in a similar
way to \code{\link{subset}} and \code{\link{transform}}
but for reordering a data frame by its columns. This
saves a lot of typing!
}
\keyword{manip}
\arguments{
\item{df}{data frame to reorder}
\item{...}{expressions evaluated in the context of
\code{df} and then fed to \code{\link{order}}}
}
\examples{mtcars[with(mtcars, order(cyl, disp)), ]
arrange(mtcars, cyl, disp)
arrange(mtcars, cyl, desc(disp))}
Rd files are a special file format loosely based on LaTeX. You can read more about the Rd format in the [R extensions](http://cran.r-project.org/doc/manuals/R-exts.html#Rd-format) manual. We'll avoid discussing Rd files as much as possible, focussing instead on what you need to know about roxygen.
When you request documentation, R then converts this file into the display you're probably used to seeing:
![HTML rendering of arrange documentation](arrange-html.png)
When you use `help` or `example` it looks for the Rd files in the _installed_ package. This isn't very useful for package development, because we want to retrieve the `.Rd` files from the _source_ package. `devtools` provides two functions to do this `dev_help` and `dev_example` - these behave similarly to `help` and `example`, but look in source packages you have loaded with `load_all`, rather than installed packages you've loaded with `library`.
## Common documentation tags
As you've seen roxygen comments have a special format. They start with `#'` and they include tags of the form `@tagname` that break the documentation up into pieces. The content of a tag extends from the tag name to the next tag, and they can span multiple lines.
If you want to put a literal `@` in the documentation, use `@@`.
Each documentation block starts with a text description. The first sentence/line becomes the title of the documentation. That's what you see when you look at `help(package = mypackage)` and is shown at the top of the documentation for each package. It should be succinct. The second paragraph is the description: this is the first thing shown in the documentation and should briefly describe what the function does. The third and subsequence paragraphs go in the details: this is a (often long) section that comes after the description of the parameters. If you want to override the default behaviour, you can use the `@title`, `@description` and `@details` tags.
Here's an example showing what the documentation for `sum` might look like if it was written using roxygen:
#' Sum of Vector Elements
#'
#' \code{sum} returns the sum of all the values present in its arguments.
#'
#' This is a generic function: methods can be defined for it directly
#' or via the \code{Summary} group generic. For this to work properly,
#' the arguments \code{...} should be unnamed, and dispatch is on the
#' first argument.'
sum <- function(..., na.rm = TRUE) {}
The following documentation is equivalent, but uses explicit tags. This is not necessary unless you want to have a multiple paragraph description or title, or want to omit the description (in which case roxygen will replace with the title).
#' @title Sum of Vector Elements
#'
#' @description
#' \code{sum} returns the sum of all the values present in its arguments.
#'
#' @details
#' This is a generic function: methods can be defined for it directly
#' or via the \code{Summary} group generic. For this to work properly,
#' the arguments \code{...} should be unnamed, and dispatch is on the
#' first argument.'
sum <- function(..., na.rm = TRUE) {}
If you use explicit tags, you can put them in any order and still get the same output. The following documentation block would produce exactly the same Rd file and final documentation as the previous two.
#' @details
#' This is a generic function: methods can be defined for it directly
#' or via the \code{Summary} group generic. For this to work properly,
#' the arguments \code{...} should be unnamed, and dispatch is on the
#' first argument.'
#'
#' @title Sum of Vector Elements
#'
#' @description
#' \code{sum} returns the sum of all the values present in its arguments.
#'
sum <- function(..., na.rm = TRUE) {}
You can also opt to document multiple functions in one `.Rd` file. If you want to do so, use the `@rdname` tag to manully specify the name of the rd file (if you omit this it's done automatically in a way that every object gets its own documentation). You should be careful about documenting too many functions in one file because it gets confusing (which parameter belongs to which function?), but it can be useful if the functions are tightly connected.
You can get documentation on a given tag with `?TagName`. For example, if you wanted to get help on the param tag, you'd do `?TagParam`. (Unfortunately you can't do `?@param` because of a technical limitation in Rd files.)
## Documenting functions (and methods)
When documenting a function, the first decision you need to make is whether you want to export it or not. Exporting is described more in [[namespaces]], but basically when you export a function you are saying it is ready for use and you are making a commitment to keep it around in a similar form for the near future. All exported functions need to be documented. You don't have to document internal functions, but it can be a good idea if they're particularly complicated. Few users will read the documentation for internal functions, but it will be useful for you so when you come back to them in the future you don't need to struggle to recall what the inputs and outputs are.
The following code shows the arrange function and its documentation from the
plyr package.
#' Order a data frame by its columns.
#'
#' This function completes the subsetting, transforming and ordering triad
#' with a function that works in a similar way to \code{\link{subset}} and
#' \code{\link{transform}} but for reordering a data frame by its columns.
#' This saves a lot of typing!
#'
#' @param df data frame to reorder
#' @param ... expressions evaluated in the context of \code{df} and
#' then fed to \code{\link{order}}
#' @keywords manip
#' @export
#' @examples
#' mtcars[with(mtcars, order(cyl, disp)), ]
#' arrange(mtcars, cyl, disp)
#' arrange(mtcars, cyl, desc(disp))
arrange <- function(df, ...) {
ord <- eval(substitute(order(...)), df, parent.frame())
unrowname(df[ord, ])
}
The tags used here are:
* `@param arg description` - a description for each function argument. This
can span multiple lines (or even paragraphs) if necessary.
* `@export` - a flag indicating that this function should be exported for use
by others. Described in more detail in [[namespaces]].
* `@examples` - examples of the function in use. In my opinion, this is the
most important part of the documentation - these are what most people will
look at first to figure out how to use the function. I always put this last.
Other tags that you might find useful are:
* `@return` - the type of object that the function returns
The following tags apply to all types of documentation
* `@author`, in the form `Firstname Lastname <email@@address.com>`. Use this if
some of the functions are written by people other than the author of the
package.
* `@seealso` - to provide pointers to other related topics
* `@references` - references to scientific literature on this topic
* `@aliases` - a list of additional topic names that will be mapped to this
documentation when the user looks them up from the command line.
* `@family` - a family name. All functions that have the same family tag will be linked
in the documentation.
## Documenting S3 generic functions and methods
S3 generics are documented in the same way as functions - there is typically no reason for a user to know (or care) that a function is a generic. (Although you may want to mention it in the docs so that other developers know that they can extend it with their own methods).
If you `@export` an S3 method, all methods defined in your package will also get exported appropriately (using the `S3method()` namespace directive): they will be available when you call the generic, but users can not call the methods directly. For example, the `wday` generic in `lubridate` is exported:
library(lubridate)
wday(Sys.Date(), label = TRUE)
wday(1, label = TRUE)
But you can't call any of the methods directly:
wday.default(Sys.Date(), label = TRUE)
wday.numeric(1, label = TRUE)
lubridate:::wday.default(Sys.Date(), label = TRUE)
lubridate:::wday.numeric(1, label = TRUE)
This is good practice because it hides the internal implementation details of your functions. Users should not rely on specific behaviour (for example, in this case we need to call `wday.default` for Date objects). This makes it easier for you to change how the functions work in the future - for example, you could change from S3 to S4 dispatch.
You only need to explicitly `@export` a method if it's for a generic defined in another package. This is one of the easiest things to forget, and it creates subtle bugs that `R CMD check` doesn't report and are hard to track down. Future versions of roxygen should hopefully make this easier.
Most of the time you don't need to document S3 methods - particularly if they are for simple generics like `print`. However, if your method is more complicated, you should document it. In base R, you can find documentation for more complex methods like `predict.lm`, `predict.glm`, `anova.glm` and so on.
## Documenting S4 generic and methods
The same rules apply when documenting S4 generics and methods:
* Document the generic like you would document any function. Document methods that are more complex or have significant differences in behaviours.
* You don't need to export methods if they're for a generic you defined (just export the generic). Export all methods for generics defined in other packages.
* Use `@genericMethods` in the generic documentation if you want an automated listing of all methods implemented for the generic.
## Documenting R5 methods
Currently there is no way to use roxygen to document R5 methods. Use the standard docstring format. Alternatively, use roxygen to document an object with `@name` named by convention and assign it to `NULL`. For example, if object `obj` has method `getName`, then document the method as follows, using the convention to replace the `$` with `_`:
#' @title Get the name of the object
#' ...
#' @name obj_getName
NULL
obj$methods(getName = function() return("name"))
And the help for `getName` can be accessed by `?obj_getName`
## Documenting classes
### Documenting a S3 class
Since S3 classes have no formal structure, you should document the constructor function.
### Documenting an S4 class
Typically you should document the `setClass` call, and if present, the constructor function. Unless you want to separate out internal documentation for the class and the public interface (for the constructor), I recommend using `@rdname` to document both in the same file. This will mean that `?classname` and `class?classname` go to the same place.
Export the class if you want other developers to be able to write subclasses of your class. Export the constructor if you want users to be able to use it.
### Documenting an R5 class
## Documenting datasets
The following documentation excerpt comes from the diamonds dataset in ggplot2. We can't document the object directly (because it lives in the data directory), so we document `NULL` and provide the `@name` of the object we really want to document.
#' Prices of 50,000 round cut diamonds.
#'
#' A dataset containing the prices and other attributes of almost 54,000
#' diamonds. The variables are as follows:
#'
#' \itemize{
#' \item price. price in US dollars (\$326--\$18,823)
#' \item carat. weight of the diamond (0.2--5.01)
#' ...
#' }
#'
#' @docType data
#' @keywords datasets
#' @format A data frame with 53940 rows and 10 variables
#' @name diamonds
NULL
There are a few new tags:
* `@docType data`, which indicates that this is documentation for a dataset.
* `@format`, which gives an overview of the structure of the dataset. If you omit this, roxygen will automatically add something based on the first line of `str` output
* `@source` where you got the data form, often a `\url{}`.
## Documenting packages
As well [[package level documentation|documenting-packages]] resources, every package should also have its own documentation page. I usually put this documentation in a file with the same name as the package.
This documentation topic should contain an overview documentation topic that describes the overall purpose of the package, and points to the most important functions. This topic should have `@docType package` and be aliased to `package-pkgname` and `pkgname` (unless there is already a function by that name) so that you can get an overview of the package by doing `?pkgname` or `package?pkgname`.
There are still relatively few packages that provide package documentation, but it's an extremely useful tool for users, because instead of just listing functions like `help(package = pkgname)` it organises them and shows the user where to get started.
The example below shows the basic structure, as taken from the documentation for the `lubridate` package:
#' Dates and times made easy with lubridate.
#'
#' Lubridate provides tools that make it easier to parse and
#' manipulate dates. These tools are grouped below by common
#' purpose. More information about each function can be found in
#' its help documentation.
#'
#' Parsing dates
#'
#' Lubridate's parsing functions read strings into R as POSIXct
#' date-time objects. Users should choose the function whose name
#' models the order in which the year ('y'), month ('m') and day
#' ('d') elements appear the string to be parsed:
#' \code{\link{dmy}}, \code{\link{myd}}, \code{\link{ymd}},
#' \code{\link{ydm}}, \code{\link{dym}}, \code{\link{mdy}},
#' \code{\link{ymd_hms}}).
#'
#' ...
#'
#' @references Garrett Grolemund, Hadley Wickham (2011). Dates and Times
#' Made Easy with lubridate. Journal of Statistical Software, 40(3),
#' 1-25. \url{http://www.jstatsoft.org/v40/i03/}.
#' @import plyr stringr
#' @docType package
#' @name lubridate
NULL
Important components are:
* The general overview of the package: what it does, and what are the
important pieces.
* Like for data sets, there isn't a object that we can document directly so
document `NULL` and use `@name` to say what we're actually documenting
* `@docType package` to indicate that it's documenting a package. This will automatically add the corect aliases so that `package?yourpackage` works.
* `@references` point to any published material about the package that users
might find help.
The package documentation should not contain a verbatim list of functions or copy of `DESCRIPTION`. This file is for human reading, so pick the most important elements of your package.
## Text formatting
Within roxygen text, you use the usual R documentation formatting rules, as summarised below. A fuller description is available in the [R extensions](http://cran.r-project.org/doc/manuals/R-exts.html#Sectioning) manual.
Sections are indicated by the @section tag, followed by the title (which should be in sentence case) and a colon. Subsections are indicated by a LaTeX-style command; the first argument is the subsection title and the second argument contains the subsection content. The following example illustrates how to add an arbitrary section:
@section Warning:
You must not call this function unless ...
\subsection{Exceptions}{
Apart from the following special cases...
}
The section ends when another tag is declared or when the roxygen2 block ends.
### Lists
* Ordered (numbered) lists:
\enumerate{
\item First item
\item Second item
}
* Unordered (bulleted) lists
\itemize{
\item First item
\item Second item
}
* Definition (named) lists
\describe{
\item{One}{First item}
\item{Two}{Second item}
}
### Tables
Tables are created with the tabular command, which has two arguments:
* Column alignment: a letter for each column. `l` = left alignment, `c` = centre, `r` = right.
* Table contents. Columns separated by `\tab`, rows separated by `\cr`.
The following function will turn an R data frame into the correct format. It ignores column and row names, but should get you started.
tabular <- function(df) {
stopifnot(is.data.frame(df))
align <- function(x) if (is.numeric(x)) "r" else "l"
col_align <- vapply(df, align, character(1))
contents <- do.call("paste", c(lapply(df, format),
list(sep = " \\tab ", collapse = "\\cr\n")))
mat <- matrix(unlist(lapply(df, format)), ncol = ncol(df))
paste("\\tabular{", paste(col_align, collapse = ""), "}{\n",
contents, "}\n", sep = "")
}
### Mathematics
`\eqn` for inline, `\deqn` for display. Standard LaTeX (no extensions).
### Character formatting
* `\emph{text}`: emphasised text, usually displayed as _italics_
* `\strong{text}`: strong text, usually displayed in __bold__
* `\code{text}`, `\pkg{package_name}`, `\file{file_name}`
* External links: `\email{email_address}`, `\url{url}`, `\href{url}{text}`
* `\link[package]{function}` - the first argument can be omitted if the link
is in the current package, or the base package. Will usually be wrapped inside `\code`: `\code{\link{fun}}`
## Dynamic help
Since R.XX, help has been...