Skip to content

Commit

Permalink
Improve documentation about on-the-fly expressions, fixes #126
Browse files Browse the repository at this point in the history
  • Loading branch information
gergness committed Sep 28, 2021
1 parent 27ef9a4 commit 94d7294
Show file tree
Hide file tree
Showing 3 changed files with 76 additions and 9 deletions.
20 changes: 19 additions & 1 deletion R/summarise.r
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,7 @@ summarise_.grouped_svy <- function(.data, ..., .dots) {
#'
#'
#' @param .data tbl A \code{tbl_svy} object
#' @param ... Name-value pairs of summary functions
#' @param ... Name-value pairs of summarizing expressions, see details
#' @param .groups Defaults to "drop_last" in srvyr meaning that the last group is peeled
#' off, but if there are more groups they will be preserved. Other options are "drop", which
#' drops all groups, "keep" which keeps all of them and "rowwise" which converts the object
Expand Down Expand Up @@ -130,6 +130,13 @@ summarise_.grouped_svy <- function(.data, ..., .dots) {
#' Calculate an unweighted estimate as you would on a regular \code{tbl_df}.
#' Based on dplyr's \code{\link[dplyr]{summarise}}.}
#'}
#'
#' You can use expressions both in the \code{...} of \code{summarize} and also
#' in the arguments to the summarizing functions. Though this is valid syntactically
#' it can also allow you to calculate incorrect results (for example if you multiply
#' the mean by 100, the standard error is also multipled by 100, but the variance
#' is not).
#'
#' @examples
#' data(api, package = "survey")
#'
Expand All @@ -149,6 +156,17 @@ summarise_.grouped_svy <- function(.data, ..., .dots) {
#' api00_mn = survey_mean(api00),
#' api_diff = survey_mean(api00 - api99))
#'
#' # Expressions are allowed in summarize arguments & inside functions
#' # Here we can calculate binary variable on the fly and also multiply by 100 to
#' # get percentages
#' dstrata %>%
#' summarize(api99_over_700_pct = 100 * survey_mean(api99 > 700)
#'
#' # But be careful, the variance doesn't scale the same way, so this is wrong!
#' dstrata %>%
#' summarize(api99_over_700_pct = 100 * survey_mean(api99 > 700, vartype = "var")
#' # Wrong variance!
#'
#' @name summarise
#' @export
#' @importFrom dplyr summarise
Expand Down
19 changes: 18 additions & 1 deletion man/summarise.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

46 changes: 39 additions & 7 deletions vignettes/srvyr-vs-survey.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -209,12 +209,6 @@ ggplot(data = out, aes(x = stype, y = api_diff, group = hs_grad_pct, fill = hs_g
```


# CAUTION! Difference in default degrees of freedom
Though it should be possible to get the exact same results using `srvyr` as you would
if you used the `survey` package, s


# Grab Bag

## Using `survey` functions on `srvyr` objects
Expand All @@ -229,11 +223,49 @@ summary(glm)
```


## Using expressions to create variables on the fly
Like `dplyr`, `srvyr` allows you to use expressions in the arguments,
allowing you to create variables in a single step. For example, you can
use expressions:

1) as the arguments inside the survey statistic functions like `survey_mean`
```{r, message = FALSE}
strat_design %>%
summarize(prop_api99_over_700 = survey_mean(api99 > 700))
```

2) as an argument to `summarize`
```{r, message = FALSE}
strat_design %>%
group_by(awards) %>%
summarize(percentage = 100 * survey_mean())
```

3) and you can even create varables inside of `group_by`
```{r, message = FALSE}
strat_design %>%
group_by(api99_above_700 = api99 > 700) %>%
summarize(api00_mn = survey_mean(api00))
```

Though on-the-fly expressions are syntactically valid, it is possible to make statistically
invalid numbers from them. For example, though the standard error and confidence intervals
can be multiplied by a scalar (like 100), the variance does not scale the same way, so the
following is invalid:

```{r, message = FALSE}
# BAD DON'T DO THIS!
strat_design %>%
group_by(awards) %>%
summarize(percentage = 100 * survey_mean(vartype = "var"))
# VARIANCE IS WRONG
```

## Non-Standard evaluation

Srvyr supports the non-standard evaluation conventions that dplyr uses.
If you'd like to use a function programmatically, you can use the functions from
rlang like the `{{` operator (aka "curly curly") from `rlang`.
rlang like the `{{` operator (aka "curly curly") from `rlang`.

Here's a quick example, but please see the dplyr vignette
[`vignette("programming", package = "dplyr")`](https://dplyr.tidyverse.org/articles/programming.html)
Expand Down

0 comments on commit 94d7294

Please sign in to comment.