diff --git a/docs/404.html b/docs/404.html index d64b2bf..de719b2 100644 --- a/docs/404.html +++ b/docs/404.html @@ -72,7 +72,7 @@
diff --git a/docs/CODE_OF_CONDUCT.html b/docs/CODE_OF_CONDUCT.html index 0a01385..df82af3 100644 --- a/docs/CODE_OF_CONDUCT.html +++ b/docs/CODE_OF_CONDUCT.html @@ -72,7 +72,7 @@ @@ -137,12 +137,29 @@As contributors and maintainers of this project, we pledge to respect all people who contribute through reporting issues, posting feature requests, updating documentation, submitting pull requests or patches, and other activities.
-We are committed to making participation in this project a harassment-free experience for everyone, regardless of level of experience, gender, gender identity and expression, sexual orientation, disability, personal appearance, body size, race, ethnicity, age, or religion.
-Examples of unacceptable behavior by participants include the use of sexual language or imagery, derogatory comments or personal attacks, trolling, public or private harassment, insults, or other unprofessional conduct.
-Project maintainers have the right and responsibility to remove, edit, or reject comments, commits, code, wiki edits, issues, and other contributions that are not aligned to this Code of Conduct. Project maintainers who do not follow the Code of Conduct may be removed from the project team.
-Instances of abusive, harassing, or otherwise unacceptable behavior may be reported by opening an issue or contacting one or more of the project maintainers.
-This Code of Conduct is adapted from the Contributor Covenant (http://contributor-covenant.org), version 1.0.0, available at http://contributor-covenant.org/version/1/0/0/
+As contributors and maintainers of this project, we pledge to respect +all people who contribute through reporting issues, posting feature +requests, updating documentation, submitting pull requests or patches, +and other activities.
+We are committed to making participation in this project a +harassment-free experience for everyone, regardless of level of +experience, gender, gender identity and expression, sexual orientation, +disability, personal appearance, body size, race, ethnicity, age, or +religion.
+Examples of unacceptable behavior by participants include the use of +sexual language or imagery, derogatory comments or personal attacks, +trolling, public or private harassment, insults, or other unprofessional +conduct.
+Project maintainers have the right and responsibility to remove, +edit, or reject comments, commits, code, wiki edits, issues, and other +contributions that are not aligned to this Code of Conduct. Project +maintainers who do not follow the Code of Conduct may be removed from +the project team.
+Instances of abusive, harassing, or otherwise unacceptable behavior +may be reported by opening an issue or contacting one or more of the +project maintainers.
+This Code of Conduct is adapted from the Contributor Covenant (http://contributor-covenant.org), version 1.0.0, +available at http://contributor-covenant.org/version/1/0/0/
vignettes/extending-srvyr.Rmd
extending-srvyr.Rmd
## Loading required package: convey
## Loading required package: laeken
-I don’t expect this vignette to be help for most srvyr users, it is instead intended for other package developers. An exciting new feature that is easier now that I have reworked srvyr’s non-standard evaluation to match dplyr 0.7+ is that it is now possible for non-srvyr functions to be called from within summarize
. This vignette describes some of the inner-workings of summarize so that others can extend srvyr. This is kind of a fiddly part of srvyr, and I don’t expect that many people will want or need to understand it, so this guide is mostly aimed at package authors who already have an understanding of how survey objects work. If you’d like more explanation, please let me know on github!
This guide has also been rewritten for srvyr 1.0, as I had to rework summarize and was unable to maintain backwards compatibility.
+I don’t expect this vignette to be help for most srvyr users, it is
+instead intended for other package developers. An exciting new feature
+that is easier now that I have reworked srvyr’s non-standard evaluation
+to match dplyr 0.7+ is that it is now possible for non-srvyr functions
+to be called from within summarize
. This vignette describes
+some of the inner-workings of summarize so that others can extend srvyr.
+This is kind of a fiddly part of srvyr, and I don’t expect that many
+people will want or need to understand it, so this guide is mostly aimed
+at package authors who already have an understanding of how survey
+objects work. If you’d like more explanation, please let me know on github!
This guide has also been rewritten for srvyr 1.0, as I had to rework +summarize and was unable to maintain backwards compatibility.
srvyr implements the “survey statistics” functions from the survey package. Some examples are the svymean, svytotal, svyciprop, svyquantile and svyratio all return a svystat
object which usually prints out the estimate and its standard error and other estimates of the variance can be calculated from it. In srvyr, these estimates are created inside of a summarize call and the variance estimates are specified at the same time.
The combination of srvyr’s group_by and summarize is analogous to the svyby
function that performs one of the survey statistic function and performs it on multiple groups. However, as of srvyr 1.0, srvyr no longer uses svyby
, instead the survey object is split into each group’s
srvyr implements the “survey statistics” functions from the survey
+package. Some examples are the svymean, svytotal, svyciprop, svyquantile
+and svyratio all return a svystat
object which usually
+prints out the estimate and its standard error and other estimates of
+the variance can be calculated from it. In srvyr, these estimates are
+created inside of a summarize call and the variance estimates are
+specified at the same time.
The combination of srvyr’s group_by and summarize is analogous to the
+svyby
function that performs one of the survey statistic
+function and performs it on multiple groups. However, as of srvyr 1.0,
+srvyr no longer uses svyby
, instead the survey object is
+split into each group’s
srvyr’s summarize expects that the survey statistics functions will return objects that are formatted in a particular way. Below, I’ll explain some of the functions that will help create these objects for you in most cases, but the return should be:
+srvyr’s summarize expects that the survey statistics functions will +return objects that are formatted in a particular way. Below, I’ll +explain some of the functions that will help create these objects for +you in most cases, but the return should be:
srvyr_result_df
object (which is just a wrapper around a data.frame
)srvyr_result_df
object (which is just a wrapper
+around a data.frame
)srvyr now exports several functions that can help convert functions designed for the survey package to this format.
+srvyr now exports several functions that can help convert functions +designed for the survey package to this format.
cur_svy()
- This function, modeled after dplyr::current_vars()
, is a hidden way to send the survey object to the object (by hidden, I mean that the user doesn’t have to specify the survey in the arguments of their function call). To use it, you can now directly call cur_svy()
from inside your function. This survey includes only the current group’s survey data.cur_svy()
- This function, modeled after
+dplyr::current_vars()
, is a hidden way to send the survey
+object to the object (by hidden, I mean that the user doesn’t have to
+specify the survey in the arguments of their function call). To use it,
+you can now directly call cur_svy()
from inside your
+function. This survey includes only the current group’s survey
+data.
cur_svy_full()
- Like cur_svy()
, but includes the full survey data intead of just the current group’s data.cur_svy_full()
- Like cur_svy()
, but
+includes the full survey data intead of just the current group’s
+data.
cur_svy_wts()
- This helper function provides access to the full-sample weights for the current group’s data.cur_svy_wts()
- This helper function provides access to
+the full-sample weights for the current group’s data.
set_survey_vars()
- Many survey functions have limited support for both supplying a formula indicating the variables to calculate a statistic on as well as a vector. However, oftentimes the vector version is less well supported than the formula version. Since srvyr uses dplyr semantics, it ends up returning the values as vectors. This function will add on the variable to the survey, defaulting to having the name “__SRVYR_TEMP_VAR__”.set_survey_vars()
- Many survey functions have limited
+support for both supplying a formula indicating the variables to
+calculate a statistic on as well as a vector. However, oftentimes the
+vector version is less well supported than the formula version. Since
+srvyr uses dplyr semantics, it ends up returning the values as vectors.
+This function will add on the variable to the survey, defaulting to
+having the name “__SRVYR_TEMP_VAR__”.
get_var_est()
- A helper function that calculates variance estimates like standard error (se), confidence interval (ci), variance (var), or coefficient of variance (cv). For functions that support it, there is a separate argument for design effects (to match survey’s conventions).get_var_est()
- A helper function that calculates
+variance estimates like standard error (se), confidence interval (ci),
+variance (var), or coefficient of variance (cv). For functions that
+support it, there is a separate argument for design effects (to match
+survey’s conventions).
as_srvyr_result_df()
- A helper function that adds the srvyr_result_df
class to a data.frame
+as_srvyr_result_df()
- A helper function that adds the
+srvyr_result_df
class to a data.frame
Note that these functions may not work in all cases. In srvyr, I’ve actually had to write multiple versions of get_var_est()
because of minor differences in the way survey objects are returned. Hopefully they will help in most situations, or at least give you a good place to start.
Note that these functions may not work in all cases. In srvyr, I’ve
+actually had to write multiple versions of get_var_est()
+because of minor differences in the way survey objects are returned.
+Hopefully they will help in most situations, or at least give you a good
+place to start.
Two less important conventions that srvyr functions follow are:
That was just a lot of text, but I think it’s probably easiest just to provide an example. The convey package provides several methods for analysis of inequality using survey data. The svygini function calculates the gini coefficient. Here, we’ll write functions that make a srvyr version survey_gini
.
That was just a lot of text, but I think it’s probably easiest just
+to provide an example. The convey package provides several methods for
+analysis of inequality using survey data. The svygini function
+calculates the gini coefficient. Here, we’ll write functions that make a
+srvyr version survey_gini
.
# S3 generic function
survey_gini <- function(
diff --git a/docs/articles/index.html b/docs/articles/index.html
index 132e6ca..33aa362 100644
--- a/docs/articles/index.html
+++ b/docs/articles/index.html
@@ -72,7 +72,7 @@
vignettes/srvyr-database.Rmd
srvyr-database.Rmd
## Loading required package: RSQLite
-Srvyr 0.3 has a completely rewritten database backend. Using databases that are already stored dplyr’s tbl_lazy
objects is now just as easy as working with data stored in regular R data.frames as you don’t need to have a unique identifier. Additionally, it now works more similarly to the survey package’s database code and so shouldn’t be any slower.
During development, I have tested using SQLite (and the now defunct MonetDBLite) databases, but in theory other database backends should work as well.
-This vignette shows the basics of how to use srvyr with databases. It is based on analysis from the wonderful resource asdfree ( website and github ). Many thanks to ajdamico and collaborators. Specifically, I have adapted code from American Community Survey - 2011 single year analysis and the associated data preparation scripts.
+Srvyr 0.3 has a completely rewritten database backend. Using
+databases that are already stored dplyr’s tbl_lazy
objects
+is now just as easy as working with data stored in regular R data.frames
+as you don’t need to have a unique identifier. Additionally, it now
+works more similarly to the survey package’s database code and so
+shouldn’t be any slower.
During development, I have tested using SQLite (and the now defunct +MonetDBLite) databases, but in theory other database backends should +work as well.
+This vignette shows the basics of how to use srvyr with databases. It +is based on analysis from the wonderful resource asdfree ( website and github ). Many thanks to +ajdamico and collaborators. +Specifically, I have adapted code from American +Community Survey - 2011 single year analysis and the associated data +preparation scripts.
In order to focus on srvyr and databases, we start with a prepared dataset. The full code is available on Github, and the high level description of what it does is:
+In order to focus on srvyr and databases, we start with a prepared +dataset. The full code is available on +Github, and the high level description of what it does is:
Download data from acs website (currently only Alaska and Hawaii to save time, though it would be easy to adapt to download to all 50 states and Puerto Rico).
Merges the household and person datasets so that we can look at the variables related to each person including those at the household level
Selects only a few variables that will be used in this analysis to save space, but again it could easily be adapted to keep all of the variables.
Download data from acs website (currently only Alaska and Hawaii +to save time, though it would be easy to adapt to download to all 50 +states and Puerto Rico).
Merges the household and person datasets so that we can look at +the variables related to each person including those at the household +level
Selects only a few variables that will be used in this analysis +to save space, but again it could easily be adapted to keep all of the +variables.
For more information on the specifics of the American Community Survey, see the asdfree site. Now, our code loads this prepared dataset, initiates a SQLite database, and puts the data into the dataset.
+For more information on the specifics of the American Community +Survey, see the asdfree site. Now, our code loads this prepared dataset, +initiates a SQLite database, and puts the data into the dataset.
suppressMessages({
library(survey)
@@ -139,7 +161,12 @@
# Or, if the data was already stored in the database, you could do this
# acs_m_data <- tbl(db, sql("SELECT * FROM acs_m"))
Now that we have the data in the database, we can interact with the database directly using sql commands, or we can use dplyr’s functionality to treat it mostly the same as a local data.frame
. However, the data is not stored in memory, so we could work with much larger datasets (though in this case, the data is too small for this to be a problem).
Now that we have the data in the database, we can interact with the
+database directly using sql commands, or we can use dplyr’s
+functionality to treat it mostly the same as a local
+data.frame
. However, the data is not stored in memory, so
+we could work with much larger datasets (though in this case, the data
+is too small for this to be a problem).
# Same results
acs_m %>%
@@ -154,10 +181,10 @@
acs_m_db %>%
group_by(sex) %>%
summarize(hicov = mean(hicov))
## Warning: Missing values are always removed in SQL.
-## Use `mean(x, na.rm = TRUE)` to silence this warning
-## This warning is displayed only once per session.
-## # Source: lazy query [?? x 2]
+## Warning: Missing values are always removed in SQL aggregation functions.
+## Use `na.rm = TRUE` to silence this warning
+## This warning is displayed once every 8 hours.
+## # Source: SQL [2 x 2]
## # Database: sqlite 3.37.0 [:memory:]
## sex hicov
## <int> <dbl>
@@ -169,8 +196,13 @@
## 7777312 bytes
object.size(acs_m_db)
-## 10544 bytes
-
Note that though many commands behave exactly the same whether on a local data.frame or database, sometimes more advanced / complicated syntax around variable modification allowed in dplyr does not work on a particular database and so it is better to be more explicit. For example, creating a variable inside of a summarize call does not work in some databases. .
+## 10880 bytes
+Note that though many commands behave exactly the same whether on a
+local data.frame or database, sometimes more advanced / complicated
+syntax around variable modification allowed in dplyr does not work on a
+particular database and so it is better to be more explicit. For
+example, creating a variable inside of a summarize call does not work in
+some databases. .
acs_m %>%
group_by(sex) %>%
@@ -195,14 +227,21 @@
group_by(sex) %>%
mutate(hicov = ifelse(hicov == 1, 1L, 0L)) %>%
summarize(hicov = mean(hicov))
-## # Source: lazy query [?? x 2]
+## # Source: SQL [2 x 2]
## # Database: sqlite 3.37.0 [:memory:]
## sex hicov
## <int> <dbl>
## 1 1 0.858
## 2 2 0.895
-Further, sometimes working with variable types can get difficult if you are used to working in R. Notice how in the above, instead of hicov = (hicov == 1)
, I wrote out the ifelse statement. If I hadn’t RSQLite would be unable to calculate the mean of the boolean variable created.
-Finally, a major difference when transitioning from dplyr on local data.frames is that not all R functions are translated to SQL. For example, cut()
isn’t implemented in SQL, so you can’t create a new variable in the data.frame using it.
+Further, sometimes working with variable types can get difficult if
+you are used to working in R. Notice how in the above, instead of
+hicov = (hicov == 1)
, I wrote out the ifelse statement. If
+I hadn’t RSQLite would be unable to calculate the mean of the boolean
+variable created.
+Finally, a major difference when transitioning from dplyr on local
+data.frames is that not all R functions are translated to SQL. For
+example, cut()
isn’t implemented in SQL, so you can’t
+create a new variable in the data.frame using it.
acs_m %>%
group_by(agecat = cut(agep, c(0, 19, 35, 50, 65, 200))) %>%
@@ -233,7 +272,7 @@
ifelse(agep >= 65, "65+", NA)))))) %>%
group_by(agecat) %>%
summarize(hicov = mean(hicov))
-## # Source: lazy query [?? x 2]
+## # Source: SQL [5 x 2]
## # Database: sqlite 3.37.0 [:memory:]
## agecat hicov
## <chr> <dbl>
@@ -242,12 +281,17 @@
## 3 35-49 1.17
## 4 50-64 1.13
## 5 65+ 1.01
-For more information on the specifics of databases with dplyr, see vignette("database", package = "dplyr")
, the DBI
package or the specific database packages, like RSQLite
.
+For more information on the specifics of databases with dplyr, see
+vignette("database", package = "dplyr")
, the
+DBI
package or the specific database packages, like
+RSQLite
.
Srvyr commands are nearly identical to old. The only difference for setup is that you need a variable that uniquely identifies each row in the database (uid).
+Srvyr commands are nearly identical to old. The only difference for +setup is that you need a variable that uniquely identifies each row in +the database (uid).
acs_m_db_svy <- acs_m_db %>%
as_survey_rep(
@@ -284,16 +328,24 @@
## (int), pwgtp73 (int), pwgtp74 (int), pwgtp75 (int), pwgtp76 (int), pwgtp77
## (int), pwgtp78 (int), pwgtp79 (int), pwgtp80 (int), agep (int), hicov (int),
## sex (int), st (chr), rt (chr)
-Because srvyr stores the survey variables locally, the srvyr object takes up much more memory than the dplyr one. However, this object would not grow in size if you added more data variables to your survey, so if your survey is very wide, it will save a lot space.
+Because srvyr stores the survey variables locally, the srvyr object +takes up much more memory than the dplyr one. However, this object would +not grow in size if you added more data variables to your survey, so if +your survey is very wide, it will save a lot space.
object.size(acs_m_db_svy)
## 8391576 bytes
+## 8391912 bytes
Analysis commands from srvyr are also similar to ones that work on local data.frames. The main differences come from the issues discussed above about explicitly creating variables difficulties in translating R commands, and variable types.
-The following analysis is based on the asdfree analysis and shows some basic analysis on the total population, insurance coverage, age and sex.
+Analysis commands from srvyr are also similar to ones that work on +local data.frames. The main differences come from the issues discussed +above about explicitly creating variables difficulties in translating R +commands, and variable types.
+The following analysis is based on the asdfree analysis and shows +some basic analysis on the total population, insurance coverage, age and +sex.
# You can calculate the population of the united states #
# by state
@@ -344,7 +396,7 @@
mutate(hicov = as.character(hicov)) %>%
group_by(st, hicov) %>%
summarize(pct = survey_mean(na.rm = TRUE))
## Adding missing grouping variables: `st`, `hicov`
+## Adding missing grouping variables: `st` and `hicov`
## # A tibble: 4 × 4
## # Groups: st [2]
## st hicov pct pct_se
@@ -414,7 +466,10 @@
Running survey commands with collect
-If you’d like to run a command from the survey package, you’ll need to collect the data locally first. You can select only the variables you’ll need for the analysis so that you don’t have to store the whole dataset in memory.
+If you’d like to run a command from the survey package, you’ll need
+to collect the data locally first. You can select only the variables
+you’ll need for the analysis so that you don’t have to store the whole
+dataset in memory.
acs_m_db_svy %>%
select(agep, hicov, sex) %>%
@@ -494,7 +549,11 @@
Write Access
-Note that srvyr does not require write access to perform calculations, the database created in this vignette was set to read-only at the beginning. This can be important when you want to make sure that your original data is not altered accidentally, or if you don’t have write access to a database.
+Note that srvyr does not require write access to perform
+calculations, the database created in this vignette was set to read-only
+at the beginning. This can be important when you want to make sure that
+your original data is not altered accidentally, or if you don’t have
+write access to a database.
diff --git a/docs/articles/srvyr-vs-survey.html b/docs/articles/srvyr-vs-survey.html
index a68f154..8f8fef4 100644
--- a/docs/articles/srvyr-vs-survey.html
+++ b/docs/articles/srvyr-vs-survey.html
@@ -32,7 +32,7 @@
srvyr
compared to the survey
packagesrvyr
compared to the
+survey
package
vignettes/srvyr-vs-survey.Rmd
srvyr-vs-survey.Rmd
The srvyr
package adds dplyr
like syntax to the survey
package. This vignette focuses on how srvyr
compares to the survey
package, for more information about survey design and analysis, check out the vignettes in the survey
package, or Thomas Lumley’s book, Complex Surveys: A Guide to Analysis Using R. (Also see the bottom of this document for some more resources).
Everything that srvyr
can do, can also be done in survey
. In fact, behind the scenes the survey
package is doing all of the hard work for srvyr
. srvyr
strives to make your code simpler and more easily readable to you, especially if you are already used to the dplyr
package.
The srvyr
package adds dplyr
like syntax to
+the survey
package. This vignette focuses on how
+srvyr
compares to the survey
package, for more
+information about survey design and analysis, check out the vignettes in
+the survey
package, or Thomas Lumley’s book, Complex
+Surveys: A Guide to Analysis Using R. (Also see the bottom of
+this document for some more resources).
Everything that srvyr
can do, can also be done in
+survey
. In fact, behind the scenes the survey
+package is doing all of the hard work for srvyr
.
+srvyr
strives to make your code simpler and more easily
+readable to you, especially if you are already used to the
+dplyr
package.
The dplyr
package has made it easy to write code to summarize data. For example, if we wanted to check how the year-to-year change in academic progress indicator score varied by school level and percent of parents were high school graduates, we can do this:
The dplyr
package has made it easy to write code to
+summarize data. For example, if we wanted to check how the year-to-year
+change in academic progress indicator score varied by school level and
+percent of parents were high school graduates, we can do this:
library(survey)
library(ggplot2)
@@ -130,12 +145,22 @@
geom_text(aes(y = 0, label = n), position = position_dodge(width = 0.9), vjust = -1)
## Warning: Ignoring unknown parameters: stat
-However, if we wanted to add error bars to the graph to capture the uncertainty due to sampling variation, we have to completely rewrite the dplyr
code for the survey
package. srvyr
allows a more direct translation.
However, if we wanted to add error bars to the graph to capture the
+uncertainty due to sampling variation, we have to completely rewrite the
+dplyr
code for the survey
package.
+srvyr
allows a more direct translation.
as_survey_design()
, as_survey_rep()
and as_survey_twophase()
are analogous to survey::svydesign()
, survey::svrepdesign()
and survey::twophase()
respectively. Because they are designed to match dplyr
’s style of non-standard evaluation, they accept bare column names instead of formulas (~). They also move the data argument first, so that it is easier to use magrittr
pipes (%>%
).
as_survey_design()
, as_survey_rep()
and
+as_survey_twophase()
are analogous to
+survey::svydesign()
, survey::svrepdesign()
and
+survey::twophase()
respectively. Because they are designed
+to match dplyr
’s style of non-standard evaluation, they
+accept bare column names instead of formulas (~). They also move the
+data argument first, so that it is easier to use magrittr
+pipes (%>%
).
library(srvyr)
@@ -143,7 +168,10 @@
srs_design_srvyr <- apisrs %>% as_survey_design(ids = 1, fpc = fpc)
srs_design_survey <- svydesign(ids = ~1, fpc = ~fpc, data = apisrs)
The srvyr
functions also accept dplyr::select()
’s special selection functions (such as starts_with()
, one_of()
, etc.), so these functions are analogous:
The srvyr
functions also accept
+dplyr::select()
’s special selection functions (such as
+starts_with()
, one_of()
, etc.), so these
+functions are analogous:
# selecting variables to keep in the survey object (stratified example)
strat_design_srvyr <- apistrat %>%
@@ -153,7 +181,9 @@
strat_design_survey <- svydesign(~1, strata = ~stype, fpc = ~fpc,
variables = ~stype + api99 + api00 + api.stu,
weight = ~pw, data = apistrat)
The function as_survey()
will automatically choose between the three as_survey_*
functions based on the arguments, so you can save a few keystrokes.
The function as_survey()
will automatically choose
+between the three as_survey_*
functions based on the
+arguments, so you can save a few keystrokes.
# simple random sample (again)
srs_design_srvyr2 <- apisrs %>% as_survey(ids = 1, fpc = fpc)
Once you’ve set up your survey data, you can use dplyr
verbs such as mutate()
, select()
, filter()
and rename()
.
Once you’ve set up your survey data, you can use dplyr
+verbs such as mutate()
, select()
,
+filter()
and rename()
.
strat_design_srvyr <- strat_design_srvyr %>%
mutate(api_diff = api00 - api99) %>%
@@ -170,7 +202,14 @@
strat_design_survey$variables$api_diff <- strat_design_survey$variables$api00 -
strat_design_survey$variables$api99
names(strat_design_survey$variables)[names(strat_design_survey$variables) == "api.stu"] <- "api_students"
Note that arrange()
is not available, because the srvyr
object expects to stay in the same order. Nor are two-table verbs such as full_join()
, bind_rows()
, etc. available to srvyr
objects either because they may have implications on the survey design. If you need to use these functions, you should use them earlier in your analysis pipeline, when the objects are still stored as data.frame
s.
Note that arrange()
is not available, because the
+srvyr
object expects to stay in the same order. Nor are
+two-table verbs such as full_join()
,
+bind_rows()
, etc. available to srvyr
objects
+either because they may have implications on the survey design. If you
+need to use these functions, you should use them earlier in your
+analysis pipeline, when the objects are still stored as
+data.frame
s.
srvyr
also provides summarize()
and several survey-specific functions that calculate summary statistics on numeric variables: survey_mean()
, survey_total()
, survey_quantile()
and survey_ratio()
. These functions differ from their counterparts in survey
because they always return a data.frame in a consistent format. As such, they do not return the variance-covariance matrix, and so are not as flexible.
srvyr
also provides summarize()
and several
+survey-specific functions that calculate summary statistics on numeric
+variables: survey_mean()
, survey_total()
,
+survey_quantile()
and survey_ratio()
. These
+functions differ from their counterparts in survey
because
+they always return a data.frame in a consistent format. As such, they do
+not return the variance-covariance matrix, and so are not as
+flexible.
# Using srvyr
out <- strat_design_srvyr %>%
@@ -204,7 +250,8 @@
By group
-srvyr
also allows you to calculate statistics on numeric variables by group, using group_by()
.
+srvyr
also allows you to calculate statistics on numeric
+variables by group, using group_by()
.
# Using srvyr
strat_design_srvyr %>%
@@ -232,7 +279,9 @@
Proportions by group
-You can also calculate the proportion or count in each group of a factor or character variable by leaving x empty in survey_mean()
or survey_total()
.
+You can also calculate the proportion or count in each group of a
+factor or character variable by leaving x empty in
+survey_mean()
or survey_total()
.
# Using srvyr
srs_design_srvyr %>%
@@ -259,7 +308,8 @@
Unweighted calculations
-Finally, the unweighted()
function can act as an escape hatch to calculate unweighted calculations on the dataset.
+Finally, the unweighted()
function can act as an escape
+hatch to calculate unweighted calculations on the dataset.
# Using srvyr
strat_design_srvyr %>%
@@ -283,7 +333,11 @@
Back to the example
-So now, we have all the tools needed to create the first graph and add error bounds. Notice that the data manipulation code is nearly identical to the dplyr
code, with a little extra set up, and replacing weighted.mean()
with survey_mean
.
+So now, we have all the tools needed to create the first graph and
+add error bounds. Notice that the data manipulation code is nearly
+identical to the dplyr
code, with a little extra set up,
+and replacing weighted.mean()
with
+survey_mean
.
strat_design <- apistrat %>%
as_survey_design(strata = stype, fpc = fpc, weight = pw)
@@ -306,8 +360,14 @@
Comparison to the survey package (Degrees of freedom)
-For the most part, srvyr
tries to be a drop-in replacement for the survey package, only changing the syntax that you wrote. However, the way that calculations of degrees of freedom when calculating confidence intervals is different.
-srvyr
assumes that you want to use the true degrees of freedom by default, but the survey
package uses Inf
as the default. You can use the argument df
to get the same result as the survey package.
+For the most part, srvyr
tries to be a drop-in
+replacement for the survey package, only changing the syntax that you
+wrote. However, the way that calculations of degrees of freedom when
+calculating confidence intervals is different.
+srvyr
assumes that you want to use the true degrees of
+freedom by default, but the survey
package uses
+Inf
as the default. You can use the argument
+df
to get the same result as the survey package.
# Set pillar print methods so tibble has more decimal places
old_sigfig <- options("pillar.sigfig")
@@ -349,8 +409,13 @@
Grab Bag
-Using survey
functions on srvyr
objects
-Because srvyr
objects are just survey
objects with some extra structure, all of the functions from survey
will still work with them. If you need to calculate something beyond simple summary statistics, you can use survey
functions.
+Using survey
functions on srvyr
+objects
+Because srvyr
objects are just survey
+objects with some extra structure, all of the functions from
+survey
will still work with them. If you need to calculate
+something beyond simple summary statistics, you can use
+survey
functions.
@@ -377,9 +442,12 @@
Using expressions to create variables on the fly
-Like dplyr
, srvyr
allows you to use expressions in the arguments, allowing you to create variables in a single step. For example, you can use expressions:
+Like dplyr
, srvyr
allows you to use
+expressions in the arguments, allowing you to create variables in a
+single step. For example, you can use expressions:
-- as the arguments inside the survey statistic functions like
survey_mean
+ - as the arguments inside the survey statistic functions like
+
survey_mean
@@ -403,7 +471,8 @@
## 1 No 36.1 3.44
## 2 Yes 63.9 3.44
-- and you can even create varables inside of
group_by
+ - and you can even create varables inside of
+
group_by
@@ -415,7 +484,11 @@
## <lgl> <dbl> <dbl>
## 1 FALSE 599. 7.88
## 2 TRUE 805. 7.15
-Though on-the-fly expressions are syntactically valid, it is possible to make statistically invalid numbers from them. For example, though the standard error and confidence intervals can be multiplied by a scalar (like 100), the variance does not scale the same way, so the following is invalid:
+Though on-the-fly expressions are syntactically valid, it is possible
+to make statistically invalid numbers from them. For example, though the
+standard error and confidence intervals can be multiplied by a scalar
+(like 100), the variance does not scale the same way, so the following
+is invalid:
# BAD DON'T DO THIS!
strat_design %>%
@@ -426,8 +499,12 @@
Non-Standard evaluation
-Srvyr supports the non-standard evaluation conventions that dplyr uses. If you’d like to use a function programmatically, you can use the functions from rlang like the {{
operator (aka “curly curly”) from rlang
.
-Here’s a quick example, but please see the dplyr vignette vignette("programming", package = "dplyr")
for more details.
+Srvyr supports the non-standard evaluation conventions that dplyr
+uses. If you’d like to use a function programmatically, you can use the
+functions from rlang like the {{
operator (aka “curly
+curly”) from rlang
.
+Here’s a quick example, but please see the dplyr vignette vignette("programming", package = "dplyr")
+for more details.
mean_with_ci <- function(.data, var) {
summarize(.data, mean = survey_mean({{var}}, vartype = "ci"))
@@ -440,12 +517,19 @@
## mean mean_low mean_upp
## <dbl> <dbl> <dbl>
## 1 625. 606. 643.
-Srvyr will also follow dplyr’s lead on deprecating the old methods of NSE, such as rlang::quo
, and !!
, in addition to the so-called “underscore functions” (like summarize_
). Currently, they have been soft-deprecated, they may be removed altogether in some future version of srvyr.
+Srvyr will also follow dplyr’s lead on deprecating the old methods of
+NSE, such as rlang::quo
, and !!
, in addition
+to the so-called “underscore functions” (like summarize_
).
+Currently, they have been soft-deprecated, they may be removed
+altogether in some future version of srvyr.
Working column-wise
-As of version 1.0 of srvyr, it supports dplyr’s across function, so when you want to calculate a statistic on more than one variable, it is easy to do so. See vignette("colwise", package = "dplyr")
for more details, but here is another quick example:
+As of version 1.0 of srvyr, it supports dplyr’s across function, so
+when you want to calculate a statistic on more than one variable, it is
+easy to do so. See vignette("colwise", package = "dplyr")
+for more details, but here is another quick example:
# Calculate survey mean for all variables that have names starting with "api"
strat_design %>%
@@ -454,12 +538,22 @@
## api00 api00_se api99 api99_se api.stu api.stu_se
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 662. 9.41 629. 9.96 498. 16.1
-Srvyr also supports older methods of working column-wise, the “scoped variants”, such as summarize_at
, summarize_if
, summarize_all
and summarize_each
. Again, these are maintained for backwards compatibility, matching what the tidyverse team has done, but may be removed from a future version.
+Srvyr also supports older methods of working column-wise, the “scoped
+variants”, such as summarize_at
, summarize_if
,
+summarize_all
and summarize_each
. Again, these
+are maintained for backwards compatibility, matching what the tidyverse
+team has done, but may be removed from a future version.
Calculating proportions in groups
-You can calculate the weighted proportion that falls into a group using the survey_prop()
function (or the survey_mean()
function with no x
argument). The proportion is calculated by “unpeeling” the last variable used in group_by()
and then calculating the proportion within the other groups that fall into the last group (so that the proportion within each group that was unpeeled sums to 100%).
+You can calculate the weighted proportion that falls into a group
+using the survey_prop()
function (or the
+survey_mean()
function with no x
argument).
+The proportion is calculated by “unpeeling” the last variable used in
+group_by()
and then calculating the proportion within the
+other groups that fall into the last group (so that the proportion
+within each group that was unpeeled sums to 100%).
# Calculate the proportion that falls into each category of `awards` per `stype`
strat_design %>%
@@ -475,7 +569,11 @@
## 4 H Yes 0.32 0.0644
## 5 M No 0.52 0.0696
## 6 M Yes 0.48 0.0696
-If you want to calculate the proportion for groups from multiple variables at the same time that add up to 100%, the interact
function can help. The interact
function creates a variable that is automatically split apart so that more than one variable can be unpeeled.
+If you want to calculate the proportion for groups from multiple
+variables at the same time that add up to 100%, the
+interact
function can help. The interact
+function creates a variable that is automatically split apart so that
+more than one variable can be unpeeled.
# Calculate the proportion that falls into each category of both `awards` and `stype`
strat_design %>%
@@ -495,43 +593,73 @@
Learning More
-Here are some free resources put together by the community about srvyr:
+Here are some free resources put together by the community about
+srvyr:
-
“How-to”s & examples of using srvyr
-- Stephanie Zimmer & Rebecca Powell’s 2021 AAPOR Workshop “Tidy Survey Analysis in R using the srvyr Package”
+
- Stephanie Zimmer & Rebecca Powell’s 2021 AAPOR
+Workshop “Tidy Survey Analysis in R using the srvyr Package”
-- “The Epidemiologist R Handbook”, by Neale Batra et al. has a chapter on survey analysis with srvyr and survey package examples
-- Kieran Healy’s book “Data Visualization: A Practical Introduction” has a section on using srvyr to visualize the ESS.
-- The IPUMS PMA team’s blog had a series showing examples of using the PMA COVID survey panel with weights
+
- “The Epidemiologist R Handbook”, by Neale Batra et al. has a chapter on survey analysis with
+srvyr and survey package examples
+- Kieran Healy’s book “Data
+Visualization: A Practical Introduction” has a section on using
+srvyr to visualize the ESS.
+- The IPUMS PMA team’s blog had a series showing examples of using the
+PMA COVID
+survey panel with weights
-
-“Open Case Studies: Vaping Behaviors in American Youth” by Carrie Wright, Michael Ontiveros, Leah Jager, Margaret Taub, and Stephanie Hicks is a detailed case study that includes using srvyr to analyze the National Youth Tobacco Survey.
+“Open
+Case Studies: Vaping Behaviors in American Youth” by Carrie Wright,
+Michael Ontiveros, Leah Jager, Margaret Taub, and Stephanie Hicks is a
+detailed case study that includes using srvyr to analyze the National
+Youth Tobacco Survey.
-
-“How to plot Likert scales with a weighted survey in a dplyr friendly way” by Francisco Suárez Salas
-- The tidycensus package vignette “Working with Census microdata” includes information about using the weights from the ACS retrieved from the census API.
+“How
+to plot Likert scales with a weighted survey in a dplyr friendly
+way” by Francisco Suárez Salas
+- The tidycensus package vignette “Working
+with Census microdata” includes information about using the weights
+from the ACS retrieved from the census API.
-
-“The Joy of Calculating the Direct Standard Error for PUMS Estimates” by GitHub user @ldaly
+“The Joy of
+Calculating the Direct Standard Error for PUMS Estimates” by GitHub
+user @ldaly
About survey statistics
-- Thomas Lumley’s book “Complex Surveys: a guide to analysis using R”
+
- Thomas Lumley’s book “Complex Surveys:
+a guide to analysis using R”
-- Chris Skinner. Jon Wakefield. “Introduction to the Design and Analysis of Complex Survey Data.” Statist. Sci. 32 (2) 165 - 175, May 2017. 10.1214/17-STS614
-- Sharon Lohr’s textbook “Sampling: Design and Analysis”. Second or Third Editions
-- “Survey weighting is a mess” is the opening to Andrew Gelman’s “Struggles with Survey Weighting and Regression Modeling”
+
- Chris
+Skinner. Jon Wakefield. “Introduction to the Design and Analysis of
+Complex Survey Data.” Statist. Sci. 32 (2) 165 - 175, May 2017.
+10.1214/17-STS614
+- Sharon Lohr’s textbook “Sampling: Design and Analysis”. Second
+or Third
+Editions
+- “Survey weighting is a mess” is the opening to Andrew Gelman’s “Struggles
+with Survey Weighting and Regression Modeling”
-- Anthony Damico’s website “Analyze Survey Data for Free” has the weight specifications for a wide variety of public use survey datasets.
+- Anthony Damico’s website “Analyze
+Survey Data for Free” has the weight specifications for a wide
+variety of public use survey datasets.
-Working programmatically and/or on multiple columns at once (eg dplyr::across
and rlang
’s “curly curly” {{}}
)
+Working programmatically and/or on multiple columns at once
+(eg dplyr::across
and rlang
’s “curly curly”
+{{}}
)
-- dplyr’s included package vignettes “Column-wise operations” & “Programming with dplyr”
+
- dplyr’s included package vignettes “Column-wise
+operations” & “Programming
+with dplyr”
@@ -539,28 +667,48 @@
Non-English resources
-
-Em português: “Análise de Dados Amostrais Complexos” by Djalma Pessoa and Pedro Nascimento Silva
+Em português: “Análise de Dados Amostrais
+Complexos” by Djalma Pessoa and Pedro Nascimento Silva
-
-En español: “Usando R para jugar con los microdatos del INEGI” by Claudio Daniel Pacheco Castro
+En español: “Usando
+R para jugar con los microdatos del INEGI” by Claudio Daniel Pacheco
+Castro
-
-Tiếng Việt: “Dịch tễ học ứng dụng và y tế công cộng với R”
+Tiếng Việt: “Dịch tễ học ứng
+dụng và y tế công cộng với R”
Other cool stuff that uses srvyr
-- A (free) graphical interface allowing exploratory data analysis of survey data without writing code: iNZight (and survey data instructions)
+- A (free) graphical interface allowing exploratory data analysis of
+survey data without writing code: iNZight (and survey data
+instructions)
-
-“serosurvey: Serological Survey Analysis For Prevalence Estimation Under Misclassification” by Andree Valle Campos
-- Several packages on CRAN depend on srvyr, you can see them by looking at the reverse Imports/Suggestions on CRAN.
+“serosurvey:
+Serological Survey Analysis For Prevalence Estimation Under
+Misclassification” by Andree Valle Campos
+Several packages on CRAN depend on srvyr, you can see them by
+looking at the reverse
+Imports/Suggestions on CRAN.
Still need help?
-I think the best way to get help is to form a specific question and ask it in some place like rstudio’s community webiste (known for it’s friendly community) or stackoverflow.com (maybe not known for being quite as friendly, but probably has more people). If you think you’ve found a bug in srvyr’s code, please file an issue on GitHub, but note that I’m not a great resource for helping specific issue, both because I have limited capacity but also because I do not consider myself an expert in the statistical methods behind survey analysis.
+I think the best way to get help is to form a specific question and
+ask it in some place like rstudio’s community webiste
+(known for it’s friendly community) or stackoverflow.com (maybe not known
+for being quite as friendly, but probably has more people). If you think
+you’ve found a bug in srvyr’s code, please file an issue on GitHub,
+but note that I’m not a great resource for helping specific issue, both
+because I have limited capacity but also because I do not consider
+myself an expert in the statistical methods behind survey analysis.
Have something to add?
-These resources were mostly found via vanity searches on twitter & github. If you know of anything I missed, or have written something yourself, please let me know in this GitHub issue!
+These resources were mostly found via vanity searches on twitter
+& github. If you know of anything I missed, or have written
+something yourself, please let me know
+in this GitHub issue!
diff --git a/docs/authors.html b/docs/authors.html
index e5caf8e..47d0465 100644
--- a/docs/authors.html
+++ b/docs/authors.html
@@ -72,7 +72,7 @@
diff --git a/docs/index.html b/docs/index.html
index c355006..0dbd84e 100644
--- a/docs/index.html
+++ b/docs/index.html
@@ -34,7 +34,7 @@
@@ -95,12 +95,19 @@
-srvyr brings parts of dplyr’s syntax to survey analysis, using the survey package.
-srvyr focuses on calculating summary statistics from survey data, such as the mean, total or quantile. It allows for the use of many dplyr verbs, such as summarize
, group_by
, and mutate
, the convenience of pipe-able functions, rlang’s style of non-standard evaluation and more consistent return types than the survey package.
+srvyr brings parts of dplyr’s syntax to survey
+analysis, using the survey package.
+srvyr focuses on calculating summary statistics from survey data,
+such as the mean, total or quantile. It allows for the use of many dplyr
+verbs, such as summarize
, group_by
, and
+mutate
, the convenience of pipe-able functions, rlang’s
+style of non-standard evaluation and more consistent return types than
+the survey package.
You can try it out:
install.packages("srvyr")
@@ -109,7 +116,11 @@
Example usage
-First, describe the variables that define the survey’s structure with the function as_survey()
with the bare column names of the names that you would use in functions from the survey package like survey::svydesign()
, survey::svrepdesign()
or survey::twophase()
.
+First, describe the variables that define the survey’s structure with
+the function as_survey()
with the bare column names of the
+names that you would use in functions from the survey package like
+survey::svydesign()
, survey::svrepdesign()
or
+survey::twophase()
.
library(srvyr, warn.conflicts = FALSE)
data(api, package = "survey")
@@ -126,7 +137,8 @@
mutate(api_diff = api00 - api99)
-
-
summarise()
calculates summary statistics such as mean, total, quantile or ratio.
+summarise()
calculates summary statistics such as mean,
+total, quantile or ratio.
dstrata %>%
@@ -137,7 +149,8 @@
#> 1 32.9 28.8 37.0
-
-
group_by()
and then summarise()
creates summaries by groups.
+group_by()
and then summarise()
creates
+summaries by groups.
dstrata %>%
@@ -177,44 +190,76 @@
Learning more
-Here are some free resources put together by the community about srvyr:
+Here are some free resources put together by the community about
+srvyr:
-
“How-to”s & examples of using srvyr
-- srvyr’s included vignette “srvyr vs survey” and the rest of the pkgdown website
+
- srvyr’s included vignette “srvyr vs
+survey” and the rest of the pkgdown
+website
-- Stephanie Zimmer & Rebecca Powell’s 2021 AAPOR Workshop “Tidy Survey Analysis in R using the srvyr Package”
+
- Stephanie Zimmer & Rebecca Powell’s 2021 AAPOR
+Workshop “Tidy Survey Analysis in R using the srvyr Package”
-- “The Epidemiologist R Handbook”, by Neale Batra et al. has a chapter on survey analysis with srvyr and survey package examples
-- Kieran Healy’s book “Data Visualization: A Practical Introduction” has a section on using srvyr to visualize the ESS.
-- The IPUMS PMA team’s blog had a series showing examples of using the PMA COVID survey panel with weights
+
- “The Epidemiologist R Handbook”, by Neale Batra et al. has a chapter on survey analysis with
+srvyr and survey package examples
+- Kieran Healy’s book “Data
+Visualization: A Practical Introduction” has a section on using
+srvyr to visualize the ESS.
+- The IPUMS PMA team’s blog had a series showing examples of using the
+PMA COVID
+survey panel with weights
-
-“Open Case Studies: Vaping Behaviors in American Youth” by Carrie Wright, Michael Ontiveros, Leah Jager, Margaret Taub, and Stephanie Hicks is a detailed case study that includes using srvyr to analyze the National Youth Tobacco Survey.
+“Open
+Case Studies: Vaping Behaviors in American Youth” by Carrie Wright,
+Michael Ontiveros, Leah Jager, Margaret Taub, and Stephanie Hicks is a
+detailed case study that includes using srvyr to analyze the National
+Youth Tobacco Survey.
-
-“How to plot Likert scales with a weighted survey in a dplyr friendly way” by Francisco Suárez Salas
-- The tidycensus package vignette “Working with Census microdata” includes information about using the weights from the ACS retrieved from the census API.
+“How
+to plot Likert scales with a weighted survey in a dplyr friendly
+way” by Francisco Suárez Salas
+- The tidycensus package vignette “Working
+with Census microdata” includes information about using the weights
+from the ACS retrieved from the census API.
-
-“The Joy of Calculating the Direct Standard Error for PUMS Estimates” by GitHub user @ldaly
+“The Joy of
+Calculating the Direct Standard Error for PUMS Estimates” by GitHub
+user @ldaly
About survey statistics
-- Thomas Lumley’s book “Complex Surveys: a guide to analysis using R”
+
- Thomas Lumley’s book “Complex Surveys:
+a guide to analysis using R”
-- Chris Skinner. Jon Wakefield. “Introduction to the Design and Analysis of Complex Survey Data.” Statist. Sci. 32 (2) 165 - 175, May 2017. 10.1214/17-STS614
-- Sharon Lohr’s textbook “Sampling: Design and Analysis”. Second or Third Editions
-- “Survey weighting is a mess” is the opening to Andrew Gelman’s “Struggles with Survey Weighting and Regression Modeling”
+
- Chris
+Skinner. Jon Wakefield. “Introduction to the Design and Analysis of
+Complex Survey Data.” Statist. Sci. 32 (2) 165 - 175, May 2017.
+10.1214/17-STS614
+- Sharon Lohr’s textbook “Sampling: Design and Analysis”. Second
+or Third
+Editions
+- “Survey weighting is a mess” is the opening to Andrew Gelman’s “Struggles
+with Survey Weighting and Regression Modeling”
-- Anthony Damico’s website “Analyze Survey Data for Free” has the weight specifications for a wide variety of public use survey datasets.
+- Anthony Damico’s website “Analyze
+Survey Data for Free” has the weight specifications for a wide
+variety of public use survey datasets.
-Working programmatically and/or on multiple columns at once (eg dplyr::across
and rlang
’s “curly curly” {{}}
)
+Working programmatically and/or on multiple columns at once
+(eg dplyr::across
and rlang
’s “curly curly”
+{{}}
)
-- dplyr’s included package vignettes “Column-wise operations” & “Programming with dplyr”
+
- dplyr’s included package vignettes “Column-wise
+operations” & “Programming
+with dplyr”
@@ -222,52 +267,86 @@
Non-English resources
-
-Em português: “Análise de Dados Amostrais Complexos” by Djalma Pessoa and Pedro Nascimento Silva
+Em português: “Análise de Dados Amostrais
+Complexos” by Djalma Pessoa and Pedro Nascimento Silva
-
-En español: “Usando R para jugar con los microdatos del INEGI” by Claudio Daniel Pacheco Castro
+En español: “Usando
+R para jugar con los microdatos del INEGI” by Claudio Daniel Pacheco
+Castro
-
-Tiếng Việt: “Dịch tễ học ứng dụng và y tế công cộng với R”
+Tiếng Việt: “Dịch tễ học ứng
+dụng và y tế công cộng với R”
Other cool stuff that uses srvyr
-- A (free) graphical interface allowing exploratory data analysis of survey data without writing code: iNZight (and survey data instructions)
+- A (free) graphical interface allowing exploratory data analysis of
+survey data without writing code: iNZight (and survey data
+instructions)
-
-“serosurvey: Serological Survey Analysis For Prevalence Estimation Under Misclassification” by Andree Valle Campos
-- Several packages on CRAN depend on srvyr, you can see them by looking at the reverse Imports/Suggestions on CRAN.
+“serosurvey:
+Serological Survey Analysis For Prevalence Estimation Under
+Misclassification” by Andree Valle Campos
+Several packages on CRAN depend on srvyr, you can see them by
+looking at the reverse
+Imports/Suggestions on CRAN.
Still need help?
-I think the best way to get help is to form a specific question and ask it in some place like rstudio’s community website (known for it’s friendly community) or stackoverflow.com (maybe not known for being quite as friendly, but probably has more people). If you think you’ve found a bug in srvyr’s code, please file an issue on GitHub, but note that I’m not a great resource for helping specific issue, both because I have limited capacity but also because I do not consider myself an expert in the statistical methods behind survey analysis.
+I think the best way to get help is to form a specific question and
+ask it in some place like rstudio’s community website
+(known for it’s friendly community) or stackoverflow.com (maybe not known
+for being quite as friendly, but probably has more people). If you think
+you’ve found a bug in srvyr’s code, please file an issue on GitHub,
+but note that I’m not a great resource for helping specific issue, both
+because I have limited capacity but also because I do not consider
+myself an expert in the statistical methods behind survey analysis.
Have something to add?
-These resources were mostly found via vanity searches on twitter & github. If you know of anything I missed, or have written something yourself, please let me know in this GitHub issue!
+These resources were mostly found via vanity searches on twitter
+& github. If you know of anything I missed, or have written
+something yourself, please let me know
+in this GitHub issue!
What people are saying about srvyr
-minimal changes to my #r #dplyr script to incorporate survey weights, thanks to the amazing #srvyr and #survey packages. Thanks to @gregfreedman & @tslumley. Integrates soooo nicely into tidyverse
-–Brian Guay (@BrianMGuay on Jun 16, 2021)
+minimal changes to my #r #dplyr script to incorporate survey weights,
+thanks to the amazing #srvyr and #survey packages. Thanks to
+@gregfreedman & @tslumley. Integrates soooo nicely into
+tidyverse
+–Brian Guay (@BrianMGuay
+on Jun 16, 2021)
-Spending my afternoon using srvyr
for tidy analysis of weighted survey data in #rstats and it’s so elegant. Vignette here: https://CRAN.R-project.org/package=srvyr/vignettes/srvyr-vs-survey.html
-–Chris Skovron (@cskovron on Nov 20, 2018)
+Spending my afternoon using srvyr
for tidy analysis of
+weighted survey data in #rstats and it’s so elegant. Vignette here: https://CRAN.R-project.org/package=srvyr/vignettes/srvyr-vs-survey.html
+–Chris Skovron (@cskovron
+on Nov 20, 2018)
- Yay!
-–Thomas Lumley, in the Biased and Inefficient blog
+–Thomas Lumley, in
+the Biased and Inefficient blog
Contributing
-I do appreciate bug reports, suggestions and pull requests! I started this as a way to learn about R package development, and am still learning, so you’ll have to bear with me. Please review the Contributor Code of Conduct, as all participants are required to abide by its terms.
-If you’re unfamiliar with contributing to an R package, I recommend the guides provided by Rstudio’s tidyverse team, such as Jim Hester’s blog post or Hadley Wickham’s R packages book.
+I do appreciate bug reports, suggestions and pull requests! I started
+this as a way to learn about R package development, and am still
+learning, so you’ll have to bear with me. Please review the Contributor
+Code of Conduct, as all participants are required to abide by its
+terms.
+If you’re unfamiliar with contributing to an R package, I recommend
+the guides provided by Rstudio’s tidyverse team, such as Jim Hester’s blog
+post or Hadley Wickham’s R packages
+book.
diff --git a/docs/news/index.html b/docs/news/index.html
index 918f8e2..7636a81 100644
--- a/docs/news/index.html
+++ b/docs/news/index.html
@@ -72,7 +72,7 @@
@@ -136,15 +136,28 @@ Changelog
Source: NEWS.md
+
-srvyr 1.1.1 Unreleased
+srvyr 1.1.1 2022-02-20
-- Add function
cur_svy_wts()
to access the survey weights (#136, #139, thanks @ray-p144 and @bschneidr)
-- Allow access to survey context functions like
cur_svy()
and cur_svy_wts()
in mutate
and filter
(#138, #139, thanks @ray-p144 and @bschneidr)
-- Improve behavior of
interact()
when using cascade()
(#133, thanks @szimmer)
-- Fix a bug with non-standard names of grouping variables (like
1234
) in cascade (#132, thanks @szimmer)
+- Add function
cur_svy_wts()
to access the survey weights
+(#136, #139, thanks @ray-p144 and @bschneidr)
+- Allow access to survey context functions like
cur_svy()
+and cur_svy_wts()
in mutate
and
+filter
(#138, #139, thanks @ray-p144 and @bschneidr)
+- Improve behavior of
interact()
when using
+cascade()
(#133, thanks @szimmer)
+- Fix a bug with non-standard names of grouping variables (like
+
1234
) in cascade (#132, thanks @szimmer)
@@ -152,11 +165,19 @@
srvyr 1.1.0 2021-09-29
-- Uses the new quantile functions provided in version 4.1 of the survey package. The old survey quantile functions can be accessed with
survey_old_quantile()
and survey_old_median()
+ - Uses the new quantile functions provided in version 4.1 of the
+survey package. The old survey quantile functions can be accessed with
+
survey_old_quantile()
and
+survey_old_median()
-- Adds a new function
interact
that makes it easier to calculate proportions among interacted groups
-- “Filering joins” (
anti_join
and semi_join
) are now available for srvyr objects. You must put the tbl_svy
object first. (#65, #120, @bschneidr)
-- Auto-unpacking of data.frames works even inside of a named data.frame column (like one created by
dplyr::across
). (#129)
+- Adds a new function
interact
that makes it easier to
+calculate proportions among interacted groups
+- “Filering joins” (
anti_join
and semi_join
)
+are now available for srvyr objects. You must put the
+tbl_svy
object first. (#65, #120, @bschneidr)
+- Auto-unpacking of data.frames works even inside of a named
+data.frame column (like one created by
dplyr::across
).
+(#129)
- Miscellaneous documentation improvements (#119, #126, #127)
@@ -166,7 +187,8 @@
-
-
survey_mean()
with no x
no longer errors when there are no grouping variables (#117)
+survey_mean()
with no x
no longer errors
+when there are no grouping variables (#117)
@@ -181,26 +203,43 @@
-
dplyr::across()
now works within it
-- dplyr functions like
dplyr::cur_group()
, dplyr::cur_group_id()
, dplyr::cur_data()
work in it (as well as new anlagous functions srvyr-specific cur_svy()
and cur_svy_full()
)
+- dplyr functions like
dplyr::cur_group()
,
+dplyr::cur_group_id()
, dplyr::cur_data()
work
+in it (as well as new anlagous functions srvyr-specific
+cur_svy()
and cur_svy_full()
)
The only known breaking change is:
-
-
objects in the summarize
will refer to the output of summarize
before the input. Meaning code that looks like this:
+objects in the summarize
will refer to the output of
+summarize
before the input. Meaning code that looks like
+this:
dstrata %>% summarize(api99 = survey_mean(api99), api_diff = survey_mean(api00 - api99))
-will now error because it calculates the mean of api99
before using it inside of the calculation for api_diff
. This behavior better matches dplyr
’s so will likely be kept.
+will now error because it calculates the mean of api99
+before using it inside of the calculation for api_diff
.
+This behavior better matches dplyr
’s so will likely be
+kept.
-Support for group_map()
/group_walk()
/group_map_dfr()
, group_split()
, group_nest()
and nest_by()
were added for tbl_svy
objects.
+Support for
+group_map()
/group_walk()
/group_map_dfr()
,
+group_split()
, group_nest()
and
+nest_by()
were added for tbl_svy
+objects.
Support drop_na
from tidyr (#107).
-as_survey()
and as_survey_()
are now idempotent: given a srvyr
survey object (a tbl_srv
), they return it unchanged. If extra arguments are provided, they are ignored with a warning (#97, thanks @krivit).
-rename_with()
now works with surveys (#96, thanks @krivit).
+as_survey()
and as_survey_()
are now
+idempotent: given a srvyr
survey object (a
+tbl_srv
), they return it unchanged. If extra arguments are
+provided, they are ignored with a warning (#97, thanks
+@krivit).
+rename_with()
now works with surveys (#96, thanks
+@krivit).
@@ -208,9 +247,19 @@
srvyr 0.4.0 2020-07-30
-Fix to ensure that ordered factors can be used as grouping variables or as inputs to survey_count
and survey_tally
(#92, thanks for reporting @szimmer & @walkerke & for fixing @bschneidr).
-Fix to ensure that numeric values can be used in grouping variables (#78 & #74, thanks for reporting @tzoltak & fix @bschneidr)
-Some improvements for dplyr 1.0 (#79) transmute()
now works (thanks for reporting @caayala), summarise()
’s .groups
argument is respected, and multi-row returns to summarise()
work. (Unfortunately the new across()
function isn’t quite supported in summarise()
yet, it will hopefully come soon)
+Fix to ensure that ordered factors can be used as grouping
+variables or as inputs to survey_count
and
+survey_tally
(#92, thanks for reporting @szimmer &
+@walkerke & for fixing @bschneidr).
+Fix to ensure that numeric values can be used in grouping
+variables (#78 & #74, thanks for reporting @tzoltak & fix
+@bschneidr)
+Some improvements for dplyr 1.0 (#79) transmute()
+now works (thanks for reporting @caayala), summarise()
’s
+.groups
argument is respected, and multi-row returns to
+summarise()
work. (Unfortunately the new
+across()
function isn’t quite supported in
+summarise()
yet, it will hopefully come soon)
@@ -235,8 +284,11 @@
srvyr 0.3.8 2020-03-07
-unweighted
now evaluates in the right context and so will provide correct error when an incorrectly interpolated function is used (#70, thanks for reporting @tlmcmurry)
-filter_at
works now, (#57, thanks for reporting @dcaseykc & helping @bschneidr).
+unweighted
now evaluates in the right context and so
+will provide correct error when an incorrectly interpolated function is
+used (#70, thanks for reporting @tlmcmurry)
+filter_at
works now, (#57, thanks for reporting
+@dcaseykc & helping @bschneidr).
Fix for upcoming version of tibble (#72).
@@ -245,10 +297,13 @@
srvyr 0.3.7 2020-01-17
-filter
ing on grouped survey designs now works correctly (#54, thanks for reporting @dcaseykc)
+filter
ing on grouped survey designs now works
+correctly (#54, thanks for reporting @dcaseykc)
-df
parameter now set to be degrees of freedom of survey for quantiles and variance to match other functions.
-Updated tests to work with upcoming version of survey (#66).
+df
parameter now set to be degrees of freedom of
+survey for quantiles and variance to match other functions.
+Updated tests to work with upcoming version of survey
+(#66).
@@ -256,8 +311,11 @@
srvyr 0.3.6 2019-10-05
-Small update to quasiquotation syntax inside unweighted
to improve consistency with recent rlang updates (#54).
-Added functions survey_tally()
and survey_count()
(#53)
+Small update to quasiquotation syntax inside
+unweighted
to improve consistency with recent rlang updates
+(#54).
+Added functions survey_tally()
and
+survey_count()
(#53)
@@ -265,11 +323,18 @@
srvyr 0.3.5 2019-07-09
-New functions survey_var and survey_sd to calculate population variance and standard deviaton.
-Computation of standard errors in all survey_ functions can be suppressed by setting vartype=NULL (#45, thanks @tzoltak).
-Fixed an issue where you’d get an error when summarize components returned different lengths of data - usually when factor levels were not present in the data (#49).
-Removed references to MonetDBLite since it has been removed from CRAN.
-Small updates to replace soft-deprecated dplyr functions with their tibble and tidyselect equivalents (#52, thanks @bschneidr).
+New functions survey_var and survey_sd to calculate population
+variance and standard deviaton.
+Computation of standard errors in all survey_ functions can be
+suppressed by setting vartype=NULL (#45, thanks @tzoltak).
+Fixed an issue where you’d get an error when summarize components
+returned different lengths of data - usually when factor levels were not
+present in the data (#49).
+Removed references to MonetDBLite since it has been removed from
+CRAN.
+Small updates to replace soft-deprecated dplyr functions with
+their tibble and tidyselect equivalents (#52, thanks
+@bschneidr).
@@ -286,8 +352,11 @@
srvyr 0.3.3 2018-05-22
-Add warning to explain that design effects cannot be calculated on proportions. (#39, thanks @mlaviolet)
-Remove dependency on stringr in tests and add DBI to suggests so that test dependencies are correctly specified (#40, thanks CRAN!)
+Add warning to explain that design effects cannot be calculated
+on proportions. (#39, thanks @mlaviolet)
+Remove dependency on stringr in tests and add DBI to suggests so
+that test dependencies are correctly specified (#40, thanks
+CRAN!)
@@ -295,7 +364,8 @@
srvyr 0.3.2 2018-05-04
-- Bug fix for calculating multiple quantiles on grouped data (#38, thanks @iantperry)
+- Bug fix for calculating multiple quantiles on grouped data (#38,
+thanks @iantperry)
@@ -303,8 +373,12 @@
srvyr 0.3.1 2018-03-10
-When converting from a survey db-backed survey to a srvyr one srvyr now tries to capture the updates you’ve already sent. If dbplyr can convert the function, then it will bring the update. If it can’t it will warn you (#35).
-Small bug fixes, mostly having to do with CRAN checks, running on CI services, or for upstream rev dep checks.
+When converting from a survey db-backed survey to a srvyr one
+srvyr now tries to capture the updates you’ve already sent. If dbplyr
+can convert the function, then it will bring the update. If it can’t it
+will warn you (#35).
+Small bug fixes, mostly having to do with CRAN checks, running on
+CI services, or for upstream rev dep checks.
@@ -312,8 +386,15 @@
srvyr 0.3.0 2018-01-24
-srvyr now uses tidy evaluation from rlang. The “underscore” functions have been soft deprecated in favor of quosure splicing. See dplyr’s vignette “programming” for more details. In almost all cases, the old syntax will still work, with one exception: the standard evaluation function as_survey_twophase_()
had to be changed slightly so that the entire list is inside quotation.
-Datbase support has been rewritten. It should be faster now and doesn’t require a unique identifier. You also can now convert survey db-backed surveys to srvyr with as_survey.
+srvyr now uses tidy evaluation from rlang. The “underscore”
+functions have been soft deprecated in favor of quosure splicing. See
+dplyr’s vignette “programming” for more details. In almost all cases,
+the old syntax will still work, with one exception: the standard
+evaluation function as_survey_twophase_()
had to be changed
+slightly so that the entire list is inside quotation.
+Datbase support has been rewritten. It should be faster now and
+doesn’t require a unique identifier. You also can now convert survey
+db-backed surveys to srvyr with as_survey.
srvyr now has a pkgdown site, check it out at http://gdfe.co/srvyr/
@@ -330,8 +411,12 @@
srvyr 0.2.1 2017-04-26
-Added support for dplyr mutate_at/_if/_all and summarize_at/_if/_all for srvyr surveys.
-Fixed a few bugs introduced with dplyr 0.6. This version of srvyr will work with both old versions of dplyr and 0.6, but may be full of warnings if you update dplyr. Full support for the new dplyr is coming soon.
+Added support for dplyr mutate_at/_if/_all and
+summarize_at/_if/_all for srvyr surveys.
+Fixed a few bugs introduced with dplyr 0.6. This version of srvyr
+will work with both old versions of dplyr and 0.6, but may be full of
+warnings if you update dplyr. Full support for the new dplyr is coming
+soon.
@@ -339,7 +424,9 @@
srvyr 0.2.0 2016-09-26
-- Added support for database backed surveys, using dplyr’s handling of DBI. Because of problems interacting with the survey package twophase designs do not work.
+- Added support for database backed surveys, using dplyr’s handling of
+DBI. Because of problems interacting with the survey package twophase
+designs do not work.
@@ -347,9 +434,15 @@
srvyr 0.1.2 2016-06-28
-Fixed a problem with confidence levels not being passed into quantiles
-Added deff parameter to survey_mean()
, survey_total()
and survey_median()
, and a df parameter to those functions and survey_quantile()
/ survey_median()
.
-summarize
and mutate
match dplyr’s behavior when arguments aren’t named (uses dplyr::auto_name()
)
+Fixed a problem with confidence levels not being passed into
+quantiles
+Added deff parameter to survey_mean()
,
+survey_total()
and survey_median()
, and a df
+parameter to those functions and survey_quantile()
/
+survey_median()
.
+summarize
and mutate
match dplyr’s
+behavior when arguments aren’t named (uses
+dplyr::auto_name()
)
@@ -357,8 +450,10 @@
srvyr 0.1.1 2016-04-03
-New function cascade
summarizes groups, and cascades to create summary statistics of groups of groups.
-Fixed a bug for confidence intervals for survey_total()
on groups.
+New function cascade
summarizes groups, and cascades
+to create summary statistics of groups of groups.
+Fixed a bug for confidence intervals for
+survey_total()
on groups.
Fixed some issues with the upcoming version of dplyr.
diff --git a/docs/pkgdown.yml b/docs/pkgdown.yml
index b4c0c71..1ce9d20 100644
--- a/docs/pkgdown.yml
+++ b/docs/pkgdown.yml
@@ -1,9 +1,9 @@
-pandoc: 2.11.2
+pandoc: 2.17.1.1
pkgdown: 1.6.1
pkgdown_sha: ~
articles:
extending-srvyr: extending-srvyr.html
srvyr-database: srvyr-database.html
srvyr-vs-survey: srvyr-vs-survey.html
-last_built: 2022-02-20T17:17Z
+last_built: 2022-10-05T21:14Z
diff --git a/docs/reference/as_srvyr_result_df.html b/docs/reference/as_srvyr_result_df.html
index c22c76c..3c04ea4 100644
--- a/docs/reference/as_srvyr_result_df.html
+++ b/docs/reference/as_srvyr_result_df.html
@@ -76,7 +76,7 @@
diff --git a/docs/reference/as_survey.html b/docs/reference/as_survey.html
index 3856210..1527470 100644
--- a/docs/reference/as_survey.html
+++ b/docs/reference/as_survey.html
@@ -76,7 +76,7 @@
diff --git a/docs/reference/as_survey_design.html b/docs/reference/as_survey_design.html
index 605ffce..89cedac 100644
--- a/docs/reference/as_survey_design.html
+++ b/docs/reference/as_survey_design.html
@@ -73,7 +73,7 @@
diff --git a/docs/reference/as_survey_rep.html b/docs/reference/as_survey_rep.html
index c019fd6..b602962 100644
--- a/docs/reference/as_survey_rep.html
+++ b/docs/reference/as_survey_rep.html
@@ -73,7 +73,7 @@
diff --git a/docs/reference/as_survey_twophase.html b/docs/reference/as_survey_twophase.html
index c515912..a67f1b9 100644
--- a/docs/reference/as_survey_twophase.html
+++ b/docs/reference/as_survey_twophase.html
@@ -77,7 +77,7 @@
diff --git a/docs/reference/as_tibble.html b/docs/reference/as_tibble.html
index e3fbb25..3537a2f 100644
--- a/docs/reference/as_tibble.html
+++ b/docs/reference/as_tibble.html
@@ -73,7 +73,7 @@
diff --git a/docs/reference/cascade.html b/docs/reference/cascade.html
index fbaa637..4157f94 100644
--- a/docs/reference/cascade.html
+++ b/docs/reference/cascade.html
@@ -78,7 +78,7 @@
diff --git a/docs/reference/collect.html b/docs/reference/collect.html
index 05d7b3e..832a02b 100644
--- a/docs/reference/collect.html
+++ b/docs/reference/collect.html
@@ -77,7 +77,7 @@
diff --git a/docs/reference/cur_svy.html b/docs/reference/cur_svy.html
index 645816a..54ab961 100644
--- a/docs/reference/cur_svy.html
+++ b/docs/reference/cur_svy.html
@@ -79,7 +79,7 @@
diff --git a/docs/reference/cur_svy_wts.html b/docs/reference/cur_svy_wts.html
index 5049439..992bf6e 100644
--- a/docs/reference/cur_svy_wts.html
+++ b/docs/reference/cur_svy_wts.html
@@ -75,7 +75,7 @@
diff --git a/docs/reference/current_svy.html b/docs/reference/current_svy.html
index 2a4f138..29ab86d 100644
--- a/docs/reference/current_svy.html
+++ b/docs/reference/current_svy.html
@@ -80,6 +80,7 @@
+
srvyr
0.4.0
diff --git a/docs/reference/dplyr_filter_joins.html b/docs/reference/dplyr_filter_joins.html
index fb0b027..73fded0 100644
--- a/docs/reference/dplyr_filter_joins.html
+++ b/docs/reference/dplyr_filter_joins.html
@@ -74,7 +74,7 @@
diff --git a/docs/reference/dplyr_single.html b/docs/reference/dplyr_single.html
index 18688ac..92333dd 100644
--- a/docs/reference/dplyr_single.html
+++ b/docs/reference/dplyr_single.html
@@ -73,7 +73,7 @@
diff --git a/docs/reference/get_var_est.html b/docs/reference/get_var_est.html
index 6f1e156..5a6713c 100644
--- a/docs/reference/get_var_est.html
+++ b/docs/reference/get_var_est.html
@@ -78,7 +78,7 @@
diff --git a/docs/reference/group_by.html b/docs/reference/group_by.html
index f2ecf7d..3e46432 100644
--- a/docs/reference/group_by.html
+++ b/docs/reference/group_by.html
@@ -76,7 +76,7 @@
diff --git a/docs/reference/group_map_dfr.html b/docs/reference/group_map_dfr.html
index 62105f0..c9eaf73 100644
--- a/docs/reference/group_map_dfr.html
+++ b/docs/reference/group_map_dfr.html
@@ -76,7 +76,7 @@
Functions for calculating summary measures taking into account complex survey design
+Functions for calculating summary measures taking into account +complex survey design
diff --git a/docs/reference/interact.html b/docs/reference/interact.html index be29336..c535b34 100644 --- a/docs/reference/interact.html +++ b/docs/reference/interact.html @@ -75,7 +75,7 @@