-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
survey_mean
produces incorrect standard errors when an expression is used in summarize
#126
Comments
Oh bummer, I hadn't really thought about that. The earliest versions actually did only allow Plus, I've seen users take advantage of this, which is nice: as_survey(apiclus2) %>% group_by(awards) %>% summarize(aw=100*survey_mean(vartype = "ci"))
#> # A tibble: 2 × 4
#> awards aw aw_low aw_upp
#> <fct> <dbl> <dbl> <dbl>
#> 1 No 34.1 25.7 42.5
#> 2 Yes 65.9 57.5 74.3 (It's been a while since my stats training, can you remind me if the standard error, variance, and coefficient of variation can also be multiplied by a scalar like this? I feel like at least one is wrong, but can't remember which one.) I think I'll probably just add a note in documentation, but I'll think about it. Thanks for reporting! Also, not sure if you already know this, but you can get the correct variance for #> svy %>% summarize(v1 = survey_mean(2 - awards12), v2 = survey_mean(awards12 == 1))
#> v1 v1_se v2 v2_se
#> 1 0.6587302 0.04240799 0.6587302 0.04240799 |
oops, and also expressions are allowed in svy %>% group_by(awards = 2 - awards12) %>% summarize(aw = survey_mean())
#> # A tibble: 2 × 3
#> awards aw aw_se
#> <dbl> <dbl> <dbl>
#> 1 0 0.341 0.0424
#> 2 1 0.659 0.0424 (definitely not as nice, but I think it's consistent with the general tidyverse philosophy that sometimes you've gotta tidy your data before you get clean code) |
Right, the "times 100" functionality is nice to have. (CV and standard errors are multiplicative like that; the variance has to be multiplied by the square of the factor.) I would probably trust that whatever is inside |
I had to work with Yes/No variables codes as integer 1/2, so I thought it would be a good idea to put down
summarize(prop_yes=2-survey_mean(badly_coded_variable))
. The syntax worked but the side effect of the expression insidesummarize()
was that the standard errors were also affected by that expression.Output:
I don't know what the right fix for this is. You cannot possibly parse the expressions inside
summarize()
to make sense like "Oh, this is a linear combination so the resulting standard error is a quadratic form" -- that would be annoying. I think a safe conservative fix is to forbid expressions undersummarize()
, and only allow the RHS to be thesurvey_whatever()
functions, so that no smart asses would try anything weird, but I don't know the implementation details to gauge if that's technically possible.The text was updated successfully, but these errors were encountered: