Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

as_tibble.egor(), as_alters_df(), and as_aaties_df() should include design information when the ego has a design. #53

Closed
krivit opened this issue Sep 10, 2020 · 5 comments

Comments

@krivit
Copy link
Collaborator

krivit commented Sep 10, 2020

At the moment, the design information is thrown away. Unfortunately, srvyr does not support joins and similar, but it does support indexing, and, also, as far as I can tell, the variables are stored separately from the design information, so it should, in principle, be possible to handle, say, alter table extraction, as follows:

# Create a weighted dataset.
e <- make_egor(8, 32) %>% 
  mutate(weights = sample(c(0.5, 1, 1.5), n(), replace = TRUE))
ego_design(e) <- list(weights = "weights")
ego_design(e)

# Obtain a mapping to "join" egos to alters:
eamap <- match(e$alter$.egoID, e$ego$variables$.egoID)
# Augment the ego survey design to have an ego for each alter (creating cluster samples):
ed <- e$ego[eamap]
# Replace the variables (previously those of ego) with those of the alters.
ed$variables <- e$alter
# This is now a valid `tbl_svy` object preserving the ego design but containing the appropriate alter variables:
ed

Any thoughts?

@tilltnet
Copy link
Owner

Yes, this would be a consistent way to handle egor objects with ego.designs, when converted to stand-alone representations.

I ran the code and it works fine. With the match() line in there it'll work also when .egoID in ego and alter object are not ordered in the same manner. Seems solid to me and should also work for aaties?!

A side note on print.tbl_svy(). For me it does not print the variables itself, but just the design info. I think I'd like it better if it would print the variables first as a tibble and then the design info. If I am not the only one bothered by this we could we file an issue at the srvyr repository?!

@krivit
Copy link
Collaborator Author

krivit commented Sep 13, 2020

I am thinking of implementing essentially left_join and a full_join methods for tbl_svy using indexing, which we can then use in the backends. What do you think?

@krivit
Copy link
Collaborator Author

krivit commented Sep 13, 2020

Actually, there's an even simpler solution: I had forgotten that we've had a workaround needed for *_join.egor and other dplyr verbs all along. I'll just use it here.

@krivit
Copy link
Collaborator Author

krivit commented Sep 13, 2020

On even further thought, as_tibble.egor() should always return a tibble, and similarly with the *_df functions. But, as_survey.egor() should always return a tbl_svy, as should as_alters_survey(). This should happen regardless of whether the original egor object has an ego design.

In particular, a user might start out with an SRS of egos (and so not specify a design), but the alter list would then be a cluster sample with their egos being the clusters. If they want that information, they can use as_alters_survey().

krivit added a commit that referenced this issue Sep 13, 2020
* as_survey_design.egor() is gone.
* as_survey.egor() behaves the same as as_tibble.egor(), but returns a tbl_svy object (regardless of whether the egor has ego design).
* as_egos_df() has been added for consistency.
* as_(egos|alters|aaties)_survey() have been added, analogous to as_*_df() but returning a tbl_svy object.
* as_tibble.egor() is has been moved to conversions.R.
* as_tibble.egor() and as_survey.egor() are now documented in the same file as as_*_df() and as_*_survey().

References #14. Fixes #33, #53.
krivit added a commit that referenced this issue Sep 13, 2020
* as_survey_design.egor() is gone.
* as_survey.egor() behaves the same as as_tibble.egor(), but returns a tbl_svy object (regardless of whether the egor has ego design).
* as_egos_df() has been added for consistency.
* as_(egos|alters|aaties)_survey() have been added, analogous to as_*_df() but returning a tbl_svy object.
* as_tibble.egor() is has been moved to conversions.R.
* as_tibble.egor() and as_survey.egor() are now documented in the same file as as_*_df() and as_*_survey().

References #14. Fixes #33,#53.
@krivit krivit closed this as completed in d52c371 Sep 13, 2020
@krivit
Copy link
Collaborator Author

krivit commented Sep 13, 2020

OK, I've implemented these changes. It passes the checks, and I hope this is something that makes sense in the grander project. (If not, we can always revert.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants