-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tidypivot feedback #3
Comments
P.s. counter to how I wrote my example, would probably have |
Here's an alternative version of # Inspiration for using `rlang::quo_is_null()` came from here:
# https://stackoverflow.com/a/66580483/9059865
pivot_table <- function(data, rows = NULL, cols = NULL){
cols_quo <- enquo(cols)
tidy <- data %>%
group_by(across(c({{rows}}, {{cols}}))) %>%
summarize(value = n()) %>%
ungroup()
if(rlang::quo_is_null(cols_quo)) return(tidy)
tidy %>%
pivot_wider(names_from = {{cols}})
} |
where you can pass multiple vars to rows, cols - I really like this approach!
yes! where possible, i lean towards buildable code w/ pipes over functions w/ many arguments |
Writing pipeable packages...?@shannonpileggi do you know any good example packages or tutorials on building pipeable packages? I feel like the tidymodels packages may be a place to look (e.g. {infer}, {recipes} etc) but the thought of going through the source code there is a bit intimidating (even if maybe less daunting than going through dplyr's). I imagine those packages are doing it the "right" way though with setting-up specifications, delaying evaluation etc. and I feel like that may be important if taking a more pipe-based API approach to tidypivot. The couple toy pipe-based packages I've written are pretty rough and I just actively do the transformations in each step... though don't actually think this is the "right" way to go about it. e.g. animatrixr: But pretty sure isn't "right" way. More notes on interfaceCould also have it be optional, i.e. a user could either do
OR
(Similar to how tidymodels let's you do things in a number of different ways.) Aggregation "values" may be better handled by pipe-based approach too, you might also want to have special functions to handle "show-values as" e.g.
For something like... But don't know... Also don't know how to handle all the options for totals/subtotal that are commonly used in pivot tables (e.g. |
I don't know any tutorials, unfortunately. One package that I think models this functionality well is gtsummary, and now that I think about it, I wonder if there is some overlap here. |
Adding a couple of links to related work: |
And package here. https://github.com/EvaMaeRey/tidypivot |
@EvaMaeRey on an R4DS community thread someone posted asking for how to do pivot tables in R. Listing links below that were shared there (note R4DS community threads are deleted every few months)
|
Per Twitter DM conversation, here are my initial notes on your investigation into a pivot table like interface (tables)
On the whole, sounds like a really good idea -- am a bit surprised someone hasn't done a tidyverse <--> pivot-table like interface (which seems could be useful for analysts coming over to R from excel).
Notes through step 0
Did not know about
uncount()
, thanks!base R's
table()
is a bit more pivot table like than thetidyr::pivot_*
functions, but it is limited to just the 2-d case.Notes step 1
I see how you have y and y0, I'm wondering if the equivalent for x's may be limited by tibble/data.frame structure. It may work building this on top of data.frames but also makes me wonder if a new class would be better that allowed for multiple column headers and multiple row labels... or something... (but trying to do this would obviously add a lot of work... I do know tidyverse has documentation on building new classes on top of dataframes but haven't ever looked at.)
Notes on question at end
I don't know if it does or doesn't. I feel like there is a lot of #rstats stuff out there that attempts to replicate bits of excel functionality so wouldn't be surprised if something like it did, but I don't know of it. You might ping David Gohel and see if he's aware of anything.
I like the API you describe here better (allowing each to take multiple arguments), I also think I like arguments of
rows
andcols
better.I feel like rather than having a
y0
argument, this should be handled by multiple variables passed into a vector or tidyselection specification.For example rather than:
Would have:
(Which has the same output as your example.)
This works the other way too but has the issue mentioned above of only one column header (so levels just get concatenated).
Like idea of
cols
androws
both be able to take either a vector (as shwon above) or any tidyselection (e.g.all_of("Class", "Survived")
).Also think would be nice to have a
.fns
argument that could take in things likelist(n = n, sum = sum)
, etc. (similar to howdplyr::across()
takes in functions). If you do this, could have a more generic name likepivot_table()
and don't need to have separate functions for each aggregation type (and gives more flexibility to user-- there's a lot could do here in terms of interface, e.g. if wanted to make it possible for aggregations to specify variables inside of them...).Similarly, I'm wondering if there's some way you wouldn't have to make a separate function like
two_way_prop()
and what kind of API make sense for getting across the "Show Values As" options in excel. Maybe awithin
-like argument, where this is just a grouping specification, or maybe there are different functions that get across the logic for the "Show Values As" logic...Alternative approach
One thing about the approach described above is that while it supports tidyselection it's not a traditional lego-block/pipe-based approach that is characteristic of tidyverse style syntax -- an alternative more pipe-based API would be something where you build up a pivot table piece by piece (more like a ggplot2 or tidymodels workflow object), e.g.
This mirrors the way pivot tables are created step by step (though am not sure if this is actually necessary).
One potential advantage with this approach is maybe you could have each addition have a
within
argument that was set to eitherTRUE
orFALSE
and would specify whether the specification was used to set the grouping level for how the values are displayed so as to cover the "Show Values As" functionality. May want to use a different argument thanwithin
... maybe not though, again though handling proportions and an interface for the options in "Show values as" in excel may take more thought.Other questions
Regarding 3, would probably take a little more thought to get really down, two good resources are rlang Tidy evaluation tab (which was updated late last year): https://rlang.r-lib.org/index.html along with "programming with dplyr" vignette in dplyr. You also might take a look at {dplyover} (though I know author plans to refactor code, so don't know if internals are that great, but I think the interface for variable selection is good -- is pretty much the same as how is done in my package {pwiser}, of which I actually didn't write the internals).
For 4. I'm not sure on best approach there -- was a discussion on dplyr about such a feature and why they don't include it in dplyr: tidyverse/dplyr#3463 , the {janitor} package has
janitor::adorn_totals()
that maybe would be useful. Some of these parts are tricky because pivot tables are kind of 1/2 analysis 1/2 reporting/aesthetics (which you kind of suggest with your use of {gt}).Hope notes are helpful!
The text was updated successfully, but these errors were encountered: