-
Notifications
You must be signed in to change notification settings - Fork 368
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support type-based column selectors #3034
Comments
Yes - I just need to think if there are any corner cases that would lead to problems. We could even potentially allow |
OK - now I remember why we do not have this. Except the So adding the requested functionality would require a significant redesign. This is of course doable. @nalimilan - what do you think? |
I agree it would be nice to be able to do |
@nalimilan - I can do it. The only issue is that the PR might end up being 1000 lines and touch many files so it will be hard to review (not sure yet - maybe it will be easier). Essentially we need to drop using In other words the original design of DataFrames.jl assumes such functionality will not be needed ( |
@nalimilan - let us make a decision if we:
I would like to finalize the scope of 1.4 release so that we can have it before JuliaCon. |
I move it to 1.5 release for a decision |
I was thinking about it. The issue is that DataFrames.jl/src/other/index.jl Line 1 in b240458
so it - by design - only supports name lookup. Now the issue is that to create a In summary this means that it is a major redesign of @nalimilan - the question is if we want to do it. An alternative would be to special case such selector before passing it to index, but this will lead to ugly design (in many places we will have to apply a patch that is hard to maintain). |
After more thinking I am giving it a 1.x milestone. Maybe we will add it at some point, but it is not likely we will do it fast. For now users need to use |
In this issue let us track all request for basing column selection on column values (as column element type is just a special case). In this post I discuss the choice in more detail. If you feel we should add this functionality please vote up: 👍. Thank you! |
My two cents about why I wouldn't recommend adding a new method:
|
I appreciate the thoughtful examples in the blog post! With the examples you’ve given there, I think I should be able to wrap this functionality within TidierData.jl. The only piece I’m concerned about is making sure I escape the data frame in the right place since I have a bunch of functions that parse and modify the expression along the way. Will let you know if I run into roadblocks. |
Looks like an interesting feature. If there is no performance benefit of
over
perhaps it might as well be done by a macro in DataFramesMeta? |
If work with PCA or cor(Matrix), better with Number Type, using Pipe,Tidier
df =load_csv("airbnb_nyc_2019",false)
type_df=@pipe describe(df)|>select(_,[:variable,:eltype])
int_df=@chain type_df begin
@filter(isa(eltype,Union{Type{Int64},Type{Float64}}))
end
|
Hi @math4mad, Thanks for the question. Just to clarify, are you asking:
Or all of the above? That may help with tailoring the reply a bit better. Thanks! |
just select columns containing Numerical super-type |
Do you mean to select all columns (denoted
(I am listing four most common cases you might want to select. |
at now I think would be option |
Currently, when applying a transformation to all columns of a specific type (or subtypes of an abstract type), a pattern such as
transform(df, names(df, Number) .=> f)
is used.Ideally, this could be achieved with a column-selector, e.g.
transform(df, Cols(Number) .=> f)
.While a minor convenience feature, this may make the column-selector API (even) more consistent and users don't have to repeat the name of the DataFrame multiple times.
The text was updated successfully, but these errors were encountered: