-
Notifications
You must be signed in to change notification settings - Fork 370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Assignment to SubDataFrame #2785
Comments
Sounds good. I agree it's probably better to start with the more essential features, and evaluate in a second phase whether replacing columns should be allowed.
What extra rules are you referring to? (Ref. #2467 (comment)) |
OK. Then I will add only essential set of methods allowing for adding columns. Extra rules: means rules I describe in this issue. If I think we do not need any The reason is that if you create a Is this clear? When I make a PR I will add |
Something that I had not realized is that having e.g. Maybe it would be worth thinking about the use cases though. For example we could introduce an |
How are you getting this? On the PR I have
I agree that defaulting to |
I assume you both meant
Regarding
Because of these ambiguities in the PR I have not implemented any of them now. Indeed each of them can potentially be useful. In
if Though maybe having "new column filling omitted rows with values from old column" as the only option provided is good enough? (the resason it is better than in-place is because new values might not have types matching the |
Yes indeed I meant
Yeah, probably. The fact that the result of
Right. Also in-place as a default would be weird as it would be completely different from what happens with |
I was thinking about it, and I think it is OK. The reasoning is that we essentially will say:
So, in a sense, non-existing columns would work "as if" they contained virtual @pdeffebach - please comment if we are in agreement here. If yes, I will continue with #2794 (adding what we discuss here + responding to @nalimilan suggestions). As a side note - adding this functionality to |
Yes, this seems good. I think Stata's rules are easy to understand and it's good to emulate them. One thing to note, though, is that this doesn't solve problems with
So we will still have work to do on the API after this. |
Could you please clarify what you mean here? I agree that requiring first to create a view, and then operating on it is a bit heavy, but I feel that you mean something else.
As a next step we could consider |
If I have a data frame with 1000 columns and you do something like
Am I going to pay a cost of creating a view for all 1000 columns, or do I just pay the cost of making a view of |
Only
PS. Preferably I propose not to use DataFramesMeta.jl syntax in DataFrames.jl discussions (in particular DataFramesMeta.jl does not have a finalized syntax yet). |
Ah okay, I will remember that in the future. I think this is exactly correct. I'm fine with |
closing as this is now implemented |
@pdeffebach following our discussion on
WhereDataFrame
I propose the following extra new rules for assignment toSubDataFrame
.They add complexity unfortunately, but the benefit is that they are doing what probably user would want (I hope).
The crucial decision is the following:
data.table
DO NOT allow replacing existing columns; however, if we want we can (and my proposal assumes this); the point is that withdf[:, col] = v
anddf[!, col] = v
distinction we can support both functionalities cleanly. I thought users might find it useful to replace existing columns with filtered-out values rather than only being able to create a new column this way. But maybe you will feel that this is too flexible. Then we can only focus on adding functionalities for adding columnsselect!
In summary - this is a "full option" proposal. I am OK to limit its scope to only add things that we find really useful in practice (but I thought it is better to have at this stage all things laid out on a table and decide what we allow and what we disallow).
The extra rules are only for
SubDataFrame
that does not subset columns (uses:
as column selector - the point is to use:
as selector not to select all columns in their original order as this is a less strict requirements).This is easily identified using the
AbstractIndex
in the type. Example:Functions to cover:
insertcols!
- natural (it does not allow column overwriting already)select!
andtransform!
: columns that are not transformed at all (i.e. are passed as bare:a
orr"a"
, or column renaming, oridentity
function) are reused; all other operations create columns withmissing
in filtered-out rowssdf[!, col] = v
,sdf[!, cols] = v
,sdf.col = v
,sdf[!, col] .= v
,sdf[!, cols] .= v
, andsdf.col .= v
are allowed and replaces columns puttingmissing
in filtered-out rows (all other rules are the same as forDataFrame
)sdf[:, col] = v
andsdf[:, col] .= v
ifcol
is not present in a data frame are allowed and replaces columns puttingmissing
in filtered-out rows(if I have missed any of the functions that add/remove columns from a data frame in-place please comment)
CC @nalimilan @matthieugomez
The text was updated successfully, but these errors were encountered: