Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Views of DataFrame design issue #3272

Closed
bkamins opened this issue Jan 24, 2023 · 3 comments · Fixed by #3273
Closed

Views of DataFrame design issue #3272

bkamins opened this issue Jan 24, 2023 · 3 comments · Fixed by #3273
Labels
Milestone

Comments

@bkamins
Copy link
Member

bkamins commented Jan 24, 2023

@nalimilan we have the following related problems in the core design of DataFrames.jl. What do you think we should do about it?

Problem 1. view with no columns

julia> df = DataFrame(a=1:3, b=4:6)
3×2 DataFrame
 Row │ a      b
     │ Int64  Int64
─────┼──────────────
   1 │     1      4
   2 │     2      5
   3 │     3      6

julia> sdf = @view df[:, 2:1]
0×0 SubDataFrame

julia> sdf[[true, true, true], :]
0×0 DataFrame

julia> sdf[Bool[], :]
ERROR: BoundsError: attempt to access 3-element Base.OneTo{Int64} at index [0-element Vector{Bool}]

The issue: view allows to set its :rows field to whatever parent allows, but with 0 columns it should have 0 rows.

Problem 2. dropping columns with view

julia> df = DataFrame(a=1:3, b=4:6)
3×2 DataFrame
 Row │ a      b
     │ Int64  Int64
─────┼──────────────
   1 │     1      4
   2 │     2      5
   3 │     3      6

julia> sdf = @view df[:, :]
3×2 SubDataFrame
 Row │ a      b
     │ Int64  Int64
─────┼──────────────
   1 │     1      4
   2 │     2      5
   3 │     3      6

julia> dfr = df[1, :]
DataFrameRow
 Row │ a      b
     │ Int64  Int64
─────┼──────────────
   1 │     1      4

julia> select!(df)
0×0 DataFrame

julia> sdf[:, :]
ERROR: BoundsError: attempt to access 0×0 DataFrame at index [1:3, 1:0]

julia> dfr
Error showing value of type DataFrameRow{DataFrame, DataFrames.Index}:
ERROR: BoundsError: attempt to access 0×0 DataFrame at index [[1], 1:0]

The issue here is that if we use : as column selector it is a valid operation to drop/add columns to a data frame.
However, if we move past 0-columns case we might even change number of rows in a data frame.

If something is not clear please let me know. In general I was not sure how to best handle it (there are several possible options, but in general it shows a hole in the design so I was not sure what is best).

@bkamins bkamins added the bug label Jan 24, 2023
@bkamins bkamins added this to the patch milestone Jan 24, 2023
@bkamins
Copy link
Member Author

bkamins commented Jan 24, 2023

Results of some additional thinking:

Regarding the second issue (changing set of columns in a parent of a view with : as column selector). While it is irritating, this is strictly speaking not an issue. If we drop all columns in the parent we change its number of rows - and this is disallowed when working with views. So we can leave things as is

Regarding the first issue (0-column view of a data frame with some rows). I was thinking of two solutions:

  • changing the rows argument when sub data frame is created to be empty;
  • changing the implementation of view and getindex so that they work correctly in this case.

The first (fixing rows) is easier to do, but it will give an incorrect parentindices result (i.e. you cannot recover the row indices used when creating a SubDataFrame. Though it is not strictly required that parentindices should allow for this. Also fixing rows would be technically breaking.

So I am leaning to changing view and getindex implementations, but keeping rows field untouched. What do you think?

@nalimilan
Copy link
Member

Makes sense. Is there any drawback to that solution?

@bkamins
Copy link
Member Author

bkamins commented Jan 24, 2023

I do not see any (except for complexity). See #3273 for an implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants