Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

target setindex implementation #1899

Merged
merged 31 commits into from
Sep 2, 2019
Merged
Show file tree
Hide file tree
Changes from 16 commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
81a22dc
part 1: target setindex
bkamins Jul 26, 2019
5acea57
Apply suggestions from code review
bkamins Aug 4, 2019
bcad2e6
fixes after the review
bkamins Aug 4, 2019
898dcca
Merge branch 'master' into target_setindex_rules
bkamins Aug 13, 2019
5a1f12b
adding functionality part 2
bkamins Aug 13, 2019
db28574
improve placement of deprecations
bkamins Aug 13, 2019
6c3de3d
start rewriting tests; add Logging
bkamins Aug 13, 2019
49b0526
fix typo
bkamins Aug 14, 2019
d41d61d
handle logging warnings in unstack
bkamins Aug 14, 2019
6ba679c
fix method ambiguities
bkamins Aug 14, 2019
865fbf6
disable logging for test/deprecated.jl
bkamins Aug 14, 2019
69377c1
switch to @eval for defining methods
bkamins Aug 14, 2019
1e9d8af
clean up deprecated methods
bkamins Aug 14, 2019
9ce08a5
DataFrameRow assignment tests
bkamins Aug 14, 2019
17c1db3
broadcasting tests
bkamins Aug 14, 2019
374360c
additional tests
bkamins Aug 14, 2019
af66b6f
Apply suggestions from code review
bkamins Aug 24, 2019
34abbf6
fixes after the code review
bkamins Aug 24, 2019
b4d44c3
documentation update
bkamins Aug 24, 2019
3ae4924
fix type inference problem
bkamins Aug 24, 2019
bf8eccb
fix syntax error
bkamins Aug 24, 2019
ee62210
fix bug in broadcasting code
bkamins Aug 24, 2019
6c8201e
fix tests
bkamins Aug 25, 2019
8c3ed72
Apply suggestions from code review
bkamins Sep 1, 2019
867e348
stricter DataFrameRow setindex! rules
bkamins Sep 2, 2019
35c88ee
fix more tests
bkamins Sep 2, 2019
44a5867
fix more tests
bkamins Sep 2, 2019
b3b0ddd
fix bad test
bkamins Sep 2, 2019
e476e08
setting 1-row DataFrame is deprecated but still works
bkamins Sep 2, 2019
eb4340d
add missing line
bkamins Sep 2, 2019
99f669a
fix another test
bkamins Sep 2, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -24,11 +24,12 @@ DataStructures = "864edb3b-99cc-5e75-8d2d-829cb0a9cfe8"
DataValues = "e7dc6d0d-1eca-5fa6-8ad6-5aecde8b7ea5"
Dates = "ade2ca70-3891-5945-98fb-dc099432e06a"
LaTeXStrings = "b964fa9f-0449-5b57-a5c2-d3ea65f4040f"
Logging = "56ddb016-857b-54e1-b83d-db4d58db5568"
Random = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"
Test = "8dfed614-e22c-5e08-85e1-65c5234f0b40"

[targets]
test = ["DataStructures", "DataValues", "Dates", "LaTeXStrings", "Random", "Test"]
test = ["DataStructures", "DataValues", "Dates", "LaTeXStrings", "Logging", "Random", "Test"]

[compat]
julia = "1"
Expand Down
24 changes: 15 additions & 9 deletions docs/src/lib/indexing.md
Original file line number Diff line number Diff line change
Expand Up @@ -110,15 +110,13 @@ then view points to selected columns by their number at the moment of creation o

## `setindex!`

The following list specifies the **target** behavior of `setindex!` operations depending on argument types.

In the current release of DataFrames.jl we are in the transition period when an old, undocumented, behavior
of `setindex!` is still supported, but throws deprecation warnings.

The behavior described below will be fully implemented in the next major release of DataFrames.jl.
The following list specifies the behavior of `setindex!` operations depending on argument types.

In particular a description explicitly mentions if the assignment is *in-place*.

Note that if `setindex!` operation throws an error the target data frame may be partially changed
bkamins marked this conversation as resolved.
Show resolved Hide resolved
so it is unsafe to use it afterwards.

`setindex!` on `DataFrame`:
* `df[row, col] = v` -> set value of `col` in row `row` to `v` in-place;
* `df[CartesianIndex(row, col)] = v` -> the same as `df[row, col] = v`;
Expand All @@ -131,7 +129,9 @@ In particular a description explicitly mentions if the assignment is *in-place*.
also if `col` is a `Symbol` that is not present in `df` then a new column in `df` is created and holds `v`;
equivalent to `df.col = v` if `col` is a valid identifier;
this is allowed if `ncol(df) == 0 || length(v) == nrow(df)`;
* `df[!, cols] = v` -> is currently disallowed, but is planned to be supported in the future;
* `df[!, cols] = v` -> replaces existing columns `cols` in data frame `df` with copying;
`v` must be an `AbstractMatrix` or an `AbstractDataFrame`
(in this case column names must match);
bkamins marked this conversation as resolved.
Show resolved Hide resolved

Note that only `df[!, col] = v` and `df.col = v` can be used to add a new column to a `DataFrame`.
In particular as `df[:, col] = v` is an in-place operation it does not add a column `v` to a `DataFrame` if `col` is missing
Expand All @@ -151,7 +151,9 @@ Note that `sdf[!, col] = v`, `sdf[!, cols] = v` and `sdf.col = v` are not allowe
* `dfr[col] = v` -> set value of `col` in row `row` to `v` in-place;
equivalent to `dfr.col = v` if `col` is a valid identifier;
* `dfr[cols] = v` -> set values of entries in columns `cols` in `dfr` by elements of `v` in place;
`v` can be an `AbstractVector` or `v` can be a `NamedTuple` or `DataFrameRow` when column names must match;
`v` can be: 1) an iterable in which case it must have a number of elements equal to `length(dfr)`,
bkamins marked this conversation as resolved.
Show resolved Hide resolved
2) an `AbstractDict`, when column names must match,
bkamins marked this conversation as resolved.
Show resolved Hide resolved
3) a `NamedTuple` or `DataFrameRow` when column names and order must match;
bkamins marked this conversation as resolved.
Show resolved Hide resolved

## Broadcasting

Expand All @@ -163,6 +165,10 @@ The following broadcasting rules apply to `AbstractDataFrame` objects:
of dimensionality higher than two.
* If multiple `AbstractDataFrame` objects take part in broadcasting then they have to have identical column names.

Note that if broadcasting assignment operation throws an error the target data frame may be partially changed
so it is unsafe to use it afterwards.


Broadcasting `DataFrameRow` is currently not allowed (which is consistent with `NamedTuple`).

It is possible to assign a value to `AbstractDataFrame` and `DataFrameRow` objects using the `.=` operator.
Expand All @@ -178,7 +184,7 @@ Additional rules:
* in the `df[row, cols] .= v` syntaxes the assignment to `df` is performed in-place;
* in the `df[rows, col] .= v` and `df[rows, cols] .= v` syntaxes the assignment to `df` is performed in-place;
* in the `df[!, col] .= v` syntax column `col` is replaced by a freshly allocated vector; if `col` is `Symbol` and it is missing from `df` then a new column is added; the length of the column is always the value of `nrow(df)` before the assignment takes place;
* `df[!, cols] = v` syntax is currently disallowed, but is planned to be supported in the future;
* the `df[!, cols] .= v` replaces existing columns `cols` in data frame `df` with freshly allocated vectors;
bkamins marked this conversation as resolved.
Show resolved Hide resolved
* `df.col .= v` syntax is allowed and performs in-place assignment to an existing vector `df.col`.
* in the `sdf[CartesianIndex(row, col)] .= v`, `sdf[row, col] .= v` and `sdf[row, cols] .= v` syntaxes the assignment to `sdf` is performed in-place;
* in the `sdf[rows, col] .= v` and `sdf[rows, cols] .= v` syntaxes the assignment to `sdf` is performed in-place;
Expand Down
129 changes: 96 additions & 33 deletions src/dataframe/dataframe.jl
Original file line number Diff line number Diff line change
Expand Up @@ -498,51 +498,114 @@ function Base.setindex!(df::DataFrame, v::Any, row_ind::Integer, col_ind::Column
return df
end

# df[SingleRowIndex, MultiColumnIndex] = value
# the method for value of type DataFrameRow, AbstractDict and NamedTuple
# is defined in dataframerow.jl

for T in (:AbstractVector, :Regex, :Not, :Between, :All, :Colon)
@eval function Base.setindex!(df::DataFrame,
v,
row_ind::Integer,
col_inds::$(T))
bkamins marked this conversation as resolved.
Show resolved Hide resolved
idxs = index(df)[col_inds]
if length(v) != length(idxs)
throw(DimensionMismatch("$(length(idxs)) columns were selected and the assigned" *
bkamins marked this conversation as resolved.
Show resolved Hide resolved
" value contains $(length(v)) elements"))
bkamins marked this conversation as resolved.
Show resolved Hide resolved
end
for (i, x) in enumerate(v)
df[row_ind, i] = x
end
return df
end
end

# df[MultiRowIndex, SingleColumnIndex] = AbstractVector
function Base.setindex!(df::DataFrame,
v::AbstractVector,
row_inds::Union{AbstractVector, Not}, # add Colon after deprecation
col_ind::ColumnIndex)
x = df[!, col_ind]
try
x[row_inds] = v
catch
Base.depwarn("implicit vector broadcasting in setindex! is deprecated; " *
"write `df[row_inds, col_ind] .= v` instead", :setindex!)
insert_multiple_entries!(df, v, row_inds, col_ind)
for T in (:AbstractVector, :Not, :Colon)
@eval function Base.setindex!(df::DataFrame,
v::AbstractVector,
row_inds::$(T),
col_ind::ColumnIndex)
x = df[!, col_ind]
try
x[row_inds] = v
catch
insert_multiple_entries!(df, v, axes(df, 1)[row_inds], col_ind)
Base.depwarn("implicit vector broadcasting in setindex! is deprecated; " *
"write `df[row_inds, col_ind] .= v` instead", :setindex!)
end
return df
end
return df
end

# df[MultiRowIndex, MultiColumnIndex] = AbstractDataFrame
function Base.setindex!(df::DataFrame,
new_df::AbstractDataFrame,
row_inds::Union{AbstractVector, Not}, # add Colon after deprecation
col_inds::Union{AbstractVector, Regex, Not, Between, All}) # add Colon after deprecation
idxs = index(df)[col_inds]
if view(_names(df), idxs) != _names(new_df)
Base.depwarn("in the future column names in source and target will have to match", :setindex!)
for T1 in (:AbstractVector, :Not, :Colon),
T2 in (:AbstractVector, :Regex, :Not, :Between, :All, :Colon)
@eval function Base.setindex!(df::DataFrame,
new_df::AbstractDataFrame,
row_inds::$(T1),
col_inds::$(T2))
idxs = index(df)[col_inds]
for (j, col) in enumerate(idxs)
df[row_inds, col] = new_df[!, j]
end
if view(_names(df), idxs) != _names(new_df)
Base.depwarn("in the future column names in source and target will have to match", :setindex!)
end
return df
end
for (j, col) in enumerate(idxs)
df[row_inds, col] = new_df[!, j]
end

for T in (:AbstractVector, :Regex, :Not, :Between, :All, :Colon)
@eval function Base.setindex!(df::DataFrame,
new_df::AbstractDataFrame,
row_inds::typeof(!),
col_inds::$(T))
idxs = index(df)[col_inds]
if view(_names(df), idxs) != _names(new_df)
throw(ArgumentError("Column names in source and target data frames do not match"))
end
for (j, col) in enumerate(idxs)
# make sure we make a copy on assignment
df[!, col] = new_df[:, j]
end
return df
end
return df
end

# df[MultiRowIndex, MultiColumnIndex] = AbstractMatrix
function Base.setindex!(df::DataFrame,
mx::AbstractMatrix,
row_inds::Union{AbstractVector, Not}, # add Colon after deprecation
col_inds::Union{AbstractVector, Regex, Not, Between, All}) # add Colon after deprecation
idxs = index(df)[col_inds]
if size(mx, 2) != length(idxs)
throw(DimensionMismatch("number of selected columns ($(length(idxs))) and number of columns in" *
" matrix ($(size(mx, 2))) do not match"))
for T1 in (:AbstractVector, :Not, :Colon),
T2 in (:AbstractVector, :Regex, :Not, :Between, :All, :Colon)
@eval function Base.setindex!(df::DataFrame,
mx::AbstractMatrix,
row_inds::$(T1),
col_inds::$(T2))
idxs = index(df)[col_inds]
if size(mx, 2) != length(idxs)
throw(DimensionMismatch("number of selected columns ($(length(idxs))) and number of columns in" *
" matrix ($(size(mx, 2))) do not match"))
end
for (j, col) in enumerate(idxs)
df[row_inds, col] = view(mx, :, j)
end
return df
end
for (j, col) in enumerate(idxs)
df[row_inds, col] = view(mx, :, j)
end

for T in (:AbstractVector, :Regex, :Not, :Between, :All, :Colon)
@eval function Base.setindex!(df::DataFrame,
mx::AbstractMatrix,
row_inds::typeof(!),
col_inds::$(T))
idxs = index(df)[col_inds]
if size(mx, 2) != length(idxs)
throw(DimensionMismatch("number of selected columns ($(length(idxs))) and number of columns in" *
" matrix ($(size(mx, 2))) do not match"))
end
for (j, col) in enumerate(idxs)
df[!, col] = mx[:, j]
bkamins marked this conversation as resolved.
Show resolved Hide resolved
end
return df
end
return df
end

##############################################################################
Expand Down
35 changes: 30 additions & 5 deletions src/dataframerow/dataframerow.jl
Original file line number Diff line number Diff line change
Expand Up @@ -98,14 +98,39 @@ Base.@propagate_inbounds Base.getindex(r::DataFrameRow, idxs::Union{AbstractVect
DataFrameRow(parent(r), row(r), parentcols(index(r), idxs))
Base.@propagate_inbounds Base.getindex(r::DataFrameRow, ::Colon) = r

Base.@propagate_inbounds function Base.setindex!(r::DataFrameRow, value::Any, idx)
col = parentcols(index(r), idx)
if !(col isa Int)
Base.depwarn("implicit broadcasting in DataFrameRow assignment is deprecated", :setindex!)
for T in (:AbstractVector, :Regex, :Not, :Between, :All, :Colon)
@eval function Base.setindex!(df::DataFrame,
v::Union{DataFrameRow, NamedTuple, AbstractDict},
row_ind::Integer,
col_inds::$(T))
idxs = index(df)[col_inds]
if length(v) != length(idxs)
throw(DimensionMismatch("$(length(idxs)) columns were selected but the assigned" *
" value contains $(length(v)) elements"))
bkamins marked this conversation as resolved.
Show resolved Hide resolved
end

if !(v isa AbstractDict || all(((a, b),) -> a == b, zip(view(_names(df), idxs), keys(v))))
bkamins marked this conversation as resolved.
Show resolved Hide resolved
mismatched = findall(view(_names(df), idxs) .!= collect(keys(v)))
throw(ArgumentError("Selected column names do not match the names in assigned value in" *
" positions $(join(mismatched, ", ", " and "))"))
end
if v isa AbstractDict
for n in view(_names(df), idxs)
if !haskey(v, n)
throw(ArgumentError("Column $n not found in source dictionary"))
bkamins marked this conversation as resolved.
Show resolved Hide resolved
end
end
end
for (col, val) in pairs(v)
df[row_ind, col] = val
end
return df
end
setindex!(parent(r), value, row(r), col)
end

Base.@propagate_inbounds Base.setindex!(r::DataFrameRow, value, idx) =
setindex!(parent(r), value, row(r), parentcols(index(r), idx))

index(r::DataFrameRow) = getfield(r, :colindex)

Base.names(r::DataFrameRow) = _names(parent(r))[parentcols(index(r), :)]
Expand Down
Loading