Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make sure getindex on DataFrameRows does not alias passed selector #3192

Merged
merged 4 commits into from
Oct 6, 2022

Conversation

bkamins
Copy link
Member

@bkamins bkamins commented Oct 5, 2022

Fixes #3191

@bkamins bkamins added the bug label Oct 5, 2022
@bkamins bkamins added this to the patch milestone Oct 5, 2022
@bkamins bkamins requested a review from nalimilan October 5, 2022 08:27
@bkamins
Copy link
Member Author

bkamins commented Oct 5, 2022

This works around the issue that when making a view a copy of idx is not made. Unfortunately copy is not always doable so it is a bit tricky.

The problem with filter in Base Julia is as follows:

function filter(f, a::AbstractArray)
    (IndexStyle(a) != IndexLinear()) && return a[map(f, a)::AbstractArray{Bool}]

    j = 1
    idxs = Vector{Int}(undef, length(a))
    for idx in eachindex(a)
        @inbounds idxs[j] = idx
        ai = @inbounds a[idx]
        j = ifelse(f(ai), j+1, j)
    end
    resize!(idxs, j-1)
    res = a[idxs]
    empty!(idxs) # I do not know why this line is needed
    sizehint!(idxs, 0) # I do not know why this line is needed
    return res
end

@nalimilan
Copy link
Member

Woops. This was introduced by https://github.com/JuliaLang/julia/pull/31929/files#r987877494. Sounds like a good fix at least in the short term. Though it's a bit annoying to have to make all getindex calls copy the input index unless that's really required by the AbstractArray interface. If not, we could define our own filter implementation.

@bkamins
Copy link
Member Author

bkamins commented Oct 5, 2022

Though it's a bit annoying to have to make all getindex calls copy the input index

The point is that Base Julia assumes (correctly IMO) that getindex produces an independent object from idx. We internally, for performance reasons, create a view, so I think it is our responsibility to de-alias (note that for other objects than AbstractVector{Int} de-aliasing is ensured by SubDataFrame logic).

@bkamins
Copy link
Member Author

bkamins commented Oct 5, 2022

@nalimilan - even if something is changed in Base Julia, I think we still need this fix for backward versions.

Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
@bkamins bkamins merged commit 8f726a6 into main Oct 6, 2022
@bkamins bkamins deleted the bk/fix_eachrow branch October 6, 2022 10:06
@bkamins
Copy link
Member Author

bkamins commented Oct 6, 2022

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Filtering of eachrow(df) not working in 1.4.0
2 participants