Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resizing sparse vectors #11

Open
gdalle opened this issue Nov 3, 2021 · 11 comments
Open

Resizing sparse vectors #11

gdalle opened this issue Nov 3, 2021 · 11 comments

Comments

@gdalle
Copy link

gdalle commented Nov 3, 2021

Hi there!

I was trying to use sparse columns in a DataFrame and I found out that some methods were missing for the SparseVector type, such as deleteat!. Here's a MWE:

julia> using DataFrames, DataFramesMeta, SparseArrays

julia> df = DataFrame(x = sparse([0, 1, 0, 1]))
4×1 DataFrame
 Row │ x     
     │ Int64 
─────┼───────
   10
   21
   30
   41

julia> @subset!(df, :x .== 1)
ERROR: MethodError: no method matching deleteat!(::SparseVector{Int64, Int64}, ::Vector{Int64})
Closest candidates are:
  deleteat!(::Vector{T} where T, ::AbstractVector{T} where T) at array.jl:1381
  deleteat!(::Vector{T} where T, ::Any) at array.jl:1380
  deleteat!(::BitVector, ::Any) at bitarray.jl:981
  ...
Stacktrace:
 [1] _delete!_helper(df::DataFrame, drop::Vector{Int64})
   @ DataFrames ~/.julia/packages/DataFrames/vuMM8/src/dataframe/dataframe.jl:999
 [2] delete!(df::DataFrame, inds::Vector{Int64})
   @ DataFrames ~/.julia/packages/DataFrames/vuMM8/src/dataframe/dataframe.jl:976
 [3] subset!(df::DataFrame, args::Any; skipmissing::Bool)
   @ DataFrames ~/.julia/packages/DataFrames/vuMM8/src/abstractdataframe/subset.jl:292
 [4] top-level scope
   @ ~/.julia/packages/DataFramesMeta/yzaoq/src/macros.jl:830

Is there a specific reason why this doesn't exist, or is it just a case of "no one has gotten around to it yet"? In that scenario, I wouldn't mind taking a crack at it if the difficulty is reasonable enough!

cc @pdeffebach @bkamins following our discussion in JuliaData/CategoricalArrays.jl#374

@fredrikekre
Copy link
Member

@nalimilan
Copy link
Contributor

Is there any particular advantage to n being fixed and to sparse vectors being immutable?

@bkamins
Copy link

bkamins commented Nov 3, 2021

Is there any particular advantage to n being fixed and to sparse vectors being immutable?

probably bounds checking is a bit faster.

@fredrikekre
Copy link
Member

n needs to be stored (it is not just the length of the vector of indices).

@bkamins
Copy link

bkamins commented Nov 3, 2021

Yes - it has to be stored, but it could be made mutable.

@gdalle
Copy link
Author

gdalle commented Nov 3, 2021

However common knowledge has it that mutable structs are slower. I haven't managed to find an official source for this, but maybe this could be worked around by storing n as a reference? The idea is that we don't need (or want) to make both vectors in the SparseVector struct mutable as well

@vtjnash
Copy link
Contributor

vtjnash commented Nov 3, 2021

Common knowledge sounds wrong then

@KristofferC
Copy link
Member

Common knowledge sounds wrong then

Ref JuliaLang/julia#9755

@gdalle
Copy link
Author

gdalle commented Nov 3, 2021

So could we keep the struct immutable and wrap n in a Ref{Ti} instead? How much of a performance overhead should we expect then? And how devastating would it be for the sparse codebase 😅 ?

@gdalle
Copy link
Author

gdalle commented Nov 3, 2021

See also JuliaLang/julia#15668

@KristofferC KristofferC transferred this issue from JuliaLang/julia Jan 14, 2022
@mind6
Copy link

mind6 commented Apr 30, 2023

sounds like you need a SparseCateogrical specificly for use in dataframes, where resizing is needed over immutability.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants