-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Interest for repeat(x, lengths) function? #16443
Comments
This is a really good question. I've never felt that there's a great solution, but I would say that |
Should I interpret the small number of reactions as a sign of lack of support? :-) |
I just ran into this problem and this function would be nice to have :) |
+1 for |
Bump - I came to the repo to open this issue. I quite often have a use for this - one example is to interpret range-length-encoded Vectors. |
Maybe make a pull request to get the ball rolling? Since |
Yeah, the question is to design how it would interact with |
Possibly a clearer suggestion than a new keyword would be to allow the |
Yeah, why not. |
OK I'll take a look |
This just came up on Slack. Any reason the PR was never merged? Also what's the current best workaround? |
Seems like the PR author might have stopped working on it before it was finished. Do you want to take a stab at updating it? |
mmm... I'm not sure how I would proceed, but since the Slack question was about sparse arrays, I just noticed that this could use some improvement (on Julia v1.6.0): julia> using SparseArrays, BenchmarkTools
julia> n = 1000 ;
julia> S = sprand(n,n,1/n) ; # Let's (inner) repeat this sparse array
julia> function repeat_inner(X, t) # new function to (inner) repeat by repeating the axes first
idxs = [repeat(axes(X,d), inner=c) for (d,c) in enumerate(t)]
X[idxs...]
end
repeat_inner (generic function with 1 method)
julia> repeat_inner(S, (2,3)) == repeat(S, inner=(2,3)) # check
true
julia> @btime repeat($S, inner=(20,30)) ;
8.254 s (24 allocations: 19.82 MiB)
julia> @btime repeat_inner($S, (20,30)) ;
1.245 ms (15 allocations: 9.81 MiB) This poorman's benchmark gives a factor ~2 reduction in memory allocation and a factor ~5'000 improvement in speed! So before I (potentially) dive into these internals, could someone help me understand why |
Are you sure we're talking about the same function? This issue is about passing a |
I showed the benchmark because if I was starting a PR I would use the same approach, except I would use the function rep(x::AbstractVector, lengths::AbstractVector{T}) where {T <: Integer}
if length(x) != length(lengths)
throw(DimensionMismatch("vector lengths must match"))
end
res = similar(x, sum(lengths))
i = 1
for idx in 1:length(x)
tmp = x[idx]
for kdx in 1:lengths[idx]
res[i] = tmp
i += 1
end
end
return res
end
function repeat_inner(X, t) # new function to (inner) repeat by repeating the axes first
idxs = [rep(axes(X,d), v) for (d,v) in enumerate(t)]
X[idxs...]
end which would give julia> A = [1 2; 3 4]
2×2 Matrix{Int64}:
1 2
3 4
julia> repeat_inner(A, ([1,2],[3, 4]))
3×7 Matrix{Int64}:
1 1 1 2 2 2 2
3 3 3 4 4 4 4
3 3 3 4 4 4 4 Now this works on that example, but I'm guessing there is a reason that approach was not used for So yes, I am aware that this was a different feature. Since Julia developers/contributors generally know these things better than me, I thought I should post that benchmark first to ask if there's anything wrong with my approach (and hopefully it would help my understanding of the current Base approach). |
DataArrays.jl currently offers a
rep
function modelled after R's, and which is mostly redundant withrepeat
(at least after #14082 is merged). So I'd like to deprecate it.There's one missing feature, though:
rep(x::AbstractVector, lengths::AbstractVector)
returns a vector with each element repeatedlengths[i]
times. This is used once in DataFrames.jl, so it looks like it might be of more general use. Should we include this functionality in Base?If yes, I'm not sure where it should go. It would be logical to add it to
repeat
via another keyword argument, but it would completely conflict with the existing ones, which means it would essentially be a separate function. We could addrepvec
to parallelrepmat
and support it there (with the same signature asrep
). OTOH, it could be nice to be able to deprecaterepmat
when type inference is fixed forrepeat
.For reference, the definition of
rep
:The text was updated successfully, but these errors were encountered: