Interest for repeat(x, lengths) function? #16443

nalimilan · 2016-05-19T08:44:04Z

DataArrays.jl currently offers a rep function modelled after R's, and which is mostly redundant with repeat (at least after #14082 is merged). So I'd like to deprecate it.

There's one missing feature, though: rep(x::AbstractVector, lengths::AbstractVector) returns a vector with each element repeated lengths[i] times. This is used once in DataFrames.jl, so it looks like it might be of more general use. Should we include this functionality in Base?

If yes, I'm not sure where it should go. It would be logical to add it to repeat via another keyword argument, but it would completely conflict with the existing ones, which means it would essentially be a separate function. We could add repvec to parallel repmat and support it there (with the same signature as rep). OTOH, it could be nice to be able to deprecate repmat when type inference is fixed for repeat.

For reference, the definition of rep:

function rep{T <: Integer}(x::AbstractVector, lengths::AbstractVector{T})
    if length(x) != length(lengths)
        throw(DimensionMismatch("vector lengths must match"))
    end
    res = similar(x, sum(lengths))
    i = 1
    for idx in 1:length(x)
        tmp = x[idx]
        for kdx in 1:lengths[idx]
            res[i] = tmp
            i += 1
        end
    end
    return res
end

The text was updated successfully, but these errors were encountered:

johnmyleswhite · 2016-05-19T17:40:22Z

This is a really good question. I've never felt that there's a great solution, but I would say that repvec is the least contentious.

nalimilan · 2016-10-06T09:06:34Z

Should I interpret the small number of reactions as a sign of lack of support? :-)

tlnagy · 2016-12-22T23:10:53Z

I just ran into this problem and this function would be nice to have :)

timofeytt · 2017-04-17T10:56:15Z

+1 for repvec

Gnimuc · 2017-04-17T11:24:52Z

X-ref: http://stackoverflow.com/questions/41732818/make-a-new-array-by-repeating-some-rows-of-an-old-array-in-julia

mkborregaard · 2018-08-30T10:52:24Z

Bump - I came to the repo to open this issue. I quite often have a use for this - one example is to interpret range-length-encoded Vectors.
In fact, RLEVectors.jl has an efficient implementation of this (but it's often a small atomic data householding operation that this is used for, and rarely worth adding a dependency, especially in package code).

nalimilan · 2018-08-30T11:27:09Z

Maybe make a pull request to get the ball rolling? Since repmat no longer exists, this should probably be implemented via a keyword argument to repeat.

mkborregaard · 2018-08-30T19:50:12Z

Yeah, the question is to design how it would interact with inner and outer.

mkborregaard · 2018-08-30T19:54:05Z

Possibly a clearer suggestion than a new keyword would be to allow the inner keyword to take a Vector of the same length as the input, for the Vector case, and not support higher Array dimensions?

nalimilan · 2018-08-30T20:56:23Z

Yeah, why not.

mkborregaard · 2018-09-03T14:54:40Z

OK I'll take a look

briochemc · 2021-04-19T23:45:23Z

This just came up on Slack. Any reason the PR was never merged? Also what's the current best workaround?

vtjnash · 2021-04-20T00:08:16Z

Seems like the PR author might have stopped working on it before it was finished. Do you want to take a stab at updating it?

briochemc · 2021-04-20T01:20:55Z

mmm... I'm not sure how I would proceed, but since the Slack question was about sparse arrays, I just noticed that this could use some improvement (on Julia v1.6.0):

julia> using SparseArrays, BenchmarkTools

julia> n = 1000 ;

julia> S = sprand(n,n,1/n) ; # Let's (inner) repeat this sparse array

julia> function repeat_inner(X, t) # new function to (inner) repeat by repeating the axes first
           idxs = [repeat(axes(X,d), inner=c) for (d,c) in enumerate(t)]
           X[idxs...]
       end
repeat_inner (generic function with 1 method)

julia> repeat_inner(S, (2,3)) == repeat(S, inner=(2,3)) # check
true

julia> @btime repeat($S, inner=(20,30)) ;
  8.254 s (24 allocations: 19.82 MiB)

julia> @btime repeat_inner($S, (20,30)) ;
  1.245 ms (15 allocations: 9.81 MiB)

This poorman's benchmark gives a factor ~2 reduction in memory allocation and a factor ~5'000 improvement in speed! So before I (potentially) dive into these internals, could someone help me understand why repeat does not take repeated indices first and then just getindex on those?

nalimilan · 2021-04-20T07:45:14Z

Are you sure we're talking about the same function? This issue is about passing a lengths vector with one entry for each entry in x indicating how many times to repeat it. You seem to be looking for an optimized method for the currently supported repeat behavior, which is a completely different issue.

briochemc · 2021-04-20T08:58:00Z

I showed the benchmark because if I was starting a PR I would use the same approach, except I would use the rep function (from the original post) to repeat the indices inside the body of the function. Specifically, I was thinking of

function rep(x::AbstractVector, lengths::AbstractVector{T}) where {T <: Integer}
    if length(x) != length(lengths)
        throw(DimensionMismatch("vector lengths must match"))
    end
    res = similar(x, sum(lengths))
    i = 1
    for idx in 1:length(x)
        tmp = x[idx]
        for kdx in 1:lengths[idx]
            res[i] = tmp
            i += 1
        end
    end
    return res
end
function repeat_inner(X, t) # new function to (inner) repeat by repeating the axes first
    idxs = [rep(axes(X,d), v) for (d,v) in enumerate(t)]
    X[idxs...]
end

which would give

julia> A = [1 2; 3 4]
2×2 Matrix{Int64}:
 1  2
 3  4

julia> repeat_inner(A, ([1,2],[3, 4]))
3×7 Matrix{Int64}:
 1  1  1  2  2  2  2
 3  3  3  4  4  4  4
 3  3  3  4  4  4  4

Now this works on that example, but I'm guessing there is a reason that approach was not used for repeat in Base, right? This is why I posted the benchmark.

So yes, I am aware that this was a different feature. Since Julia developers/contributors generally know these things better than me, I thought I should post that benchmark first to ask if there's anything wrong with my approach (and hopefully it would help my understanding of the current Base approach).

This was referenced Nov 3, 2017

WIP: expand data by FrequencyWeights JuliaStats/StatsBase.jl#314

Closed

modify weighted quantile (aweights + fweights) JuliaStats/StatsBase.jl#316

Merged

mkborregaard mentioned this issue Oct 8, 2018

Pass vector of lengths to repeat; close #16443 #29560

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Interest for repeat(x, lengths) function? #16443

Interest for repeat(x, lengths) function? #16443

nalimilan commented May 19, 2016 •

edited

Loading

johnmyleswhite commented May 19, 2016

nalimilan commented Oct 6, 2016

tlnagy commented Dec 22, 2016

timofeytt commented Apr 17, 2017

Gnimuc commented Apr 17, 2017 •

edited

Loading

mkborregaard commented Aug 30, 2018

nalimilan commented Aug 30, 2018

mkborregaard commented Aug 30, 2018

mkborregaard commented Aug 30, 2018

nalimilan commented Aug 30, 2018

mkborregaard commented Sep 3, 2018

briochemc commented Apr 19, 2021

vtjnash commented Apr 20, 2021

briochemc commented Apr 20, 2021

nalimilan commented Apr 20, 2021

briochemc commented Apr 20, 2021

Interest for repeat(x, lengths) function? #16443

Interest for repeat(x, lengths) function? #16443

Comments

nalimilan commented May 19, 2016 • edited Loading

johnmyleswhite commented May 19, 2016

nalimilan commented Oct 6, 2016

tlnagy commented Dec 22, 2016

timofeytt commented Apr 17, 2017

Gnimuc commented Apr 17, 2017 • edited Loading

mkborregaard commented Aug 30, 2018

nalimilan commented Aug 30, 2018

mkborregaard commented Aug 30, 2018

mkborregaard commented Aug 30, 2018

nalimilan commented Aug 30, 2018

mkborregaard commented Sep 3, 2018

briochemc commented Apr 19, 2021

vtjnash commented Apr 20, 2021

briochemc commented Apr 20, 2021

nalimilan commented Apr 20, 2021

briochemc commented Apr 20, 2021

nalimilan commented May 19, 2016 •

edited

Loading

Gnimuc commented Apr 17, 2017 •

edited

Loading