Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

speed up filter of vector #31929

Merged
merged 12 commits into from
Jul 5, 2019
Merged

Conversation

chethega
Copy link
Contributor

@chethega chethega commented May 5, 2019

Use branchfree code for filtering vectors. Part of the saved time is then spent to resize the buffer of the result, such that memory consumption is reduced, if the result should happen to be long-lived.

I observe a 4x speedup for inplace filtering and 7x for out-of-place filtering in the 50% case. If desired we can obtain an additional 30% speedup by skipping the realloc of the buffer, i.e. the sizehint!; this will, however, increase memory consumption. Since the result may long-lived, I tend to be conservative with eating user memory. 1 ms corresponds to about 2 cycles / element.

Abstract assumptions (updated):

  1. AbstractVector and AbstractArray with IndexLinear have indices that survive a round-trip to Int
  2. AbstractVector and AbstractArray with IndexLinear support getindex(a, idxs::Vector{Int})
  3. AbstractArray without IndexLinear support logical indexing (the old code made this assumption).

Old algorithms: Branchy inplace filtering of AbstractVector; out of place, logical indexing for AbstractArray and push!-based for Vector.

New algorithms: Branchless inplace filtering for AbstractVector; branchless out-of-place for Array, Vector{Int}-indexing for AbstractArray with LinearIndexing (using branchless construction of the indexing vector; this is an 8 times larger temporary than the old code using logical indexing, so we immediately free its buffer via empty!(tmp); sizehint!(tmp, 0) to relieve gc pressure). For AbstractArray without IndexLinear, we continue to use the old logical indexing algorithm.

Memory consumption: We have a larger temporary to store selected indices. In case of inplace filtering Vector, we now shrink the underlying buffer post filtering. In case of out-of-place filtering of Vector, we now allocate a potentially larger buffer and shrink it afterwards, instead of relying on the push!-resize logic. This can be either a gain or a loss with respect to peak memory consumption, and is almost always a win with respect to sustained memory consumption for long-lived results (because the old push! based code leads to up to 2x overallocation of the resulting object, and the old code did not truncate the resulting buffer). This should be a gain with respect to gc pressure.

Updated (2) Microbenchmark:

julia> using BenchmarkTools, Test
julia> struct isle{T} <:Function
       elem::T
       end
julia> (c::isle)(x) = x <= c.elem
julia> Base.deleteat!(a::Test.GenericArray, ran) = begin deleteat!(a.a, ran); a end

julia> begin
       N=10^6
       println("N=$N, copy")
       @btime copy(arr)  evals=1 setup = (arr = rand(N))
       for p in [0.5, 0.15, 0.85, 0.05, 0.95, 0.01, 0.99]
       println("\np=$p, N=$N")
       global arr
       println("filter!(pred, a::Vector)")
       @btime filter!($(isle(p)), arr) evals=1 setup = (arr = rand(N))
       println("filter!(pred, a::Test.GenericVector)")
       @btime filter!($(isle(p)), arr) evals=1 setup = (arr = Test.GenericArray(rand(N)))
       println("filter(pred, a::Vector)")
       @btime filter($(isle(p)), arr) evals=1 setup = (arr = rand(N))
       println("filter(pred, a::Matrix)")
       @btime filter($(isle(p)), arr) evals=1 setup = (arr = rand(N, 1))
       println("filter(pred, a::Test.GenericVector)")
       @btime filter($(isle(p)), arr) evals=1 setup = (arr = Test.GenericArray(rand(N)))
       println("filter(pred, a::AbstractMatrix) with linear indexing")
       @btime filter($(isle(p)), arr) evals=1 setup = (arr = view(rand(N), :, 1:1))
       println("filter(pred, a::AbstractMatrix) with cartesian indexing")
       @btime filter($(isle(p)), arr) evals=1 setup = (arr = view(rand(N), :, [1]))
       end
       end

Before:

N=1000000, copy
  888.068 μs (2 allocations: 7.63 MiB)

p=0.5, N=1000000
filter!(pred, a::Vector)
  6.872 ms (0 allocations: 0 bytes)
filter!(pred, a::Test.GenericVector)
  6.962 ms (0 allocations: 0 bytes)
filter(pred, a::Vector)
  12.613 ms (19 allocations: 5.00 MiB)
filter(pred, a::Matrix)
  7.305 ms (9 allocations: 4.76 MiB)
filter(pred, a::Test.GenericVector)
  7.755 ms (9 allocations: 4.76 MiB)
filter(pred, a::AbstractMatrix) with linear indexing
  8.056 ms (9 allocations: 4.76 MiB)
filter(pred, a::AbstractMatrix) with cartesian indexing
  13.467 ms (9 allocations: 4.76 MiB)

p=0.15, N=1000000
filter!(pred, a::Vector)
  3.296 ms (0 allocations: 0 bytes)
filter!(pred, a::Test.GenericVector)
  3.225 ms (0 allocations: 0 bytes)
filter(pred, a::Vector)
  5.020 ms (18 allocations: 3.00 MiB)
filter(pred, a::Matrix)
  4.042 ms (9 allocations: 2.09 MiB)
filter(pred, a::Test.GenericVector)
  4.304 ms (9 allocations: 2.09 MiB)
filter(pred, a::AbstractMatrix) with linear indexing
  4.701 ms (9 allocations: 2.09 MiB)
filter(pred, a::AbstractMatrix) with cartesian indexing
  9.767 ms (9 allocations: 2.09 MiB)

p=0.85, N=1000000
filter!(pred, a::Vector)
  3.207 ms (0 allocations: 0 bytes)
filter!(pred, a::Test.GenericVector)
  3.513 ms (0 allocations: 0 bytes)
filter(pred, a::Vector)
  10.723 ms (20 allocations: 9.00 MiB)
filter(pred, a::Matrix)
  4.515 ms (9 allocations: 7.43 MiB)
filter(pred, a::Test.GenericVector)
  4.810 ms (9 allocations: 7.43 MiB)
filter(pred, a::AbstractMatrix) with linear indexing
  5.156 ms (9 allocations: 7.43 MiB)
filter(pred, a::AbstractMatrix) with cartesian indexing
  9.533 ms (9 allocations: 7.43 MiB)

p=0.05, N=1000000
filter!(pred, a::Vector)
  1.704 ms (0 allocations: 0 bytes)
filter!(pred, a::Test.GenericVector)
  1.703 ms (0 allocations: 0 bytes)
filter(pred, a::Vector)
  2.553 ms (16 allocations: 1.00 MiB)
filter(pred, a::Matrix)
  3.044 ms (9 allocations: 1.33 MiB)
filter(pred, a::Test.GenericVector)
  3.533 ms (9 allocations: 1.33 MiB)
filter(pred, a::AbstractMatrix) with linear indexing
  3.743 ms (9 allocations: 1.33 MiB)
filter(pred, a::AbstractMatrix) with cartesian indexing
  9.399 ms (9 allocations: 1.33 MiB)

p=0.95, N=1000000
filter!(pred, a::Vector)
  1.804 ms (0 allocations: 0 bytes)
filter!(pred, a::Test.GenericVector)
  2.177 ms (0 allocations: 0 bytes)
filter(pred, a::Vector)
  9.752 ms (20 allocations: 9.00 MiB)
filter(pred, a::Matrix)
  3.576 ms (9 allocations: 8.20 MiB)
filter(pred, a::Test.GenericVector)
  3.869 ms (9 allocations: 8.20 MiB)
filter(pred, a::AbstractMatrix) with linear indexing
  4.252 ms (9 allocations: 8.20 MiB)
filter(pred, a::AbstractMatrix) with cartesian indexing
  8.296 ms (9 allocations: 8.20 MiB)

p=0.01, N=1000000
filter!(pred, a::Vector)
  1.086 ms (0 allocations: 0 bytes)
filter!(pred, a::Test.GenericVector)
  1.149 ms (0 allocations: 0 bytes)
filter(pred, a::Vector)
  1.638 ms (14 allocations: 256.64 KiB)
filter(pred, a::Matrix)
  2.390 ms (9 allocations: 1.03 MiB)
filter(pred, a::Test.GenericVector)
  2.905 ms (9 allocations: 1.03 MiB)
filter(pred, a::AbstractMatrix) with linear indexing
  3.060 ms (9 allocations: 1.03 MiB)
filter(pred, a::AbstractMatrix) with cartesian indexing
  8.558 ms (9 allocations: 1.03 MiB)

p=0.99, N=1000000
filter!(pred, a::Vector)
  1.276 ms (0 allocations: 0 bytes)
filter!(pred, a::Test.GenericVector)
  1.612 ms (0 allocations: 0 bytes)
filter(pred, a::Vector)
  9.319 ms (20 allocations: 9.00 MiB)
filter(pred, a::Matrix)
  3.206 ms (9 allocations: 8.50 MiB)
filter(pred, a::Test.GenericVector)
  3.485 ms (9 allocations: 8.50 MiB)
filter(pred, a::AbstractMatrix) with linear indexing
  3.856 ms (9 allocations: 8.50 MiB)
filter(pred, a::AbstractMatrix) with cartesian indexing
  7.823 ms (9 allocations: 8.51 MiB)

After:

N=1000000, copy
  871.499 μs (2 allocations: 7.63 MiB)

p=0.5, N=1000000
filter!(pred, a::Vector)
  1.721 ms (1 allocation: 0 bytes)
filter!(pred, a::Test.GenericVector)
  1.674 ms (0 allocations: 0 bytes)
filter(pred, a::Vector)
  1.747 ms (3 allocations: 7.63 MiB)
filter(pred, a::Matrix)
  1.743 ms (3 allocations: 7.63 MiB)
filter(pred, a::Test.GenericVector)
  3.383 ms (6 allocations: 11.43 MiB)
filter(pred, a::AbstractMatrix) with linear indexing
  3.171 ms (6 allocations: 11.43 MiB)
filter(pred, a::AbstractMatrix) with cartesian indexing
  13.479 ms (7 allocations: 4.76 MiB)

p=0.15, N=1000000
filter!(pred, a::Vector)
  1.321 ms (1 allocation: 0 bytes)
filter!(pred, a::Test.GenericVector)
  1.636 ms (0 allocations: 0 bytes)
filter(pred, a::Vector)
  1.348 ms (3 allocations: 7.63 MiB)
filter(pred, a::Matrix)
  1.342 ms (3 allocations: 7.63 MiB)
filter(pred, a::Test.GenericVector)
  2.483 ms (6 allocations: 8.76 MiB)
filter(pred, a::AbstractMatrix) with linear indexing
  2.312 ms (6 allocations: 8.76 MiB)
filter(pred, a::AbstractMatrix) with cartesian indexing
  9.728 ms (7 allocations: 2.09 MiB)

p=0.85, N=1000000
filter!(pred, a::Vector)
  2.065 ms (1 allocation: 0 bytes)
filter!(pred, a::Test.GenericVector)
  1.699 ms (0 allocations: 0 bytes)
filter(pred, a::Vector)
  2.224 ms (3 allocations: 7.63 MiB)
filter(pred, a::Matrix)
  2.223 ms (3 allocations: 7.63 MiB)
filter(pred, a::Test.GenericVector)
  4.222 ms (6 allocations: 14.11 MiB)
filter(pred, a::AbstractMatrix) with linear indexing
  4.028 ms (6 allocations: 14.11 MiB)
filter(pred, a::AbstractMatrix) with cartesian indexing
  9.560 ms (7 allocations: 7.43 MiB)

p=0.05, N=1000000
filter!(pred, a::Vector)
  1.181 ms (1 allocation: 0 bytes)
filter!(pred, a::Test.GenericVector)
  1.618 ms (0 allocations: 0 bytes)
filter(pred, a::Vector)
  1.191 ms (3 allocations: 7.63 MiB)
filter(pred, a::Matrix)
  1.189 ms (3 allocations: 7.63 MiB)
filter(pred, a::Test.GenericVector)
  1.991 ms (6 allocations: 8.01 MiB)
filter(pred, a::AbstractMatrix) with linear indexing
  1.822 ms (6 allocations: 8.01 MiB)
filter(pred, a::AbstractMatrix) with cartesian indexing
  9.343 ms (7 allocations: 1.33 MiB)

p=0.95, N=1000000
filter!(pred, a::Vector)
  1.264 ms (0 allocations: 0 bytes)
filter!(pred, a::Test.GenericVector)
  1.687 ms (0 allocations: 0 bytes)
filter(pred, a::Vector)
  1.506 ms (2 allocations: 7.63 MiB)
filter(pred, a::Matrix)
  1.502 ms (2 allocations: 7.63 MiB)
filter(pred, a::Test.GenericVector)
  5.005 ms (6 allocations: 14.87 MiB)
filter(pred, a::AbstractMatrix) with linear indexing
  4.846 ms (6 allocations: 14.87 MiB)
filter(pred, a::AbstractMatrix) with cartesian indexing
  8.307 ms (7 allocations: 8.20 MiB)

p=0.01, N=1000000
filter!(pred, a::Vector)
  1.121 ms (1 allocation: 0 bytes)
filter!(pred, a::Test.GenericVector)
  1.621 ms (0 allocations: 0 bytes)
filter(pred, a::Vector)
  1.131 ms (3 allocations: 7.63 MiB)
filter(pred, a::Matrix)
  1.125 ms (3 allocations: 7.63 MiB)
filter(pred, a::Test.GenericVector)
  1.677 ms (6 allocations: 7.70 MiB)
filter(pred, a::AbstractMatrix) with linear indexing
  1.504 ms (6 allocations: 7.70 MiB)
filter(pred, a::AbstractMatrix) with cartesian indexing
  8.522 ms (7 allocations: 1.03 MiB)

p=0.99, N=1000000
filter!(pred, a::Vector)
  1.233 ms (0 allocations: 0 bytes)
filter!(pred, a::Test.GenericVector)
  1.685 ms (0 allocations: 0 bytes)
filter(pred, a::Vector)
  1.507 ms (2 allocations: 7.63 MiB)
filter(pred, a::Matrix)
  1.524 ms (2 allocations: 7.63 MiB)
filter(pred, a::Test.GenericVector)
  5.155 ms (6 allocations: 15.18 MiB)
filter(pred, a::AbstractMatrix) with linear indexing
  4.941 ms (6 allocations: 15.18 MiB)
filter(pred, a::AbstractMatrix) with cartesian indexing
  7.816 ms (7 allocations: 8.51 MiB)

Monkey-patch:

julia> @eval Base begin

       filter(f::F, a::Vector) where F = invoke(filter, Tuple{F, Array}, f, a)

       function filter(f, a::AbstractArray)
           (IndexStyle(a) != IndexLinear()) && return a[map(f, a)::AbstractArray{Bool}]
           
           j = 1
           idxs = Vector{Int}(undef, length(a))
           for idx in eachindex(a)
               @inbounds idxs[j] = idx
               ai = @inbounds a[idx]
               j = ifelse(f(ai), j+1, j)
           end
           resize!(idxs, j-1)
           res = a[idxs]
           empty!(idxs)
           sizehint!(idxs, 0)
           return res
       end

       function filter(f, a::Array{T, N}) where {T, N}
           j = 1
           b = Vector{T}(undef, length(a))
           for ai in a
               @inbounds b[j] = ai
               j = ifelse(f(ai), j+1, j)
           end
           resize!(b, j-1)
           sizehint!(b, length(b))
           b
       end

       function filter!(f, a::AbstractVector)
           j = firstindex(a)
           for ai in a
               @inbounds a[j] = ai
               j = ifelse(f(ai), nextind(a, j), j)
           end
           j > lastindex(a) && return a
           if a isa Vector
               resize!(a, j-1)
               sizehint!(a, j-1)
           else
               deleteat!(a, j:lastindex(a))
           end
           return a
       end

       end

@ararslan ararslan added arrays [a, r, r, a, y, s] performance Must go faster labels May 5, 2019
@ararslan ararslan requested a review from JeffBezanson May 5, 2019 21:58
base/array.jl Outdated Show resolved Hide resolved
base/array.jl Outdated Show resolved Hide resolved
@chethega
Copy link
Contributor Author

chethega commented May 6, 2019

I am not entirely sure whether we can give filter!(f, a::AbstractVector) the same treatment.

For this, we would do

function filter!(f, a::AbstractVector)
    j = firstindex(a)
    for ai in a
        @inbounds a[j] = ai
        j = ifelse(f(ai), j+1, j)
    end
    deleteat!(a, j:lastindex(a))
    sizehint!(a, length(a))
    a
end

This would work for simple offset vectors. Are there valid abstract vectors where this would fail?

Are we OK with returning a Vector on filter(pred, a::AbstractVector)? If not, what is the right incantation to create an undef variant of a (same offsets, indices, type)?

base/array.jl Outdated Show resolved Hide resolved
@chethega
Copy link
Contributor Author

chethega commented May 7, 2019

I think this could be ready to go: All dispatches now rely on the same interface functions as before (getindex for out-of-place filtering of abstract vectors, deleteat! for inplace filtering of abstract vectors, unchanged code for higher dimensional arrays, known existing interface functions for built-in vectors). I'll squash the commits when ready.

The test failures look unrelated to me. Can we re-run the tests? Can we get a nanosoldier to catch unexpected perf regressions and otherwise delight in smaller numbers?

Then there is the question of time-memory tradeoff: The realloc after filtering is kinda expensive. With the expensive realloc (currently conservatively included), this PR is a strict improvement in both space and time, unless the branches are predicted perfectly (e.g. take every second element).

Any votes for speeding this up even further, at the cost of potentially increased memory consumption? I tend to be against: The overhead of filter is now a small factor of copy, which should be fast enough in case of cheap predicates and is irrelevant in case of expensive predicates.

@KristofferC
Copy link
Member

All CI seems to fail with the same tests, are you sure they are unrelated? I would expect they are not but we can easily rerun the tests.

@KristofferC KristofferC closed this May 7, 2019
@KristofferC KristofferC reopened this May 7, 2019
@chethega
Copy link
Contributor Author

chethega commented May 7, 2019

are you sure they are unrelated?

You are correct and I failed at reading comprehension of the test results. Sorry!

base/array.jl Outdated Show resolved Hide resolved
base/array.jl Outdated Show resolved Hide resolved
base/array.jl Outdated
j = ifelse(f(ai), j+1, j)
end
resize!(idxs, j-1)
return a[idxs]
Copy link
Member

@JeffBezanson JeffBezanson May 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is generally the best algorithm. In some cases the extra index vector might be faster, but not enough to justify using twice the memory in many cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it is using up to 8x the temporary memory (Int instead of Bool), same as a view.

However, the temp is very temporary. Should we manually free its buffer immediately via resize!(idxs, 0); sizehint!(idxs, 0) in order to limit gc pressure? I am not seeing lots of precedent of that in Base, but I like the idea.

Or is the issue that out-of-place filtering of giant vectors of tiny objects can cause oom errors?

b = Vector{T}(undef, length(a))
for ai in a
@inbounds b[j] = ai
j = ifelse(f(ai), j+1, j)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would guess LLVM is smart enough to eliminate the branch if you write it like this?

Suggested change
j = ifelse(f(ai), j+1, j)
j = f(ai) ? j+1: j

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LLVM should. You can also use the rules of arithmetic to write it as:

Suggested change
j = ifelse(f(ai), j+1, j)
j += f(ai)::Bool

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the j += f(ai)::Bool variant: That works, but appears to be no faster and sometimes sligthly slower. My naive guess would be that f(ai) is typically encoded in a flag that can be directly used for a conditional move, and converting the flag to a 0/1 integer register plus addition takes longer (even though llvm should know that both are semantically the same and should select the fastest version regardless of what I wrote). Also, there used to be cases where explicit typeasserts confused the compiler, even if it can correctly infer that f(ai)::Bool. But I may well be hallucinating issues where there are none; should I still change it?

Regarding the j = f(ai) ? j+1 : j spelling: That becomes the same compiled code as felse(f(ai), j+1, j). I just thought that the ifelse was more informative to the reader, since we explicitly want a conditional move instead of a branch. With this explanation, do you still prefer the ?: spelling? Then I'll change it.

base/array.jl Outdated Show resolved Hide resolved
@dkarrasch
Copy link
Member

What's the status of this? How would the examples in #32303 run with these changes?

@chethega chethega closed this Jun 14, 2019
@chethega chethega reopened this Jun 14, 2019
@chethega
Copy link
Contributor Author

I think this should be ready, and the test failure looks unrelated. Rerunning tests to see.

@dkarrasch
Copy link
Member

Could you quickly run the examples in #32303? Is it true that with this PR the AbstractVector specialization yields faster code than when artificially wrapping the vector in an array?

@chethega
Copy link
Contributor Author

Before:

julia> begin
       haystack = repeat(["ab","cd","ef","gh"], 250);
       needles = ["ab","cd","gh"];
       @btime filter($(in(needles)), $haystack)
       @btime getindex($haystack, map($(in(needles)), $haystack))
       haystack = rand(100_000);
       pred = x->x<0.5
       @btime filter($pred, $haystack)
       @btime getindex($haystack, map($pred, $haystack))
       haystack = reshape(rand(100_000), 1,100000)
       @btime filter($pred, $haystack) 
       @btime getindex($haystack, map($pred, $haystack))
       ;
       end;
  26.094 μs (10 allocations: 16.39 KiB)
  20.837 μs (5 allocations: 7.14 KiB)
  1.219 ms (16 allocations: 1.00 MiB)
  684.314 μs (7 allocations: 488.91 KiB)
  691.666 μs (8 allocations: 488.52 KiB)
  690.217 μs (7 allocations: 488.47 KiB)

After:

  19.933 μs (1 allocation: 7.94 KiB)
  20.842 μs (5 allocations: 7.14 KiB)
  148.827 μs (3 allocations: 781.33 KiB)
  695.398 μs (7 allocations: 492.72 KiB)
  159.846 μs (3 allocations: 781.33 KiB)
  682.754 μs (7 allocations: 489.59 KiB)

@chethega
Copy link
Contributor Author

Is it true that with this PR the AbstractVector specialization yields faster code than when artificially wrapping the vector in an array?

To answer your question look at the benchmarks from the top:

p=0.5, N=1000000
filter!(pred, a::Vector)
  6.872 ms (0 allocations: 0 bytes) #before
  1.721 ms (1 allocation: 0 bytes) #after
filter!(pred, a::Test.GenericVector)
  6.962 ms (0 allocations: 0 bytes)
  1.674 ms (0 allocations: 0 bytes)
filter(pred, a::Vector)
  12.613 ms (19 allocations: 5.00 MiB)
  1.747 ms (3 allocations: 7.63 MiB)
filter(pred, a::Matrix)
  7.305 ms (9 allocations: 4.76 MiB)
  1.743 ms (3 allocations: 7.63 MiB)
filter(pred, a::Test.GenericVector)
  7.755 ms (9 allocations: 4.76 MiB)
  3.383 ms (6 allocations: 11.43 MiB)
filter(pred, a::AbstractMatrix) with linear indexing
  8.056 ms (9 allocations: 4.76 MiB)
  3.171 ms (6 allocations: 11.43 MiB)
filter(pred, a::AbstractMatrix) with cartesian indexing
  13.467 ms (9 allocations: 4.76 MiB)
  13.479 ms (7 allocations: 4.76 MiB)

You see that all Array now have the same performance, which is about 4x faster than the fastest old implementation and 8x faster than the old Vector implementation (when 0.5 of entries are filtered and the predicate is super cheap).

AbstractArray with linear indexing (especially AbstractVector) uses a slower algorithm, because we cannot assume that its indices are 1:length(v) (see discussion above) and I failed at figuring out a branch-free allocation-free version that uses the eachindex / axes protocol.

@dkarrasch
Copy link
Member

This is amazing work! If I read the original benchmarks correctly, then this PR even improves the case when the success rate is very small? That's when the current mapfilter approach would push! very rarely and hence have little overhead.

@chethega
Copy link
Contributor Author

Yes. The only case that should currently slow down is when you have a very cheap predicate and filter a very small (size k) subset of giant Array (size n) of non-isbits type.

In that case, the new algorithm allocates a new array of length n, fills it and afterwards truncates it to size k. That could be done almost free with appropriate syscalls, but is currently expensive (need to zero 8*n bytes twice, once in the kernel pagefault handler and once in the julia initializer for the array; optimally we would only write O(k) bytes to memory). The current mood appears to be against mucking with OS internals, but I hope that this will get fixed eventually.

@nickrobinson251
Copy link
Contributor

Fixes #32303

@chethega
Copy link
Contributor Author

Single test fail looks unrelated ("Error: Download failed: curl: (6) Could not resolve host: frippery.org").

Any change requests? Rebase & squash & merge? Or let it wait some more time?

@ViralBShah
Copy link
Member

I think we should merge.

@StefanKarpinski
Copy link
Member

frippery.org is a super sketchy sounding domain to be downloading things from.

@nickrobinson251
Copy link
Contributor

Bump? :)

@StefanKarpinski StefanKarpinski merged commit cd79fe6 into JuliaLang:master Jul 5, 2019
maleadt added a commit to JuliaGPU/GPUArrays.jl that referenced this pull request Sep 5, 2019
maleadt added a commit to JuliaGPU/GPUArrays.jl that referenced this pull request Sep 5, 2019
Comment on lines +2352 to +2353
empty!(idxs)
sizehint!(idxs, 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this really improve performance in the end? This breaks with AbstractArrays for which getindex returns a view which uses idxs to store its indices. It's not clear whether the AbstractArray interface allows that or not. See JuliaData/DataFrames.jl#3192.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nalimilan, maybe open an issue so this doesn't get lost?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure: #47078

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrays [a, r, r, a, y, s] performance Must go faster
Projects
None yet
Development

Successfully merging this pull request may close these issues.