Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Type instability in collect #10961

Merged
merged 1 commit into from
Apr 24, 2015
Merged

Type instability in collect #10961

merged 1 commit into from
Apr 24, 2015

Conversation

swt30
Copy link
Contributor

@swt30 swt30 commented Apr 23, 2015

julia> function collect_range()
           collect(1:2)
           @time collect(1:1_000_000)
       end
collect_range (generic function with 1 method)

julia> function collect_array()
           collect([1,2])
           arr = [i for i=1:1_000_000]
           @time collect(arr)
       end
collect_array (generic function with 1 method)

julia> collect_range();
elapsed time: 0.010505716 seconds (7 MB allocated)

julia> collect_array();
elapsed time: 0.085192319 seconds (68 MB allocated, 3.53% gc time in 3 pauses with 0 full sweep)

There's no point in looping over the array in the second case, since it's already "an array of all items in a collection". So does making collect(v::Vector) just return v seem reasonable? The only thing I can think of is that people might expect collect() to return a copy.

@swt30 swt30 changed the title Make collect(v::Vector) just return v Collecting a vector shouldn't need to loop over the vector Apr 23, 2015
@mbauman
Copy link
Sponsor Member

mbauman commented Apr 23, 2015

Thanks for the contribution! And very good find — collect is indeed performing quite poorly. It looks like the reason is due to a type instability! To prevent a combinatorial explosion of methods, Julia only specializes methods for a specific type if you explicitly parameterize ::Type{T}, which isn't being done in the current definition at array.jl:255. This affects all iterables that have a length method, not just Vector. Would you be willing to make that change instead?

The only thing I can think of is that people might expect collect() to return a copy.

This worries me somewhat - it's a subtle change in semantics. Let's grab the low-hanging fruit first.

@garborg
Copy link
Contributor

garborg commented Apr 23, 2015

I've been wishing we could have both behaviors to choose from. Either behavior alone means running into choices I don't like making -- between writing generic methods and either extra allocation or safety, as fits the situation.

I guess if there was a collect that tried not to copy, it'd be trivial to wrap in your own function that checked ===, but it seems like the wrapper is safer to export for general use.

@StefanKarpinski
Copy link
Sponsor Member

I think that collect should definitely make a copy.

@swt30
Copy link
Contributor Author

swt30 commented Apr 23, 2015

So if I've got this right, the type needs to be parameterised like this?
function collect{T}(::Type{T}, itr)

@swt30
Copy link
Contributor Author

swt30 commented Apr 23, 2015

As a little background, I found this because I was making modifications to deal with the fact that the linspace function now returns a LinSpace, which is a Range. However, I need to pass Arrays to a particular multidimensional interpolation procedure, so I was running collect on each argument beforehand. I suppose an alternative is to do multiple dispatch and only collect the Ranges, but I don't currently know how to do this without writing out a separate method for every possible call signature, where each argument could be either a Range (needing collecting) or a Vector.

@mbauman
Copy link
Sponsor Member

mbauman commented Apr 24, 2015

function collect{T}(::Type{T}, itr)

Yup, exactly! I think this should be about the same as (or slightly faster than) the range method. Yes, for your use-case, I think you could use something like

as_vector{T}(A::AbstractVector{T}) = convert(Vector{T}, A)

(But I'm on a phone and it's untested)

@mbauman
Copy link
Sponsor Member

mbauman commented Apr 24, 2015

Perfect! Timings are much more reasonable now:

julia> collect_range();
elapsed time: 0.00368071 seconds (7 MB allocated)

julia> collect_array();
elapsed time: 0.004951789 seconds (7 MB allocated)

For reference, the fastest that this could absolutely go is bounded by copy (but all these timings are pretty variable on my machine):

julia> arr = collect(1:1_000_000)
       @time copy(arr);
elapsed time: 0.004127298 seconds (7 MB allocated)

Thanks!

mbauman added a commit that referenced this pull request Apr 24, 2015
Collecting a vector shouldn't need to loop over the vector
@mbauman mbauman merged commit 54e9ed1 into JuliaLang:master Apr 24, 2015
@swt30
Copy link
Contributor Author

swt30 commented Apr 24, 2015

Great! Wasn't sure if it would be slower than just copying, but I'm pleased to see that it's not really.

@swt30
Copy link
Contributor Author

swt30 commented Apr 27, 2015

By the way, thanks for the hint about as_vector. I had also tried to convert directly from a LinSpace{Float64} to a Vector{Float64} by just calling convert(Vector, ...) on it. Or equivalently in 0.4, Vector(...). But it doesn't work without explicitly including the type parameter.

@swt30 swt30 changed the title Collecting a vector shouldn't need to loop over the vector Type instability in collect May 1, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants