Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the performance of reinterpretarray with simultaneous reshaping #37559

Merged
merged 3 commits into from
Sep 29, 2020

Conversation

timholy
Copy link
Member

@timholy timholy commented Sep 13, 2020

This introduces reinterpret(reshape, T, A). This adds a dimension when reinterpreting to a smaller element size (e.g., Tuple{Int,Int}->Int), removes a dimension when reinterpreting to a bigger element size, and keeps the dimensionality constant when the element size is unchanged. Here's a demo:

julia> a = [(1,2), (3,4)]
2-element Vector{Tuple{Int64, Int64}}:
 (1, 2)
 (3, 4)

julia> reinterpret(reshape, Int, a)       # bigger elsize to smaller elsize, adds a dimension
2×2 reinterpret(reshape, Int64, ::Vector{Tuple{Int64, Int64}}):
 1  3
 2  4

julia> b = [1 2; 3 4]
2×2 Matrix{Int64}:
 1  2
 3  4

julia> reinterpret(reshape, Tuple{Int,Int}, b)    # smaller elsize to bigger elsize, removes a dimension
2-element reinterpret(reshape, Tuple{Int64, Int64}, ::Matrix{Int64}):
 (1, 3)
 (2, 4)

julia> c = [1, -1]
2-element Vector{Int64}:
  1
 -1

julia> reinterpret(reshape, UInt, c) # same elsize, no change in dimensionality
2-element reinterpret(reshape, UInt64, ::Vector{Int64}):
 0x0000000000000001
 0xffffffffffffffff

Obviously, the first dimension must have size commensurate with the difference in element size.

This scenario is widely used:

  • it's used when transitioning between a Vector of SVectors and a matrix where each vector is on a column
  • it's used in splitting and combining color channels in image processing

Unfortunately, since Julia 1.0 the performance of ReinterpretArray has been quite poor in these circumstances. For those who want benchmarks, see JuliaImages/ImageCore.jl#142; the short answer is that the improvements are dramatic, and come close to erasing any difference between the performance of a copy and the view itself (in some cases, even providing performance improvements). The trick here is that LLVM knows the size of the first dimension and so can often unroll the inner loop.

This replaces the strategy I was aiming for in #37290 by reshaping the array. I definitely think this is a better solution (thanks as always @mbauman, who has a gift for brief but incredibly useful suggestions). This is split into two commits: the first is pretty straightforward and, I think, relatively risk-free. The second is much scarier: it attempts to retain the performance advantages of linear indexing when the parent array supports it, but introduces a new AbstractCartesianIndex subtype. I think I've prevented packages from getting borked by not supporting this index type (see the new WrappedArray tests), but there's no doubt that this is the part that deserves the closest scrutiny.

Fixes #28980
Closes #37290

@timholy
Copy link
Member Author

timholy commented Sep 13, 2020

CC @johnnychen94, @RalphAS

end
IndexStyle(::IndexSCartesian2{K,N}, ::IndexSCartesian2{K,N}) where {K,N} = IndexSCartesian2{K,N}()

struct SCartesianIndex2{K,N} <: AbstractCartesianIndex{N}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unsure about the N parameter here. It felt slightly weird to have it be a subtype of generic AbstractCartesianIndex, and this does make it clear how many dimensions the particular SCartesianIndex2 is standing in for. The downside is that this will force a lot of completely useless specialization.

@timholy
Copy link
Member Author

timholy commented Sep 25, 2020

I'll probably merge this Monday. Any thoughts? I'm especially interested in feedback on #37559 (comment).

This addresses longstanding performance problems with `reinterpret`
when `sizeof(eltype(a))` is an integer multiple of `sizeof(T)`.
By reshaping the array to have an extra "channel dimension,"
LLVM can unroll the inner loop thanks to static size information.
Conversely, this consumes the initial "channel dimension" if
`sizeof(T)` is an integer multiple of `sizeof(eltype(a))`.
@timholy timholy force-pushed the teh/reinterpret_reshape branch from dfca7db to 797944c Compare September 29, 2020 12:39
@timholy
Copy link
Member Author

timholy commented Sep 29, 2020

I decided to get rid of the non-functional type parameter. I'll merge this once it gets through CI.

In order to make this a subtype of `AbstractCartesianIndex`, formerly
this added a useless type parameter `N` to indicate the dimensionality
of the array for which this index was constructed. But since none of the
code depends on `N`, and it would have forced useless specialization,
it seems better to not have it.
@timholy timholy force-pushed the teh/reinterpret_reshape branch from 797944c to 7143444 Compare September 29, 2020 14:07
@timholy
Copy link
Member Author

timholy commented Sep 29, 2020

I decided to get rid of the non-functional type parameter. When we merge, though, let's not squash, so we can simply revert the 3rd commit if folks think that was a bad idea.

That's a lot of failures, but as far as I can tell they have nothing to do with this PR since several of my PRs are showing somewhat similar errors. OK to merge? EDIT: looks related to #37731 and with overlaps in other failures I see elsewhere.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Performance of ReinterpretArray, continued
1 participant