Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiplication of a SharedArray and an Array #19238

Closed
krcools opened this issue Nov 6, 2016 · 0 comments · Fixed by #23750
Closed

Multiplication of a SharedArray and an Array #19238

krcools opened this issue Nov 6, 2016 · 0 comments · Fixed by #23750
Labels
parallelism Parallel or distributed computation

Comments

@krcools
Copy link

krcools commented Nov 6, 2016

An Array{Float64} can be multiplied with a SharedArray{Complex128} but the reverse results in the following exception being thrown:

julia> A = zeros(Float64,2,2)
2×2 Array{Float64,2}:
 0.0  0.0
 0.0  0.0

julia> S = SharedArray{Complex128}(2,2);

julia> fill!(S,0)
2×2 SharedArray{Complex{Float64},2}:
 0.0+0.0im  0.0+0.0im
 0.0+0.0im  0.0+0.0im

julia> S*A
ERROR: MethodError: no method matching reinterpret(::Type{Float64}, ::SharedArray{Complex{Float64},2}, ::Tuple{Int64,Int64})
Closest candidates are:
  reinterpret{T,S,N}(::Type{T}, ::Array{S,N}, ::Tuple{Vararg{Int64,N}}) at array.jl:86
  reinterpret{T}(::Type{T}, ::Base.ReshapedArray{T,N,P<:AbstractArray,MI<:Tuple{Vararg{Base.MultiplicativeInverses.SignedMultiplicativeInverse{Int64},N}}}, ::Tuple{Vararg{Int64,N}}) at reshapedarray.jl:102
  reinterpret{T,Tv,Ti,N}(::Type{T}, ::SparseMatrixCSC{Tv,Ti}, ::Tuple{Vararg{Int64,N}}) at sparse\sparsematrix.jl:167
  ...
 in A_mul_B!(::Array{Complex{Float64},2}, ::SharedArray{Complex{Float64},2}, ::Array{Float64,2}) at .\linalg\matmul.jl:135
 in *(::SharedArray{Complex{Float64},2}, ::Array{Float64,2}) at .\linalg\matmul.jl:129

julia> A*S
2×2 Array{Complex{Float64},2}:
 0.0+0.0im  0.0+0.0im
 0.0+0.0im  0.0+0.0im

If the eltype of the arrays match both products are executed without errors.

This sample was run on:

Julia Version 0.5.0
Commit 3c9d753 (2016-09-19 18:14 UTC)
Platform Info:
  System: NT (x86_64-w64-mingw32)
  CPU: Intel(R) Xeon(R) CPU E5-1620 v3 @ 3.50GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.7.1 (ORCJIT, haswell)

and the nightly developers build:

Julia Version 0.6.0-dev.1201
Commit 05ed40b (2016-11-06 03:50 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: Intel(R) Xeon(R) CPU E5-1620 v3 @ 3.50GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.7.1 (ORCJIT, haswell)
@amitmurthy amitmurthy added the parallelism Parallel or distributed computation label Nov 8, 2016
Keno added a commit that referenced this issue Sep 20, 2017
This redoes `reinterpret` in julia rather than punning the memory
of the actual array. The motivation for this is to avoid the API
limitations of the current reinterpret implementation (Array only,
preventing strong TBAA, alignment problems). The surface API
essentially unchanged, though the shape argument to reinterpret
is removed, since those concepts are now orthogonal. The return
type from `reinterpret` is now `ReinterpretArray`, which implements
the AbstractArray interface and does the reinterpreting lazily on
demand. The compiler is able to fold away the abstraction and
generate very tight IR:

```
julia> ar = reinterpret(Complex{Int64}, rand(Int64, 1000));

julia> typeof(ar)
Base.ReinterpretArray{Complex{Int64},Int64,1,Array{Int64,1}}

julia> f(ar) = @inbounds return ar[1]
f (generic function with 1 method)

julia> @code_llvm f(ar)

; Function f
; Location: REPL[2]
define void @julia_f_63575({ i64, i64 } addrspace(11)* noalias nocapture sret, %jl_value_t addrspace(10)* dereferenceable(8)) #0 {
top:
; Location: REPL[2]:1
; Function getindex; {
; Location: reinterpretarray.jl:31
  %2 = addrspacecast %jl_value_t addrspace(10)* %1 to %jl_value_t addrspace(11)*
  %3 = bitcast %jl_value_t addrspace(11)* %2 to %jl_value_t addrspace(10)* addrspace(11)*
  %4 = load %jl_value_t addrspace(10)*, %jl_value_t addrspace(10)* addrspace(11)* %3, align 8
  %5 = addrspacecast %jl_value_t addrspace(10)* %4 to %jl_value_t addrspace(11)*
  %6 = bitcast %jl_value_t addrspace(11)* %5 to i64* addrspace(11)*
  %7 = load i64*, i64* addrspace(11)* %6, align 8
  %8 = load i64, i64* %7, align 8
  %9 = getelementptr i64, i64* %7, i64 1
  %10 = load i64, i64* %9, align 8
  %.sroa.0.0..sroa_idx = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 0
  store i64 %8, i64 addrspace(11)* %.sroa.0.0..sroa_idx, align 8
  %.sroa.3.0..sroa_idx13 = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 1
  store i64 %10, i64 addrspace(11)* %.sroa.3.0..sroa_idx13, align 8
;}
  ret void
}

julia> g(a) = @inbounds return reinterpret(Complex{Int64}, a)[1]
g (generic function with 1 method)

julia> @code_llvm g(randn(1000))

; Function g
; Location: REPL[4]
define void @julia_g_63642({ i64, i64 } addrspace(11)* noalias nocapture sret, %jl_value_t addrspace(10)* dereferenceable(40)) #0 {
top:
; Location: REPL[4]:1
; Function getindex; {
; Location: reinterpretarray.jl:31
  %2 = addrspacecast %jl_value_t addrspace(10)* %1 to %jl_value_t addrspace(11)*
  %3 = bitcast %jl_value_t addrspace(11)* %2 to double* addrspace(11)*
  %4 = load double*, double* addrspace(11)* %3, align 8
  %5 = bitcast double* %4 to i64*
  %6 = load i64, i64* %5, align 8
  %7 = getelementptr double, double* %4, i64 1
  %8 = bitcast double* %7 to i64*
  %9 = load i64, i64* %8, align 8
  %.sroa.0.0..sroa_idx = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 0
  store i64 %6, i64 addrspace(11)* %.sroa.0.0..sroa_idx, align 8
  %.sroa.3.0..sroa_idx13 = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 1
  store i64 %9, i64 addrspace(11)* %.sroa.3.0..sroa_idx13, align 8
;}
  ret void
}
```

In addition, the new `reinterpret` implementation is able to handle any AbstractArray
(whether useful or not is a separate decision):

```
invoke(reinterpret, Tuple{Type{Complex{Float64}}, AbstractArray}, Complex{Float64}, speye(10))
5×10 Base.ReinterpretArray{Complex{Float64},Float64,2,SparseMatrixCSC{Float64,Int64}}:
 1.0+0.0im  0.0+1.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im
 0.0+0.0im  0.0+0.0im  1.0+0.0im  0.0+1.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im
 0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  1.0+0.0im  0.0+1.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im
 0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  1.0+0.0im  0.0+1.0im  0.0+0.0im  0.0+0.0im
 0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  1.0+0.0im  0.0+1.0im
```

The remaining todo is to audit the uses of reinterpret in base. I've fixed up the uses themselves, but there's
code deeper in the array code that needs to be broadened to allow ReinterpretArray.

Fixes #22849
Fixes #19238
Keno added a commit that referenced this issue Sep 20, 2017
This redoes `reinterpret` in julia rather than punning the memory
of the actual array. The motivation for this is to avoid the API
limitations of the current reinterpret implementation (Array only,
preventing strong TBAA, alignment problems). The surface API
essentially unchanged, though the shape argument to reinterpret
is removed, since those concepts are now orthogonal. The return
type from `reinterpret` is now `ReinterpretArray`, which implements
the AbstractArray interface and does the reinterpreting lazily on
demand. The compiler is able to fold away the abstraction and
generate very tight IR:

```
julia> ar = reinterpret(Complex{Int64}, rand(Int64, 1000));

julia> typeof(ar)
Base.ReinterpretArray{Complex{Int64},Int64,1,Array{Int64,1}}

julia> f(ar) = @inbounds return ar[1]
f (generic function with 1 method)

julia> @code_llvm f(ar)

; Function f
; Location: REPL[2]
define void @julia_f_63575({ i64, i64 } addrspace(11)* noalias nocapture sret, %jl_value_t addrspace(10)* dereferenceable(8)) #0 {
top:
; Location: REPL[2]:1
; Function getindex; {
; Location: reinterpretarray.jl:31
  %2 = addrspacecast %jl_value_t addrspace(10)* %1 to %jl_value_t addrspace(11)*
  %3 = bitcast %jl_value_t addrspace(11)* %2 to %jl_value_t addrspace(10)* addrspace(11)*
  %4 = load %jl_value_t addrspace(10)*, %jl_value_t addrspace(10)* addrspace(11)* %3, align 8
  %5 = addrspacecast %jl_value_t addrspace(10)* %4 to %jl_value_t addrspace(11)*
  %6 = bitcast %jl_value_t addrspace(11)* %5 to i64* addrspace(11)*
  %7 = load i64*, i64* addrspace(11)* %6, align 8
  %8 = load i64, i64* %7, align 8
  %9 = getelementptr i64, i64* %7, i64 1
  %10 = load i64, i64* %9, align 8
  %.sroa.0.0..sroa_idx = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 0
  store i64 %8, i64 addrspace(11)* %.sroa.0.0..sroa_idx, align 8
  %.sroa.3.0..sroa_idx13 = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 1
  store i64 %10, i64 addrspace(11)* %.sroa.3.0..sroa_idx13, align 8
;}
  ret void
}

julia> g(a) = @inbounds return reinterpret(Complex{Int64}, a)[1]
g (generic function with 1 method)

julia> @code_llvm g(randn(1000))

; Function g
; Location: REPL[4]
define void @julia_g_63642({ i64, i64 } addrspace(11)* noalias nocapture sret, %jl_value_t addrspace(10)* dereferenceable(40)) #0 {
top:
; Location: REPL[4]:1
; Function getindex; {
; Location: reinterpretarray.jl:31
  %2 = addrspacecast %jl_value_t addrspace(10)* %1 to %jl_value_t addrspace(11)*
  %3 = bitcast %jl_value_t addrspace(11)* %2 to double* addrspace(11)*
  %4 = load double*, double* addrspace(11)* %3, align 8
  %5 = bitcast double* %4 to i64*
  %6 = load i64, i64* %5, align 8
  %7 = getelementptr double, double* %4, i64 1
  %8 = bitcast double* %7 to i64*
  %9 = load i64, i64* %8, align 8
  %.sroa.0.0..sroa_idx = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 0
  store i64 %6, i64 addrspace(11)* %.sroa.0.0..sroa_idx, align 8
  %.sroa.3.0..sroa_idx13 = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 1
  store i64 %9, i64 addrspace(11)* %.sroa.3.0..sroa_idx13, align 8
;}
  ret void
}
```

In addition, the new `reinterpret` implementation is able to handle any AbstractArray
(whether useful or not is a separate decision):

```
invoke(reinterpret, Tuple{Type{Complex{Float64}}, AbstractArray}, Complex{Float64}, speye(10))
5×10 Base.ReinterpretArray{Complex{Float64},Float64,2,SparseMatrixCSC{Float64,Int64}}:
 1.0+0.0im  0.0+1.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im
 0.0+0.0im  0.0+0.0im  1.0+0.0im  0.0+1.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im
 0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  1.0+0.0im  0.0+1.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im
 0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  1.0+0.0im  0.0+1.0im  0.0+0.0im  0.0+0.0im
 0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  1.0+0.0im  0.0+1.0im
```

The remaining todo is to audit the uses of reinterpret in base. I've fixed up the uses themselves, but there's
code deeper in the array code that needs to be broadened to allow ReinterpretArray.

Fixes #22849
Fixes #19238
Keno added a commit that referenced this issue Sep 22, 2017
This redoes `reinterpret` in julia rather than punning the memory
of the actual array. The motivation for this is to avoid the API
limitations of the current reinterpret implementation (Array only,
preventing strong TBAA, alignment problems). The surface API
essentially unchanged, though the shape argument to reinterpret
is removed, since those concepts are now orthogonal. The return
type from `reinterpret` is now `ReinterpretArray`, which implements
the AbstractArray interface and does the reinterpreting lazily on
demand. The compiler is able to fold away the abstraction and
generate very tight IR:

```
julia> ar = reinterpret(Complex{Int64}, rand(Int64, 1000));

julia> typeof(ar)
Base.ReinterpretArray{Complex{Int64},Int64,1,Array{Int64,1}}

julia> f(ar) = @inbounds return ar[1]
f (generic function with 1 method)

julia> @code_llvm f(ar)

; Function f
; Location: REPL[2]
define void @julia_f_63575({ i64, i64 } addrspace(11)* noalias nocapture sret, %jl_value_t addrspace(10)* dereferenceable(8)) #0 {
top:
; Location: REPL[2]:1
; Function getindex; {
; Location: reinterpretarray.jl:31
  %2 = addrspacecast %jl_value_t addrspace(10)* %1 to %jl_value_t addrspace(11)*
  %3 = bitcast %jl_value_t addrspace(11)* %2 to %jl_value_t addrspace(10)* addrspace(11)*
  %4 = load %jl_value_t addrspace(10)*, %jl_value_t addrspace(10)* addrspace(11)* %3, align 8
  %5 = addrspacecast %jl_value_t addrspace(10)* %4 to %jl_value_t addrspace(11)*
  %6 = bitcast %jl_value_t addrspace(11)* %5 to i64* addrspace(11)*
  %7 = load i64*, i64* addrspace(11)* %6, align 8
  %8 = load i64, i64* %7, align 8
  %9 = getelementptr i64, i64* %7, i64 1
  %10 = load i64, i64* %9, align 8
  %.sroa.0.0..sroa_idx = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 0
  store i64 %8, i64 addrspace(11)* %.sroa.0.0..sroa_idx, align 8
  %.sroa.3.0..sroa_idx13 = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 1
  store i64 %10, i64 addrspace(11)* %.sroa.3.0..sroa_idx13, align 8
;}
  ret void
}

julia> g(a) = @inbounds return reinterpret(Complex{Int64}, a)[1]
g (generic function with 1 method)

julia> @code_llvm g(randn(1000))

; Function g
; Location: REPL[4]
define void @julia_g_63642({ i64, i64 } addrspace(11)* noalias nocapture sret, %jl_value_t addrspace(10)* dereferenceable(40)) #0 {
top:
; Location: REPL[4]:1
; Function getindex; {
; Location: reinterpretarray.jl:31
  %2 = addrspacecast %jl_value_t addrspace(10)* %1 to %jl_value_t addrspace(11)*
  %3 = bitcast %jl_value_t addrspace(11)* %2 to double* addrspace(11)*
  %4 = load double*, double* addrspace(11)* %3, align 8
  %5 = bitcast double* %4 to i64*
  %6 = load i64, i64* %5, align 8
  %7 = getelementptr double, double* %4, i64 1
  %8 = bitcast double* %7 to i64*
  %9 = load i64, i64* %8, align 8
  %.sroa.0.0..sroa_idx = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 0
  store i64 %6, i64 addrspace(11)* %.sroa.0.0..sroa_idx, align 8
  %.sroa.3.0..sroa_idx13 = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 1
  store i64 %9, i64 addrspace(11)* %.sroa.3.0..sroa_idx13, align 8
;}
  ret void
}
```

In addition, the new `reinterpret` implementation is able to handle any AbstractArray
(whether useful or not is a separate decision):

```
invoke(reinterpret, Tuple{Type{Complex{Float64}}, AbstractArray}, Complex{Float64}, speye(10))
5×10 Base.ReinterpretArray{Complex{Float64},Float64,2,SparseMatrixCSC{Float64,Int64}}:
 1.0+0.0im  0.0+1.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im
 0.0+0.0im  0.0+0.0im  1.0+0.0im  0.0+1.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im
 0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  1.0+0.0im  0.0+1.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im
 0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  1.0+0.0im  0.0+1.0im  0.0+0.0im  0.0+0.0im
 0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  1.0+0.0im  0.0+1.0im
```

The remaining todo is to audit the uses of reinterpret in base. I've fixed up the uses themselves, but there's
code deeper in the array code that needs to be broadened to allow ReinterpretArray.

Fixes #22849
Fixes #19238
Keno added a commit that referenced this issue Sep 27, 2017
This redoes `reinterpret` in julia rather than punning the memory
of the actual array. The motivation for this is to avoid the API
limitations of the current reinterpret implementation (Array only,
preventing strong TBAA, alignment problems). The surface API
essentially unchanged, though the shape argument to reinterpret
is removed, since those concepts are now orthogonal. The return
type from `reinterpret` is now `ReinterpretArray`, which implements
the AbstractArray interface and does the reinterpreting lazily on
demand. The compiler is able to fold away the abstraction and
generate very tight IR:

```
julia> ar = reinterpret(Complex{Int64}, rand(Int64, 1000));

julia> typeof(ar)
Base.ReinterpretArray{Complex{Int64},Int64,1,Array{Int64,1}}

julia> f(ar) = @inbounds return ar[1]
f (generic function with 1 method)

julia> @code_llvm f(ar)

; Function f
; Location: REPL[2]
define void @julia_f_63575({ i64, i64 } addrspace(11)* noalias nocapture sret, %jl_value_t addrspace(10)* dereferenceable(8)) #0 {
top:
; Location: REPL[2]:1
; Function getindex; {
; Location: reinterpretarray.jl:31
  %2 = addrspacecast %jl_value_t addrspace(10)* %1 to %jl_value_t addrspace(11)*
  %3 = bitcast %jl_value_t addrspace(11)* %2 to %jl_value_t addrspace(10)* addrspace(11)*
  %4 = load %jl_value_t addrspace(10)*, %jl_value_t addrspace(10)* addrspace(11)* %3, align 8
  %5 = addrspacecast %jl_value_t addrspace(10)* %4 to %jl_value_t addrspace(11)*
  %6 = bitcast %jl_value_t addrspace(11)* %5 to i64* addrspace(11)*
  %7 = load i64*, i64* addrspace(11)* %6, align 8
  %8 = load i64, i64* %7, align 8
  %9 = getelementptr i64, i64* %7, i64 1
  %10 = load i64, i64* %9, align 8
  %.sroa.0.0..sroa_idx = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 0
  store i64 %8, i64 addrspace(11)* %.sroa.0.0..sroa_idx, align 8
  %.sroa.3.0..sroa_idx13 = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 1
  store i64 %10, i64 addrspace(11)* %.sroa.3.0..sroa_idx13, align 8
;}
  ret void
}

julia> g(a) = @inbounds return reinterpret(Complex{Int64}, a)[1]
g (generic function with 1 method)

julia> @code_llvm g(randn(1000))

; Function g
; Location: REPL[4]
define void @julia_g_63642({ i64, i64 } addrspace(11)* noalias nocapture sret, %jl_value_t addrspace(10)* dereferenceable(40)) #0 {
top:
; Location: REPL[4]:1
; Function getindex; {
; Location: reinterpretarray.jl:31
  %2 = addrspacecast %jl_value_t addrspace(10)* %1 to %jl_value_t addrspace(11)*
  %3 = bitcast %jl_value_t addrspace(11)* %2 to double* addrspace(11)*
  %4 = load double*, double* addrspace(11)* %3, align 8
  %5 = bitcast double* %4 to i64*
  %6 = load i64, i64* %5, align 8
  %7 = getelementptr double, double* %4, i64 1
  %8 = bitcast double* %7 to i64*
  %9 = load i64, i64* %8, align 8
  %.sroa.0.0..sroa_idx = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 0
  store i64 %6, i64 addrspace(11)* %.sroa.0.0..sroa_idx, align 8
  %.sroa.3.0..sroa_idx13 = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 1
  store i64 %9, i64 addrspace(11)* %.sroa.3.0..sroa_idx13, align 8
;}
  ret void
}
```

In addition, the new `reinterpret` implementation is able to handle any AbstractArray
(whether useful or not is a separate decision):

```
invoke(reinterpret, Tuple{Type{Complex{Float64}}, AbstractArray}, Complex{Float64}, speye(10))
5×10 Base.ReinterpretArray{Complex{Float64},Float64,2,SparseMatrixCSC{Float64,Int64}}:
 1.0+0.0im  0.0+1.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im
 0.0+0.0im  0.0+0.0im  1.0+0.0im  0.0+1.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im
 0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  1.0+0.0im  0.0+1.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im
 0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  1.0+0.0im  0.0+1.0im  0.0+0.0im  0.0+0.0im
 0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  1.0+0.0im  0.0+1.0im
```

The remaining todo is to audit the uses of reinterpret in base. I've fixed up the uses themselves, but there's
code deeper in the array code that needs to be broadened to allow ReinterpretArray.

Fixes #22849
Fixes #19238
Keno added a commit that referenced this issue Sep 27, 2017
This redoes `reinterpret` in julia rather than punning the memory
of the actual array. The motivation for this is to avoid the API
limitations of the current reinterpret implementation (Array only,
preventing strong TBAA, alignment problems). The surface API
essentially unchanged, though the shape argument to reinterpret
is removed, since those concepts are now orthogonal. The return
type from `reinterpret` is now `ReinterpretArray`, which implements
the AbstractArray interface and does the reinterpreting lazily on
demand. The compiler is able to fold away the abstraction and
generate very tight IR:

```
julia> ar = reinterpret(Complex{Int64}, rand(Int64, 1000));

julia> typeof(ar)
Base.ReinterpretArray{Complex{Int64},Int64,1,Array{Int64,1}}

julia> f(ar) = @inbounds return ar[1]
f (generic function with 1 method)

julia> @code_llvm f(ar)

; Function f
; Location: REPL[2]
define void @julia_f_63575({ i64, i64 } addrspace(11)* noalias nocapture sret, %jl_value_t addrspace(10)* dereferenceable(8)) #0 {
top:
; Location: REPL[2]:1
; Function getindex; {
; Location: reinterpretarray.jl:31
  %2 = addrspacecast %jl_value_t addrspace(10)* %1 to %jl_value_t addrspace(11)*
  %3 = bitcast %jl_value_t addrspace(11)* %2 to %jl_value_t addrspace(10)* addrspace(11)*
  %4 = load %jl_value_t addrspace(10)*, %jl_value_t addrspace(10)* addrspace(11)* %3, align 8
  %5 = addrspacecast %jl_value_t addrspace(10)* %4 to %jl_value_t addrspace(11)*
  %6 = bitcast %jl_value_t addrspace(11)* %5 to i64* addrspace(11)*
  %7 = load i64*, i64* addrspace(11)* %6, align 8
  %8 = load i64, i64* %7, align 8
  %9 = getelementptr i64, i64* %7, i64 1
  %10 = load i64, i64* %9, align 8
  %.sroa.0.0..sroa_idx = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 0
  store i64 %8, i64 addrspace(11)* %.sroa.0.0..sroa_idx, align 8
  %.sroa.3.0..sroa_idx13 = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 1
  store i64 %10, i64 addrspace(11)* %.sroa.3.0..sroa_idx13, align 8
;}
  ret void
}

julia> g(a) = @inbounds return reinterpret(Complex{Int64}, a)[1]
g (generic function with 1 method)

julia> @code_llvm g(randn(1000))

; Function g
; Location: REPL[4]
define void @julia_g_63642({ i64, i64 } addrspace(11)* noalias nocapture sret, %jl_value_t addrspace(10)* dereferenceable(40)) #0 {
top:
; Location: REPL[4]:1
; Function getindex; {
; Location: reinterpretarray.jl:31
  %2 = addrspacecast %jl_value_t addrspace(10)* %1 to %jl_value_t addrspace(11)*
  %3 = bitcast %jl_value_t addrspace(11)* %2 to double* addrspace(11)*
  %4 = load double*, double* addrspace(11)* %3, align 8
  %5 = bitcast double* %4 to i64*
  %6 = load i64, i64* %5, align 8
  %7 = getelementptr double, double* %4, i64 1
  %8 = bitcast double* %7 to i64*
  %9 = load i64, i64* %8, align 8
  %.sroa.0.0..sroa_idx = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 0
  store i64 %6, i64 addrspace(11)* %.sroa.0.0..sroa_idx, align 8
  %.sroa.3.0..sroa_idx13 = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 1
  store i64 %9, i64 addrspace(11)* %.sroa.3.0..sroa_idx13, align 8
;}
  ret void
}
```

In addition, the new `reinterpret` implementation is able to handle any AbstractArray
(whether useful or not is a separate decision):

```
invoke(reinterpret, Tuple{Type{Complex{Float64}}, AbstractArray}, Complex{Float64}, speye(10))
5×10 Base.ReinterpretArray{Complex{Float64},Float64,2,SparseMatrixCSC{Float64,Int64}}:
 1.0+0.0im  0.0+1.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im
 0.0+0.0im  0.0+0.0im  1.0+0.0im  0.0+1.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im
 0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  1.0+0.0im  0.0+1.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im
 0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  1.0+0.0im  0.0+1.0im  0.0+0.0im  0.0+0.0im
 0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  1.0+0.0im  0.0+1.0im
```

The remaining todo is to audit the uses of reinterpret in base. I've fixed up the uses themselves, but there's
code deeper in the array code that needs to be broadened to allow ReinterpretArray.

Fixes #22849
Fixes #19238
Keno added a commit that referenced this issue Sep 27, 2017
This redoes `reinterpret` in julia rather than punning the memory
of the actual array. The motivation for this is to avoid the API
limitations of the current reinterpret implementation (Array only,
preventing strong TBAA, alignment problems). The surface API
essentially unchanged, though the shape argument to reinterpret
is removed, since those concepts are now orthogonal. The return
type from `reinterpret` is now `ReinterpretArray`, which implements
the AbstractArray interface and does the reinterpreting lazily on
demand. The compiler is able to fold away the abstraction and
generate very tight IR:

```
julia> ar = reinterpret(Complex{Int64}, rand(Int64, 1000));

julia> typeof(ar)
Base.ReinterpretArray{Complex{Int64},Int64,1,Array{Int64,1}}

julia> f(ar) = @inbounds return ar[1]
f (generic function with 1 method)

julia> @code_llvm f(ar)

; Function f
; Location: REPL[2]
define void @julia_f_63575({ i64, i64 } addrspace(11)* noalias nocapture sret, %jl_value_t addrspace(10)* dereferenceable(8)) #0 {
top:
; Location: REPL[2]:1
; Function getindex; {
; Location: reinterpretarray.jl:31
  %2 = addrspacecast %jl_value_t addrspace(10)* %1 to %jl_value_t addrspace(11)*
  %3 = bitcast %jl_value_t addrspace(11)* %2 to %jl_value_t addrspace(10)* addrspace(11)*
  %4 = load %jl_value_t addrspace(10)*, %jl_value_t addrspace(10)* addrspace(11)* %3, align 8
  %5 = addrspacecast %jl_value_t addrspace(10)* %4 to %jl_value_t addrspace(11)*
  %6 = bitcast %jl_value_t addrspace(11)* %5 to i64* addrspace(11)*
  %7 = load i64*, i64* addrspace(11)* %6, align 8
  %8 = load i64, i64* %7, align 8
  %9 = getelementptr i64, i64* %7, i64 1
  %10 = load i64, i64* %9, align 8
  %.sroa.0.0..sroa_idx = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 0
  store i64 %8, i64 addrspace(11)* %.sroa.0.0..sroa_idx, align 8
  %.sroa.3.0..sroa_idx13 = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 1
  store i64 %10, i64 addrspace(11)* %.sroa.3.0..sroa_idx13, align 8
;}
  ret void
}

julia> g(a) = @inbounds return reinterpret(Complex{Int64}, a)[1]
g (generic function with 1 method)

julia> @code_llvm g(randn(1000))

; Function g
; Location: REPL[4]
define void @julia_g_63642({ i64, i64 } addrspace(11)* noalias nocapture sret, %jl_value_t addrspace(10)* dereferenceable(40)) #0 {
top:
; Location: REPL[4]:1
; Function getindex; {
; Location: reinterpretarray.jl:31
  %2 = addrspacecast %jl_value_t addrspace(10)* %1 to %jl_value_t addrspace(11)*
  %3 = bitcast %jl_value_t addrspace(11)* %2 to double* addrspace(11)*
  %4 = load double*, double* addrspace(11)* %3, align 8
  %5 = bitcast double* %4 to i64*
  %6 = load i64, i64* %5, align 8
  %7 = getelementptr double, double* %4, i64 1
  %8 = bitcast double* %7 to i64*
  %9 = load i64, i64* %8, align 8
  %.sroa.0.0..sroa_idx = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 0
  store i64 %6, i64 addrspace(11)* %.sroa.0.0..sroa_idx, align 8
  %.sroa.3.0..sroa_idx13 = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 1
  store i64 %9, i64 addrspace(11)* %.sroa.3.0..sroa_idx13, align 8
;}
  ret void
}
```

In addition, the new `reinterpret` implementation is able to handle any AbstractArray
(whether useful or not is a separate decision):

```
invoke(reinterpret, Tuple{Type{Complex{Float64}}, AbstractArray}, Complex{Float64}, speye(10))
5×10 Base.ReinterpretArray{Complex{Float64},Float64,2,SparseMatrixCSC{Float64,Int64}}:
 1.0+0.0im  0.0+1.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im
 0.0+0.0im  0.0+0.0im  1.0+0.0im  0.0+1.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im
 0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  1.0+0.0im  0.0+1.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im
 0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  1.0+0.0im  0.0+1.0im  0.0+0.0im  0.0+0.0im
 0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  1.0+0.0im  0.0+1.0im
```

The remaining todo is to audit the uses of reinterpret in base. I've fixed up the uses themselves, but there's
code deeper in the array code that needs to be broadened to allow ReinterpretArray.

Fixes #22849
Fixes #19238
Keno added a commit that referenced this issue Sep 27, 2017
This redoes `reinterpret` in julia rather than punning the memory
of the actual array. The motivation for this is to avoid the API
limitations of the current reinterpret implementation (Array only,
preventing strong TBAA, alignment problems). The surface API
essentially unchanged, though the shape argument to reinterpret
is removed, since those concepts are now orthogonal. The return
type from `reinterpret` is now `ReinterpretArray`, which implements
the AbstractArray interface and does the reinterpreting lazily on
demand. The compiler is able to fold away the abstraction and
generate very tight IR:

```
julia> ar = reinterpret(Complex{Int64}, rand(Int64, 1000));

julia> typeof(ar)
Base.ReinterpretArray{Complex{Int64},Int64,1,Array{Int64,1}}

julia> f(ar) = @inbounds return ar[1]
f (generic function with 1 method)

julia> @code_llvm f(ar)

; Function f
; Location: REPL[2]
define void @julia_f_63575({ i64, i64 } addrspace(11)* noalias nocapture sret, %jl_value_t addrspace(10)* dereferenceable(8)) #0 {
top:
; Location: REPL[2]:1
; Function getindex; {
; Location: reinterpretarray.jl:31
  %2 = addrspacecast %jl_value_t addrspace(10)* %1 to %jl_value_t addrspace(11)*
  %3 = bitcast %jl_value_t addrspace(11)* %2 to %jl_value_t addrspace(10)* addrspace(11)*
  %4 = load %jl_value_t addrspace(10)*, %jl_value_t addrspace(10)* addrspace(11)* %3, align 8
  %5 = addrspacecast %jl_value_t addrspace(10)* %4 to %jl_value_t addrspace(11)*
  %6 = bitcast %jl_value_t addrspace(11)* %5 to i64* addrspace(11)*
  %7 = load i64*, i64* addrspace(11)* %6, align 8
  %8 = load i64, i64* %7, align 8
  %9 = getelementptr i64, i64* %7, i64 1
  %10 = load i64, i64* %9, align 8
  %.sroa.0.0..sroa_idx = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 0
  store i64 %8, i64 addrspace(11)* %.sroa.0.0..sroa_idx, align 8
  %.sroa.3.0..sroa_idx13 = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 1
  store i64 %10, i64 addrspace(11)* %.sroa.3.0..sroa_idx13, align 8
;}
  ret void
}

julia> g(a) = @inbounds return reinterpret(Complex{Int64}, a)[1]
g (generic function with 1 method)

julia> @code_llvm g(randn(1000))

; Function g
; Location: REPL[4]
define void @julia_g_63642({ i64, i64 } addrspace(11)* noalias nocapture sret, %jl_value_t addrspace(10)* dereferenceable(40)) #0 {
top:
; Location: REPL[4]:1
; Function getindex; {
; Location: reinterpretarray.jl:31
  %2 = addrspacecast %jl_value_t addrspace(10)* %1 to %jl_value_t addrspace(11)*
  %3 = bitcast %jl_value_t addrspace(11)* %2 to double* addrspace(11)*
  %4 = load double*, double* addrspace(11)* %3, align 8
  %5 = bitcast double* %4 to i64*
  %6 = load i64, i64* %5, align 8
  %7 = getelementptr double, double* %4, i64 1
  %8 = bitcast double* %7 to i64*
  %9 = load i64, i64* %8, align 8
  %.sroa.0.0..sroa_idx = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 0
  store i64 %6, i64 addrspace(11)* %.sroa.0.0..sroa_idx, align 8
  %.sroa.3.0..sroa_idx13 = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 1
  store i64 %9, i64 addrspace(11)* %.sroa.3.0..sroa_idx13, align 8
;}
  ret void
}
```

In addition, the new `reinterpret` implementation is able to handle any AbstractArray
(whether useful or not is a separate decision):

```
invoke(reinterpret, Tuple{Type{Complex{Float64}}, AbstractArray}, Complex{Float64}, speye(10))
5×10 Base.ReinterpretArray{Complex{Float64},Float64,2,SparseMatrixCSC{Float64,Int64}}:
 1.0+0.0im  0.0+1.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im
 0.0+0.0im  0.0+0.0im  1.0+0.0im  0.0+1.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im
 0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  1.0+0.0im  0.0+1.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im
 0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  1.0+0.0im  0.0+1.0im  0.0+0.0im  0.0+0.0im
 0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  1.0+0.0im  0.0+1.0im
```

The remaining todo is to audit the uses of reinterpret in base. I've fixed up the uses themselves, but there's
code deeper in the array code that needs to be broadened to allow ReinterpretArray.

Fixes #22849
Fixes #19238
Keno added a commit that referenced this issue Sep 27, 2017
This redoes `reinterpret` in julia rather than punning the memory
of the actual array. The motivation for this is to avoid the API
limitations of the current reinterpret implementation (Array only,
preventing strong TBAA, alignment problems). The surface API
essentially unchanged, though the shape argument to reinterpret
is removed, since those concepts are now orthogonal. The return
type from `reinterpret` is now `ReinterpretArray`, which implements
the AbstractArray interface and does the reinterpreting lazily on
demand. The compiler is able to fold away the abstraction and
generate very tight IR:

```
julia> ar = reinterpret(Complex{Int64}, rand(Int64, 1000));

julia> typeof(ar)
Base.ReinterpretArray{Complex{Int64},Int64,1,Array{Int64,1}}

julia> f(ar) = @inbounds return ar[1]
f (generic function with 1 method)

julia> @code_llvm f(ar)

; Function f
; Location: REPL[2]
define void @julia_f_63575({ i64, i64 } addrspace(11)* noalias nocapture sret, %jl_value_t addrspace(10)* dereferenceable(8)) #0 {
top:
; Location: REPL[2]:1
; Function getindex; {
; Location: reinterpretarray.jl:31
  %2 = addrspacecast %jl_value_t addrspace(10)* %1 to %jl_value_t addrspace(11)*
  %3 = bitcast %jl_value_t addrspace(11)* %2 to %jl_value_t addrspace(10)* addrspace(11)*
  %4 = load %jl_value_t addrspace(10)*, %jl_value_t addrspace(10)* addrspace(11)* %3, align 8
  %5 = addrspacecast %jl_value_t addrspace(10)* %4 to %jl_value_t addrspace(11)*
  %6 = bitcast %jl_value_t addrspace(11)* %5 to i64* addrspace(11)*
  %7 = load i64*, i64* addrspace(11)* %6, align 8
  %8 = load i64, i64* %7, align 8
  %9 = getelementptr i64, i64* %7, i64 1
  %10 = load i64, i64* %9, align 8
  %.sroa.0.0..sroa_idx = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 0
  store i64 %8, i64 addrspace(11)* %.sroa.0.0..sroa_idx, align 8
  %.sroa.3.0..sroa_idx13 = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 1
  store i64 %10, i64 addrspace(11)* %.sroa.3.0..sroa_idx13, align 8
;}
  ret void
}

julia> g(a) = @inbounds return reinterpret(Complex{Int64}, a)[1]
g (generic function with 1 method)

julia> @code_llvm g(randn(1000))

; Function g
; Location: REPL[4]
define void @julia_g_63642({ i64, i64 } addrspace(11)* noalias nocapture sret, %jl_value_t addrspace(10)* dereferenceable(40)) #0 {
top:
; Location: REPL[4]:1
; Function getindex; {
; Location: reinterpretarray.jl:31
  %2 = addrspacecast %jl_value_t addrspace(10)* %1 to %jl_value_t addrspace(11)*
  %3 = bitcast %jl_value_t addrspace(11)* %2 to double* addrspace(11)*
  %4 = load double*, double* addrspace(11)* %3, align 8
  %5 = bitcast double* %4 to i64*
  %6 = load i64, i64* %5, align 8
  %7 = getelementptr double, double* %4, i64 1
  %8 = bitcast double* %7 to i64*
  %9 = load i64, i64* %8, align 8
  %.sroa.0.0..sroa_idx = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 0
  store i64 %6, i64 addrspace(11)* %.sroa.0.0..sroa_idx, align 8
  %.sroa.3.0..sroa_idx13 = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 1
  store i64 %9, i64 addrspace(11)* %.sroa.3.0..sroa_idx13, align 8
;}
  ret void
}
```

In addition, the new `reinterpret` implementation is able to handle any AbstractArray
(whether useful or not is a separate decision):

```
invoke(reinterpret, Tuple{Type{Complex{Float64}}, AbstractArray}, Complex{Float64}, speye(10))
5×10 Base.ReinterpretArray{Complex{Float64},Float64,2,SparseMatrixCSC{Float64,Int64}}:
 1.0+0.0im  0.0+1.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im
 0.0+0.0im  0.0+0.0im  1.0+0.0im  0.0+1.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im
 0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  1.0+0.0im  0.0+1.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im
 0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  1.0+0.0im  0.0+1.0im  0.0+0.0im  0.0+0.0im
 0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  1.0+0.0im  0.0+1.0im
```

The remaining todo is to audit the uses of reinterpret in base. I've fixed up the uses themselves, but there's
code deeper in the array code that needs to be broadened to allow ReinterpretArray.

Fixes #22849
Fixes #19238
Keno added a commit that referenced this issue Sep 28, 2017
This redoes `reinterpret` in julia rather than punning the memory
of the actual array. The motivation for this is to avoid the API
limitations of the current reinterpret implementation (Array only,
preventing strong TBAA, alignment problems). The surface API
essentially unchanged, though the shape argument to reinterpret
is removed, since those concepts are now orthogonal. The return
type from `reinterpret` is now `ReinterpretArray`, which implements
the AbstractArray interface and does the reinterpreting lazily on
demand. The compiler is able to fold away the abstraction and
generate very tight IR:

```
julia> ar = reinterpret(Complex{Int64}, rand(Int64, 1000));

julia> typeof(ar)
Base.ReinterpretArray{Complex{Int64},Int64,1,Array{Int64,1}}

julia> f(ar) = @inbounds return ar[1]
f (generic function with 1 method)

julia> @code_llvm f(ar)

; Function f
; Location: REPL[2]
define void @julia_f_63575({ i64, i64 } addrspace(11)* noalias nocapture sret, %jl_value_t addrspace(10)* dereferenceable(8)) #0 {
top:
; Location: REPL[2]:1
; Function getindex; {
; Location: reinterpretarray.jl:31
  %2 = addrspacecast %jl_value_t addrspace(10)* %1 to %jl_value_t addrspace(11)*
  %3 = bitcast %jl_value_t addrspace(11)* %2 to %jl_value_t addrspace(10)* addrspace(11)*
  %4 = load %jl_value_t addrspace(10)*, %jl_value_t addrspace(10)* addrspace(11)* %3, align 8
  %5 = addrspacecast %jl_value_t addrspace(10)* %4 to %jl_value_t addrspace(11)*
  %6 = bitcast %jl_value_t addrspace(11)* %5 to i64* addrspace(11)*
  %7 = load i64*, i64* addrspace(11)* %6, align 8
  %8 = load i64, i64* %7, align 8
  %9 = getelementptr i64, i64* %7, i64 1
  %10 = load i64, i64* %9, align 8
  %.sroa.0.0..sroa_idx = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 0
  store i64 %8, i64 addrspace(11)* %.sroa.0.0..sroa_idx, align 8
  %.sroa.3.0..sroa_idx13 = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 1
  store i64 %10, i64 addrspace(11)* %.sroa.3.0..sroa_idx13, align 8
;}
  ret void
}

julia> g(a) = @inbounds return reinterpret(Complex{Int64}, a)[1]
g (generic function with 1 method)

julia> @code_llvm g(randn(1000))

; Function g
; Location: REPL[4]
define void @julia_g_63642({ i64, i64 } addrspace(11)* noalias nocapture sret, %jl_value_t addrspace(10)* dereferenceable(40)) #0 {
top:
; Location: REPL[4]:1
; Function getindex; {
; Location: reinterpretarray.jl:31
  %2 = addrspacecast %jl_value_t addrspace(10)* %1 to %jl_value_t addrspace(11)*
  %3 = bitcast %jl_value_t addrspace(11)* %2 to double* addrspace(11)*
  %4 = load double*, double* addrspace(11)* %3, align 8
  %5 = bitcast double* %4 to i64*
  %6 = load i64, i64* %5, align 8
  %7 = getelementptr double, double* %4, i64 1
  %8 = bitcast double* %7 to i64*
  %9 = load i64, i64* %8, align 8
  %.sroa.0.0..sroa_idx = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 0
  store i64 %6, i64 addrspace(11)* %.sroa.0.0..sroa_idx, align 8
  %.sroa.3.0..sroa_idx13 = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 1
  store i64 %9, i64 addrspace(11)* %.sroa.3.0..sroa_idx13, align 8
;}
  ret void
}
```

In addition, the new `reinterpret` implementation is able to handle any AbstractArray
(whether useful or not is a separate decision):

```
invoke(reinterpret, Tuple{Type{Complex{Float64}}, AbstractArray}, Complex{Float64}, speye(10))
5×10 Base.ReinterpretArray{Complex{Float64},Float64,2,SparseMatrixCSC{Float64,Int64}}:
 1.0+0.0im  0.0+1.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im
 0.0+0.0im  0.0+0.0im  1.0+0.0im  0.0+1.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im
 0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  1.0+0.0im  0.0+1.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im
 0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  1.0+0.0im  0.0+1.0im  0.0+0.0im  0.0+0.0im
 0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  1.0+0.0im  0.0+1.0im
```

The remaining todo is to audit the uses of reinterpret in base. I've fixed up the uses themselves, but there's
code deeper in the array code that needs to be broadened to allow ReinterpretArray.

Fixes #22849
Fixes #19238
Keno added a commit that referenced this issue Sep 29, 2017
This redoes `reinterpret` in julia rather than punning the memory
of the actual array. The motivation for this is to avoid the API
limitations of the current reinterpret implementation (Array only,
preventing strong TBAA, alignment problems). The surface API
essentially unchanged, though the shape argument to reinterpret
is removed, since those concepts are now orthogonal. The return
type from `reinterpret` is now `ReinterpretArray`, which implements
the AbstractArray interface and does the reinterpreting lazily on
demand. The compiler is able to fold away the abstraction and
generate very tight IR:

```
julia> ar = reinterpret(Complex{Int64}, rand(Int64, 1000));

julia> typeof(ar)
Base.ReinterpretArray{Complex{Int64},Int64,1,Array{Int64,1}}

julia> f(ar) = @inbounds return ar[1]
f (generic function with 1 method)

julia> @code_llvm f(ar)

; Function f
; Location: REPL[2]
define void @julia_f_63575({ i64, i64 } addrspace(11)* noalias nocapture sret, %jl_value_t addrspace(10)* dereferenceable(8)) #0 {
top:
; Location: REPL[2]:1
; Function getindex; {
; Location: reinterpretarray.jl:31
  %2 = addrspacecast %jl_value_t addrspace(10)* %1 to %jl_value_t addrspace(11)*
  %3 = bitcast %jl_value_t addrspace(11)* %2 to %jl_value_t addrspace(10)* addrspace(11)*
  %4 = load %jl_value_t addrspace(10)*, %jl_value_t addrspace(10)* addrspace(11)* %3, align 8
  %5 = addrspacecast %jl_value_t addrspace(10)* %4 to %jl_value_t addrspace(11)*
  %6 = bitcast %jl_value_t addrspace(11)* %5 to i64* addrspace(11)*
  %7 = load i64*, i64* addrspace(11)* %6, align 8
  %8 = load i64, i64* %7, align 8
  %9 = getelementptr i64, i64* %7, i64 1
  %10 = load i64, i64* %9, align 8
  %.sroa.0.0..sroa_idx = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 0
  store i64 %8, i64 addrspace(11)* %.sroa.0.0..sroa_idx, align 8
  %.sroa.3.0..sroa_idx13 = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 1
  store i64 %10, i64 addrspace(11)* %.sroa.3.0..sroa_idx13, align 8
;}
  ret void
}

julia> g(a) = @inbounds return reinterpret(Complex{Int64}, a)[1]
g (generic function with 1 method)

julia> @code_llvm g(randn(1000))

; Function g
; Location: REPL[4]
define void @julia_g_63642({ i64, i64 } addrspace(11)* noalias nocapture sret, %jl_value_t addrspace(10)* dereferenceable(40)) #0 {
top:
; Location: REPL[4]:1
; Function getindex; {
; Location: reinterpretarray.jl:31
  %2 = addrspacecast %jl_value_t addrspace(10)* %1 to %jl_value_t addrspace(11)*
  %3 = bitcast %jl_value_t addrspace(11)* %2 to double* addrspace(11)*
  %4 = load double*, double* addrspace(11)* %3, align 8
  %5 = bitcast double* %4 to i64*
  %6 = load i64, i64* %5, align 8
  %7 = getelementptr double, double* %4, i64 1
  %8 = bitcast double* %7 to i64*
  %9 = load i64, i64* %8, align 8
  %.sroa.0.0..sroa_idx = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 0
  store i64 %6, i64 addrspace(11)* %.sroa.0.0..sroa_idx, align 8
  %.sroa.3.0..sroa_idx13 = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 1
  store i64 %9, i64 addrspace(11)* %.sroa.3.0..sroa_idx13, align 8
;}
  ret void
}
```

In addition, the new `reinterpret` implementation is able to handle any AbstractArray
(whether useful or not is a separate decision):

```
invoke(reinterpret, Tuple{Type{Complex{Float64}}, AbstractArray}, Complex{Float64}, speye(10))
5×10 Base.ReinterpretArray{Complex{Float64},Float64,2,SparseMatrixCSC{Float64,Int64}}:
 1.0+0.0im  0.0+1.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im
 0.0+0.0im  0.0+0.0im  1.0+0.0im  0.0+1.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im
 0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  1.0+0.0im  0.0+1.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im
 0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  1.0+0.0im  0.0+1.0im  0.0+0.0im  0.0+0.0im
 0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  1.0+0.0im  0.0+1.0im
```

The remaining todo is to audit the uses of reinterpret in base. I've fixed up the uses themselves, but there's
code deeper in the array code that needs to be broadened to allow ReinterpretArray.

Fixes #22849
Fixes #19238
Keno added a commit that referenced this issue Sep 29, 2017
This redoes `reinterpret` in julia rather than punning the memory
of the actual array. The motivation for this is to avoid the API
limitations of the current reinterpret implementation (Array only,
preventing strong TBAA, alignment problems). The surface API
essentially unchanged, though the shape argument to reinterpret
is removed, since those concepts are now orthogonal. The return
type from `reinterpret` is now `ReinterpretArray`, which implements
the AbstractArray interface and does the reinterpreting lazily on
demand. The compiler is able to fold away the abstraction and
generate very tight IR:

```
julia> ar = reinterpret(Complex{Int64}, rand(Int64, 1000));

julia> typeof(ar)
Base.ReinterpretArray{Complex{Int64},Int64,1,Array{Int64,1}}

julia> f(ar) = @inbounds return ar[1]
f (generic function with 1 method)

julia> @code_llvm f(ar)

; Function f
; Location: REPL[2]
define void @julia_f_63575({ i64, i64 } addrspace(11)* noalias nocapture sret, %jl_value_t addrspace(10)* dereferenceable(8)) #0 {
top:
; Location: REPL[2]:1
; Function getindex; {
; Location: reinterpretarray.jl:31
  %2 = addrspacecast %jl_value_t addrspace(10)* %1 to %jl_value_t addrspace(11)*
  %3 = bitcast %jl_value_t addrspace(11)* %2 to %jl_value_t addrspace(10)* addrspace(11)*
  %4 = load %jl_value_t addrspace(10)*, %jl_value_t addrspace(10)* addrspace(11)* %3, align 8
  %5 = addrspacecast %jl_value_t addrspace(10)* %4 to %jl_value_t addrspace(11)*
  %6 = bitcast %jl_value_t addrspace(11)* %5 to i64* addrspace(11)*
  %7 = load i64*, i64* addrspace(11)* %6, align 8
  %8 = load i64, i64* %7, align 8
  %9 = getelementptr i64, i64* %7, i64 1
  %10 = load i64, i64* %9, align 8
  %.sroa.0.0..sroa_idx = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 0
  store i64 %8, i64 addrspace(11)* %.sroa.0.0..sroa_idx, align 8
  %.sroa.3.0..sroa_idx13 = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 1
  store i64 %10, i64 addrspace(11)* %.sroa.3.0..sroa_idx13, align 8
;}
  ret void
}

julia> g(a) = @inbounds return reinterpret(Complex{Int64}, a)[1]
g (generic function with 1 method)

julia> @code_llvm g(randn(1000))

; Function g
; Location: REPL[4]
define void @julia_g_63642({ i64, i64 } addrspace(11)* noalias nocapture sret, %jl_value_t addrspace(10)* dereferenceable(40)) #0 {
top:
; Location: REPL[4]:1
; Function getindex; {
; Location: reinterpretarray.jl:31
  %2 = addrspacecast %jl_value_t addrspace(10)* %1 to %jl_value_t addrspace(11)*
  %3 = bitcast %jl_value_t addrspace(11)* %2 to double* addrspace(11)*
  %4 = load double*, double* addrspace(11)* %3, align 8
  %5 = bitcast double* %4 to i64*
  %6 = load i64, i64* %5, align 8
  %7 = getelementptr double, double* %4, i64 1
  %8 = bitcast double* %7 to i64*
  %9 = load i64, i64* %8, align 8
  %.sroa.0.0..sroa_idx = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 0
  store i64 %6, i64 addrspace(11)* %.sroa.0.0..sroa_idx, align 8
  %.sroa.3.0..sroa_idx13 = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 1
  store i64 %9, i64 addrspace(11)* %.sroa.3.0..sroa_idx13, align 8
;}
  ret void
}
```

In addition, the new `reinterpret` implementation is able to handle any AbstractArray
(whether useful or not is a separate decision):

```
invoke(reinterpret, Tuple{Type{Complex{Float64}}, AbstractArray}, Complex{Float64}, speye(10))
5×10 Base.ReinterpretArray{Complex{Float64},Float64,2,SparseMatrixCSC{Float64,Int64}}:
 1.0+0.0im  0.0+1.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im
 0.0+0.0im  0.0+0.0im  1.0+0.0im  0.0+1.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im
 0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  1.0+0.0im  0.0+1.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im
 0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  1.0+0.0im  0.0+1.0im  0.0+0.0im  0.0+0.0im
 0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  1.0+0.0im  0.0+1.0im
```

The remaining todo is to audit the uses of reinterpret in base. I've fixed up the uses themselves, but there's
code deeper in the array code that needs to be broadened to allow ReinterpretArray.

Fixes #22849
Fixes #19238
Keno added a commit that referenced this issue Sep 29, 2017
This redoes `reinterpret` in julia rather than punning the memory
of the actual array. The motivation for this is to avoid the API
limitations of the current reinterpret implementation (Array only,
preventing strong TBAA, alignment problems). The surface API
essentially unchanged, though the shape argument to reinterpret
is removed, since those concepts are now orthogonal. The return
type from `reinterpret` is now `ReinterpretArray`, which implements
the AbstractArray interface and does the reinterpreting lazily on
demand. The compiler is able to fold away the abstraction and
generate very tight IR:

```
julia> ar = reinterpret(Complex{Int64}, rand(Int64, 1000));

julia> typeof(ar)
Base.ReinterpretArray{Complex{Int64},Int64,1,Array{Int64,1}}

julia> f(ar) = @inbounds return ar[1]
f (generic function with 1 method)

julia> @code_llvm f(ar)

; Function f
; Location: REPL[2]
define void @julia_f_63575({ i64, i64 } addrspace(11)* noalias nocapture sret, %jl_value_t addrspace(10)* dereferenceable(8)) #0 {
top:
; Location: REPL[2]:1
; Function getindex; {
; Location: reinterpretarray.jl:31
  %2 = addrspacecast %jl_value_t addrspace(10)* %1 to %jl_value_t addrspace(11)*
  %3 = bitcast %jl_value_t addrspace(11)* %2 to %jl_value_t addrspace(10)* addrspace(11)*
  %4 = load %jl_value_t addrspace(10)*, %jl_value_t addrspace(10)* addrspace(11)* %3, align 8
  %5 = addrspacecast %jl_value_t addrspace(10)* %4 to %jl_value_t addrspace(11)*
  %6 = bitcast %jl_value_t addrspace(11)* %5 to i64* addrspace(11)*
  %7 = load i64*, i64* addrspace(11)* %6, align 8
  %8 = load i64, i64* %7, align 8
  %9 = getelementptr i64, i64* %7, i64 1
  %10 = load i64, i64* %9, align 8
  %.sroa.0.0..sroa_idx = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 0
  store i64 %8, i64 addrspace(11)* %.sroa.0.0..sroa_idx, align 8
  %.sroa.3.0..sroa_idx13 = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 1
  store i64 %10, i64 addrspace(11)* %.sroa.3.0..sroa_idx13, align 8
;}
  ret void
}

julia> g(a) = @inbounds return reinterpret(Complex{Int64}, a)[1]
g (generic function with 1 method)

julia> @code_llvm g(randn(1000))

; Function g
; Location: REPL[4]
define void @julia_g_63642({ i64, i64 } addrspace(11)* noalias nocapture sret, %jl_value_t addrspace(10)* dereferenceable(40)) #0 {
top:
; Location: REPL[4]:1
; Function getindex; {
; Location: reinterpretarray.jl:31
  %2 = addrspacecast %jl_value_t addrspace(10)* %1 to %jl_value_t addrspace(11)*
  %3 = bitcast %jl_value_t addrspace(11)* %2 to double* addrspace(11)*
  %4 = load double*, double* addrspace(11)* %3, align 8
  %5 = bitcast double* %4 to i64*
  %6 = load i64, i64* %5, align 8
  %7 = getelementptr double, double* %4, i64 1
  %8 = bitcast double* %7 to i64*
  %9 = load i64, i64* %8, align 8
  %.sroa.0.0..sroa_idx = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 0
  store i64 %6, i64 addrspace(11)* %.sroa.0.0..sroa_idx, align 8
  %.sroa.3.0..sroa_idx13 = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 1
  store i64 %9, i64 addrspace(11)* %.sroa.3.0..sroa_idx13, align 8
;}
  ret void
}
```

In addition, the new `reinterpret` implementation is able to handle any AbstractArray
(whether useful or not is a separate decision):

```
invoke(reinterpret, Tuple{Type{Complex{Float64}}, AbstractArray}, Complex{Float64}, speye(10))
5×10 Base.ReinterpretArray{Complex{Float64},Float64,2,SparseMatrixCSC{Float64,Int64}}:
 1.0+0.0im  0.0+1.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im
 0.0+0.0im  0.0+0.0im  1.0+0.0im  0.0+1.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im
 0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  1.0+0.0im  0.0+1.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im
 0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  1.0+0.0im  0.0+1.0im  0.0+0.0im  0.0+0.0im
 0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  1.0+0.0im  0.0+1.0im
```

The remaining todo is to audit the uses of reinterpret in base. I've fixed up the uses themselves, but there's
code deeper in the array code that needs to be broadened to allow ReinterpretArray.

Fixes #22849
Fixes #19238
Keno added a commit that referenced this issue Sep 29, 2017
This redoes `reinterpret` in julia rather than punning the memory
of the actual array. The motivation for this is to avoid the API
limitations of the current reinterpret implementation (Array only,
preventing strong TBAA, alignment problems). The surface API
essentially unchanged, though the shape argument to reinterpret
is removed, since those concepts are now orthogonal. The return
type from `reinterpret` is now `ReinterpretArray`, which implements
the AbstractArray interface and does the reinterpreting lazily on
demand. The compiler is able to fold away the abstraction and
generate very tight IR:

```
julia> ar = reinterpret(Complex{Int64}, rand(Int64, 1000));

julia> typeof(ar)
Base.ReinterpretArray{Complex{Int64},Int64,1,Array{Int64,1}}

julia> f(ar) = @inbounds return ar[1]
f (generic function with 1 method)

julia> @code_llvm f(ar)

; Function f
; Location: REPL[2]
define void @julia_f_63575({ i64, i64 } addrspace(11)* noalias nocapture sret, %jl_value_t addrspace(10)* dereferenceable(8)) #0 {
top:
; Location: REPL[2]:1
; Function getindex; {
; Location: reinterpretarray.jl:31
  %2 = addrspacecast %jl_value_t addrspace(10)* %1 to %jl_value_t addrspace(11)*
  %3 = bitcast %jl_value_t addrspace(11)* %2 to %jl_value_t addrspace(10)* addrspace(11)*
  %4 = load %jl_value_t addrspace(10)*, %jl_value_t addrspace(10)* addrspace(11)* %3, align 8
  %5 = addrspacecast %jl_value_t addrspace(10)* %4 to %jl_value_t addrspace(11)*
  %6 = bitcast %jl_value_t addrspace(11)* %5 to i64* addrspace(11)*
  %7 = load i64*, i64* addrspace(11)* %6, align 8
  %8 = load i64, i64* %7, align 8
  %9 = getelementptr i64, i64* %7, i64 1
  %10 = load i64, i64* %9, align 8
  %.sroa.0.0..sroa_idx = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 0
  store i64 %8, i64 addrspace(11)* %.sroa.0.0..sroa_idx, align 8
  %.sroa.3.0..sroa_idx13 = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 1
  store i64 %10, i64 addrspace(11)* %.sroa.3.0..sroa_idx13, align 8
;}
  ret void
}

julia> g(a) = @inbounds return reinterpret(Complex{Int64}, a)[1]
g (generic function with 1 method)

julia> @code_llvm g(randn(1000))

; Function g
; Location: REPL[4]
define void @julia_g_63642({ i64, i64 } addrspace(11)* noalias nocapture sret, %jl_value_t addrspace(10)* dereferenceable(40)) #0 {
top:
; Location: REPL[4]:1
; Function getindex; {
; Location: reinterpretarray.jl:31
  %2 = addrspacecast %jl_value_t addrspace(10)* %1 to %jl_value_t addrspace(11)*
  %3 = bitcast %jl_value_t addrspace(11)* %2 to double* addrspace(11)*
  %4 = load double*, double* addrspace(11)* %3, align 8
  %5 = bitcast double* %4 to i64*
  %6 = load i64, i64* %5, align 8
  %7 = getelementptr double, double* %4, i64 1
  %8 = bitcast double* %7 to i64*
  %9 = load i64, i64* %8, align 8
  %.sroa.0.0..sroa_idx = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 0
  store i64 %6, i64 addrspace(11)* %.sroa.0.0..sroa_idx, align 8
  %.sroa.3.0..sroa_idx13 = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 1
  store i64 %9, i64 addrspace(11)* %.sroa.3.0..sroa_idx13, align 8
;}
  ret void
}
```

In addition, the new `reinterpret` implementation is able to handle any AbstractArray
(whether useful or not is a separate decision):

```
invoke(reinterpret, Tuple{Type{Complex{Float64}}, AbstractArray}, Complex{Float64}, speye(10))
5×10 Base.ReinterpretArray{Complex{Float64},Float64,2,SparseMatrixCSC{Float64,Int64}}:
 1.0+0.0im  0.0+1.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im
 0.0+0.0im  0.0+0.0im  1.0+0.0im  0.0+1.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im
 0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  1.0+0.0im  0.0+1.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im
 0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  1.0+0.0im  0.0+1.0im  0.0+0.0im  0.0+0.0im
 0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  1.0+0.0im  0.0+1.0im
```

The remaining todo is to audit the uses of reinterpret in base. I've fixed up the uses themselves, but there's
code deeper in the array code that needs to be broadened to allow ReinterpretArray.

Fixes #22849
Fixes #19238
Keno added a commit that referenced this issue Oct 2, 2017
This redoes `reinterpret` in julia rather than punning the memory
of the actual array. The motivation for this is to avoid the API
limitations of the current reinterpret implementation (Array only,
preventing strong TBAA, alignment problems). The surface API
essentially unchanged, though the shape argument to reinterpret
is removed, since those concepts are now orthogonal. The return
type from `reinterpret` is now `ReinterpretArray`, which implements
the AbstractArray interface and does the reinterpreting lazily on
demand. The compiler is able to fold away the abstraction and
generate very tight IR:

```
julia> ar = reinterpret(Complex{Int64}, rand(Int64, 1000));

julia> typeof(ar)
Base.ReinterpretArray{Complex{Int64},Int64,1,Array{Int64,1}}

julia> f(ar) = @inbounds return ar[1]
f (generic function with 1 method)

julia> @code_llvm f(ar)

; Function f
; Location: REPL[2]
define void @julia_f_63575({ i64, i64 } addrspace(11)* noalias nocapture sret, %jl_value_t addrspace(10)* dereferenceable(8)) #0 {
top:
; Location: REPL[2]:1
; Function getindex; {
; Location: reinterpretarray.jl:31
  %2 = addrspacecast %jl_value_t addrspace(10)* %1 to %jl_value_t addrspace(11)*
  %3 = bitcast %jl_value_t addrspace(11)* %2 to %jl_value_t addrspace(10)* addrspace(11)*
  %4 = load %jl_value_t addrspace(10)*, %jl_value_t addrspace(10)* addrspace(11)* %3, align 8
  %5 = addrspacecast %jl_value_t addrspace(10)* %4 to %jl_value_t addrspace(11)*
  %6 = bitcast %jl_value_t addrspace(11)* %5 to i64* addrspace(11)*
  %7 = load i64*, i64* addrspace(11)* %6, align 8
  %8 = load i64, i64* %7, align 8
  %9 = getelementptr i64, i64* %7, i64 1
  %10 = load i64, i64* %9, align 8
  %.sroa.0.0..sroa_idx = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 0
  store i64 %8, i64 addrspace(11)* %.sroa.0.0..sroa_idx, align 8
  %.sroa.3.0..sroa_idx13 = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 1
  store i64 %10, i64 addrspace(11)* %.sroa.3.0..sroa_idx13, align 8
;}
  ret void
}

julia> g(a) = @inbounds return reinterpret(Complex{Int64}, a)[1]
g (generic function with 1 method)

julia> @code_llvm g(randn(1000))

; Function g
; Location: REPL[4]
define void @julia_g_63642({ i64, i64 } addrspace(11)* noalias nocapture sret, %jl_value_t addrspace(10)* dereferenceable(40)) #0 {
top:
; Location: REPL[4]:1
; Function getindex; {
; Location: reinterpretarray.jl:31
  %2 = addrspacecast %jl_value_t addrspace(10)* %1 to %jl_value_t addrspace(11)*
  %3 = bitcast %jl_value_t addrspace(11)* %2 to double* addrspace(11)*
  %4 = load double*, double* addrspace(11)* %3, align 8
  %5 = bitcast double* %4 to i64*
  %6 = load i64, i64* %5, align 8
  %7 = getelementptr double, double* %4, i64 1
  %8 = bitcast double* %7 to i64*
  %9 = load i64, i64* %8, align 8
  %.sroa.0.0..sroa_idx = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 0
  store i64 %6, i64 addrspace(11)* %.sroa.0.0..sroa_idx, align 8
  %.sroa.3.0..sroa_idx13 = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 1
  store i64 %9, i64 addrspace(11)* %.sroa.3.0..sroa_idx13, align 8
;}
  ret void
}
```

In addition, the new `reinterpret` implementation is able to handle any AbstractArray
(whether useful or not is a separate decision):

```
invoke(reinterpret, Tuple{Type{Complex{Float64}}, AbstractArray}, Complex{Float64}, speye(10))
5×10 Base.ReinterpretArray{Complex{Float64},Float64,2,SparseMatrixCSC{Float64,Int64}}:
 1.0+0.0im  0.0+1.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im
 0.0+0.0im  0.0+0.0im  1.0+0.0im  0.0+1.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im
 0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  1.0+0.0im  0.0+1.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im
 0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  1.0+0.0im  0.0+1.0im  0.0+0.0im  0.0+0.0im
 0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  1.0+0.0im  0.0+1.0im
```

The remaining todo is to audit the uses of reinterpret in base. I've fixed up the uses themselves, but there's
code deeper in the array code that needs to be broadened to allow ReinterpretArray.

Fixes #22849
Fixes #19238
Keno added a commit that referenced this issue Oct 3, 2017
This redoes `reinterpret` in julia rather than punning the memory
of the actual array. The motivation for this is to avoid the API
limitations of the current reinterpret implementation (Array only,
preventing strong TBAA, alignment problems). The surface API
essentially unchanged, though the shape argument to reinterpret
is removed, since those concepts are now orthogonal. The return
type from `reinterpret` is now `ReinterpretArray`, which implements
the AbstractArray interface and does the reinterpreting lazily on
demand. The compiler is able to fold away the abstraction and
generate very tight IR:

```
julia> ar = reinterpret(Complex{Int64}, rand(Int64, 1000));

julia> typeof(ar)
Base.ReinterpretArray{Complex{Int64},Int64,1,Array{Int64,1}}

julia> f(ar) = @inbounds return ar[1]
f (generic function with 1 method)

julia> @code_llvm f(ar)

; Function f
; Location: REPL[2]
define void @julia_f_63575({ i64, i64 } addrspace(11)* noalias nocapture sret, %jl_value_t addrspace(10)* dereferenceable(8)) #0 {
top:
; Location: REPL[2]:1
; Function getindex; {
; Location: reinterpretarray.jl:31
  %2 = addrspacecast %jl_value_t addrspace(10)* %1 to %jl_value_t addrspace(11)*
  %3 = bitcast %jl_value_t addrspace(11)* %2 to %jl_value_t addrspace(10)* addrspace(11)*
  %4 = load %jl_value_t addrspace(10)*, %jl_value_t addrspace(10)* addrspace(11)* %3, align 8
  %5 = addrspacecast %jl_value_t addrspace(10)* %4 to %jl_value_t addrspace(11)*
  %6 = bitcast %jl_value_t addrspace(11)* %5 to i64* addrspace(11)*
  %7 = load i64*, i64* addrspace(11)* %6, align 8
  %8 = load i64, i64* %7, align 8
  %9 = getelementptr i64, i64* %7, i64 1
  %10 = load i64, i64* %9, align 8
  %.sroa.0.0..sroa_idx = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 0
  store i64 %8, i64 addrspace(11)* %.sroa.0.0..sroa_idx, align 8
  %.sroa.3.0..sroa_idx13 = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 1
  store i64 %10, i64 addrspace(11)* %.sroa.3.0..sroa_idx13, align 8
;}
  ret void
}

julia> g(a) = @inbounds return reinterpret(Complex{Int64}, a)[1]
g (generic function with 1 method)

julia> @code_llvm g(randn(1000))

; Function g
; Location: REPL[4]
define void @julia_g_63642({ i64, i64 } addrspace(11)* noalias nocapture sret, %jl_value_t addrspace(10)* dereferenceable(40)) #0 {
top:
; Location: REPL[4]:1
; Function getindex; {
; Location: reinterpretarray.jl:31
  %2 = addrspacecast %jl_value_t addrspace(10)* %1 to %jl_value_t addrspace(11)*
  %3 = bitcast %jl_value_t addrspace(11)* %2 to double* addrspace(11)*
  %4 = load double*, double* addrspace(11)* %3, align 8
  %5 = bitcast double* %4 to i64*
  %6 = load i64, i64* %5, align 8
  %7 = getelementptr double, double* %4, i64 1
  %8 = bitcast double* %7 to i64*
  %9 = load i64, i64* %8, align 8
  %.sroa.0.0..sroa_idx = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 0
  store i64 %6, i64 addrspace(11)* %.sroa.0.0..sroa_idx, align 8
  %.sroa.3.0..sroa_idx13 = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 1
  store i64 %9, i64 addrspace(11)* %.sroa.3.0..sroa_idx13, align 8
;}
  ret void
}
```

In addition, the new `reinterpret` implementation is able to handle any AbstractArray
(whether useful or not is a separate decision):

```
invoke(reinterpret, Tuple{Type{Complex{Float64}}, AbstractArray}, Complex{Float64}, speye(10))
5×10 Base.ReinterpretArray{Complex{Float64},Float64,2,SparseMatrixCSC{Float64,Int64}}:
 1.0+0.0im  0.0+1.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im
 0.0+0.0im  0.0+0.0im  1.0+0.0im  0.0+1.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im
 0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  1.0+0.0im  0.0+1.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im
 0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  1.0+0.0im  0.0+1.0im  0.0+0.0im  0.0+0.0im
 0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  1.0+0.0im  0.0+1.0im
```

The remaining todo is to audit the uses of reinterpret in base. I've fixed up the uses themselves, but there's
code deeper in the array code that needs to be broadened to allow ReinterpretArray.

Fixes #22849
Fixes #19238
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
parallelism Parallel or distributed computation
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants