Get rid of reinterpret in its current form #22849

Keno · 2017-07-17T20:50:41Z

I don't like the way reinterpret is currently implemented. It puns on the notion of an array. One problem we recently had to deal with is that alignment guarantees are different between different types (those are now disallowed). Another is that it prevents us from doing more strict TBAA on array element types. I think we should get rid of reinterpret completely and replace it with a ReinterpretArray type, who's getindex method performs the appropriate load from the original array. That way we never have an Array with incorrectly typed storage, but can retain the convenience (and once again allow reinterpret for types with mismatched alignment).

The text was updated successfully, but these errors were encountered:

vtjnash · 2017-07-17T21:22:21Z

I think the issue we have is that we allow reinterpret to cast between types that a method like ReinterpretArray couldn't. For example, between structs and bytes (allowing it to observe padding bytes and alignment, for example).

However, do we still need that capability in general? In the parts of the IO system that used to use this, I've largely switched them to using Ref and doing unsafe_ byte IO (by casting to a Ptr{UInt8}) and recommending marking them noinline (for both the gc-root and as a TBAA barrier).

Keno · 2017-07-17T21:26:20Z

I'm fine with implementing ReinterpretArray's getindex by copying out enough elements to on-stack storage and then reinterpreting there. LLVM will fold any unnecessary copies easily enough. My only problem is with Array here.

vtjnash · 2017-07-17T21:41:23Z

Right, but how would you express that? Currently we don't allow any sort of struct-type-punning in reinterpret for values, just for Arrays (as a special case).

If copy the elements to stack, wouldn't that still imply that the source and destination TBAA are different, inhibiting optimization around it (since LLVM doesn't have a TBAA union type)?

Keno · 2017-07-18T06:58:14Z

The fact that source and destination TBAA can't be different is a limitation of the current memcpy implementation, not of the underlying semantics. I plan to fix that. That's an optimization however. The reinterpret changes would be breaking and would have to be done pre-1.0.

lobingera · 2017-07-18T07:38:22Z

So do you mean to change the implementation of reinterpret, or the syntax?

Keno · 2017-07-18T07:41:28Z

implementation. But the return type would change, so it's breaking.

lobingera · 2017-07-18T07:44:57Z

Then i'd like to see a better motivation than "I don't like the way reinterpret is currently implemented."

Keno · 2017-07-18T07:50:36Z

It's in the original post. It violates alignment guarantees (so you now get errors unless you know about alignment issues) and it prevents us from strengthening TBAA.

iamed2 · 2017-07-26T17:53:51Z

I would love to see this. As a user I appreciate this interface.

vtjnash · 2017-08-07T15:05:58Z

Cross-ref additional issues with reinterpret: #16652

StefanKarpinski · 2017-08-20T19:27:19Z

What about this API:

you can declare an alighnment for an array when you create it
alignment can be computed from a given set of element types
you can reinterpret to any element type for which an arrays alignment is valid

This requires a little forethought but it should be doable.

Another possible API for when that's not possible is to extract elements of a different type from an array without reinterpreting it, as if it had a different element type. That would potentially be slow of the alignment disagrees, but it wouldn't be wrong at least. We can have both as they're somewhat complementary.

yuyichao · 2017-08-20T19:32:00Z

you can declare an alighnment for an array when you create it

Will this be in the type or in the value?

StefanKarpinski · 2017-08-20T19:42:14Z

One should also be able to query the maximal alignment of an array.

yuyichao · 2017-08-20T19:51:27Z

you can declare an alighnment for an array when you create it

alignment can be computed from a given set of element types

I'm still confused by what you mean. Do you mean the alignment is determined by the array or the eltype?

StefanKarpinski · 2017-08-21T02:22:05Z

The alignment would have to be a property of the array. The original element type would determine the minimal alignment that an array must have, but the idea is that you could request coarser alignment if you know it's going to be needed. Something like this:

a = Array{UInt8}(n)
b = reinterpret(UInt128, a) # illegal

a = Array{UInt8}(2^20, alignment=128)
b = reinterpret(UInt128, b) # legal

For the second suggestion, I had something like this in mind:

reinterpret(UInt128, a, i) # reinterpret(UInt128, a)[i] but legal

Keno's idea of having a ReinterpretArray that can deal with misalignment issues is essentially a wrapper type that does that via a wrapper instead. Perhaps that's better anyway. We are increasingly moving away from arrays invisibly sharing memory and toward them explicitly sharing memory by having one array type wrap another. Of course reshape on Arrays still does the invisible memory sharing thing. But we have Base.ReshapedArray for cases where that can't work.

vtjnash · 2017-08-21T04:13:05Z

I should perhaps clarify that Yichao is "confused" because it is utterly useless to have alignment be part of the data. We can already compute the information for free from the pointer itself. However, only alignments that are guaranteed at compile time (for various combinations of register types and hardware) are performant. Everything else (especially testing for runtime alignment properties) is potentially a substantial performance penalty.

yuyichao · 2017-08-21T04:50:20Z

It's not really useless since the concern of not allowing accidentally aligned array to be reinterpreted is that it'll hide the errors until someone get's unlucky. If the property has to be explicitly specified this won't be an issue anymore.

That said, such an object-specific property will make shift!/unshift! much more expensive (for aligned arrays) and it seems that all what it can do is already covered by a ReinterpretedArray backed by reinterpret(T, a, i) so it doesn't really seem necessary.

StefanKarpinski · 2017-08-21T14:01:31Z

The point is to make it possible to allocate an array of one size in a way that can be safely reinterpreted later to a larger size – which we can currently only do by allocating in the bigger size first and then reinterpreting the other way. There are no runtime checks on element access since you never allow an array to be reinterpreted to a size that's not safe and pre-requested. There is a runtime check on reinterpret, but that doesn't matter.

The ReinterpretedArray approach seems fine if one only cares about element access performance for Arrays but not for ReinterpretedArrays – which may not always be the case. Of course, maybe that's good enough, since you can always do copy(reinterpret(T, a)). But that forces copying the entire array. We might want a way to realign an array in place, which can be done shifting an array in-place in memory, after which it can be efficiently accessed via two different sizes until someone does a shift! or unshift! operation in the smaller size.

Keno · 2017-08-21T15:01:26Z

Accessing a non-reinterpreted array should in all cases be preferred to the reinterpreted case, performance tradeoff wise. LLVM will do a fine job folding everything to give decent performance with the ReinterpretedArray approach as well.

StefanKarpinski · 2017-08-22T13:56:06Z

LLVM will do a fine job folding everything to give decent performance with the ReinterpretedArray approach as well.

How is that possible? You're still accessing an array with the wrong alignment? If this were no slower why would we bother insisting on alignment in the first place? Or by "decent performance" do you mean, not optimal, but as good as one can get with potentially misaligned storage?

Keno · 2017-08-22T13:57:11Z

Or by "decent performance" do you mean, not optimal, but as good as one can get with potentially misaligned storage?

Yes, which is really the best you can hope for without pessimizing everything else.

StefanKarpinski · 2017-08-31T18:08:43Z

Decided: we're going to do the ReinterpretedArray thing.

Keno · 2017-09-01T22:14:29Z

Note to self: get rid of this as part of this: https://github.com/JuliaLang/julia/blob/master/base/bitarray.jl#L292

@inbounds

This redoes `reinterpret` in julia rather than punning the memory of the actual array. The motivation for this is to avoid the API limitations of the current reinterpret implementation (Array only, preventing strong TBAA, alignment problems). The surface API essentially unchanged, though the shape argument to reinterpret is removed, since those concepts are now orthogonal. The return type from `reinterpret` is now `ReinterpretArray`, which implements the AbstractArray interface and does the reinterpreting lazily on demand. The compiler is able to fold away the abstraction and generate very tight IR: ``` julia> ar = reinterpret(Complex{Int64}, rand(Int64, 1000)); julia> typeof(ar) Base.ReinterpretArray{Complex{Int64},Int64,1,Array{Int64,1}} julia> f(ar) = @inbounds return ar[1] f (generic function with 1 method) julia> @code_llvm f(ar) ; Function f ; Location: REPL[2] define void @julia_f_63575({ i64, i64 } addrspace(11)* noalias nocapture sret, %jl_value_t addrspace(10)* dereferenceable(8)) #0 { top: ; Location: REPL[2]:1 ; Function getindex; { ; Location: reinterpretarray.jl:31 %2 = addrspacecast %jl_value_t addrspace(10)* %1 to %jl_value_t addrspace(11)* %3 = bitcast %jl_value_t addrspace(11)* %2 to %jl_value_t addrspace(10)* addrspace(11)* %4 = load %jl_value_t addrspace(10)*, %jl_value_t addrspace(10)* addrspace(11)* %3, align 8 %5 = addrspacecast %jl_value_t addrspace(10)* %4 to %jl_value_t addrspace(11)* %6 = bitcast %jl_value_t addrspace(11)* %5 to i64* addrspace(11)* %7 = load i64*, i64* addrspace(11)* %6, align 8 %8 = load i64, i64* %7, align 8 %9 = getelementptr i64, i64* %7, i64 1 %10 = load i64, i64* %9, align 8 %.sroa.0.0..sroa_idx = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 0 store i64 %8, i64 addrspace(11)* %.sroa.0.0..sroa_idx, align 8 %.sroa.3.0..sroa_idx13 = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 1 store i64 %10, i64 addrspace(11)* %.sroa.3.0..sroa_idx13, align 8 ;} ret void } julia> g(a) = @inbounds return reinterpret(Complex{Int64}, a)[1] g (generic function with 1 method) julia> @code_llvm g(randn(1000)) ; Function g ; Location: REPL[4] define void @julia_g_63642({ i64, i64 } addrspace(11)* noalias nocapture sret, %jl_value_t addrspace(10)* dereferenceable(40)) #0 { top: ; Location: REPL[4]:1 ; Function getindex; { ; Location: reinterpretarray.jl:31 %2 = addrspacecast %jl_value_t addrspace(10)* %1 to %jl_value_t addrspace(11)* %3 = bitcast %jl_value_t addrspace(11)* %2 to double* addrspace(11)* %4 = load double*, double* addrspace(11)* %3, align 8 %5 = bitcast double* %4 to i64* %6 = load i64, i64* %5, align 8 %7 = getelementptr double, double* %4, i64 1 %8 = bitcast double* %7 to i64* %9 = load i64, i64* %8, align 8 %.sroa.0.0..sroa_idx = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 0 store i64 %6, i64 addrspace(11)* %.sroa.0.0..sroa_idx, align 8 %.sroa.3.0..sroa_idx13 = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 1 store i64 %9, i64 addrspace(11)* %.sroa.3.0..sroa_idx13, align 8 ;} ret void } ``` In addition, the new `reinterpret` implementation is able to handle any AbstractArray (whether useful or not is a separate decision): ``` invoke(reinterpret, Tuple{Type{Complex{Float64}}, AbstractArray}, Complex{Float64}, speye(10)) 5×10 Base.ReinterpretArray{Complex{Float64},Float64,2,SparseMatrixCSC{Float64,Int64}}: 1.0+0.0im 0.0+1.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 1.0+0.0im 0.0+1.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 1.0+0.0im 0.0+1.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 1.0+0.0im 0.0+1.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 1.0+0.0im 0.0+1.0im ``` The remaining todo is to audit the uses of reinterpret in base. I've fixed up the uses themselves, but there's code deeper in the array code that needs to be broadened to allow ReinterpretArray. Fixes #22849

@inbounds

This redoes `reinterpret` in julia rather than punning the memory of the actual array. The motivation for this is to avoid the API limitations of the current reinterpret implementation (Array only, preventing strong TBAA, alignment problems). The surface API essentially unchanged, though the shape argument to reinterpret is removed, since those concepts are now orthogonal. The return type from `reinterpret` is now `ReinterpretArray`, which implements the AbstractArray interface and does the reinterpreting lazily on demand. The compiler is able to fold away the abstraction and generate very tight IR: ``` julia> ar = reinterpret(Complex{Int64}, rand(Int64, 1000)); julia> typeof(ar) Base.ReinterpretArray{Complex{Int64},Int64,1,Array{Int64,1}} julia> f(ar) = @inbounds return ar[1] f (generic function with 1 method) julia> @code_llvm f(ar) ; Function f ; Location: REPL[2] define void @julia_f_63575({ i64, i64 } addrspace(11)* noalias nocapture sret, %jl_value_t addrspace(10)* dereferenceable(8)) #0 { top: ; Location: REPL[2]:1 ; Function getindex; { ; Location: reinterpretarray.jl:31 %2 = addrspacecast %jl_value_t addrspace(10)* %1 to %jl_value_t addrspace(11)* %3 = bitcast %jl_value_t addrspace(11)* %2 to %jl_value_t addrspace(10)* addrspace(11)* %4 = load %jl_value_t addrspace(10)*, %jl_value_t addrspace(10)* addrspace(11)* %3, align 8 %5 = addrspacecast %jl_value_t addrspace(10)* %4 to %jl_value_t addrspace(11)* %6 = bitcast %jl_value_t addrspace(11)* %5 to i64* addrspace(11)* %7 = load i64*, i64* addrspace(11)* %6, align 8 %8 = load i64, i64* %7, align 8 %9 = getelementptr i64, i64* %7, i64 1 %10 = load i64, i64* %9, align 8 %.sroa.0.0..sroa_idx = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 0 store i64 %8, i64 addrspace(11)* %.sroa.0.0..sroa_idx, align 8 %.sroa.3.0..sroa_idx13 = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 1 store i64 %10, i64 addrspace(11)* %.sroa.3.0..sroa_idx13, align 8 ;} ret void } julia> g(a) = @inbounds return reinterpret(Complex{Int64}, a)[1] g (generic function with 1 method) julia> @code_llvm g(randn(1000)) ; Function g ; Location: REPL[4] define void @julia_g_63642({ i64, i64 } addrspace(11)* noalias nocapture sret, %jl_value_t addrspace(10)* dereferenceable(40)) #0 { top: ; Location: REPL[4]:1 ; Function getindex; { ; Location: reinterpretarray.jl:31 %2 = addrspacecast %jl_value_t addrspace(10)* %1 to %jl_value_t addrspace(11)* %3 = bitcast %jl_value_t addrspace(11)* %2 to double* addrspace(11)* %4 = load double*, double* addrspace(11)* %3, align 8 %5 = bitcast double* %4 to i64* %6 = load i64, i64* %5, align 8 %7 = getelementptr double, double* %4, i64 1 %8 = bitcast double* %7 to i64* %9 = load i64, i64* %8, align 8 %.sroa.0.0..sroa_idx = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 0 store i64 %6, i64 addrspace(11)* %.sroa.0.0..sroa_idx, align 8 %.sroa.3.0..sroa_idx13 = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 1 store i64 %9, i64 addrspace(11)* %.sroa.3.0..sroa_idx13, align 8 ;} ret void } ``` In addition, the new `reinterpret` implementation is able to handle any AbstractArray (whether useful or not is a separate decision): ``` invoke(reinterpret, Tuple{Type{Complex{Float64}}, AbstractArray}, Complex{Float64}, speye(10)) 5×10 Base.ReinterpretArray{Complex{Float64},Float64,2,SparseMatrixCSC{Float64,Int64}}: 1.0+0.0im 0.0+1.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 1.0+0.0im 0.0+1.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 1.0+0.0im 0.0+1.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 1.0+0.0im 0.0+1.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 1.0+0.0im 0.0+1.0im ``` The remaining todo is to audit the uses of reinterpret in base. I've fixed up the uses themselves, but there's code deeper in the array code that needs to be broadened to allow ReinterpretArray. Fixes #22849 Fixes #19238

@inbounds

This redoes `reinterpret` in julia rather than punning the memory of the actual array. The motivation for this is to avoid the API limitations of the current reinterpret implementation (Array only, preventing strong TBAA, alignment problems). The surface API essentially unchanged, though the shape argument to reinterpret is removed, since those concepts are now orthogonal. The return type from `reinterpret` is now `ReinterpretArray`, which implements the AbstractArray interface and does the reinterpreting lazily on demand. The compiler is able to fold away the abstraction and generate very tight IR: ``` julia> ar = reinterpret(Complex{Int64}, rand(Int64, 1000)); julia> typeof(ar) Base.ReinterpretArray{Complex{Int64},Int64,1,Array{Int64,1}} julia> f(ar) = @inbounds return ar[1] f (generic function with 1 method) julia> @code_llvm f(ar) ; Function f ; Location: REPL[2] define void @julia_f_63575({ i64, i64 } addrspace(11)* noalias nocapture sret, %jl_value_t addrspace(10)* dereferenceable(8)) #0 { top: ; Location: REPL[2]:1 ; Function getindex; { ; Location: reinterpretarray.jl:31 %2 = addrspacecast %jl_value_t addrspace(10)* %1 to %jl_value_t addrspace(11)* %3 = bitcast %jl_value_t addrspace(11)* %2 to %jl_value_t addrspace(10)* addrspace(11)* %4 = load %jl_value_t addrspace(10)*, %jl_value_t addrspace(10)* addrspace(11)* %3, align 8 %5 = addrspacecast %jl_value_t addrspace(10)* %4 to %jl_value_t addrspace(11)* %6 = bitcast %jl_value_t addrspace(11)* %5 to i64* addrspace(11)* %7 = load i64*, i64* addrspace(11)* %6, align 8 %8 = load i64, i64* %7, align 8 %9 = getelementptr i64, i64* %7, i64 1 %10 = load i64, i64* %9, align 8 %.sroa.0.0..sroa_idx = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 0 store i64 %8, i64 addrspace(11)* %.sroa.0.0..sroa_idx, align 8 %.sroa.3.0..sroa_idx13 = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 1 store i64 %10, i64 addrspace(11)* %.sroa.3.0..sroa_idx13, align 8 ;} ret void } julia> g(a) = @inbounds return reinterpret(Complex{Int64}, a)[1] g (generic function with 1 method) julia> @code_llvm g(randn(1000)) ; Function g ; Location: REPL[4] define void @julia_g_63642({ i64, i64 } addrspace(11)* noalias nocapture sret, %jl_value_t addrspace(10)* dereferenceable(40)) #0 { top: ; Location: REPL[4]:1 ; Function getindex; { ; Location: reinterpretarray.jl:31 %2 = addrspacecast %jl_value_t addrspace(10)* %1 to %jl_value_t addrspace(11)* %3 = bitcast %jl_value_t addrspace(11)* %2 to double* addrspace(11)* %4 = load double*, double* addrspace(11)* %3, align 8 %5 = bitcast double* %4 to i64* %6 = load i64, i64* %5, align 8 %7 = getelementptr double, double* %4, i64 1 %8 = bitcast double* %7 to i64* %9 = load i64, i64* %8, align 8 %.sroa.0.0..sroa_idx = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 0 store i64 %6, i64 addrspace(11)* %.sroa.0.0..sroa_idx, align 8 %.sroa.3.0..sroa_idx13 = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 1 store i64 %9, i64 addrspace(11)* %.sroa.3.0..sroa_idx13, align 8 ;} ret void } ``` In addition, the new `reinterpret` implementation is able to handle any AbstractArray (whether useful or not is a separate decision): ``` invoke(reinterpret, Tuple{Type{Complex{Float64}}, AbstractArray}, Complex{Float64}, speye(10)) 5×10 Base.ReinterpretArray{Complex{Float64},Float64,2,SparseMatrixCSC{Float64,Int64}}: 1.0+0.0im 0.0+1.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 1.0+0.0im 0.0+1.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 1.0+0.0im 0.0+1.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 1.0+0.0im 0.0+1.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 1.0+0.0im 0.0+1.0im ``` The remaining todo is to audit the uses of reinterpret in base. I've fixed up the uses themselves, but there's code deeper in the array code that needs to be broadened to allow ReinterpretArray. Fixes #22849 Fixes #19238

@inbounds

This redoes `reinterpret` in julia rather than punning the memory of the actual array. The motivation for this is to avoid the API limitations of the current reinterpret implementation (Array only, preventing strong TBAA, alignment problems). The surface API essentially unchanged, though the shape argument to reinterpret is removed, since those concepts are now orthogonal. The return type from `reinterpret` is now `ReinterpretArray`, which implements the AbstractArray interface and does the reinterpreting lazily on demand. The compiler is able to fold away the abstraction and generate very tight IR: ``` julia> ar = reinterpret(Complex{Int64}, rand(Int64, 1000)); julia> typeof(ar) Base.ReinterpretArray{Complex{Int64},Int64,1,Array{Int64,1}} julia> f(ar) = @inbounds return ar[1] f (generic function with 1 method) julia> @code_llvm f(ar) ; Function f ; Location: REPL[2] define void @julia_f_63575({ i64, i64 } addrspace(11)* noalias nocapture sret, %jl_value_t addrspace(10)* dereferenceable(8)) #0 { top: ; Location: REPL[2]:1 ; Function getindex; { ; Location: reinterpretarray.jl:31 %2 = addrspacecast %jl_value_t addrspace(10)* %1 to %jl_value_t addrspace(11)* %3 = bitcast %jl_value_t addrspace(11)* %2 to %jl_value_t addrspace(10)* addrspace(11)* %4 = load %jl_value_t addrspace(10)*, %jl_value_t addrspace(10)* addrspace(11)* %3, align 8 %5 = addrspacecast %jl_value_t addrspace(10)* %4 to %jl_value_t addrspace(11)* %6 = bitcast %jl_value_t addrspace(11)* %5 to i64* addrspace(11)* %7 = load i64*, i64* addrspace(11)* %6, align 8 %8 = load i64, i64* %7, align 8 %9 = getelementptr i64, i64* %7, i64 1 %10 = load i64, i64* %9, align 8 %.sroa.0.0..sroa_idx = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 0 store i64 %8, i64 addrspace(11)* %.sroa.0.0..sroa_idx, align 8 %.sroa.3.0..sroa_idx13 = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 1 store i64 %10, i64 addrspace(11)* %.sroa.3.0..sroa_idx13, align 8 ;} ret void } julia> g(a) = @inbounds return reinterpret(Complex{Int64}, a)[1] g (generic function with 1 method) julia> @code_llvm g(randn(1000)) ; Function g ; Location: REPL[4] define void @julia_g_63642({ i64, i64 } addrspace(11)* noalias nocapture sret, %jl_value_t addrspace(10)* dereferenceable(40)) #0 { top: ; Location: REPL[4]:1 ; Function getindex; { ; Location: reinterpretarray.jl:31 %2 = addrspacecast %jl_value_t addrspace(10)* %1 to %jl_value_t addrspace(11)* %3 = bitcast %jl_value_t addrspace(11)* %2 to double* addrspace(11)* %4 = load double*, double* addrspace(11)* %3, align 8 %5 = bitcast double* %4 to i64* %6 = load i64, i64* %5, align 8 %7 = getelementptr double, double* %4, i64 1 %8 = bitcast double* %7 to i64* %9 = load i64, i64* %8, align 8 %.sroa.0.0..sroa_idx = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 0 store i64 %6, i64 addrspace(11)* %.sroa.0.0..sroa_idx, align 8 %.sroa.3.0..sroa_idx13 = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 1 store i64 %9, i64 addrspace(11)* %.sroa.3.0..sroa_idx13, align 8 ;} ret void } ``` In addition, the new `reinterpret` implementation is able to handle any AbstractArray (whether useful or not is a separate decision): ``` invoke(reinterpret, Tuple{Type{Complex{Float64}}, AbstractArray}, Complex{Float64}, speye(10)) 5×10 Base.ReinterpretArray{Complex{Float64},Float64,2,SparseMatrixCSC{Float64,Int64}}: 1.0+0.0im 0.0+1.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 1.0+0.0im 0.0+1.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 1.0+0.0im 0.0+1.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 1.0+0.0im 0.0+1.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 1.0+0.0im 0.0+1.0im ``` The remaining todo is to audit the uses of reinterpret in base. I've fixed up the uses themselves, but there's code deeper in the array code that needs to be broadened to allow ReinterpretArray. Fixes #22849 Fixes #19238

@inbounds

This redoes `reinterpret` in julia rather than punning the memory of the actual array. The motivation for this is to avoid the API limitations of the current reinterpret implementation (Array only, preventing strong TBAA, alignment problems). The surface API essentially unchanged, though the shape argument to reinterpret is removed, since those concepts are now orthogonal. The return type from `reinterpret` is now `ReinterpretArray`, which implements the AbstractArray interface and does the reinterpreting lazily on demand. The compiler is able to fold away the abstraction and generate very tight IR: ``` julia> ar = reinterpret(Complex{Int64}, rand(Int64, 1000)); julia> typeof(ar) Base.ReinterpretArray{Complex{Int64},Int64,1,Array{Int64,1}} julia> f(ar) = @inbounds return ar[1] f (generic function with 1 method) julia> @code_llvm f(ar) ; Function f ; Location: REPL[2] define void @julia_f_63575({ i64, i64 } addrspace(11)* noalias nocapture sret, %jl_value_t addrspace(10)* dereferenceable(8)) #0 { top: ; Location: REPL[2]:1 ; Function getindex; { ; Location: reinterpretarray.jl:31 %2 = addrspacecast %jl_value_t addrspace(10)* %1 to %jl_value_t addrspace(11)* %3 = bitcast %jl_value_t addrspace(11)* %2 to %jl_value_t addrspace(10)* addrspace(11)* %4 = load %jl_value_t addrspace(10)*, %jl_value_t addrspace(10)* addrspace(11)* %3, align 8 %5 = addrspacecast %jl_value_t addrspace(10)* %4 to %jl_value_t addrspace(11)* %6 = bitcast %jl_value_t addrspace(11)* %5 to i64* addrspace(11)* %7 = load i64*, i64* addrspace(11)* %6, align 8 %8 = load i64, i64* %7, align 8 %9 = getelementptr i64, i64* %7, i64 1 %10 = load i64, i64* %9, align 8 %.sroa.0.0..sroa_idx = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 0 store i64 %8, i64 addrspace(11)* %.sroa.0.0..sroa_idx, align 8 %.sroa.3.0..sroa_idx13 = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 1 store i64 %10, i64 addrspace(11)* %.sroa.3.0..sroa_idx13, align 8 ;} ret void } julia> g(a) = @inbounds return reinterpret(Complex{Int64}, a)[1] g (generic function with 1 method) julia> @code_llvm g(randn(1000)) ; Function g ; Location: REPL[4] define void @julia_g_63642({ i64, i64 } addrspace(11)* noalias nocapture sret, %jl_value_t addrspace(10)* dereferenceable(40)) #0 { top: ; Location: REPL[4]:1 ; Function getindex; { ; Location: reinterpretarray.jl:31 %2 = addrspacecast %jl_value_t addrspace(10)* %1 to %jl_value_t addrspace(11)* %3 = bitcast %jl_value_t addrspace(11)* %2 to double* addrspace(11)* %4 = load double*, double* addrspace(11)* %3, align 8 %5 = bitcast double* %4 to i64* %6 = load i64, i64* %5, align 8 %7 = getelementptr double, double* %4, i64 1 %8 = bitcast double* %7 to i64* %9 = load i64, i64* %8, align 8 %.sroa.0.0..sroa_idx = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 0 store i64 %6, i64 addrspace(11)* %.sroa.0.0..sroa_idx, align 8 %.sroa.3.0..sroa_idx13 = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 1 store i64 %9, i64 addrspace(11)* %.sroa.3.0..sroa_idx13, align 8 ;} ret void } ``` In addition, the new `reinterpret` implementation is able to handle any AbstractArray (whether useful or not is a separate decision): ``` invoke(reinterpret, Tuple{Type{Complex{Float64}}, AbstractArray}, Complex{Float64}, speye(10)) 5×10 Base.ReinterpretArray{Complex{Float64},Float64,2,SparseMatrixCSC{Float64,Int64}}: 1.0+0.0im 0.0+1.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 1.0+0.0im 0.0+1.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 1.0+0.0im 0.0+1.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 1.0+0.0im 0.0+1.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 1.0+0.0im 0.0+1.0im ``` The remaining todo is to audit the uses of reinterpret in base. I've fixed up the uses themselves, but there's code deeper in the array code that needs to be broadened to allow ReinterpretArray. Fixes #22849 Fixes #19238

@inbounds

This redoes `reinterpret` in julia rather than punning the memory of the actual array. The motivation for this is to avoid the API limitations of the current reinterpret implementation (Array only, preventing strong TBAA, alignment problems). The surface API essentially unchanged, though the shape argument to reinterpret is removed, since those concepts are now orthogonal. The return type from `reinterpret` is now `ReinterpretArray`, which implements the AbstractArray interface and does the reinterpreting lazily on demand. The compiler is able to fold away the abstraction and generate very tight IR: ``` julia> ar = reinterpret(Complex{Int64}, rand(Int64, 1000)); julia> typeof(ar) Base.ReinterpretArray{Complex{Int64},Int64,1,Array{Int64,1}} julia> f(ar) = @inbounds return ar[1] f (generic function with 1 method) julia> @code_llvm f(ar) ; Function f ; Location: REPL[2] define void @julia_f_63575({ i64, i64 } addrspace(11)* noalias nocapture sret, %jl_value_t addrspace(10)* dereferenceable(8)) #0 { top: ; Location: REPL[2]:1 ; Function getindex; { ; Location: reinterpretarray.jl:31 %2 = addrspacecast %jl_value_t addrspace(10)* %1 to %jl_value_t addrspace(11)* %3 = bitcast %jl_value_t addrspace(11)* %2 to %jl_value_t addrspace(10)* addrspace(11)* %4 = load %jl_value_t addrspace(10)*, %jl_value_t addrspace(10)* addrspace(11)* %3, align 8 %5 = addrspacecast %jl_value_t addrspace(10)* %4 to %jl_value_t addrspace(11)* %6 = bitcast %jl_value_t addrspace(11)* %5 to i64* addrspace(11)* %7 = load i64*, i64* addrspace(11)* %6, align 8 %8 = load i64, i64* %7, align 8 %9 = getelementptr i64, i64* %7, i64 1 %10 = load i64, i64* %9, align 8 %.sroa.0.0..sroa_idx = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 0 store i64 %8, i64 addrspace(11)* %.sroa.0.0..sroa_idx, align 8 %.sroa.3.0..sroa_idx13 = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 1 store i64 %10, i64 addrspace(11)* %.sroa.3.0..sroa_idx13, align 8 ;} ret void } julia> g(a) = @inbounds return reinterpret(Complex{Int64}, a)[1] g (generic function with 1 method) julia> @code_llvm g(randn(1000)) ; Function g ; Location: REPL[4] define void @julia_g_63642({ i64, i64 } addrspace(11)* noalias nocapture sret, %jl_value_t addrspace(10)* dereferenceable(40)) #0 { top: ; Location: REPL[4]:1 ; Function getindex; { ; Location: reinterpretarray.jl:31 %2 = addrspacecast %jl_value_t addrspace(10)* %1 to %jl_value_t addrspace(11)* %3 = bitcast %jl_value_t addrspace(11)* %2 to double* addrspace(11)* %4 = load double*, double* addrspace(11)* %3, align 8 %5 = bitcast double* %4 to i64* %6 = load i64, i64* %5, align 8 %7 = getelementptr double, double* %4, i64 1 %8 = bitcast double* %7 to i64* %9 = load i64, i64* %8, align 8 %.sroa.0.0..sroa_idx = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 0 store i64 %6, i64 addrspace(11)* %.sroa.0.0..sroa_idx, align 8 %.sroa.3.0..sroa_idx13 = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 1 store i64 %9, i64 addrspace(11)* %.sroa.3.0..sroa_idx13, align 8 ;} ret void } ``` In addition, the new `reinterpret` implementation is able to handle any AbstractArray (whether useful or not is a separate decision): ``` invoke(reinterpret, Tuple{Type{Complex{Float64}}, AbstractArray}, Complex{Float64}, speye(10)) 5×10 Base.ReinterpretArray{Complex{Float64},Float64,2,SparseMatrixCSC{Float64,Int64}}: 1.0+0.0im 0.0+1.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 1.0+0.0im 0.0+1.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 1.0+0.0im 0.0+1.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 1.0+0.0im 0.0+1.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 1.0+0.0im 0.0+1.0im ``` The remaining todo is to audit the uses of reinterpret in base. I've fixed up the uses themselves, but there's code deeper in the array code that needs to be broadened to allow ReinterpretArray. Fixes #22849 Fixes #19238

@inbounds

This redoes `reinterpret` in julia rather than punning the memory of the actual array. The motivation for this is to avoid the API limitations of the current reinterpret implementation (Array only, preventing strong TBAA, alignment problems). The surface API essentially unchanged, though the shape argument to reinterpret is removed, since those concepts are now orthogonal. The return type from `reinterpret` is now `ReinterpretArray`, which implements the AbstractArray interface and does the reinterpreting lazily on demand. The compiler is able to fold away the abstraction and generate very tight IR: ``` julia> ar = reinterpret(Complex{Int64}, rand(Int64, 1000)); julia> typeof(ar) Base.ReinterpretArray{Complex{Int64},Int64,1,Array{Int64,1}} julia> f(ar) = @inbounds return ar[1] f (generic function with 1 method) julia> @code_llvm f(ar) ; Function f ; Location: REPL[2] define void @julia_f_63575({ i64, i64 } addrspace(11)* noalias nocapture sret, %jl_value_t addrspace(10)* dereferenceable(8)) #0 { top: ; Location: REPL[2]:1 ; Function getindex; { ; Location: reinterpretarray.jl:31 %2 = addrspacecast %jl_value_t addrspace(10)* %1 to %jl_value_t addrspace(11)* %3 = bitcast %jl_value_t addrspace(11)* %2 to %jl_value_t addrspace(10)* addrspace(11)* %4 = load %jl_value_t addrspace(10)*, %jl_value_t addrspace(10)* addrspace(11)* %3, align 8 %5 = addrspacecast %jl_value_t addrspace(10)* %4 to %jl_value_t addrspace(11)* %6 = bitcast %jl_value_t addrspace(11)* %5 to i64* addrspace(11)* %7 = load i64*, i64* addrspace(11)* %6, align 8 %8 = load i64, i64* %7, align 8 %9 = getelementptr i64, i64* %7, i64 1 %10 = load i64, i64* %9, align 8 %.sroa.0.0..sroa_idx = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 0 store i64 %8, i64 addrspace(11)* %.sroa.0.0..sroa_idx, align 8 %.sroa.3.0..sroa_idx13 = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 1 store i64 %10, i64 addrspace(11)* %.sroa.3.0..sroa_idx13, align 8 ;} ret void } julia> g(a) = @inbounds return reinterpret(Complex{Int64}, a)[1] g (generic function with 1 method) julia> @code_llvm g(randn(1000)) ; Function g ; Location: REPL[4] define void @julia_g_63642({ i64, i64 } addrspace(11)* noalias nocapture sret, %jl_value_t addrspace(10)* dereferenceable(40)) #0 { top: ; Location: REPL[4]:1 ; Function getindex; { ; Location: reinterpretarray.jl:31 %2 = addrspacecast %jl_value_t addrspace(10)* %1 to %jl_value_t addrspace(11)* %3 = bitcast %jl_value_t addrspace(11)* %2 to double* addrspace(11)* %4 = load double*, double* addrspace(11)* %3, align 8 %5 = bitcast double* %4 to i64* %6 = load i64, i64* %5, align 8 %7 = getelementptr double, double* %4, i64 1 %8 = bitcast double* %7 to i64* %9 = load i64, i64* %8, align 8 %.sroa.0.0..sroa_idx = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 0 store i64 %6, i64 addrspace(11)* %.sroa.0.0..sroa_idx, align 8 %.sroa.3.0..sroa_idx13 = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 1 store i64 %9, i64 addrspace(11)* %.sroa.3.0..sroa_idx13, align 8 ;} ret void } ``` In addition, the new `reinterpret` implementation is able to handle any AbstractArray (whether useful or not is a separate decision): ``` invoke(reinterpret, Tuple{Type{Complex{Float64}}, AbstractArray}, Complex{Float64}, speye(10)) 5×10 Base.ReinterpretArray{Complex{Float64},Float64,2,SparseMatrixCSC{Float64,Int64}}: 1.0+0.0im 0.0+1.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 1.0+0.0im 0.0+1.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 1.0+0.0im 0.0+1.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 1.0+0.0im 0.0+1.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 1.0+0.0im 0.0+1.0im ``` The remaining todo is to audit the uses of reinterpret in base. I've fixed up the uses themselves, but there's code deeper in the array code that needs to be broadened to allow ReinterpretArray. Fixes #22849 Fixes #19238

@inbounds

This redoes `reinterpret` in julia rather than punning the memory of the actual array. The motivation for this is to avoid the API limitations of the current reinterpret implementation (Array only, preventing strong TBAA, alignment problems). The surface API essentially unchanged, though the shape argument to reinterpret is removed, since those concepts are now orthogonal. The return type from `reinterpret` is now `ReinterpretArray`, which implements the AbstractArray interface and does the reinterpreting lazily on demand. The compiler is able to fold away the abstraction and generate very tight IR: ``` julia> ar = reinterpret(Complex{Int64}, rand(Int64, 1000)); julia> typeof(ar) Base.ReinterpretArray{Complex{Int64},Int64,1,Array{Int64,1}} julia> f(ar) = @inbounds return ar[1] f (generic function with 1 method) julia> @code_llvm f(ar) ; Function f ; Location: REPL[2] define void @julia_f_63575({ i64, i64 } addrspace(11)* noalias nocapture sret, %jl_value_t addrspace(10)* dereferenceable(8)) #0 { top: ; Location: REPL[2]:1 ; Function getindex; { ; Location: reinterpretarray.jl:31 %2 = addrspacecast %jl_value_t addrspace(10)* %1 to %jl_value_t addrspace(11)* %3 = bitcast %jl_value_t addrspace(11)* %2 to %jl_value_t addrspace(10)* addrspace(11)* %4 = load %jl_value_t addrspace(10)*, %jl_value_t addrspace(10)* addrspace(11)* %3, align 8 %5 = addrspacecast %jl_value_t addrspace(10)* %4 to %jl_value_t addrspace(11)* %6 = bitcast %jl_value_t addrspace(11)* %5 to i64* addrspace(11)* %7 = load i64*, i64* addrspace(11)* %6, align 8 %8 = load i64, i64* %7, align 8 %9 = getelementptr i64, i64* %7, i64 1 %10 = load i64, i64* %9, align 8 %.sroa.0.0..sroa_idx = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 0 store i64 %8, i64 addrspace(11)* %.sroa.0.0..sroa_idx, align 8 %.sroa.3.0..sroa_idx13 = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 1 store i64 %10, i64 addrspace(11)* %.sroa.3.0..sroa_idx13, align 8 ;} ret void } julia> g(a) = @inbounds return reinterpret(Complex{Int64}, a)[1] g (generic function with 1 method) julia> @code_llvm g(randn(1000)) ; Function g ; Location: REPL[4] define void @julia_g_63642({ i64, i64 } addrspace(11)* noalias nocapture sret, %jl_value_t addrspace(10)* dereferenceable(40)) #0 { top: ; Location: REPL[4]:1 ; Function getindex; { ; Location: reinterpretarray.jl:31 %2 = addrspacecast %jl_value_t addrspace(10)* %1 to %jl_value_t addrspace(11)* %3 = bitcast %jl_value_t addrspace(11)* %2 to double* addrspace(11)* %4 = load double*, double* addrspace(11)* %3, align 8 %5 = bitcast double* %4 to i64* %6 = load i64, i64* %5, align 8 %7 = getelementptr double, double* %4, i64 1 %8 = bitcast double* %7 to i64* %9 = load i64, i64* %8, align 8 %.sroa.0.0..sroa_idx = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 0 store i64 %6, i64 addrspace(11)* %.sroa.0.0..sroa_idx, align 8 %.sroa.3.0..sroa_idx13 = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 1 store i64 %9, i64 addrspace(11)* %.sroa.3.0..sroa_idx13, align 8 ;} ret void } ``` In addition, the new `reinterpret` implementation is able to handle any AbstractArray (whether useful or not is a separate decision): ``` invoke(reinterpret, Tuple{Type{Complex{Float64}}, AbstractArray}, Complex{Float64}, speye(10)) 5×10 Base.ReinterpretArray{Complex{Float64},Float64,2,SparseMatrixCSC{Float64,Int64}}: 1.0+0.0im 0.0+1.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 1.0+0.0im 0.0+1.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 1.0+0.0im 0.0+1.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 1.0+0.0im 0.0+1.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 1.0+0.0im 0.0+1.0im ``` The remaining todo is to audit the uses of reinterpret in base. I've fixed up the uses themselves, but there's code deeper in the array code that needs to be broadened to allow ReinterpretArray. Fixes #22849 Fixes #19238

@inbounds

This redoes `reinterpret` in julia rather than punning the memory of the actual array. The motivation for this is to avoid the API limitations of the current reinterpret implementation (Array only, preventing strong TBAA, alignment problems). The surface API essentially unchanged, though the shape argument to reinterpret is removed, since those concepts are now orthogonal. The return type from `reinterpret` is now `ReinterpretArray`, which implements the AbstractArray interface and does the reinterpreting lazily on demand. The compiler is able to fold away the abstraction and generate very tight IR: ``` julia> ar = reinterpret(Complex{Int64}, rand(Int64, 1000)); julia> typeof(ar) Base.ReinterpretArray{Complex{Int64},Int64,1,Array{Int64,1}} julia> f(ar) = @inbounds return ar[1] f (generic function with 1 method) julia> @code_llvm f(ar) ; Function f ; Location: REPL[2] define void @julia_f_63575({ i64, i64 } addrspace(11)* noalias nocapture sret, %jl_value_t addrspace(10)* dereferenceable(8)) #0 { top: ; Location: REPL[2]:1 ; Function getindex; { ; Location: reinterpretarray.jl:31 %2 = addrspacecast %jl_value_t addrspace(10)* %1 to %jl_value_t addrspace(11)* %3 = bitcast %jl_value_t addrspace(11)* %2 to %jl_value_t addrspace(10)* addrspace(11)* %4 = load %jl_value_t addrspace(10)*, %jl_value_t addrspace(10)* addrspace(11)* %3, align 8 %5 = addrspacecast %jl_value_t addrspace(10)* %4 to %jl_value_t addrspace(11)* %6 = bitcast %jl_value_t addrspace(11)* %5 to i64* addrspace(11)* %7 = load i64*, i64* addrspace(11)* %6, align 8 %8 = load i64, i64* %7, align 8 %9 = getelementptr i64, i64* %7, i64 1 %10 = load i64, i64* %9, align 8 %.sroa.0.0..sroa_idx = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 0 store i64 %8, i64 addrspace(11)* %.sroa.0.0..sroa_idx, align 8 %.sroa.3.0..sroa_idx13 = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 1 store i64 %10, i64 addrspace(11)* %.sroa.3.0..sroa_idx13, align 8 ;} ret void } julia> g(a) = @inbounds return reinterpret(Complex{Int64}, a)[1] g (generic function with 1 method) julia> @code_llvm g(randn(1000)) ; Function g ; Location: REPL[4] define void @julia_g_63642({ i64, i64 } addrspace(11)* noalias nocapture sret, %jl_value_t addrspace(10)* dereferenceable(40)) #0 { top: ; Location: REPL[4]:1 ; Function getindex; { ; Location: reinterpretarray.jl:31 %2 = addrspacecast %jl_value_t addrspace(10)* %1 to %jl_value_t addrspace(11)* %3 = bitcast %jl_value_t addrspace(11)* %2 to double* addrspace(11)* %4 = load double*, double* addrspace(11)* %3, align 8 %5 = bitcast double* %4 to i64* %6 = load i64, i64* %5, align 8 %7 = getelementptr double, double* %4, i64 1 %8 = bitcast double* %7 to i64* %9 = load i64, i64* %8, align 8 %.sroa.0.0..sroa_idx = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 0 store i64 %6, i64 addrspace(11)* %.sroa.0.0..sroa_idx, align 8 %.sroa.3.0..sroa_idx13 = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 1 store i64 %9, i64 addrspace(11)* %.sroa.3.0..sroa_idx13, align 8 ;} ret void } ``` In addition, the new `reinterpret` implementation is able to handle any AbstractArray (whether useful or not is a separate decision): ``` invoke(reinterpret, Tuple{Type{Complex{Float64}}, AbstractArray}, Complex{Float64}, speye(10)) 5×10 Base.ReinterpretArray{Complex{Float64},Float64,2,SparseMatrixCSC{Float64,Int64}}: 1.0+0.0im 0.0+1.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 1.0+0.0im 0.0+1.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 1.0+0.0im 0.0+1.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 1.0+0.0im 0.0+1.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 1.0+0.0im 0.0+1.0im ``` The remaining todo is to audit the uses of reinterpret in base. I've fixed up the uses themselves, but there's code deeper in the array code that needs to be broadened to allow ReinterpretArray. Fixes #22849 Fixes #19238

@inbounds

This redoes `reinterpret` in julia rather than punning the memory of the actual array. The motivation for this is to avoid the API limitations of the current reinterpret implementation (Array only, preventing strong TBAA, alignment problems). The surface API essentially unchanged, though the shape argument to reinterpret is removed, since those concepts are now orthogonal. The return type from `reinterpret` is now `ReinterpretArray`, which implements the AbstractArray interface and does the reinterpreting lazily on demand. The compiler is able to fold away the abstraction and generate very tight IR: ``` julia> ar = reinterpret(Complex{Int64}, rand(Int64, 1000)); julia> typeof(ar) Base.ReinterpretArray{Complex{Int64},Int64,1,Array{Int64,1}} julia> f(ar) = @inbounds return ar[1] f (generic function with 1 method) julia> @code_llvm f(ar) ; Function f ; Location: REPL[2] define void @julia_f_63575({ i64, i64 } addrspace(11)* noalias nocapture sret, %jl_value_t addrspace(10)* dereferenceable(8)) #0 { top: ; Location: REPL[2]:1 ; Function getindex; { ; Location: reinterpretarray.jl:31 %2 = addrspacecast %jl_value_t addrspace(10)* %1 to %jl_value_t addrspace(11)* %3 = bitcast %jl_value_t addrspace(11)* %2 to %jl_value_t addrspace(10)* addrspace(11)* %4 = load %jl_value_t addrspace(10)*, %jl_value_t addrspace(10)* addrspace(11)* %3, align 8 %5 = addrspacecast %jl_value_t addrspace(10)* %4 to %jl_value_t addrspace(11)* %6 = bitcast %jl_value_t addrspace(11)* %5 to i64* addrspace(11)* %7 = load i64*, i64* addrspace(11)* %6, align 8 %8 = load i64, i64* %7, align 8 %9 = getelementptr i64, i64* %7, i64 1 %10 = load i64, i64* %9, align 8 %.sroa.0.0..sroa_idx = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 0 store i64 %8, i64 addrspace(11)* %.sroa.0.0..sroa_idx, align 8 %.sroa.3.0..sroa_idx13 = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 1 store i64 %10, i64 addrspace(11)* %.sroa.3.0..sroa_idx13, align 8 ;} ret void } julia> g(a) = @inbounds return reinterpret(Complex{Int64}, a)[1] g (generic function with 1 method) julia> @code_llvm g(randn(1000)) ; Function g ; Location: REPL[4] define void @julia_g_63642({ i64, i64 } addrspace(11)* noalias nocapture sret, %jl_value_t addrspace(10)* dereferenceable(40)) #0 { top: ; Location: REPL[4]:1 ; Function getindex; { ; Location: reinterpretarray.jl:31 %2 = addrspacecast %jl_value_t addrspace(10)* %1 to %jl_value_t addrspace(11)* %3 = bitcast %jl_value_t addrspace(11)* %2 to double* addrspace(11)* %4 = load double*, double* addrspace(11)* %3, align 8 %5 = bitcast double* %4 to i64* %6 = load i64, i64* %5, align 8 %7 = getelementptr double, double* %4, i64 1 %8 = bitcast double* %7 to i64* %9 = load i64, i64* %8, align 8 %.sroa.0.0..sroa_idx = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 0 store i64 %6, i64 addrspace(11)* %.sroa.0.0..sroa_idx, align 8 %.sroa.3.0..sroa_idx13 = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 1 store i64 %9, i64 addrspace(11)* %.sroa.3.0..sroa_idx13, align 8 ;} ret void } ``` In addition, the new `reinterpret` implementation is able to handle any AbstractArray (whether useful or not is a separate decision): ``` invoke(reinterpret, Tuple{Type{Complex{Float64}}, AbstractArray}, Complex{Float64}, speye(10)) 5×10 Base.ReinterpretArray{Complex{Float64},Float64,2,SparseMatrixCSC{Float64,Int64}}: 1.0+0.0im 0.0+1.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 1.0+0.0im 0.0+1.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 1.0+0.0im 0.0+1.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 1.0+0.0im 0.0+1.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 1.0+0.0im 0.0+1.0im ``` The remaining todo is to audit the uses of reinterpret in base. I've fixed up the uses themselves, but there's code deeper in the array code that needs to be broadened to allow ReinterpretArray. Fixes #22849 Fixes #19238

@inbounds

This redoes `reinterpret` in julia rather than punning the memory of the actual array. The motivation for this is to avoid the API limitations of the current reinterpret implementation (Array only, preventing strong TBAA, alignment problems). The surface API essentially unchanged, though the shape argument to reinterpret is removed, since those concepts are now orthogonal. The return type from `reinterpret` is now `ReinterpretArray`, which implements the AbstractArray interface and does the reinterpreting lazily on demand. The compiler is able to fold away the abstraction and generate very tight IR: ``` julia> ar = reinterpret(Complex{Int64}, rand(Int64, 1000)); julia> typeof(ar) Base.ReinterpretArray{Complex{Int64},Int64,1,Array{Int64,1}} julia> f(ar) = @inbounds return ar[1] f (generic function with 1 method) julia> @code_llvm f(ar) ; Function f ; Location: REPL[2] define void @julia_f_63575({ i64, i64 } addrspace(11)* noalias nocapture sret, %jl_value_t addrspace(10)* dereferenceable(8)) #0 { top: ; Location: REPL[2]:1 ; Function getindex; { ; Location: reinterpretarray.jl:31 %2 = addrspacecast %jl_value_t addrspace(10)* %1 to %jl_value_t addrspace(11)* %3 = bitcast %jl_value_t addrspace(11)* %2 to %jl_value_t addrspace(10)* addrspace(11)* %4 = load %jl_value_t addrspace(10)*, %jl_value_t addrspace(10)* addrspace(11)* %3, align 8 %5 = addrspacecast %jl_value_t addrspace(10)* %4 to %jl_value_t addrspace(11)* %6 = bitcast %jl_value_t addrspace(11)* %5 to i64* addrspace(11)* %7 = load i64*, i64* addrspace(11)* %6, align 8 %8 = load i64, i64* %7, align 8 %9 = getelementptr i64, i64* %7, i64 1 %10 = load i64, i64* %9, align 8 %.sroa.0.0..sroa_idx = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 0 store i64 %8, i64 addrspace(11)* %.sroa.0.0..sroa_idx, align 8 %.sroa.3.0..sroa_idx13 = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 1 store i64 %10, i64 addrspace(11)* %.sroa.3.0..sroa_idx13, align 8 ;} ret void } julia> g(a) = @inbounds return reinterpret(Complex{Int64}, a)[1] g (generic function with 1 method) julia> @code_llvm g(randn(1000)) ; Function g ; Location: REPL[4] define void @julia_g_63642({ i64, i64 } addrspace(11)* noalias nocapture sret, %jl_value_t addrspace(10)* dereferenceable(40)) #0 { top: ; Location: REPL[4]:1 ; Function getindex; { ; Location: reinterpretarray.jl:31 %2 = addrspacecast %jl_value_t addrspace(10)* %1 to %jl_value_t addrspace(11)* %3 = bitcast %jl_value_t addrspace(11)* %2 to double* addrspace(11)* %4 = load double*, double* addrspace(11)* %3, align 8 %5 = bitcast double* %4 to i64* %6 = load i64, i64* %5, align 8 %7 = getelementptr double, double* %4, i64 1 %8 = bitcast double* %7 to i64* %9 = load i64, i64* %8, align 8 %.sroa.0.0..sroa_idx = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 0 store i64 %6, i64 addrspace(11)* %.sroa.0.0..sroa_idx, align 8 %.sroa.3.0..sroa_idx13 = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 1 store i64 %9, i64 addrspace(11)* %.sroa.3.0..sroa_idx13, align 8 ;} ret void } ``` In addition, the new `reinterpret` implementation is able to handle any AbstractArray (whether useful or not is a separate decision): ``` invoke(reinterpret, Tuple{Type{Complex{Float64}}, AbstractArray}, Complex{Float64}, speye(10)) 5×10 Base.ReinterpretArray{Complex{Float64},Float64,2,SparseMatrixCSC{Float64,Int64}}: 1.0+0.0im 0.0+1.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 1.0+0.0im 0.0+1.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 1.0+0.0im 0.0+1.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 1.0+0.0im 0.0+1.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 1.0+0.0im 0.0+1.0im ``` The remaining todo is to audit the uses of reinterpret in base. I've fixed up the uses themselves, but there's code deeper in the array code that needs to be broadened to allow ReinterpretArray. Fixes #22849 Fixes #19238

@inbounds

This redoes `reinterpret` in julia rather than punning the memory of the actual array. The motivation for this is to avoid the API limitations of the current reinterpret implementation (Array only, preventing strong TBAA, alignment problems). The surface API essentially unchanged, though the shape argument to reinterpret is removed, since those concepts are now orthogonal. The return type from `reinterpret` is now `ReinterpretArray`, which implements the AbstractArray interface and does the reinterpreting lazily on demand. The compiler is able to fold away the abstraction and generate very tight IR: ``` julia> ar = reinterpret(Complex{Int64}, rand(Int64, 1000)); julia> typeof(ar) Base.ReinterpretArray{Complex{Int64},Int64,1,Array{Int64,1}} julia> f(ar) = @inbounds return ar[1] f (generic function with 1 method) julia> @code_llvm f(ar) ; Function f ; Location: REPL[2] define void @julia_f_63575({ i64, i64 } addrspace(11)* noalias nocapture sret, %jl_value_t addrspace(10)* dereferenceable(8)) #0 { top: ; Location: REPL[2]:1 ; Function getindex; { ; Location: reinterpretarray.jl:31 %2 = addrspacecast %jl_value_t addrspace(10)* %1 to %jl_value_t addrspace(11)* %3 = bitcast %jl_value_t addrspace(11)* %2 to %jl_value_t addrspace(10)* addrspace(11)* %4 = load %jl_value_t addrspace(10)*, %jl_value_t addrspace(10)* addrspace(11)* %3, align 8 %5 = addrspacecast %jl_value_t addrspace(10)* %4 to %jl_value_t addrspace(11)* %6 = bitcast %jl_value_t addrspace(11)* %5 to i64* addrspace(11)* %7 = load i64*, i64* addrspace(11)* %6, align 8 %8 = load i64, i64* %7, align 8 %9 = getelementptr i64, i64* %7, i64 1 %10 = load i64, i64* %9, align 8 %.sroa.0.0..sroa_idx = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 0 store i64 %8, i64 addrspace(11)* %.sroa.0.0..sroa_idx, align 8 %.sroa.3.0..sroa_idx13 = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 1 store i64 %10, i64 addrspace(11)* %.sroa.3.0..sroa_idx13, align 8 ;} ret void } julia> g(a) = @inbounds return reinterpret(Complex{Int64}, a)[1] g (generic function with 1 method) julia> @code_llvm g(randn(1000)) ; Function g ; Location: REPL[4] define void @julia_g_63642({ i64, i64 } addrspace(11)* noalias nocapture sret, %jl_value_t addrspace(10)* dereferenceable(40)) #0 { top: ; Location: REPL[4]:1 ; Function getindex; { ; Location: reinterpretarray.jl:31 %2 = addrspacecast %jl_value_t addrspace(10)* %1 to %jl_value_t addrspace(11)* %3 = bitcast %jl_value_t addrspace(11)* %2 to double* addrspace(11)* %4 = load double*, double* addrspace(11)* %3, align 8 %5 = bitcast double* %4 to i64* %6 = load i64, i64* %5, align 8 %7 = getelementptr double, double* %4, i64 1 %8 = bitcast double* %7 to i64* %9 = load i64, i64* %8, align 8 %.sroa.0.0..sroa_idx = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 0 store i64 %6, i64 addrspace(11)* %.sroa.0.0..sroa_idx, align 8 %.sroa.3.0..sroa_idx13 = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 1 store i64 %9, i64 addrspace(11)* %.sroa.3.0..sroa_idx13, align 8 ;} ret void } ``` In addition, the new `reinterpret` implementation is able to handle any AbstractArray (whether useful or not is a separate decision): ``` invoke(reinterpret, Tuple{Type{Complex{Float64}}, AbstractArray}, Complex{Float64}, speye(10)) 5×10 Base.ReinterpretArray{Complex{Float64},Float64,2,SparseMatrixCSC{Float64,Int64}}: 1.0+0.0im 0.0+1.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 1.0+0.0im 0.0+1.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 1.0+0.0im 0.0+1.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 1.0+0.0im 0.0+1.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 1.0+0.0im 0.0+1.0im ``` The remaining todo is to audit the uses of reinterpret in base. I've fixed up the uses themselves, but there's code deeper in the array code that needs to be broadened to allow ReinterpretArray. Fixes #22849 Fixes #19238

@inbounds

This redoes `reinterpret` in julia rather than punning the memory of the actual array. The motivation for this is to avoid the API limitations of the current reinterpret implementation (Array only, preventing strong TBAA, alignment problems). The surface API essentially unchanged, though the shape argument to reinterpret is removed, since those concepts are now orthogonal. The return type from `reinterpret` is now `ReinterpretArray`, which implements the AbstractArray interface and does the reinterpreting lazily on demand. The compiler is able to fold away the abstraction and generate very tight IR: ``` julia> ar = reinterpret(Complex{Int64}, rand(Int64, 1000)); julia> typeof(ar) Base.ReinterpretArray{Complex{Int64},Int64,1,Array{Int64,1}} julia> f(ar) = @inbounds return ar[1] f (generic function with 1 method) julia> @code_llvm f(ar) ; Function f ; Location: REPL[2] define void @julia_f_63575({ i64, i64 } addrspace(11)* noalias nocapture sret, %jl_value_t addrspace(10)* dereferenceable(8)) #0 { top: ; Location: REPL[2]:1 ; Function getindex; { ; Location: reinterpretarray.jl:31 %2 = addrspacecast %jl_value_t addrspace(10)* %1 to %jl_value_t addrspace(11)* %3 = bitcast %jl_value_t addrspace(11)* %2 to %jl_value_t addrspace(10)* addrspace(11)* %4 = load %jl_value_t addrspace(10)*, %jl_value_t addrspace(10)* addrspace(11)* %3, align 8 %5 = addrspacecast %jl_value_t addrspace(10)* %4 to %jl_value_t addrspace(11)* %6 = bitcast %jl_value_t addrspace(11)* %5 to i64* addrspace(11)* %7 = load i64*, i64* addrspace(11)* %6, align 8 %8 = load i64, i64* %7, align 8 %9 = getelementptr i64, i64* %7, i64 1 %10 = load i64, i64* %9, align 8 %.sroa.0.0..sroa_idx = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 0 store i64 %8, i64 addrspace(11)* %.sroa.0.0..sroa_idx, align 8 %.sroa.3.0..sroa_idx13 = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 1 store i64 %10, i64 addrspace(11)* %.sroa.3.0..sroa_idx13, align 8 ;} ret void } julia> g(a) = @inbounds return reinterpret(Complex{Int64}, a)[1] g (generic function with 1 method) julia> @code_llvm g(randn(1000)) ; Function g ; Location: REPL[4] define void @julia_g_63642({ i64, i64 } addrspace(11)* noalias nocapture sret, %jl_value_t addrspace(10)* dereferenceable(40)) #0 { top: ; Location: REPL[4]:1 ; Function getindex; { ; Location: reinterpretarray.jl:31 %2 = addrspacecast %jl_value_t addrspace(10)* %1 to %jl_value_t addrspace(11)* %3 = bitcast %jl_value_t addrspace(11)* %2 to double* addrspace(11)* %4 = load double*, double* addrspace(11)* %3, align 8 %5 = bitcast double* %4 to i64* %6 = load i64, i64* %5, align 8 %7 = getelementptr double, double* %4, i64 1 %8 = bitcast double* %7 to i64* %9 = load i64, i64* %8, align 8 %.sroa.0.0..sroa_idx = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 0 store i64 %6, i64 addrspace(11)* %.sroa.0.0..sroa_idx, align 8 %.sroa.3.0..sroa_idx13 = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 1 store i64 %9, i64 addrspace(11)* %.sroa.3.0..sroa_idx13, align 8 ;} ret void } ``` In addition, the new `reinterpret` implementation is able to handle any AbstractArray (whether useful or not is a separate decision): ``` invoke(reinterpret, Tuple{Type{Complex{Float64}}, AbstractArray}, Complex{Float64}, speye(10)) 5×10 Base.ReinterpretArray{Complex{Float64},Float64,2,SparseMatrixCSC{Float64,Int64}}: 1.0+0.0im 0.0+1.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 1.0+0.0im 0.0+1.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 1.0+0.0im 0.0+1.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 1.0+0.0im 0.0+1.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 1.0+0.0im 0.0+1.0im ``` The remaining todo is to audit the uses of reinterpret in base. I've fixed up the uses themselves, but there's code deeper in the array code that needs to be broadened to allow ReinterpretArray. Fixes #22849 Fixes #19238

@inbounds

This redoes `reinterpret` in julia rather than punning the memory of the actual array. The motivation for this is to avoid the API limitations of the current reinterpret implementation (Array only, preventing strong TBAA, alignment problems). The surface API essentially unchanged, though the shape argument to reinterpret is removed, since those concepts are now orthogonal. The return type from `reinterpret` is now `ReinterpretArray`, which implements the AbstractArray interface and does the reinterpreting lazily on demand. The compiler is able to fold away the abstraction and generate very tight IR: ``` julia> ar = reinterpret(Complex{Int64}, rand(Int64, 1000)); julia> typeof(ar) Base.ReinterpretArray{Complex{Int64},Int64,1,Array{Int64,1}} julia> f(ar) = @inbounds return ar[1] f (generic function with 1 method) julia> @code_llvm f(ar) ; Function f ; Location: REPL[2] define void @julia_f_63575({ i64, i64 } addrspace(11)* noalias nocapture sret, %jl_value_t addrspace(10)* dereferenceable(8)) #0 { top: ; Location: REPL[2]:1 ; Function getindex; { ; Location: reinterpretarray.jl:31 %2 = addrspacecast %jl_value_t addrspace(10)* %1 to %jl_value_t addrspace(11)* %3 = bitcast %jl_value_t addrspace(11)* %2 to %jl_value_t addrspace(10)* addrspace(11)* %4 = load %jl_value_t addrspace(10)*, %jl_value_t addrspace(10)* addrspace(11)* %3, align 8 %5 = addrspacecast %jl_value_t addrspace(10)* %4 to %jl_value_t addrspace(11)* %6 = bitcast %jl_value_t addrspace(11)* %5 to i64* addrspace(11)* %7 = load i64*, i64* addrspace(11)* %6, align 8 %8 = load i64, i64* %7, align 8 %9 = getelementptr i64, i64* %7, i64 1 %10 = load i64, i64* %9, align 8 %.sroa.0.0..sroa_idx = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 0 store i64 %8, i64 addrspace(11)* %.sroa.0.0..sroa_idx, align 8 %.sroa.3.0..sroa_idx13 = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 1 store i64 %10, i64 addrspace(11)* %.sroa.3.0..sroa_idx13, align 8 ;} ret void } julia> g(a) = @inbounds return reinterpret(Complex{Int64}, a)[1] g (generic function with 1 method) julia> @code_llvm g(randn(1000)) ; Function g ; Location: REPL[4] define void @julia_g_63642({ i64, i64 } addrspace(11)* noalias nocapture sret, %jl_value_t addrspace(10)* dereferenceable(40)) #0 { top: ; Location: REPL[4]:1 ; Function getindex; { ; Location: reinterpretarray.jl:31 %2 = addrspacecast %jl_value_t addrspace(10)* %1 to %jl_value_t addrspace(11)* %3 = bitcast %jl_value_t addrspace(11)* %2 to double* addrspace(11)* %4 = load double*, double* addrspace(11)* %3, align 8 %5 = bitcast double* %4 to i64* %6 = load i64, i64* %5, align 8 %7 = getelementptr double, double* %4, i64 1 %8 = bitcast double* %7 to i64* %9 = load i64, i64* %8, align 8 %.sroa.0.0..sroa_idx = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 0 store i64 %6, i64 addrspace(11)* %.sroa.0.0..sroa_idx, align 8 %.sroa.3.0..sroa_idx13 = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 1 store i64 %9, i64 addrspace(11)* %.sroa.3.0..sroa_idx13, align 8 ;} ret void } ``` In addition, the new `reinterpret` implementation is able to handle any AbstractArray (whether useful or not is a separate decision): ``` invoke(reinterpret, Tuple{Type{Complex{Float64}}, AbstractArray}, Complex{Float64}, speye(10)) 5×10 Base.ReinterpretArray{Complex{Float64},Float64,2,SparseMatrixCSC{Float64,Int64}}: 1.0+0.0im 0.0+1.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 1.0+0.0im 0.0+1.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 1.0+0.0im 0.0+1.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 1.0+0.0im 0.0+1.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 1.0+0.0im 0.0+1.0im ``` The remaining todo is to audit the uses of reinterpret in base. I've fixed up the uses themselves, but there's code deeper in the array code that needs to be broadened to allow ReinterpretArray. Fixes #22849 Fixes #19238

@inbounds

This redoes `reinterpret` in julia rather than punning the memory of the actual array. The motivation for this is to avoid the API limitations of the current reinterpret implementation (Array only, preventing strong TBAA, alignment problems). The surface API essentially unchanged, though the shape argument to reinterpret is removed, since those concepts are now orthogonal. The return type from `reinterpret` is now `ReinterpretArray`, which implements the AbstractArray interface and does the reinterpreting lazily on demand. The compiler is able to fold away the abstraction and generate very tight IR: ``` julia> ar = reinterpret(Complex{Int64}, rand(Int64, 1000)); julia> typeof(ar) Base.ReinterpretArray{Complex{Int64},Int64,1,Array{Int64,1}} julia> f(ar) = @inbounds return ar[1] f (generic function with 1 method) julia> @code_llvm f(ar) ; Function f ; Location: REPL[2] define void @julia_f_63575({ i64, i64 } addrspace(11)* noalias nocapture sret, %jl_value_t addrspace(10)* dereferenceable(8)) #0 { top: ; Location: REPL[2]:1 ; Function getindex; { ; Location: reinterpretarray.jl:31 %2 = addrspacecast %jl_value_t addrspace(10)* %1 to %jl_value_t addrspace(11)* %3 = bitcast %jl_value_t addrspace(11)* %2 to %jl_value_t addrspace(10)* addrspace(11)* %4 = load %jl_value_t addrspace(10)*, %jl_value_t addrspace(10)* addrspace(11)* %3, align 8 %5 = addrspacecast %jl_value_t addrspace(10)* %4 to %jl_value_t addrspace(11)* %6 = bitcast %jl_value_t addrspace(11)* %5 to i64* addrspace(11)* %7 = load i64*, i64* addrspace(11)* %6, align 8 %8 = load i64, i64* %7, align 8 %9 = getelementptr i64, i64* %7, i64 1 %10 = load i64, i64* %9, align 8 %.sroa.0.0..sroa_idx = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 0 store i64 %8, i64 addrspace(11)* %.sroa.0.0..sroa_idx, align 8 %.sroa.3.0..sroa_idx13 = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 1 store i64 %10, i64 addrspace(11)* %.sroa.3.0..sroa_idx13, align 8 ;} ret void } julia> g(a) = @inbounds return reinterpret(Complex{Int64}, a)[1] g (generic function with 1 method) julia> @code_llvm g(randn(1000)) ; Function g ; Location: REPL[4] define void @julia_g_63642({ i64, i64 } addrspace(11)* noalias nocapture sret, %jl_value_t addrspace(10)* dereferenceable(40)) #0 { top: ; Location: REPL[4]:1 ; Function getindex; { ; Location: reinterpretarray.jl:31 %2 = addrspacecast %jl_value_t addrspace(10)* %1 to %jl_value_t addrspace(11)* %3 = bitcast %jl_value_t addrspace(11)* %2 to double* addrspace(11)* %4 = load double*, double* addrspace(11)* %3, align 8 %5 = bitcast double* %4 to i64* %6 = load i64, i64* %5, align 8 %7 = getelementptr double, double* %4, i64 1 %8 = bitcast double* %7 to i64* %9 = load i64, i64* %8, align 8 %.sroa.0.0..sroa_idx = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 0 store i64 %6, i64 addrspace(11)* %.sroa.0.0..sroa_idx, align 8 %.sroa.3.0..sroa_idx13 = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 1 store i64 %9, i64 addrspace(11)* %.sroa.3.0..sroa_idx13, align 8 ;} ret void } ``` In addition, the new `reinterpret` implementation is able to handle any AbstractArray (whether useful or not is a separate decision): ``` invoke(reinterpret, Tuple{Type{Complex{Float64}}, AbstractArray}, Complex{Float64}, speye(10)) 5×10 Base.ReinterpretArray{Complex{Float64},Float64,2,SparseMatrixCSC{Float64,Int64}}: 1.0+0.0im 0.0+1.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 1.0+0.0im 0.0+1.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 1.0+0.0im 0.0+1.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 1.0+0.0im 0.0+1.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 1.0+0.0im 0.0+1.0im ``` The remaining todo is to audit the uses of reinterpret in base. I've fixed up the uses themselves, but there's code deeper in the array code that needs to be broadened to allow ReinterpretArray. Fixes #22849 Fixes #19238

@inbounds

This redoes `reinterpret` in julia rather than punning the memory of the actual array. The motivation for this is to avoid the API limitations of the current reinterpret implementation (Array only, preventing strong TBAA, alignment problems). The surface API essentially unchanged, though the shape argument to reinterpret is removed, since those concepts are now orthogonal. The return type from `reinterpret` is now `ReinterpretArray`, which implements the AbstractArray interface and does the reinterpreting lazily on demand. The compiler is able to fold away the abstraction and generate very tight IR: ``` julia> ar = reinterpret(Complex{Int64}, rand(Int64, 1000)); julia> typeof(ar) Base.ReinterpretArray{Complex{Int64},Int64,1,Array{Int64,1}} julia> f(ar) = @inbounds return ar[1] f (generic function with 1 method) julia> @code_llvm f(ar) ; Function f ; Location: REPL[2] define void @julia_f_63575({ i64, i64 } addrspace(11)* noalias nocapture sret, %jl_value_t addrspace(10)* dereferenceable(8)) #0 { top: ; Location: REPL[2]:1 ; Function getindex; { ; Location: reinterpretarray.jl:31 %2 = addrspacecast %jl_value_t addrspace(10)* %1 to %jl_value_t addrspace(11)* %3 = bitcast %jl_value_t addrspace(11)* %2 to %jl_value_t addrspace(10)* addrspace(11)* %4 = load %jl_value_t addrspace(10)*, %jl_value_t addrspace(10)* addrspace(11)* %3, align 8 %5 = addrspacecast %jl_value_t addrspace(10)* %4 to %jl_value_t addrspace(11)* %6 = bitcast %jl_value_t addrspace(11)* %5 to i64* addrspace(11)* %7 = load i64*, i64* addrspace(11)* %6, align 8 %8 = load i64, i64* %7, align 8 %9 = getelementptr i64, i64* %7, i64 1 %10 = load i64, i64* %9, align 8 %.sroa.0.0..sroa_idx = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 0 store i64 %8, i64 addrspace(11)* %.sroa.0.0..sroa_idx, align 8 %.sroa.3.0..sroa_idx13 = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 1 store i64 %10, i64 addrspace(11)* %.sroa.3.0..sroa_idx13, align 8 ;} ret void } julia> g(a) = @inbounds return reinterpret(Complex{Int64}, a)[1] g (generic function with 1 method) julia> @code_llvm g(randn(1000)) ; Function g ; Location: REPL[4] define void @julia_g_63642({ i64, i64 } addrspace(11)* noalias nocapture sret, %jl_value_t addrspace(10)* dereferenceable(40)) #0 { top: ; Location: REPL[4]:1 ; Function getindex; { ; Location: reinterpretarray.jl:31 %2 = addrspacecast %jl_value_t addrspace(10)* %1 to %jl_value_t addrspace(11)* %3 = bitcast %jl_value_t addrspace(11)* %2 to double* addrspace(11)* %4 = load double*, double* addrspace(11)* %3, align 8 %5 = bitcast double* %4 to i64* %6 = load i64, i64* %5, align 8 %7 = getelementptr double, double* %4, i64 1 %8 = bitcast double* %7 to i64* %9 = load i64, i64* %8, align 8 %.sroa.0.0..sroa_idx = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 0 store i64 %6, i64 addrspace(11)* %.sroa.0.0..sroa_idx, align 8 %.sroa.3.0..sroa_idx13 = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 1 store i64 %9, i64 addrspace(11)* %.sroa.3.0..sroa_idx13, align 8 ;} ret void } ``` In addition, the new `reinterpret` implementation is able to handle any AbstractArray (whether useful or not is a separate decision): ``` invoke(reinterpret, Tuple{Type{Complex{Float64}}, AbstractArray}, Complex{Float64}, speye(10)) 5×10 Base.ReinterpretArray{Complex{Float64},Float64,2,SparseMatrixCSC{Float64,Int64}}: 1.0+0.0im 0.0+1.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 1.0+0.0im 0.0+1.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 1.0+0.0im 0.0+1.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 1.0+0.0im 0.0+1.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 1.0+0.0im 0.0+1.0im ``` The remaining todo is to audit the uses of reinterpret in base. I've fixed up the uses themselves, but there's code deeper in the array code that needs to be broadened to allow ReinterpretArray. Fixes #22849 Fixes #19238

mschauer · 2017-12-09T12:22:14Z

For myself, and maybe others with the same initial reception of this:

TBAA is "type-based alias analysis", https://en.wikipedia.org/wiki/Alias_analysis
If two variables of different types cannot share the same memory location simultaneously, code can optimised in a different way.
The old reinterpret implementation breaks this, its existence prevents those optimisations and makes thus completely unrelated code slower.

Keno added this to the 1.0 milestone Jul 17, 2017

Keno changed the title ~~Get rid of reinterpret in it's current form~~ Get rid of reinterpret in its current form Jul 17, 2017

kshyatt added the types and dispatch Types, subtyping and method dispatch label Jul 18, 2017

JeffBezanson assigned Keno Aug 10, 2017

StefanKarpinski added the arrays [a, r, r, a, y, s] label Aug 31, 2017

ajkeller34 mentioned this issue Sep 7, 2017

Adds overloads for rand and ones PainterQubits/Unitful.jl#96

Merged

Keno mentioned this issue Sep 18, 2017

Implement ReinterpretArray #23750

Merged

JeffBezanson added compiler:codegen Generation of LLVM IR and native code and removed types and dispatch Types, subtyping and method dispatch labels Sep 19, 2017

Keno closed this as completed in #23750 Oct 9, 2017

albop mentioned this issue Dec 21, 2017

ENH: special object for decision rules. EconForge/Dolo.jl#126

Closed

mschauer mentioned this issue May 22, 2018

Reinterpret an Array of Float64 as an Array of SVector{Float64} ? JuliaArrays/StaticArrays.jl#410

Closed

JeffFessler mentioned this issue Nov 21, 2022

Why is reinterpret prohibited? JuliaSparse/SparseArrays.jl#289

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Get rid of reinterpret in its current form #22849

Get rid of reinterpret in its current form #22849

Keno commented Jul 17, 2017

vtjnash commented Jul 17, 2017

Keno commented Jul 17, 2017

vtjnash commented Jul 17, 2017

Keno commented Jul 18, 2017

lobingera commented Jul 18, 2017

Keno commented Jul 18, 2017

lobingera commented Jul 18, 2017

Keno commented Jul 18, 2017

iamed2 commented Jul 26, 2017

vtjnash commented Aug 7, 2017

StefanKarpinski commented Aug 20, 2017

yuyichao commented Aug 20, 2017 •

edited

Loading

StefanKarpinski commented Aug 20, 2017

yuyichao commented Aug 20, 2017

StefanKarpinski commented Aug 21, 2017

vtjnash commented Aug 21, 2017

yuyichao commented Aug 21, 2017

StefanKarpinski commented Aug 21, 2017

Keno commented Aug 21, 2017

StefanKarpinski commented Aug 22, 2017

Keno commented Aug 22, 2017

StefanKarpinski commented Aug 31, 2017

Keno commented Sep 1, 2017

mschauer commented Dec 9, 2017

Get rid of reinterpret in its current form #22849

Get rid of reinterpret in its current form #22849

Comments

Keno commented Jul 17, 2017

vtjnash commented Jul 17, 2017

Keno commented Jul 17, 2017

vtjnash commented Jul 17, 2017

Keno commented Jul 18, 2017

lobingera commented Jul 18, 2017

Keno commented Jul 18, 2017

lobingera commented Jul 18, 2017

Keno commented Jul 18, 2017

iamed2 commented Jul 26, 2017

vtjnash commented Aug 7, 2017

StefanKarpinski commented Aug 20, 2017

yuyichao commented Aug 20, 2017 • edited Loading

StefanKarpinski commented Aug 20, 2017

yuyichao commented Aug 20, 2017

StefanKarpinski commented Aug 21, 2017

vtjnash commented Aug 21, 2017

yuyichao commented Aug 21, 2017

StefanKarpinski commented Aug 21, 2017

Keno commented Aug 21, 2017

StefanKarpinski commented Aug 22, 2017

Keno commented Aug 22, 2017

StefanKarpinski commented Aug 31, 2017

Keno commented Sep 1, 2017

mschauer commented Dec 9, 2017

yuyichao commented Aug 20, 2017 •

edited

Loading