Improve reinterpret #27213

Keno · 2018-05-23T02:04:19Z

This fixes two issues with reinterpret. As always, the commit messages have details.

The first commit fixes #25908, by making the problematic reinterprets an error. We can choose more lax semantics at a later point, but this seems like the right behavior for 1.0.
The second fixes #25014 by adding a special case for reinterpret(T, ::Array). We should also investigate why LLVM stopped being able to optimize this, but that's for another time.

vtjnash · 2018-05-23T06:37:39Z

base/reinterpretarray.jl

+end
+
+using .Iterators: Stateful
+@pure function array_subpadding(S, T)


Stateful isn't pure and will break inference

andyferris · 2018-05-23T09:34:21Z

base/reinterpretarray.jl

+
+# Special case for StridedArray
+reinterpret_alignment(::Type{ReinterpretArray{T,N,S,A}} where {N, S, A<:Array}) where {T} = datatype_alignment(T)
+reinterpret_alignment(::Type{<:ReinterpretArray}) = gcd(datatype_alignment(T), datatype_alignment(S))


What’s T, S?

If this is indeed a bug (seems very likely), it would be good to have a test case that would catch it.

andyferris · 2018-05-24T06:50:20Z

EDIT: scratch that, bad benchmark, I'll try again sorry.

andyferris · 2018-05-24T23:03:23Z

I can confirm that this improves performance significantly for a few of my simpler use cases that I checked (such as finding the means and covariance matrices of vectors of 3-vectors). I did observe that there is sometimes still a small, but tolerable, speed difference between Array and ReinterpretArray (EDIT: but no difference in many cases).

(I didn't dig in too deeply but I did observe that iteration over ReinterpretArray might not be quite as fast as iterating over the indices and doing getindex, (all within @inbounds). Sorry if that's a red herring but I got a 20% speedup by doing that for ReinterpretArray, but it didn't make a difference for Array)

Keno · 2018-05-24T23:05:13Z

If there's a benchmark you care about, ideally add it to BaseBenchmarks, so it'll get tracked.

andyferris · 2018-05-24T23:09:31Z

True - I guess we can find something roughly equivalent that doesn't need using StaticArrays (I don't think including that would be a good idea).

In #25908 it was noted that reinterpreting structures with paddings exposes undef LLVM values to user code. This is problematic, because an LLVM undef value is quite dangerous (it can have a different value at every use, e.g. for `a::Bool` undef, we can have `a || !a == true`. There are proposal in LLVM to create values that are merely arbitrary (but the same at every use), but that capability does not currently exist in LLVM. As such, we should try hard to prevent `undef` showing up in a user-visible way. There are several ways to fix this: 1. Wait until LLVM comes up with a safer `undef` and have the value merely be arbitrary, but not dangerous. 2. Always guarantee that padding bytes will be 0. 3. For contiguous-memory arrays, guarantee that we end up with the underlying bytes from that array. However, for now, I think don't think we should make a choice here. Issues like #21912, may play into the consideration, and I think we should be able to reserve making a choice until that point. So what this PR does is only allow reinterprets when they would not expose padding. This should hopefully cover the most common use cases of reinterpret: - Reinterpreting a vector or matrix of values to StaticVectors of the same element type. These should generally always have compatiable padding (if not, reinterpret was likely the wrong API to use). - Reinterpreting from a Vector{UInt8} to a vector of structs (that may have padding). This PR allows this for reading (but not for writing). Both cases are generally better served by the IO APIs, but hopefully this should still allow the common cases. Fixes #25908

When I originally wrote the new ReinterpretArray code, I made sure that LLVM was able to optimize reinterpret(::Array) back to a single memory access with appropriate TBAA and alignment info. Somewhere along the line LLVM lost that ability. While we should try to recover that capability in LLVM, that showed that that is a relatively brittle optimization for a very simple operation. So this patch takes a different approach: We add two new intrinsics `tbaa_pointerref` and `tbaa_pointerset` that behave like their non-TBAA variants, but additionally take a type to use as the TBAA tag. This allows us to write a special case for `reinterpret(T, ::Array)` that directly emits the correct pointer access. It's also a model for what a post-1.0 pure Julia implementation of `Array` (e.g. on top of a buffer type) may look like. Fixes #25014

vtjnash · 2018-05-25T15:21:56Z

I don't think we should implement new intrinsics just before the release. After the release, we can consider whether the complexity is worthwhile, but:

but additionally take a type to use as the TBAA tag

That's not what a TBAA tag is. TBAA is a path-based constraint on what type of pointer you might have. The Julia type tag is just one of the possible sources of that root of that information. But we also might include additional information about the aliasing set (where the pointer was allocated). That tag might even change due to a reinterpret or field access, while a TBAA annotation on a subsequent load would need to use the tag from the original object allocation.

vtjnash

Remove @pure and tbaa usages

Keno · 2018-05-25T16:01:42Z

That's not what a TBAA tag is. TBAA is a path-based constraint on what type of pointer you might have. The Julia type tag is just one of the possible sources of that root of that information. But we also might include additional information about the aliasing set (where the pointer was allocated). That tag might even change due to a reinterpret or field access, while a TBAA annotation on a subsequent load would need to use the tag from the original object allocation.

I do know what TBAA is. The whole point of the reinterpret changes was to be able to separate out the TBAA for the different types of arrays. If you look through this change again, you will see that it indeed uses the original allocated type of the array as the source of the TBAA information.

Keno · 2018-05-25T17:44:54Z

As for @pure, I've added a constructor to Stateful that does not call back into inference and used that. I assume that was your objection. If it was, not, please explain what it was and suggest how to improve it.

favba · 2018-06-19T13:16:05Z

Will this be part of v7.0? Or will it be postponed to v1.x?

StefanKarpinski · 2018-06-19T13:56:48Z

It's a breaking change (IIUC) so it seems like it would be good to get it in if possible.

andyferris · 2018-06-20T23:42:27Z

The performance regression this fixes is pretty catastrophic for my workloads, so it would be really great if we could get this in :)

KristofferC · 2018-06-26T10:32:14Z

Bump, should this be triaged? I also have code that is now bad performance and not sure how to rewrite it to be as efficient as it used to be.

Keno · 2018-06-28T19:15:02Z

Repeating my request from triage: @vtjnash If you think there is a case in which the current implementation would cause bugs or other incorrect behavior, please say what that case is. Otherwise, I think we can merge this.

vtjnash · 2018-06-28T19:21:42Z

Please split the breaking changes from the performance concerns

JeffBezanson · 2018-06-28T19:30:31Z

Agreed, that's always for the best.

Keno · 2018-06-28T19:32:18Z

There's two commits in this PR, one for the breaking changes, one for the performance concerns.

Keno · 2018-06-29T17:33:13Z

I suggested to Jameson to replace the tbaa intrinsic by an intrinsic that takes the array as well as an element type and a byte offset. I'll split this PR to get the breaking change in first in a separate PR and rework the performance part.

JeffBezanson · 2018-07-12T19:06:23Z

I don't think this is blocking for 0.7.

KristofferC · 2018-07-18T08:56:56Z

So for 0.7, what is the upgrade path? Use pointers and unsafe_load + unsafe_store!?

andyferris · 2018-07-18T10:03:47Z

I agree - we need a viable workaround, please.

vtjnash · 2018-07-18T14:04:58Z

We're currently hoping that LLVM recognizes that:

julia/base/reinterpretarray.jl

Lines 120 to 133 in 5369849

    
           nbytes_copied = 0 
        
           # This is a bit complicated to deal with partial elements 
        
           # at both the start and the end. LLVM will fold as appropriate, 
        
           # once it knows the data layout 
        
           while nbytes_copied < sizeof(T) 
        
               s[] = a.parent[ind_start + i, tailinds...] 
        
               while nbytes_copied < sizeof(T) && sidx < sizeof(S) 
        
                   unsafe_store!(tptr, unsafe_load(sptr, sidx + 1), nbytes_copied + 1) 
        
                   sidx += 1 
        
                   nbytes_copied += 1 
        
               end 
        
               sidx = 0 
        
               i += 1 
        
           end

is actually a call to memcpy (which it then is optimized to handle), maybe we could just make that more explicit to it?

nbytes_copied = 0 
# This is a bit complicated to deal with partial elements 
# at both the start and the end. LLVM will fold as appropriate, 
# once it knows the data layout 
while nbytes_copied < sizeof(T)
    s[] = a.parent[ind_start + i, tailinds...]
    tocopy = min(sizeof(T) - nbytes_copied, sizeof(S) - sidx)
    unsafe_copy!(tptr + nbytes_copied, sptr + sidx, tocopy)
    nbytes_copied += tocopy
    sidx = 0
    i += 1
end

Keno · 2018-07-19T18:37:55Z

We've done the breaking changes here. I still plan to do some work on performance before 0.7. Closing this since it's outdated, but leave the branch around for reference.

vtjnash reviewed May 23, 2018

View reviewed changes

mschauer mentioned this pull request May 23, 2018

Reinterpret an Array of Float64 as an Array of SVector{Float64} ? JuliaArrays/StaticArrays.jl#410

Closed

andyferris reviewed May 23, 2018

View reviewed changes

Keno added 2 commits May 24, 2018 19:12

Keno force-pushed the kf/reinterpretredux branch from 65fb7f2 to ddf6d9f Compare May 24, 2018 23:13

vtjnash requested changes May 25, 2018

View reviewed changes

kshyatt added the types and dispatch Types, subtyping and method dispatch label May 27, 2018

StefanKarpinski added the triage This should be discussed on a triage call label Jun 26, 2018

Keno mentioned this pull request Jun 30, 2018

Don't allow reinterprets that would expose padding #27877

Merged

timholy mentioned this pull request Jul 2, 2018

ReinterpretArray and IndexStyle #27909

Merged

JeffBezanson added this to the 1.0 milestone Jul 12, 2018

Keno closed this Jul 19, 2018

Keno removed the triage This should be discussed on a triage call label Jul 19, 2018

JonasIsensee mentioned this pull request Jul 20, 2018

Throw a warning when creating Dataset of D > 100 JuliaDynamics/DynamicalSystemsBase.jl#70

Closed

chethega mentioned this pull request Sep 3, 2018

Performance of ReinterpretArray, continued #28980

Closed

DilumAluthge deleted the kf/reinterpretredux branch March 25, 2021 22:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve reinterpret #27213

Improve reinterpret #27213

Keno commented May 23, 2018

vtjnash May 23, 2018

andyferris May 23, 2018

StefanKarpinski May 23, 2018

andyferris commented May 24, 2018 •

edited

Loading

andyferris commented May 24, 2018 •

edited

Loading

Keno commented May 24, 2018

andyferris commented May 24, 2018

vtjnash commented May 25, 2018

vtjnash left a comment

Keno commented May 25, 2018

Keno commented May 25, 2018

favba commented Jun 19, 2018

StefanKarpinski commented Jun 19, 2018 •

edited

Loading

andyferris commented Jun 20, 2018

KristofferC commented Jun 26, 2018

Keno commented Jun 28, 2018

vtjnash commented Jun 28, 2018

JeffBezanson commented Jun 28, 2018

Keno commented Jun 28, 2018

Keno commented Jun 29, 2018

JeffBezanson commented Jul 12, 2018

KristofferC commented Jul 18, 2018

andyferris commented Jul 18, 2018

vtjnash commented Jul 18, 2018

Keno commented Jul 19, 2018

Improve reinterpret #27213

Improve reinterpret #27213

Conversation

Keno commented May 23, 2018

vtjnash May 23, 2018

Choose a reason for hiding this comment

andyferris May 23, 2018

Choose a reason for hiding this comment

StefanKarpinski May 23, 2018

Choose a reason for hiding this comment

andyferris commented May 24, 2018 • edited Loading

andyferris commented May 24, 2018 • edited Loading

Keno commented May 24, 2018

andyferris commented May 24, 2018

vtjnash commented May 25, 2018

vtjnash left a comment

Choose a reason for hiding this comment

Keno commented May 25, 2018

Keno commented May 25, 2018

favba commented Jun 19, 2018

StefanKarpinski commented Jun 19, 2018 • edited Loading

andyferris commented Jun 20, 2018

KristofferC commented Jun 26, 2018

Keno commented Jun 28, 2018

vtjnash commented Jun 28, 2018

JeffBezanson commented Jun 28, 2018

Keno commented Jun 28, 2018

Keno commented Jun 29, 2018

JeffBezanson commented Jul 12, 2018

KristofferC commented Jul 18, 2018

andyferris commented Jul 18, 2018

vtjnash commented Jul 18, 2018

Keno commented Jul 19, 2018

andyferris commented May 24, 2018 •

edited

Loading

andyferris commented May 24, 2018 •

edited

Loading

StefanKarpinski commented Jun 19, 2018 •

edited

Loading