Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve reinterpret #27213

Closed
wants to merge 2 commits into from
Closed

Improve reinterpret #27213

wants to merge 2 commits into from

Conversation

Keno
Copy link
Member

@Keno Keno commented May 23, 2018

This fixes two issues with reinterpret. As always, the commit messages have details.

The first commit fixes #25908, by making the problematic reinterprets an error. We can choose more lax semantics at a later point, but this seems like the right behavior for 1.0.
The second fixes #25014 by adding a special case for reinterpret(T, ::Array). We should also investigate why LLVM stopped being able to optimize this, but that's for another time.

end

using .Iterators: Stateful
@pure function array_subpadding(S, T)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stateful isn't pure and will break inference


# Special case for StridedArray
reinterpret_alignment(::Type{ReinterpretArray{T,N,S,A}} where {N, S, A<:Array}) where {T} = datatype_alignment(T)
reinterpret_alignment(::Type{<:ReinterpretArray}) = gcd(datatype_alignment(T), datatype_alignment(S))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What’s T, S?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is indeed a bug (seems very likely), it would be good to have a test case that would catch it.

@andyferris
Copy link
Member

andyferris commented May 24, 2018

EDIT: scratch that, bad benchmark, I'll try again sorry.

@andyferris
Copy link
Member

andyferris commented May 24, 2018

I can confirm that this improves performance significantly for a few of my simpler use cases that I checked (such as finding the means and covariance matrices of vectors of 3-vectors). I did observe that there is sometimes still a small, but tolerable, speed difference between Array and ReinterpretArray (EDIT: but no difference in many cases).

(I didn't dig in too deeply but I did observe that iteration over ReinterpretArray might not be quite as fast as iterating over the indices and doing getindex, (all within @inbounds). Sorry if that's a red herring but I got a 20% speedup by doing that for ReinterpretArray, but it didn't make a difference for Array)

@Keno
Copy link
Member Author

Keno commented May 24, 2018

If there's a benchmark you care about, ideally add it to BaseBenchmarks, so it'll get tracked.

@andyferris
Copy link
Member

True - I guess we can find something roughly equivalent that doesn't need using StaticArrays (I don't think including that would be a good idea).

Keno added 2 commits May 24, 2018 19:12
In #25908 it was noted that reinterpreting structures with paddings
exposes undef LLVM values to user code. This is problematic, because
an LLVM undef value is quite dangerous (it can have a different value
at every use, e.g. for `a::Bool` undef, we can have `a || !a == true`.
There are proposal in LLVM to create values that are merely arbitrary
(but the same at every use), but that capability does not currently
exist in LLVM. As such, we should try hard to prevent `undef` showing
up in a user-visible way. There are several ways to fix this:
 1. Wait until LLVM comes up with a safer `undef` and have the value merely
    be arbitrary, but not dangerous.
 2. Always guarantee that padding bytes will be 0.
 3. For contiguous-memory arrays, guarantee that we end up with the underlying
    bytes from that array.

However, for now, I think don't think we should make a choice here. Issues
like #21912, may play into the consideration, and I think we should be
able to reserve making a choice until that point. So what this PR does
is only allow reinterprets when they would not expose padding. This should
hopefully cover the most common use cases of reinterpret:
 - Reinterpreting a vector or matrix of values to StaticVectors of the same
   element type. These should generally always have compatiable padding (if
   not, reinterpret was likely the wrong API to use).
 - Reinterpreting from a Vector{UInt8} to a vector of structs (that may have padding).
   This PR allows this for reading (but not for writing). Both cases are generally
   better served by the IO APIs, but hopefully this should still allow the
   common cases.

Fixes #25908
When I originally wrote the new ReinterpretArray code, I made sure that LLVM was able
to optimize reinterpret(::Array) back to a single memory access with appropriate TBAA
and alignment info. Somewhere along the line LLVM lost that ability. While we should
try to recover that capability in LLVM, that showed that that is a relatively brittle
optimization for a very simple operation. So this patch takes a different approach:
We add two new intrinsics `tbaa_pointerref` and `tbaa_pointerset` that behave like
their non-TBAA variants, but additionally take a type to use as the TBAA tag. This
allows us to write a special case for `reinterpret(T, ::Array)` that directly emits
the correct pointer access. It's also a model for what a post-1.0 pure Julia
implementation of `Array` (e.g. on top of a buffer type) may look like.

Fixes #25014
@vtjnash
Copy link
Member

vtjnash commented May 25, 2018

I don't think we should implement new intrinsics just before the release. After the release, we can consider whether the complexity is worthwhile, but:

but additionally take a type to use as the TBAA tag

That's not what a TBAA tag is. TBAA is a path-based constraint on what type of pointer you might have. The Julia type tag is just one of the possible sources of that root of that information. But we also might include additional information about the aliasing set (where the pointer was allocated). That tag might even change due to a reinterpret or field access, while a TBAA annotation on a subsequent load would need to use the tag from the original object allocation.

Copy link
Member

@vtjnash vtjnash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove @pure and tbaa usages

@Keno
Copy link
Member Author

Keno commented May 25, 2018

That's not what a TBAA tag is. TBAA is a path-based constraint on what type of pointer you might have. The Julia type tag is just one of the possible sources of that root of that information. But we also might include additional information about the aliasing set (where the pointer was allocated). That tag might even change due to a reinterpret or field access, while a TBAA annotation on a subsequent load would need to use the tag from the original object allocation.

I do know what TBAA is. The whole point of the reinterpret changes was to be able to separate out the TBAA for the different types of arrays. If you look through this change again, you will see that it indeed uses the original allocated type of the array as the source of the TBAA information.

@Keno
Copy link
Member Author

Keno commented May 25, 2018

As for @pure, I've added a constructor to Stateful that does not call back into inference and used that. I assume that was your objection. If it was, not, please explain what it was and suggest how to improve it.

@kshyatt kshyatt added the types and dispatch Types, subtyping and method dispatch label May 27, 2018
@favba
Copy link
Contributor

favba commented Jun 19, 2018

Will this be part of v7.0? Or will it be postponed to v1.x?

@StefanKarpinski
Copy link
Member

StefanKarpinski commented Jun 19, 2018

It's a breaking change (IIUC) so it seems like it would be good to get it in if possible.

@andyferris
Copy link
Member

The performance regression this fixes is pretty catastrophic for my workloads, so it would be really great if we could get this in :)

@KristofferC
Copy link
Member

Bump, should this be triaged? I also have code that is now bad performance and not sure how to rewrite it to be as efficient as it used to be.

@StefanKarpinski StefanKarpinski added the triage This should be discussed on a triage call label Jun 26, 2018
@Keno
Copy link
Member Author

Keno commented Jun 28, 2018

Repeating my request from triage: @vtjnash If you think there is a case in which the current implementation would cause bugs or other incorrect behavior, please say what that case is. Otherwise, I think we can merge this.

@vtjnash
Copy link
Member

vtjnash commented Jun 28, 2018

Please split the breaking changes from the performance concerns

@JeffBezanson
Copy link
Member

Agreed, that's always for the best.

@Keno
Copy link
Member Author

Keno commented Jun 28, 2018

There's two commits in this PR, one for the breaking changes, one for the performance concerns.

@Keno
Copy link
Member Author

Keno commented Jun 29, 2018

I suggested to Jameson to replace the tbaa intrinsic by an intrinsic that takes the array as well as an element type and a byte offset. I'll split this PR to get the breaking change in first in a separate PR and rework the performance part.

@JeffBezanson
Copy link
Member

I don't think this is blocking for 0.7.

@JeffBezanson JeffBezanson added this to the 1.0 milestone Jul 12, 2018
@KristofferC
Copy link
Member

So for 0.7, what is the upgrade path? Use pointers and unsafe_load + unsafe_store!?

@andyferris
Copy link
Member

I agree - we need a viable workaround, please.

@vtjnash
Copy link
Member

vtjnash commented Jul 18, 2018

We're currently hoping that LLVM recognizes that:

nbytes_copied = 0
# This is a bit complicated to deal with partial elements
# at both the start and the end. LLVM will fold as appropriate,
# once it knows the data layout
while nbytes_copied < sizeof(T)
s[] = a.parent[ind_start + i, tailinds...]
while nbytes_copied < sizeof(T) && sidx < sizeof(S)
unsafe_store!(tptr, unsafe_load(sptr, sidx + 1), nbytes_copied + 1)
sidx += 1
nbytes_copied += 1
end
sidx = 0
i += 1
end
is actually a call to memcpy (which it then is optimized to handle), maybe we could just make that more explicit to it?

nbytes_copied = 0 
# This is a bit complicated to deal with partial elements 
# at both the start and the end. LLVM will fold as appropriate, 
# once it knows the data layout 
while nbytes_copied < sizeof(T)
    s[] = a.parent[ind_start + i, tailinds...]
    tocopy = min(sizeof(T) - nbytes_copied, sizeof(S) - sidx)
    unsafe_copy!(tptr + nbytes_copied, sptr + sidx, tocopy)
    nbytes_copied += tocopy
    sidx = 0
    i += 1
end

@Keno
Copy link
Member Author

Keno commented Jul 19, 2018

We've done the breaking changes here. I still plan to do some work on performance before 0.7. Closing this since it's outdated, but leave the branch around for reference.

@Keno Keno closed this Jul 19, 2018
@Keno Keno removed the triage This should be discussed on a triage call label Jul 19, 2018
@DilumAluthge DilumAluthge deleted the kf/reinterpretredux branch March 25, 2021 22:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
types and dispatch Types, subtyping and method dispatch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Reinterpret and structure padding Performance concerns with ReinterpretArray
8 participants