-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve reinterpret #27213
Improve reinterpret #27213
Conversation
end | ||
|
||
using .Iterators: Stateful | ||
@pure function array_subpadding(S, T) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stateful
isn't pure and will break inference
base/reinterpretarray.jl
Outdated
|
||
# Special case for StridedArray | ||
reinterpret_alignment(::Type{ReinterpretArray{T,N,S,A}} where {N, S, A<:Array}) where {T} = datatype_alignment(T) | ||
reinterpret_alignment(::Type{<:ReinterpretArray}) = gcd(datatype_alignment(T), datatype_alignment(S)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What’s T
, S
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is indeed a bug (seems very likely), it would be good to have a test case that would catch it.
EDIT: scratch that, bad benchmark, I'll try again sorry. |
I can confirm that this improves performance significantly for a few of my simpler use cases that I checked (such as finding the means and covariance matrices of vectors of 3-vectors). I did observe that there is sometimes still a small, but tolerable, speed difference between (I didn't dig in too deeply but I did observe that iteration over |
If there's a benchmark you care about, ideally add it to BaseBenchmarks, so it'll get tracked. |
True - I guess we can find something roughly equivalent that doesn't need |
In #25908 it was noted that reinterpreting structures with paddings exposes undef LLVM values to user code. This is problematic, because an LLVM undef value is quite dangerous (it can have a different value at every use, e.g. for `a::Bool` undef, we can have `a || !a == true`. There are proposal in LLVM to create values that are merely arbitrary (but the same at every use), but that capability does not currently exist in LLVM. As such, we should try hard to prevent `undef` showing up in a user-visible way. There are several ways to fix this: 1. Wait until LLVM comes up with a safer `undef` and have the value merely be arbitrary, but not dangerous. 2. Always guarantee that padding bytes will be 0. 3. For contiguous-memory arrays, guarantee that we end up with the underlying bytes from that array. However, for now, I think don't think we should make a choice here. Issues like #21912, may play into the consideration, and I think we should be able to reserve making a choice until that point. So what this PR does is only allow reinterprets when they would not expose padding. This should hopefully cover the most common use cases of reinterpret: - Reinterpreting a vector or matrix of values to StaticVectors of the same element type. These should generally always have compatiable padding (if not, reinterpret was likely the wrong API to use). - Reinterpreting from a Vector{UInt8} to a vector of structs (that may have padding). This PR allows this for reading (but not for writing). Both cases are generally better served by the IO APIs, but hopefully this should still allow the common cases. Fixes #25908
When I originally wrote the new ReinterpretArray code, I made sure that LLVM was able to optimize reinterpret(::Array) back to a single memory access with appropriate TBAA and alignment info. Somewhere along the line LLVM lost that ability. While we should try to recover that capability in LLVM, that showed that that is a relatively brittle optimization for a very simple operation. So this patch takes a different approach: We add two new intrinsics `tbaa_pointerref` and `tbaa_pointerset` that behave like their non-TBAA variants, but additionally take a type to use as the TBAA tag. This allows us to write a special case for `reinterpret(T, ::Array)` that directly emits the correct pointer access. It's also a model for what a post-1.0 pure Julia implementation of `Array` (e.g. on top of a buffer type) may look like. Fixes #25014
I don't think we should implement new intrinsics just before the release. After the release, we can consider whether the complexity is worthwhile, but:
That's not what a TBAA tag is. TBAA is a path-based constraint on what type of pointer you might have. The Julia type tag is just one of the possible sources of that root of that information. But we also might include additional information about the aliasing set (where the pointer was allocated). That tag might even change due to a reinterpret or field access, while a TBAA annotation on a subsequent load would need to use the tag from the original object allocation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove @pure
and tbaa
usages
I do know what TBAA is. The whole point of the reinterpret changes was to be able to separate out the TBAA for the different types of arrays. If you look through this change again, you will see that it indeed uses the original allocated type of the array as the source of the TBAA information. |
As for |
Will this be part of v7.0? Or will it be postponed to v1.x? |
It's a breaking change (IIUC) so it seems like it would be good to get it in if possible. |
The performance regression this fixes is pretty catastrophic for my workloads, so it would be really great if we could get this in :) |
Bump, should this be triaged? I also have code that is now bad performance and not sure how to rewrite it to be as efficient as it used to be. |
Repeating my request from triage: @vtjnash If you think there is a case in which the current implementation would cause bugs or other incorrect behavior, please say what that case is. Otherwise, I think we can merge this. |
Please split the breaking changes from the performance concerns |
Agreed, that's always for the best. |
There's two commits in this PR, one for the breaking changes, one for the performance concerns. |
I suggested to Jameson to replace the tbaa intrinsic by an intrinsic that takes the array as well as an element type and a byte offset. I'll split this PR to get the breaking change in first in a separate PR and rework the performance part. |
I don't think this is blocking for 0.7. |
So for 0.7, what is the upgrade path? Use |
I agree - we need a viable workaround, please. |
We're currently hoping that LLVM recognizes that: julia/base/reinterpretarray.jl Lines 120 to 133 in 5369849
nbytes_copied = 0
# This is a bit complicated to deal with partial elements
# at both the start and the end. LLVM will fold as appropriate,
# once it knows the data layout
while nbytes_copied < sizeof(T)
s[] = a.parent[ind_start + i, tailinds...]
tocopy = min(sizeof(T) - nbytes_copied, sizeof(S) - sidx)
unsafe_copy!(tptr + nbytes_copied, sptr + sidx, tocopy)
nbytes_copied += tocopy
sidx = 0
i += 1
end |
We've done the breaking changes here. I still plan to do some work on performance before 0.7. Closing this since it's outdated, but leave the branch around for reference. |
This fixes two issues with reinterpret. As always, the commit messages have details.
The first commit fixes #25908, by making the problematic reinterprets an error. We can choose more lax semantics at a later point, but this seems like the right behavior for 1.0.
The second fixes #25014 by adding a special case for
reinterpret(T, ::Array)
. We should also investigate why LLVM stopped being able to optimize this, but that's for another time.