Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

in-place broadcast (e.g. .+=) significanctly slower for reinterpreted array #48801

Open
Moelf opened this issue Feb 26, 2023 · 5 comments
Open
Labels
broadcast Applying a function over a collection performance Must go faster

Comments

@Moelf
Copy link
Contributor

Moelf commented Feb 26, 2023

on 1.9-beta4 and master branch (0b8e8fc):

julia> using BenchmarkTools

julia> @btime ary.+=Int32(1) setup = (ary = rand(Int32, 10^5));
  3.389 μs (0 allocations: 0 bytes)

julia> @btime ary.+=Int32(1) setup = (ary = reinterpret(Int32, rand(UInt8, 10^5*4)));
  31.680 μs (0 allocations: 0 bytes)

not sure if it's just another facet of:

@Moelf
Copy link
Contributor Author

Moelf commented Feb 26, 2023

a possible lead (#43153 (comment)):

julia> ary = reinterpret(Int32, rand(UInt8, 10^5*4))

julia> f(ary) = ary .+= Int32(1);

julia> @code_llvm f(ary)
....
vector.memcheck92:                                ; preds = %pass19.lr.ph.split.us
  %scevgep93 = getelementptr i8, i8* %12, i64 4
  %bound097 = icmp ult i8* %12, %scevgep93
  br i1 %bound097, label %scalar.ph101, label %vector.ph104
...

although I don't see the conflict flag variable

@Moelf
Copy link
Contributor Author

Moelf commented Oct 10, 2023

#44186 doesn't help

@N5N3
Copy link
Member

N5N3 commented Oct 11, 2023

IIRC, LLVM's vectorizer would insert an overly strict memcheck unless it can prove dest === src.
Unfortunately, our broadcast_unalias mechanism fools the complier.
Since #44186 doesn't help, we still need ivdep to force LLVM skipping the unneeded memcheck.

@Moelf
Copy link
Contributor Author

Moelf commented Oct 11, 2023

Since #44186 doesn't help, we still need ivdep to force LLVM skipping the unneeded memcheck.

what does "need" mean (where should this happen?) I would love to but I'm not sure I know enough about LLVM to help

@N5N3
Copy link
Member

N5N3 commented Oct 13, 2023

I mean adding @simd ivdep into broadcast kernal. It's the simplest but unsafe fix. Such change might be safer once we have compile-time infer_effect, we can add ivdep once we confirm that the broadcast function is effect free and the inputs contains no memory overlapping.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
broadcast Applying a function over a collection performance Must go faster
Projects
None yet
Development

No branches or pull requests

3 participants