-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance concerns with ReinterpretArray #25014
Comments
Still seeing this problem. I’m getting a median time of 2.6 ns for accessing an array created with unsafe_wrap, and a median time of 10.0 ns for a reinterpreted array. From what I gather on discourse it seems that there are very few people with the expertise to tackle this. |
Keno
added a commit
that referenced
this issue
May 23, 2018
When I originally wrote the new ReinterpretArray code, I made sure that LLVM was able to optimize reinterpret(::Array) back to a single memory access with appropriate TBAA and alignment info. Somewhere along the line LLVM lost that ability. While we should try to recover that capability in LLVM, that showed that that is a relatively brittle optimization for a very simple operation. So this patch takes a different approach: We add two new intrinsics `tbaa_pointerref` and `tbaa_pointerset` that behave like their non-TBAA variants, but additionally take a type to use as the TBAA tag. This allows us to write a special case for `reinterpret(T, ::Array)` that directly emits the correct pointer access. It's also a model for what a post-1.0 pure Julia implementation of `Array` (e.g. on top of a buffer type) may look like. Fixes #25014
Closed
Keno
added a commit
that referenced
this issue
May 24, 2018
When I originally wrote the new ReinterpretArray code, I made sure that LLVM was able to optimize reinterpret(::Array) back to a single memory access with appropriate TBAA and alignment info. Somewhere along the line LLVM lost that ability. While we should try to recover that capability in LLVM, that showed that that is a relatively brittle optimization for a very simple operation. So this patch takes a different approach: We add two new intrinsics `tbaa_pointerref` and `tbaa_pointerset` that behave like their non-TBAA variants, but additionally take a type to use as the TBAA tag. This allows us to write a special case for `reinterpret(T, ::Array)` that directly emits the correct pointer access. It's also a model for what a post-1.0 pure Julia implementation of `Array` (e.g. on top of a buffer type) may look like. Fixes #25014
Keno
added a commit
that referenced
this issue
Aug 16, 2018
This fixes #25014 by making it more obvious what's going on to LLVM. Instead of a memcpy loop, we use a new intrinsic that puts an actual llvm.memcpy into the IR, which is enough for LLVM to fold everything away. In the benchmark from #25014, we still see some regressions from 0.6, but that is because it needs to dereference through the pointers in the reinterpret and reshape wrappers. In any real code, that dereferencing should be loop-invariantly moved out of the inner loop.
Keno
added a commit
that referenced
this issue
Aug 16, 2018
This fixes #25014 by making it more obvious what's going on to LLVM. Instead of a memcpy loop, we use a ccall to :memcpy and turn this into llvm.memcpy at the IR level, which is enough for LLVM to fold everything away. In the benchmark from #25014, we still see some regressions from 0.6, but that is because it needs to dereference through the pointers in the reinterpret and reshape wrappers. In any real code, that dereferencing should be loop-invariantly moved out of the inner loop.
Keno
added a commit
that referenced
this issue
Aug 17, 2018
This fixes #25014 by making it more obvious what's going on to LLVM. Instead of a memcpy loop, we use a ccall to :memcpy and turn this into llvm.memcpy at the IR level, which is enough for LLVM to fold everything away. In the benchmark from #25014, we still see some regressions from 0.6, but that is because it needs to dereference through the pointers in the reinterpret and reshape wrappers. In any real code, that dereferencing should be loop-invariantly moved out of the inner loop.
Keno
added a commit
that referenced
this issue
Aug 17, 2018
This fixes #25014 by making it more obvious what's going on to LLVM. Instead of a memcpy loop, we use a ccall to :memcpy and turn this into llvm.memcpy at the IR level, which is enough for LLVM to fold everything away. In the benchmark from #25014, we still see some regressions from 0.6, but that is because it needs to dereference through the pointers in the reinterpret and reshape wrappers. In any real code, that dereferencing should be loop-invariantly moved out of the inner loop.
KristofferC
pushed a commit
that referenced
this issue
Aug 19, 2018
This fixes #25014 by making it more obvious what's going on to LLVM. Instead of a memcpy loop, we use a ccall to :memcpy and turn this into llvm.memcpy at the IR level, which is enough for LLVM to fold everything away. In the benchmark from #25014, we still see some regressions from 0.6, but that is because it needs to dereference through the pointers in the reinterpret and reshape wrappers. In any real code, that dereferencing should be loop-invariantly moved out of the inner loop. (cherry picked from commit 777810b)
KristofferC
pushed a commit
that referenced
this issue
Aug 19, 2018
This fixes #25014 by making it more obvious what's going on to LLVM. Instead of a memcpy loop, we use a ccall to :memcpy and turn this into llvm.memcpy at the IR level, which is enough for LLVM to fold everything away. In the benchmark from #25014, we still see some regressions from 0.6, but that is because it needs to dereference through the pointers in the reinterpret and reshape wrappers. In any real code, that dereferencing should be loop-invariantly moved out of the inner loop. (cherry picked from commit 777810b)
KristofferC
pushed a commit
that referenced
this issue
Aug 19, 2018
This fixes #25014 by making it more obvious what's going on to LLVM. Instead of a memcpy loop, we use a ccall to :memcpy and turn this into llvm.memcpy at the IR level, which is enough for LLVM to fold everything away. In the benchmark from #25014, we still see some regressions from 0.6, but that is because it needs to dereference through the pointers in the reinterpret and reshape wrappers. In any real code, that dereferencing should be loop-invariantly moved out of the inner loop. (cherry picked from commit 777810b)
KristofferC
pushed a commit
that referenced
this issue
Sep 8, 2018
This fixes #25014 by making it more obvious what's going on to LLVM. Instead of a memcpy loop, we use a ccall to :memcpy and turn this into llvm.memcpy at the IR level, which is enough for LLVM to fold everything away. In the benchmark from #25014, we still see some regressions from 0.6, but that is because it needs to dereference through the pointers in the reinterpret and reshape wrappers. In any real code, that dereferencing should be loop-invariantly moved out of the inner loop. (cherry picked from commit 777810b)
KristofferC
pushed a commit
that referenced
this issue
Sep 8, 2018
This fixes #25014 by making it more obvious what's going on to LLVM. Instead of a memcpy loop, we use a ccall to :memcpy and turn this into llvm.memcpy at the IR level, which is enough for LLVM to fold everything away. In the benchmark from #25014, we still see some regressions from 0.6, but that is because it needs to dereference through the pointers in the reinterpret and reshape wrappers. In any real code, that dereferencing should be loop-invariantly moved out of the inner loop. (cherry picked from commit 777810b)
KristofferC
pushed a commit
that referenced
this issue
Feb 11, 2019
This fixes #25014 by making it more obvious what's going on to LLVM. Instead of a memcpy loop, we use a ccall to :memcpy and turn this into llvm.memcpy at the IR level, which is enough for LLVM to fold everything away. In the benchmark from #25014, we still see some regressions from 0.6, but that is because it needs to dereference through the pointers in the reinterpret and reshape wrappers. In any real code, that dereferencing should be loop-invariantly moved out of the inner loop. (cherry picked from commit 777810b)
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Ref https://discourse.julialang.org/t/big-overhead-with-the-new-lazy-reshape-reinterpret/7635
The text was updated successfully, but these errors were encountered: