-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Perf] Improve sparse computation performance #1263
Labels
feature request
Suggest an idea on this project
Comments
One performance issue I found: import taichi as ti
ti.init(arch=ti.cuda, print_ir=True, kernel_profiler=True)
a = ti.var(ti.i32)
b = ti.var(ti.i32)
block = ti.root.pointer(ti.i, 1024 * 128)
block.dense(ti.i, 1024).place(a)
block.dense(ti.i, 1024).place(b)
@ti.kernel
def activate():
for i in range(1024 * 128 * 1024):
a[i] = i
@ti.kernel
def copy():
for i in a:
b[i] = a[i]
activate()
for i in range(10):
copy()
ti.kernel_profiler_print() Current
Note that After #1270, it takes 6.8 ms (1.6x faster) and the IR becomes
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
We haven't done a systematic performance engineering since we switch to the LLVM backend. I
suspect there to be a lot of space for performance optimization, especially when it comes to sparsity and hierarchical data structures. Note that we have done pretty big changes to the IR/Opt system without performance regression tests, so sparse computation performance probably has degraded over time (e.g. #1182).
Steps
The text was updated successfully, but these errors were encountered: