You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
After printing out the CHI IRs, I find that the block_dim set for from_numpy() is 128, while the block_dim set for your init() kernel is 32. As you are using the CPU backend, block_dim is the number of iterations handled in a task, and all the tasks will be scheduled to the threads you set. Assume we are using the default setting, where the number of threads equals the number of processors you have.
Let's first see why block_dim matters. In the case of your init() kernel, each iteration has equal running time. Therefore, as long as the total number of iterations executed by a single processor is roughly the same, reducing the number of tasks (by increasing block_dim) will reduce the total time because the overhead of scheduling is reduced. On my local machine, change your kernel into:
Now let's try to understand why different block_dims are set in your program. This is because from_numpy() is internally implemented with a kernel using a struct for, whose block_dim is default to 128, while your init() kernel uses a range for, whose block_dim is default to 32.
Related forum topic: https://forum.taichi.graphics/t/8-fem/2136
Test script:
The text was updated successfully, but these errors were encountered: