Why from_numpy is faster than kernel initialization #3734

FantasyVR · 2021-12-07T07:59:43Z

Related forum topic: https://forum.taichi.graphics/t/8-fem/2136
Test script:

import taichi as ti 
import numpy as np
import time
ti.init(arch=ti.cpu)
n = 1000000000
v_ti = ti.field(ti.f32,shape=n)
v_tiFnp = ti.field(ti.f32, shape=n)

@ti.kernel
def init():
    for i in range(n):
        v_ti[i] = i 

# Test from_numpy performance
start = time.time()
v_np = np.arange(n,dtype=np.float32)
v_tiFnp.from_numpy(v_np)
ti.sync()
end = time.time()
print(f">> Time using from_numpy: {end-start}")

# Test kernel initialization performance
start = time.time()
init()
ti.sync()
end = time.time()
print(f">> Time using init kernel: {end-start}")

The results show that

>> Time using from_numpy: 3.888772964477539
>> Time using init kernel: 6.807356357574463

The text was updated successfully, but these errors were encountered:

strongoier · 2021-12-07T12:03:32Z

After printing out the CHI IRs, I find that the block_dim set for from_numpy() is 128, while the block_dim set for your init() kernel is 32. As you are using the CPU backend, block_dim is the number of iterations handled in a task, and all the tasks will be scheduled to the threads you set. Assume we are using the default setting, where the number of threads equals the number of processors you have.

Let's first see why block_dim matters. In the case of your init() kernel, each iteration has equal running time. Therefore, as long as the total number of iterations executed by a single processor is roughly the same, reducing the number of tasks (by increasing block_dim) will reduce the total time because the overhead of scheduling is reduced. On my local machine, change your kernel into:

@ti.kernel
def init():
    ti.block_dim(1 << 10) 
    for i in range(n):
        v_ti[i] = i

will have a 25x speedup.

Now let's try to understand why different block_dims are set in your program. This is because from_numpy() is internally implemented with a kernel using a struct for, whose block_dim is default to 128, while your init() kernel uses a range for, whose block_dim is default to 32.

FantasyVR added the question Question on using Taichi label Dec 7, 2021

FantasyVR closed this as completed Dec 8, 2021

strongoier mentioned this issue Dec 8, 2021

Enhance the default scheduling mechanism of the CPU backend #3750

Open

qiao-bo mentioned this issue Mar 16, 2022

Parallelized for-loop performance same as serial one #4541

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why from_numpy is faster than kernel initialization #3734

Why from_numpy is faster than kernel initialization #3734

FantasyVR commented Dec 7, 2021 •

edited

Loading

strongoier commented Dec 7, 2021

Why from_numpy is faster than kernel initialization #3734

Why from_numpy is faster than kernel initialization #3734

Comments

FantasyVR commented Dec 7, 2021 • edited Loading

strongoier commented Dec 7, 2021

FantasyVR commented Dec 7, 2021 •

edited

Loading