Skip to content

Commit

Permalink
Add simt functions in operators
Browse files Browse the repository at this point in the history
  • Loading branch information
qiao-bo committed Jul 5, 2022
1 parent cf9f09f commit 3dfb93d
Showing 1 changed file with 49 additions and 0 deletions.
49 changes: 49 additions & 0 deletions docs/lang/articles/basic/operator.md
Original file line number Diff line number Diff line change
Expand Up @@ -323,3 +323,52 @@ a@a # @ denotes matrix multiplication
For now, determinant() and inverse() only works in Taichi-scope, and the
size of the matrix must be 1x1, 2x2, 3x3 or 4x4.
:::


## Supported SIMT intrinsics
For CUDA backend, Taichi now supports warp-level and block-level intrinsics that
are needed for writing high-performance SIMT kernels. You can use them in Taichi
similar to the [usage in CUDA kernels](https://developer.nvidia.com/blog/using-cuda-warp-level-primitives/). Currently, the following functions are supported:


| Operation | Mapped CUDA intrinsic |
| -------------------------- | ----------------------------- |
|`ti.simt.warp.all_nonzero` | `__all_sync` |
|`ti.simt.warp.any_nonzero` | `__any_sync` |
|`ti.simt.warp.unique` | `__uni_sync` |
|`ti.simt.warp.ballot` | `__ballot_sync` |
|`ti.simt.warp.shfl_sync_i32`| `__shfl_sync` |
|`ti.simt.warp.shfl_sync_f32`| `__shfl_sync` |
|`ti.simt.warp.shfl_up_i32` | `__shfl_up_sync` |
|`ti.simt.warp.shfl_up_f32` | `__shfl_up_sync` |
|`ti.simt.warp.shfl_down_i32`| `__shfl_down_sync`|
|`ti.simt.warp.shfl_down_f32`| `__shfl_down_sync`|
|`ti.simt.warp.shfl_xor_i32` | `__shfl_xor_sync` |
|`ti.simt.warp.match_any` | `__match_any_sync`|
|`ti.simt.warp.match_all` | `__match_all_sync`|
|`ti.simt.warp.active_mask` | `__activemask` |
|`ti.simt.warp.sync` | `__syncwarp` |

Please refer to our [API docs](https://docs.taichi.graphics/api/taichi/lang/simt/warp/#module-taichi.lang.simt.warp)
for more information on each function.

Here is an example to perform data exchange within a warp in Taichi:


```python
a = ti.field(dtype=ti.i32, shape=32)

@ti.kernel
def foo():
ti.loop_config(block_dim=32)
for i in range(32):
a[i] = ti.simt.warp.shfl_up_i32(ti.u32(0xFFFFFFFF), a[i], 1)

for i in range(32):
a[i] = i * i

foo()

for i in range(1, 32):
assert a[i] == (i - 1) * (i - 1)
```

0 comments on commit 3dfb93d

Please sign in to comment.