From 6055f07ee7feb75e60105ff38fdeda199920aff9 Mon Sep 17 00:00:00 2001 From: neozhaoliang Date: Wed, 21 Dec 2022 20:41:49 +0800 Subject: [PATCH] update docs --- docs/lang/articles/basic/sparse.md | 2 +- docs/lang/articles/performance_tuning/performance.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/lang/articles/basic/sparse.md b/docs/lang/articles/basic/sparse.md index e92637e7c2d62..714fd7b32d71a 100644 --- a/docs/lang/articles/basic/sparse.md +++ b/docs/lang/articles/basic/sparse.md @@ -34,7 +34,7 @@ Sparse data structures are traditionally based on [Quadtrees](https://en.wikiped [VDB](https://www.openvdb.org/) and [SPGrid](http://pages.cs.wisc.edu/~sifakis/papers/SPGrid.pdf) are such examples. In Taichi, programmers can compose data structures similar to VDB and SPGrid with SNodes. The advantages of Taichi spatially sparse data structures include: -- Array and List access time, which is the equivalent to accessing a dense data structure. +- Access with indices, which just like accessing a dense data structure. - Automatic parallelization when iterating. - Automatic memory access optimization. diff --git a/docs/lang/articles/performance_tuning/performance.md b/docs/lang/articles/performance_tuning/performance.md index 526d4e567e082..cebaf181dc0da 100644 --- a/docs/lang/articles/performance_tuning/performance.md +++ b/docs/lang/articles/performance_tuning/performance.md @@ -134,7 +134,7 @@ hierarchy matches `ti.root.(sparse SNode)+.dense`), Taichi will assign one CUDA thread block to each `dense` container (or `dense` block). BLS optimization works specifically for such kinds of fields. -BLS intends to enhance stencil computing processes by utilising CUDA shared memory. This optimization begins with users annotating the set of fields they want to cache using `ti.block local`. At *compile time*, Taichi tries to identify the accessing range in relation to the `dense` block of these annotated fields. If Taichi is successful, it creates code that first loads all of the accessible data in range into a *block local* buffer (CUDA's shared memory), then replaces all accesses to the relevant slots into this buffer. +BLS intends to enhance stencil computing processes by utilizing CUDA shared memory. This optimization begins with users annotating the set of fields they want to cache using `ti.block local`. At *compile time*, Taichi tries to identify the accessing range in relation to the `dense` block of these annotated fields. If Taichi is successful, it creates code that first loads all of the accessible data in range into a *block local* buffer (CUDA's shared memory), then replaces all accesses to the relevant slots into this buffer. Here is an example illustrating the usage of BLS. `a` is a sparse field with a block size of `4x4`.