diff --git a/docs/lang/articles/deployment/tutorial.md b/docs/lang/articles/deployment/tutorial.md
index f1fd8928dc877..ba5d33631b881 100644
--- a/docs/lang/articles/deployment/tutorial.md
+++ b/docs/lang/articles/deployment/tutorial.md
@@ -4,185 +4,233 @@ sidebar_position: 1
 
 # Tutorial: Run Taichi programs in C++ applications
 
-Taichi makes it easy to write high-performance programs with efficient parallelism, but in many applications we cannot simply deploy the Python scripts. Taichi offers a runtime library (TiRT) with a C interface so that your Taichi kernels can be launched in any native application. In this tutorial, we'll walk through the steps to deploy a Taichi program in a C++ application.
+Taichi makes it easy to write high-performance programs with efficient parallelism, but in many applications we cannot simply deploy the Python scripts. Taichi offers a runtime library (TiRT) with a C interface as well as its C++ wrapper, so your Taichi kernels can be launched in any native application. In this tutorial, we'll walk through the steps to deploy a Taichi program in a C++ application.
 
 ## Overview
 
 ![AOT E2E](../static/assets/aot_tutorial.png)
 
-In Python, when you call a function decorated with `@ti.kernel`, Taichi immediately compiles the kernel and sends it to the device for execution. This is called just-in-time (JIT) compilation. But generally speaking, we don't want to compile the kernels on a mobile phone, or to expose the source code to the users. For this Taichi introduced ahead-of-time (AOT) compilation so that you can compile kernels on a development machine, and launch them on user devices via TiRT.
+In Python, when you call a function decorated with `@ti.kernel`, Taichi immediately compiles the kernel and sends it to the device for execution. This is called just-in-time (JIT) compilation. However, in general, we don't want to compile the kernels on a mobile phone, or to expose the source code to the users. For this Taichi introduced ahead-of-time (AOT) compilation so that you can compile kernels on a development machine, and launch them on user devices via TiRT.
+
+In summary, running a Taichi program in C++ applications involves two steps:
 
 1. Compile Taichi kernels from Python and save the artifacts.
-2. Load AOT modules with TiRT and launch them in your applications.
+2. Load the AOT modules with TiRT and launch them in your applications.
 
 Although this tutorial only demonstrates integrating Taichi in a C++ application, the C interface allows you to integrate TiRT with many other programming languages including C/C++, Swift, Rust, C# (via P/Invoke) and Java (via JNI).
 
-### 1. Write kernels for AOT compilation
+## Quick-Start
 
-A Taichi kernel describes two aspects of a computer program: The computation itself, and the data it operates on. Because we don't know what kind of data will be fed into the kernel before execution, we have to clearly annotate the argument types for the AOT compiler.
+In this section, we will write a Taichi kernel for generating images for [Julia fractal](https://en.wikipedia.org/wiki/Julia_set) and deploy it in a C++ application. The following shows the project layout. Next, we will walk through the steps to see what they do.
 
-Taichi supports the following argument types:
+```
+.
+├── cmake
+│   └── FindTaichi.cmake    // finds the Taichi runtime library
+├── CMakeLists.txt          // builds the project
+├── app.py                  // defines and compiles the Taichi kernel
+├── app.cpp                 // deploys the compiled artifact to the application
+└── module.tcm              // the compiled Taichi kernel artifact
+```
 
-- `ti.i32`
-- `ti.f32`
-- `ti.Ndarray`
+Before we start, it is recommended to install Taichi through `taichi-nightly` Python wheels using the following command. Be aware that there's no strong version compatibility enforced yet, so it's highly recommended to use the Taichi built from exactly the same commit.
+
+```shell
+pip install -i https://pypi.taichi.graphics/simple/ taichi-nightly
+```
 
-Despite integers and floating-point numbers, we have a commonly-used data container called [`Ndarray`](https://docs.taichi-lang.org/api/taichi/lang/_ndarray/#taichi.lang._ndarray.Ndarray). It's similar to an [`ndarray`](https://numpy.org/doc/stable/reference/generated/numpy.ndarray.html) in NumPy, or a [`Tensor`](https://pytorch.org/docs/stable/tensors.html) in PyTorch. It can be multidimensional and is laid out continuously in memory. If you have experienced the multidimensional arrays in C++, You can treat it as a nested array type like `float[6][14]`.
+### 1. Compile Taichi kernel in Python script
 
-As an example of its usage, the following init kernel has an Ndarray argument named x. The dtype attribute is set to ti.f32 to specify that the Ndarray stores floating-point data, and the ndim attribute is set to 1, indicating that the Ndarray has only one dimension. Upon execution, every element within the x Ndarray will be set to zero.
+We firstly write a Python script named `app.py`, which compiles the Taichi kernel as an artifact. Save the following code to your local machine and run the program, you will obtain an archived `module.tcm` in the same directory as `app.py`.
 
 ```python
+import taichi as ti
+
+ti.init(arch=ti.vulkan)
+if ti.lang.impl.current_cfg().arch != ti.vulkan:
+    raise RuntimeError("Vulkan is not available.")
+
 @ti.kernel
-def init(x: ti.types.ndarray(dtype=ti.f32, ndim=1)):
-    for i in x:
-        x[i] = 0
+def paint(n: ti.u32, t: ti.f32, pixels: ti.types.ndarray(dtype=ti.f32, ndim=2)):
+    for i, j in pixels:  # Parallelized over all pixels
+        c = ti.Vector([-0.8, ti.cos(t) * 0.2])
+        z = ti.Vector([i / n - 1, j / n - 0.5]) * 2
+        iterations = 0
+        while z.norm() < 20 and iterations < 50:
+            z = ti.Vector([z[0]**2 - z[1]**2, z[1] * z[0] * 2]) + c
+            iterations += 1
+        pixels[i, j] = 1 - iterations * 0.02
+
+mod = ti.aot.Module(ti.vulkan)
+mod.add_kernel(paint)
+mod.archive("module.tcm")
 ```
 
-After initialization, in kernel `add_base`, we want to add a floating-point number `base` to those in `x` in each frame.
+Let's dive into the code example to see what happened.
+
+We initialize Taichi specifing the backend as `ti.vulkan` at the beginning. Considering that Taichi will fall back to CPU device if the target architecture is unavailable, we check if the current backend meets our requirement.
 
 ```python
-@ti.kernel
-def add_base(x: ti.types.ndarray(ndim=1), base: ti.f32):
-    for i in range(x.shape[0]):
-        x[i] += base
+ti.init(arch=ti.vulkan)
+if ti.lang.impl.current_cfg().arch != ti.vulkan:
+    raise RuntimeError("Vulkan is not available.")
 ```
 
-You can also create an ndarray and launch the kernels in the same script to ensure they do everything you expect.
+Then, we define our Taichi kernel for computing each pixel in our program. A Taichi kernel describes two aspects of a computer program: the computation itself, and the data it operates on. Because we don't know what kind of data will be fed into the kernel before execution, we have to clearly annotate the argument types for the AOT compiler.
 
-```python
-x = ti.ndarray(ti.f32, shape=(8192))
-init(x)
+Taichi AOT module supports the following argument types: `ti.i32`, `ti.f32`, `ti.Ndarray`. Despite integers and floating-point numbers, we have a commonly-used data container called [`Ndarray`](https://docs.taichi-lang.org/api/taichi/lang/_ndarray/#taichi.lang._ndarray.Ndarray). It's similar to an [`ndarray`](https://numpy.org/doc/stable/reference/generated/numpy.ndarray.html) in NumPy, or a [`Tensor`](https://pytorch.org/docs/stable/tensors.html) in PyTorch. It can be multidimensional and is laid out continuously in memory. If you have experienced the multidimensional arrays in C++, You can treat it as a nested array type like `float[6][14]`.
 
-N_ITER = 50
-for _ in range(N_ITER):
-    add_base(x, 0.1)
-```
+Our Taichi kernel accepts an integer `n`, a float-pointing number `t` and a 2-dimensional Ndarray `pixels` as arguments. Each element of `pixels` is a floating-point number ranges from 0.0 to 1.0.
 
-### 2. Compile and save the artifacts
+```python
+@ti.kernel
+def paint(n: ti.i32, t: ti.f32, pixels: ti.types.ndarray(dtype=ti.f32, ndim=2)):
+    for i, j in pixels:  # Parallelized over all pixels
+        c = ti.Vector([-0.8, ti.cos(t) * 0.2])
+        z = ti.Vector([i / n - 1, j / n - 0.5]) * 2
+        iterations = 0
+        while z.norm() < 20 and iterations < 50:
+            z = ti.Vector([z[0]**2 - z[1]**2, z[1] * z[0] * 2]) + c
+            iterations += 1
+        pixels[i, j] = 1 - iterations * 0.02
+```
 
-Now let's compile the kernels into an AOT module.
-The compiled version of a Taichi kernel, such as the `add_base` example, includes all compiled components that were generated when compiling the `ti.kernel` . In this case, the argument base has a declared data type of `ti.f32`. This type information is used during the compilation process and is embedded within the compiled artifact. At runtime, the `add_base` kernel can be executed with any floating-point number as the argument for base.
+Finally, we compile the kernel into an artifact. The following piece of code initializes the AOT module and add the kernel to the module. The compiled artifact is saved as `module.tcm` in the working directory.
 
 ```python
 mod = ti.aot.Module(ti.vulkan)
-mod.add_kernel(init, template_args={'x': x})
-mod.add_kernel(add_base, template_args={'x': x})
-mod.save(target_dir)
+mod.add_kernel(paint)
+mod.archive("module.tcm")
 ```
 
-`ti.types.ndarray` is a bit more complicated since it requires both `dtype` and `ndim` as its type information. To compile Taichi kernels with `ti.types.ndarray` arguments, you'll have to supply that information either directly in the type annotation, or provide an example input via `template_args`.
+### 2. Work with Taichi C-API in C++ program
 
-Now that we're done with Kernel compilation, let's take a look at the generated artifacts and its layout:
+We are now done with Python and well prepared to build our application. The compiled artifacts saved as `module.tcm` and the Taichi Runtime Libirary (TiRT) are all we need. TiRT provides a fundamental C interface to help achieve optimal portability, however we also provide a header-only C++ wrapper to save you from writing verbose C code. For simplicity purpose, we'll stick with the C++ wrapper in this tutorial.
 
-```text
-// FUTURE WORK: This is just a zip. Replace tcb with readable JSON
-// Structure of compiled artifacts
-.
-├── demo
-│   ├── add_base_c78_0_k0001_vk_0_t00.spv
-│   ├── init_c76_0_k0000_vk_0_t00.spv
-│   └── metadata.json
-└── demo.py
+Firstly, we need to include the C++ wrapper header of Taichi C-API.
+
+```c++
+#include <taichi/cpp/taichi.hpp>
 ```
 
-### 3. Get Taichi Runtime Library (TiRT)
+Next, create a Taichi runtime with target architecture. We will further load the compiled artifacts from `module.tcm` and load our `paint` kernel from the module.
+
+```C++
+ti::Runtime runtime(TI_ARCH_VULKAN);
+ti::AotModule aot_module = runtime.load_aot_module("module.tcm");
+ti::Kernel kernel_paint = aot_module.get_kernel("paint");
+```
 
+The `paint` kernel accepts three arguments, and thus we need to declare corresponding variables in C++ program. We allocate memory through TiRT's `allocate_ndarray` interface for the `pixels`, the width and the height are set to `2 * n` and `n` respectively, and the element shape is set to `1`.
 
-![TiRT](../static/assets/runtime.png)
+```c++
+int n = 320;
+float t = 0.0f;
+ti::NdArray<float> pixels = runtime.allocate_ndarray<float>({(uint32_t)(2 * n), (uint32_t)n}, {1}, true);
+```
 
-With the completion of your Python work, you are now ready to develop your application! The compiled artifacts and TiRT that you have saved are all you require.
+Then, we specify the arguments for the kernel, where the index for `kernel_paint` indicates the position in the kernel's argument list. Launch the kernel, and wait for the Taichi kernel process to finish.
 
-Currently, TiRT is included with the `taichi-nightly` Python package. Please note that there is not yet a strict version compatibility enforced, so it is strongly advised to use a Python Taichi and TiRT that were built from the same exact commit for compatibility purposes.
+```c++
+kernel_paint[0] = n;
+kernel_paint[1] = t;
+kernel_paint[2] = pixels;
+kernel_paint.launch();
+runtime.wait();
+```
 
-TODO: We'll figure out a proper way to release it once the versioning issue is improved.
+Finally, the `pixels` Ndarray holds the kernel output. Before we read the output pixel data, we must map a device memory to a user-addressable space. The image data is saved in a plain text ppm format with a utility function `save_ppm`. For the ppm format, please refer to [Wikipedia](https://en.wikipedia.org/wiki/Netpbm#File_formats).
 
-```bash
-# Install python taichi:
-pip install -i https://pypi.taichi.graphics/simple/ taichi-nightly
-# Get the runtime library:
-pip download --no-deps -i https://pypi.taichi.graphics/simple/ taichi-nightly
-# For example
-unzip taichi_nightly-1.3.0.post20221102-cp38-cp38-manylinux_2_27_x86_64.whl
-export TAICHI_C_API_INSTALL_DIR=$PWD/taichi_nightly-1.3.0.post20221102.data/data/c_api/
+```c++
+auto pixels_data = (const float*)pixels.map();
+save_ppm(pixels_data, 2 * n, n, "result.ppm");
+pixels.unmap();
 ```
 
-Currently, only TiRT for Linux systems is included in the nightly distributions. If you need one for Android / Windows, please see the FAQ below to build it from source.
+The complete C++ source code is shown below, which is saved as `app.cpp` in the same directory as `app.py`.
 
-Integrate `TiRT` to your CMakeLists.txt:
+```c++
+#include <fstream>
+#include <taichi/cpp/taichi.hpp>
 
-```cmake
-# Find built taichi C-API library in `TAICHI_C_API_INSTALL_DIR`.
-find_library(taichi_c_api taichi_c_api HINTS ${TAICHI_C_API_INSTALL_DIR}/lib NO_CMAKE_FIND_ROOT_PATH)
-if (NOT EXISTS ${taichi_c_api})
-    message(FATAL_ERROR "Couldn't find C-API library in ${TAICHI_C_API_INSTALL_DIR}")
-endif()
-
-# Make sure your target is properly linked!
-set(TAICHI_TUTORIAL_DEMO_NAME "0_tutorial_kernel")
-message("-- Building ${TAICHI_TUTORIAL_DEMO_NAME}")
-add_executable(${TAICHI_TUTORIAL_DEMO_NAME} ${CMAKE_CURRENT_SOURCE_DIR}/app.cpp)
-target_include_directories(${TAICHI_TUTORIAL_DEMO_NAME} PUBLIC ${TAICHI_C_API_INSTALL_DIR}/include)
-target_link_libraries(${TAICHI_TUTORIAL_DEMO_NAME} ${taichi_c_api})
+void save_ppm(const float* pixels, uint32_t w, uint32_t h, const char* path) {
+  std::fstream f(path, std::ios::out | std::ios::trunc);
+  f << "P3\n" << w << ' ' << h << "\n255\n";
+  for (int j = h - 1; j >= 0; --j) {
+    for (int i = 0; i < w; ++i) {
+      f << static_cast<uint32_t>(255.999 * pixels[i * h + j]) << ' '
+        << static_cast<uint32_t>(255.999 * pixels[i * h + j]) << ' '
+        << static_cast<uint32_t>(255.999 * pixels[i * h + j]) << '\n';
+    }
+  }
+  f.flush();
+  f.close();
+}
+
+int main(int argc, const char** argv) {
+  ti::Runtime runtime(TI_ARCH_VULKAN);
+  ti::AotModule aot_module = runtime.load_aot_module("module.tcm");
+  ti::Kernel kernel_paint = aot_module.get_kernel("paint");
+
+  int n = 320;
+  float t = 0.0f;
+  ti::NdArray<float> pixels = runtime.allocate_ndarray<float>({(uint32_t)(2 * n), (uint32_t)n}, {1}, true);
+
+  kernel_paint[0] = n;
+  kernel_paint[1] = t;
+  kernel_paint[2] = pixels;
+  kernel_paint.launch();
+  runtime.wait();
+
+  auto pixels_data = (const float*)pixels.map();
+  save_ppm(pixels_data, 2 * n, n, "result.ppm");
+  pixels.unmap();
+
+  return 0;
+}
 ```
 
-### 4. Run taichi kernels in your application
+### 3. Build project with CMake
 
-TiRT provides a fundamental C interface to help achieve optimal portability, however we also kindly provide a header-only C++ wrapper to save you from writing verbose C code. For simplicity purpose, we'll stick with the C++ wrapper in this tutorial.
+CMake is utilized to build our project, and we introduce the utility CMake module [`cmake/FindTaichi.cmake`](https://github.com/taichi-dev/taichi/blob/master/c_api/cmake/FindTaichi.cmake). It firstly find Taichi installation directory according to the environment variable `TAICHI_C_API_INSTALL_DIR`, without which CMake will find the Taichi library in Python wheel. Then, it will define the `Taichi::Runtime` target which is linked to our project.
 
-Calling Taichi in C++ as easy as what you'd imagine:
+The utility module is further included in the `CMakeLists.txt` which looks like as below.
 
-- Create a Taichi runtime with target arch
-- Load the compiled artifacts from disk through TiRT's `load_aot_module` interface.
-- Load kernels from the module to `k_init_` and `k_add_base_`
-- Prepare the inputs: ndarray `x_` and float `base`
-- Launch the kernels!
+```cmake
+cmake_minimum_required(VERSION 3.17)
 
-A complete C++ application with embedded Taichi is shown below:
+set(TAICHI_AOT_APP_NAME TaichiAot)
+project(${TAICHI_AOT_APP_NAME} LANGUAGES C CXX)
+set(CMAKE_CXX_STANDARD 17)
+set(CMAKE_CXX_STANDARD_REQUIRED ON)
 
-```cpp
-#include <taichi/cpp/taichi.hpp>
+# Declare executable target.
+add_executable(${TAICHI_AOT_APP_NAME} app.cpp)
+target_include_directories(${TAICHI_AOT_APP_NAME} PUBLIC ${TAICHI_C_API_INSTALL_DIR}/include)
 
-struct App0_tutorial {
-  static const uint32_t NPARTICLE = 8192 * 2;
-  static const uint32_t N_ITER = 50;
-
-  ti::Runtime runtime_;
-  ti::AotModule module_;
-  ti::Kernel k_init_;
-  ti::Kernel k_add_base_;
-  ti::NdArray<float> x_;
-
-  App0_tutorial() {
-    runtime_ = ti::Runtime(TI_ARCH_VULKAN);
-    module_ = runtime_.load_aot_module("0_tutorial_kernel/assets/tutorial");
-    k_init_ = module_.get_kernel("init");
-    k_add_base_ = module_.get_kernel("add_base");
-    x_ = runtime_.allocate_ndarray<float>({NPARTICLE}, {}, true);
-    std::cout << "Initialized!" << std::endl;
-  }
+# Find and link Taichi runtime library.
+set(CMAKE_MODULE_PATH ${PROJECT_SOURCE_DIR}/cmake)
+find_package(Taichi REQUIRED)
+target_link_libraries(${TAICHI_AOT_APP_NAME} Taichi::Runtime)
+```
 
-  bool run() {
-    float base = 0.2;
+Build the project with the commands:
 
-    k_init_.push_arg(x_);
-    k_init_.launch();
-    k_add_base_.push_arg(x_);
-    k_add_base_.push_arg(base);
-    for (int i = 0; i < N_ITER; i++) {
-      k_add_base_.launch();
-    }
-    runtime_.wait();
-    return true;
-  }
-};
+```shell
+cmake -B build
+cmake --build build
+```
 
-int main(int argc, const char** argv) {
-  App0_tutorial app;
-  app.run();
-  return 0;
-}
+Run the executable TaichiAOT demo:
+
+```shell
+./build/TaichiAOT
 ```
 
+An image of Julia fractal shown below is saved as `result.ppm` in the project directory.
+
+![](../static/assets/fractal.png)
+
 ## FAQ
 
 ### Map your Taichi data types from Python to C++
diff --git a/docs/lang/articles/static/assets/fractal.png b/docs/lang/articles/static/assets/fractal.png
new file mode 100644
index 0000000000000..b956502a934a8
Binary files /dev/null and b/docs/lang/articles/static/assets/fractal.png differ