From e7d5d41c1e9a04f4b34041c0f4515cce17d6b457 Mon Sep 17 00:00:00 2001
From: PENGUINLIONG <admin@penguinliong.moe>
Date: Thu, 11 Aug 2022 19:12:44 +0800
Subject: [PATCH 01/59] C-API documentation generator

---
 c_api/docs/taichi/taichi_core.h.md | 230 ++++++++++++
 docs/c_api/taichi/taichi_core.h.md | 554 +++++++++++++++++++++++++++++
 misc/generate_c_api_docs.py        | 130 +++++++
 misc/taichi_json.py                |   3 +
 4 files changed, 917 insertions(+)
 create mode 100644 c_api/docs/taichi/taichi_core.h.md
 create mode 100644 docs/c_api/taichi/taichi_core.h.md
 create mode 100644 misc/generate_c_api_docs.py

diff --git a/c_api/docs/taichi/taichi_core.h.md b/c_api/docs/taichi/taichi_core.h.md
new file mode 100644
index 0000000000000..f36c4af7eaa71
--- /dev/null
+++ b/c_api/docs/taichi/taichi_core.h.md
@@ -0,0 +1,230 @@
+# Taichi C-API: Core Functionalities
+
+Taichi Core exposes all necessary interfaces to offload AOT modules to Taichi. Here lists the features universally available disregards to any specific backend. The Taichi Core APIs are guaranteed to be forward compatible.
+
+## Definitions
+
+To guarantee a uniform behavior on any platform, we make the following definitions as reference.
+
+```c
+//~alias.bool
+//~definition.false
+//~definition.true
+```
+
+A boolean value is represented by an unsigned 32-bit integer. 1 represents a `true` state and 0 represents a `false` state.
+
+```c
+//~alias.flags
+```
+
+A bit-field of flags is represented by an unsigned 32-bit integer.
+
+```c
+//~definition.null_handle
+```
+
+A handle is an unsigned 64-bit interger. And a null handle is a handle of zero value.
+
+## Runtime
+
+A runtime is an instance of Taichi targeting an offload devices.
+
+```c
+//~handle.runtime
+```
+
+A runtime needs to be created with the `enumeration.arch` of the demanded backend device.
+
+```c
+//~enumeration.arch
+//~function.create_runtime
+```
+
+## AOT Module
+
+An AOT module is a pre-compiled collection of compute graphs and kernels.
+
+```c
+//~handle.aot_module
+```
+
+AOT modules can be loaded from the file system directly.
+
+```c
+//~function.load_aot_module
+```
+
+## Device Commands
+
+Device commands are interfaces that logical device
+
+## Declarations
+
+`alias.bool`
+
+A boolean value. Can be either `definition.true` or `definition.false`. Assignment with other values could lead to undefined behavior.
+
+`definition.true`
+
+A condition or a predicate is satisfied; a statement is valid.
+
+`definition.false`
+
+A condition or a predicate is not satisfied; a statement is invalid.
+
+`alias.flags`
+
+A bit field that can be used to represent 32 orthogonal flags.
+
+`definition.null_handle`
+
+A sentinal invalid handle that will never be produced from a valid call to Taichi C-API.
+
+`handle.runtime`
+
+Taichi runtime represents an instance of a logical computating device and its internal dynamic states. The user is responsible to synchronize any use of `handle.runtime`.
+
+`handle.aot_module`
+
+An ahead-of-time (AOT) compiled Taichi module, which contains a collection of kernels and compute graphs.
+
+`handle.event`
+
+A synchronization primitive to manage on-device execution flows in multiple queues.
+
+`handle.memory`
+
+A contiguous allocation of on-device memory.
+
+`handle.kernel`
+
+A Taichi kernel that can be launched on device for execution.
+
+`handle.compute_graph`
+
+A collection of Taichi kernels (a compute graph) to be launched on device with predefined order.
+
+`enumeration.arch`
+
+Types of logical offload devices.
+
+`enumeration.data_type`
+
+Elementary (primitive) data types.
+
+`enumeration.argument_type`
+
+Types of kernel and compute graph argument.
+
+`bit_field.memory_usage`
+
+Usages of a memory allocation.
+
+`structure.memory_allocate_info`
+
+Parameters of a newly allocated memory.
+
+`structure.memory_slice`
+
+A subsection of a memory allocation.
+
+`structure.nd_shape`
+
+Multi-dimensional size of an ND-array.
+
+`structure.nd_array`
+
+Multi-dimentional array of dense primitive data.
+
+`union.argument_value`
+
+A scalar or structured argument value.
+
+`structure.argument`
+
+An argument value to feed kernels.
+
+`structure.named_argument`
+
+An named argument value to feed compute graphcs.
+
+`function.create_runtime`
+
+Create a Taichi Runtime with the specified `enumeration.arch`.
+
+`function.destroy_runtime`
+
+Destroy a Taichi Runtime.
+
+`function.allocate_memory`
+
+Allocate a contiguous on-device memory with provided parameters.
+
+`function.free_memory`
+
+Free a memory allocation.
+
+`function.map_memory`
+
+Map an on-device memory to a host-addressible space. The user MUST ensure the device is not being used by any device command before the map. 
+
+`function.unmap_memory`
+
+Unmap an on-device memory and make any host-side changes about the memory visible to the device. The user MUST ensure there is no further access to the previously mapped host-addressible space.
+
+`function.create_event`
+
+Create an event primitive.
+
+`function.destroy_event`
+
+Destroy an event primitive.
+
+`function.copy_memory_device_to_device`
+
+Copy the content of a contiguous subsection of on-device memory to another. The two subsections MUST NOT overlap.
+
+`function.launch_kernel`
+
+Launch a Taichi kernel with provided arguments. The arguments MUST have the same count and types in the same order as in the source code.
+
+`function.launch_compute_graph`
+
+Launch a Taichi kernel with provided named arguments. The named arguments MUST have the same count, names and types as in the source code.
+
+`function.signal_event`
+
+Set an event primitive to a signaled state, so the queues waiting upon the event can go on execution. If the event has been signaled before, the event MUST be reset with `function.reset_event`; otherwise it is an undefined behavior.
+
+`function.reset_event`
+
+Set a signaled event primitive back to an unsignaled state.
+
+`function.wait_event`
+
+Wait on an event primitive until it transitions to a signaled state. The user MUST signal the awaited event; otherwise it is an undefined behavior.
+
+`function.submit`
+
+Submit all commands to the logical device for execution. Ensure that any previous device command has been offloaded to the logical computing device.
+
+`function.wait`
+
+Wait until all previously invoked device command has finished execution.
+
+`function.load_aot_module`
+
+Load a precompiled AOT module from the filesystem. `definition.null_handle` is returned if the runtime failed to load the AOT module from the given path.
+
+`function.destroy_aot_module`
+
+Destroy a loaded AOT module and release all related resources.
+
+`function.get_aot_module_kernel`
+
+Get a precompiled Taichi kernel from the AOT module. `definition.null_handle` is returned if the module does not have a kernel of the specified name.
+
+`function.get_aot_module_compute_graph`
+
+Get a precompiled compute graph from the AOt module. `definition.null_handle` is returned if the module does not have a kernel of the specified name.
diff --git a/docs/c_api/taichi/taichi_core.h.md b/docs/c_api/taichi/taichi_core.h.md
new file mode 100644
index 0000000000000..16759d8b96f3c
--- /dev/null
+++ b/docs/c_api/taichi/taichi_core.h.md
@@ -0,0 +1,554 @@
+# Taichi C-API: Core Functionality
+
+Taichi Core exposes all necessary interfaces to offload AOT modules to Taichi. Here lists the features universally available disregards to any specific backend. The Taichi Core APIs are guaranteed to be forward compatible.
+
+TODO: (@PENGUINLIONG) Example usage.
+
+## Declarations
+
+---
+### Alias `TiBool`
+
+```c
+// alias.bool
+typedef uint32_t TiBool;
+```
+
+A boolean value. Can be either `definition.true` or `definition.false`. Assignment with other values could lead to undefined behavior.
+
+---
+### Definition `TI_FALSE`
+
+```c
+// definition.false
+#define TI_FALSE 0
+```
+
+A condition or a predicate is not satisfied; a statement is invalid.
+
+---
+### Definition `TI_TRUE`
+
+```c
+// definition.true
+#define TI_TRUE 1
+```
+
+A condition or a predicate is satisfied; a statement is valid.
+
+---
+### Alias `TiFlags`
+
+```c
+// alias.flags
+typedef uint32_t TiFlags;
+```
+
+A bit field that can be used to represent 32 orthogonal flags.
+
+---
+### Definition `TI_NULL_HANDLE`
+
+```c
+// definition.null_handle
+#define TI_NULL_HANDLE 0
+```
+
+A sentinal invalid handle that will never be produced from a valid call to Taichi C-API.
+
+---
+### Handle `TiRuntime`
+
+```c
+// handle.runtime
+typedef struct TiRuntime_t* TiRuntime;
+```
+
+Taichi runtime represents an instance of a logical computating device and its internal dynamic states. The user is responsible to synchronize any use of `handle.runtime`.
+
+---
+### Handle `TiAotModule`
+
+```c
+// handle.aot_module
+typedef struct TiAotModule_t* TiAotModule;
+```
+
+An ahead-of-time (AOT) compiled Taichi module, which contains a collection of kernels and compute graphs.
+
+---
+### Handle `TiEvent`
+
+```c
+// handle.event
+typedef struct TiEvent_t* TiEvent;
+```
+
+A synchronization primitive to manage on-device execution flows in multiple queues.
+
+---
+### Handle `TiMemory`
+
+```c
+// handle.memory
+typedef struct TiMemory_t* TiMemory;
+```
+
+A contiguous allocation of on-device memory.
+
+---
+### Handle `TiKernel`
+
+```c
+// handle.kernel
+typedef struct TiKernel_t* TiKernel;
+```
+
+A Taichi kernel that can be launched on device for execution.
+
+---
+### Handle `TiComputeGraph`
+
+```c
+// handle.compute_graph
+typedef struct TiComputeGraph_t* TiComputeGraph;
+```
+
+A collection of Taichi kernels (a compute graph) to be launched on device with predefined order.
+
+---
+### Enumeration `TiArch`
+
+```c
+// enumeration.arch
+typedef enum TiArch {
+  TI_ARCH_X64 = 0,
+  TI_ARCH_ARM64 = 1,
+  TI_ARCH_JS = 2,
+  TI_ARCH_CC = 3,
+  TI_ARCH_WASM = 4,
+  TI_ARCH_CUDA = 5,
+  TI_ARCH_METAL = 6,
+  TI_ARCH_OPENGL = 7,
+  TI_ARCH_DX11 = 8,
+  TI_ARCH_OPENCL = 9,
+  TI_ARCH_AMDGPU = 10,
+  TI_ARCH_VULKAN = 11,
+  TI_ARCH_MAX_ENUM = 0xffffffff,
+} TiArch;
+```
+
+Types of logical offload devices.
+
+---
+### Enumeration `TiDataType`
+
+```c
+// enumeration.data_type
+typedef enum TiDataType {
+  TI_DATA_TYPE_F16 = 0,
+  TI_DATA_TYPE_F32 = 1,
+  TI_DATA_TYPE_F64 = 2,
+  TI_DATA_TYPE_I8 = 3,
+  TI_DATA_TYPE_I16 = 4,
+  TI_DATA_TYPE_I32 = 5,
+  TI_DATA_TYPE_I64 = 6,
+  TI_DATA_TYPE_U1 = 7,
+  TI_DATA_TYPE_U8 = 8,
+  TI_DATA_TYPE_U16 = 9,
+  TI_DATA_TYPE_U32 = 10,
+  TI_DATA_TYPE_U64 = 11,
+  TI_DATA_TYPE_GEN = 12,
+  TI_DATA_TYPE_UNKNOWN = 13,
+  TI_DATA_TYPE_MAX_ENUM = 0xffffffff,
+} TiDataType;
+```
+
+Elementary (primitive) data types.
+
+---
+### Enumeration `TiArgumentType`
+
+```c
+// enumeration.argument_type
+typedef enum TiArgumentType {
+  TI_ARGUMENT_TYPE_I32 = 0,
+  TI_ARGUMENT_TYPE_F32 = 1,
+  TI_ARGUMENT_TYPE_NDARRAY = 2,
+  TI_ARGUMENT_TYPE_MAX_ENUM = 0xffffffff,
+} TiArgumentType;
+```
+
+Types of kernel and compute graph argument.
+
+---
+### Bit Field `TiMemoryUsageFlagBits`
+
+```c
+// bit_field.memory_usage
+typedef enum TiMemoryUsageFlagBits {
+  TI_MEMORY_USAGE_STORAGE_BIT = 1 << 0,
+  TI_MEMORY_USAGE_UNIFORM_BIT = 1 << 1,
+  TI_MEMORY_USAGE_VERTEX_BIT = 1 << 2,
+  TI_MEMORY_USAGE_INDEX_BIT = 1 << 3,
+} TiMemoryUsageFlagBits;
+typedef TiFlags TiMemoryUsageFlags;
+```
+
+Usages of a memory allocation.
+
+---
+### Structure `TiMemoryAllocateInfo`
+
+```c
+// structure.memory_allocate_info
+typedef struct TiMemoryAllocateInfo {
+  uint64_t size;
+  TiBool host_write;
+  TiBool host_read;
+  TiBool export_sharing;
+  TiMemoryUsageFlagBits usage;
+} TiMemoryAllocateInfo;
+```
+
+Parameters of a newly allocated memory.
+
+---
+### Structure `TiMemorySlice`
+
+```c
+// structure.memory_slice
+typedef struct TiMemorySlice {
+  TiMemory memory;
+  uint64_t offset;
+  uint64_t size;
+} TiMemorySlice;
+```
+
+A subsection of a memory allocation.
+
+---
+### Structure `TiNdShape`
+
+```c
+// structure.nd_shape
+typedef struct TiNdShape {
+  uint32_t dim_count;
+  uint32_t dims[16];
+} TiNdShape;
+```
+
+Multi-dimensional size of an ND-array.
+
+---
+### Structure `TiNdArray`
+
+```c
+// structure.nd_array
+typedef struct TiNdArray {
+  TiMemory memory;
+  TiNdShape shape;
+  TiNdShape elem_shape;
+  TiDataType elem_type;
+} TiNdArray;
+```
+
+Multi-dimentional array of dense primitive data.
+
+---
+### Union `TiArgumentValue`
+
+```c
+// union.argument_value
+typedef union TiArgumentValue {
+  int32_t i32;
+  float f32;
+  TiNdArray ndarray;
+} TiArgumentValue;
+```
+
+A scalar or structured argument value.
+
+---
+### Structure `TiArgument`
+
+```c
+// structure.argument
+typedef struct TiArgument {
+  TiArgumentType type;
+  TiArgumentValue value;
+} TiArgument;
+```
+
+An argument value to feed kernels.
+
+---
+### Structure `TiNamedArgument`
+
+```c
+// structure.named_argument
+typedef struct TiNamedArgument {
+  const char* name;
+  TiArgument argument;
+} TiNamedArgument;
+```
+
+An named argument value to feed compute graphcs.
+
+---
+### Function `ti_create_runtime`
+
+```c
+// function.create_runtime
+TI_DLL_EXPORT TiRuntime TI_API_CALL ti_create_runtime(
+  TiArch arch
+);
+```
+
+Create a Taichi Runtime with the specified `enumeration.arch`.
+
+---
+### Function `ti_destroy_runtime`
+
+```c
+// function.destroy_runtime
+TI_DLL_EXPORT void TI_API_CALL ti_destroy_runtime(
+  TiRuntime runtime
+);
+```
+
+Destroy a Taichi Runtime.
+
+---
+### Function `ti_allocate_memory`
+
+```c
+// function.allocate_memory
+TI_DLL_EXPORT TiMemory TI_API_CALL ti_allocate_memory(
+  TiRuntime runtime,
+  const TiMemoryAllocateInfo* allocate_info
+);
+```
+
+Allocate a contiguous on-device memory with provided parameters.
+
+---
+### Function `ti_free_memory`
+
+```c
+// function.free_memory
+TI_DLL_EXPORT void TI_API_CALL ti_free_memory(
+  TiRuntime runtime,
+  TiMemory memory
+);
+```
+
+Free a memory allocation.
+
+---
+### Function `ti_map_memory`
+
+```c
+// function.map_memory
+TI_DLL_EXPORT void* TI_API_CALL ti_map_memory(
+  TiRuntime runtime,
+  TiMemory memory
+);
+```
+
+Map an on-device memory to a host-addressible space. The user MUST ensure the device is not being used by any device command before the map.
+
+---
+### Function `ti_unmap_memory`
+
+```c
+// function.unmap_memory
+TI_DLL_EXPORT void TI_API_CALL ti_unmap_memory(
+  TiRuntime runtime,
+  TiMemory memory
+);
+```
+
+Unmap an on-device memory and make any host-side changes about the memory visible to the device. The user MUST ensure there is no further access to the previously mapped host-addressible space.
+
+---
+### Function `ti_create_event`
+
+```c
+// function.create_event
+TI_DLL_EXPORT TiEvent TI_API_CALL ti_create_event(
+  TiRuntime runtime
+);
+```
+
+Create an event primitive.
+
+---
+### Function `ti_destroy_event`
+
+```c
+// function.destroy_event
+TI_DLL_EXPORT void TI_API_CALL ti_destroy_event(
+  TiEvent event
+);
+```
+
+Destroy an event primitive.
+
+---
+### Function `ti_copy_memory_device_to_device` (Device Command)
+
+```c
+// function.copy_memory_device_to_device
+TI_DLL_EXPORT void TI_API_CALL ti_copy_memory_device_to_device(
+  TiRuntime runtime,
+  const TiMemorySlice* dst_memory,
+  const TiMemorySlice* src_memory
+);
+```
+
+Copy the content of a contiguous subsection of on-device memory to another. The two subsections MUST NOT overlap.
+
+---
+### Function `ti_launch_kernel` (Device Command)
+
+```c
+// function.launch_kernel
+TI_DLL_EXPORT void TI_API_CALL ti_launch_kernel(
+  TiRuntime runtime,
+  TiKernel kernel,
+  uint32_t arg_count,
+  const TiArgument* args
+);
+```
+
+Launch a Taichi kernel with provided arguments. The arguments MUST have the same count and types in the same order as in the source code.
+
+---
+### Function `ti_launch_compute_graph` (Device Command)
+
+```c
+// function.launch_compute_graph
+TI_DLL_EXPORT void TI_API_CALL ti_launch_compute_graph(
+  TiRuntime runtime,
+  TiComputeGraph compute_graph,
+  uint32_t arg_count,
+  const TiNamedArgument* args
+);
+```
+
+Launch a Taichi kernel with provided named arguments. The named arguments MUST have the same count, names and types as in the source code.
+
+---
+### Function `ti_signal_event` (Device Command)
+
+```c
+// function.signal_event
+TI_DLL_EXPORT void TI_API_CALL ti_signal_event(
+  TiRuntime runtime,
+  TiEvent event
+);
+```
+
+Set an event primitive to a signaled state, so the queues waiting upon the event can go on execution. If the event has been signaled before, the event MUST be reset with `function.reset_event`; otherwise it is an undefined behavior.
+
+---
+### Function `ti_reset_event` (Device Command)
+
+```c
+// function.reset_event
+TI_DLL_EXPORT void TI_API_CALL ti_reset_event(
+  TiRuntime runtime,
+  TiEvent event
+);
+```
+
+Set a signaled event primitive back to an unsignaled state.
+
+---
+### Function `ti_wait_event` (Device Command)
+
+```c
+// function.wait_event
+TI_DLL_EXPORT void TI_API_CALL ti_wait_event(
+  TiRuntime runtime,
+  TiEvent event
+);
+```
+
+Wait on an event primitive until it transitions to a signaled state. The user MUST signal the awaited event; otherwise it is an undefined behavior.
+
+---
+### Function `ti_submit`
+
+```c
+// function.submit
+TI_DLL_EXPORT void TI_API_CALL ti_submit(
+  TiRuntime runtime
+);
+```
+
+Submit all commands to the logical device for execution. Ensure that any previous device command has been offloaded to the logical computing device.
+
+---
+### Function `ti_wait`
+
+```c
+// function.wait
+TI_DLL_EXPORT void TI_API_CALL ti_wait(
+  TiRuntime runtime
+);
+```
+
+Wait until all previously invoked device command has finished execution.
+
+---
+### Function `ti_load_aot_module`
+
+```c
+// function.load_aot_module
+TI_DLL_EXPORT TiAotModule TI_API_CALL ti_load_aot_module(
+  TiRuntime runtime,
+  const char* module_path
+);
+```
+
+Load a precompiled AOT module from the filesystem. `definition.null_handle` is returned if the runtime failed to load the AOT module from the given path.
+
+---
+### Function `ti_destroy_aot_module`
+
+```c
+// function.destroy_aot_module
+TI_DLL_EXPORT void TI_API_CALL ti_destroy_aot_module(
+  TiAotModule aot_module
+);
+```
+
+Destroy a loaded AOT module and release all related resources.
+
+---
+### Function `ti_get_aot_module_kernel`
+
+```c
+// function.get_aot_module_kernel
+TI_DLL_EXPORT TiKernel TI_API_CALL ti_get_aot_module_kernel(
+  TiAotModule aot_module,
+  const char* name
+);
+```
+
+Get a precompiled Taichi kernel from the AOT module. `definition.null_handle` is returned if the module does not have a kernel of the specified name.
+
+---
+### Function `ti_get_aot_module_compute_graph`
+
+```c
+// function.get_aot_module_compute_graph
+TI_DLL_EXPORT TiComputeGraph TI_API_CALL ti_get_aot_module_compute_graph(
+  TiAotModule aot_module,
+  const char* name
+);
+```
+
+Get a precompiled compute graph from the AOt module. `definition.null_handle` is returned if the module does not have a kernel of the specified name.
\ No newline at end of file
diff --git a/misc/generate_c_api_docs.py b/misc/generate_c_api_docs.py
new file mode 100644
index 0000000000000..6f19770ead5a5
--- /dev/null
+++ b/misc/generate_c_api_docs.py
@@ -0,0 +1,130 @@
+from collections import defaultdict
+from pathlib import Path
+import re
+from taichi_json import (Alias, BitField, BuiltInType, Definition, EntryBase,
+                         Enumeration, Field, Function, Handle, Module,
+                         Structure, Union)
+
+from generate_c_api import (get_type_name, get_field, get_declr)
+
+def get_title(x: EntryBase):
+    ty = type(x)
+    if ty is BuiltInType:
+        return ""
+
+    elif ty is Alias:
+        return f"Alias `{get_type_name(x)}`"
+
+    elif ty is Definition:
+        return f"Definition `{x.name.screaming_snake_case}`"
+
+    elif ty is Handle:
+        return f"Handle `{get_type_name(x)}`"
+
+    elif ty is Enumeration:
+        return f"Enumeration `{get_type_name(x)}`"
+
+    elif ty is BitField:
+        return f"Bit Field `{get_type_name(x)}`"
+
+    elif ty is Structure:
+        return f"Structure `{get_type_name(x)}`"
+
+    elif ty is Union:
+        return f"Union `{get_type_name(x)}`"
+
+    elif ty is Function:
+        extra = ""
+        if x.is_device_command:
+            extra += " (Device Command)"
+        return f"Function `{x.name.snake_case}`" + extra
+
+    else:
+        raise RuntimeError(f"'{x.id}' doesn't need title")
+
+
+def print_module_doc(module: Module, templ):
+    out = []
+
+    for i in range(len(templ)):
+        line = templ[i]
+        out += [line.strip()]
+        if line.startswith("## Declarations"):
+            break
+
+    out += [""]
+
+    cur_sym = None
+    documented_syms = defaultdict(list)
+    for line in templ[i:]:
+        line = line.strip()
+        if re.match(r"\`\w+\.\w+\`", line):
+            cur_sym = line[1:-1]
+            continue
+        documented_syms[cur_sym] += [line]
+
+
+    for x in module.declr_reg:
+        declr = module.declr_reg.resolve(x)
+
+        out += [
+            "---",
+            f"### {get_title(declr)}",
+            "",
+            "```c",
+            f"// {x}",
+            get_declr(declr),
+            "```",
+        ]
+
+        if x in documented_syms:
+            out += documented_syms[x]
+        else:
+            print(f"WARNING: `{x}` is not documented")
+
+    return '\n'.join(out)
+
+
+def generate_module_header(module):
+    if module.is_built_in:
+        return
+
+    templ_path = f"c_api/docs/{module.name}.md"
+    templ = None
+    if Path(templ_path).exists():
+        with open(templ_path) as f:
+            templ = f.readlines()
+    else: 
+        print(f"ignored {templ_path} because the documentation template cannot be found")
+        return
+
+    print(f"processing module '{module.name}'")
+    path = f"docs/c_api/{module.name}.md"
+    with open(path, "w") as f:
+        f.write(print_module_doc(module, templ))
+
+    #system(f"clang-format {path} -i")
+
+
+if __name__ == "__main__":
+    builtin_tys = {
+        BuiltInType("uint64_t", "uint64_t"),
+        BuiltInType("int64_t", "int64_t"),
+        BuiltInType("uint32_t", "uint32_t"),
+        BuiltInType("int32_t", "int32_t"),
+        BuiltInType("float", "float"),
+        BuiltInType("const char*", "const char*"),
+        BuiltInType("const char**", "const char**"),
+        BuiltInType("void*", "void*"),
+        BuiltInType("const void*", "const void*"),
+        BuiltInType("VkInstance", "VkInstance"),
+        BuiltInType("VkPhysicalDevice", "VkPhysicalDevice"),
+        BuiltInType("VkDevice", "VkDevice"),
+        BuiltInType("VkQueue", "VkQueue"),
+        BuiltInType("VkBuffer", "VkBuffer"),
+        BuiltInType("VkBufferUsageFlags", "VkBufferUsageFlags"),
+        BuiltInType("VkEvent", "VkEvent"),
+    }
+
+    for module in Module.load_all(builtin_tys):
+        generate_module_header(module)
diff --git a/misc/taichi_json.py b/misc/taichi_json.py
index 8550e2d861c6a..ba238ae817acd 100644
--- a/misc/taichi_json.py
+++ b/misc/taichi_json.py
@@ -201,6 +201,7 @@ def __init__(self, j):
         super().__init__(j, "function")
         self.return_value_type = None
         self.params = []
+        self.is_device_command = False
 
         if "parameters" in j:
             for x in j["parameters"]:
@@ -209,6 +210,8 @@ def __init__(self, j):
                     self.return_value_type = field.type
                 else:
                     self.params += [field]
+        if "is_device_command" in j:
+            self.is_device_command = True
 
 
 class Module:

From 25a7abcb2ada8a5f01ef813770b90df726ec32c5 Mon Sep 17 00:00:00 2001
From: "pre-commit-ci[bot]"
 <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Date: Thu, 11 Aug 2022 11:15:43 +0000
Subject: [PATCH 02/59] [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci
---
 c_api/docs/taichi/taichi_core.h.md |  2 +-
 docs/c_api/taichi/taichi_core.h.md |  2 +-
 misc/generate_c_api_docs.py        | 12 +++++++-----
 3 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/c_api/docs/taichi/taichi_core.h.md b/c_api/docs/taichi/taichi_core.h.md
index f36c4af7eaa71..9c478f91c3728 100644
--- a/c_api/docs/taichi/taichi_core.h.md
+++ b/c_api/docs/taichi/taichi_core.h.md
@@ -167,7 +167,7 @@ Free a memory allocation.
 
 `function.map_memory`
 
-Map an on-device memory to a host-addressible space. The user MUST ensure the device is not being used by any device command before the map. 
+Map an on-device memory to a host-addressible space. The user MUST ensure the device is not being used by any device command before the map.
 
 `function.unmap_memory`
 
diff --git a/docs/c_api/taichi/taichi_core.h.md b/docs/c_api/taichi/taichi_core.h.md
index 16759d8b96f3c..527f00e62547f 100644
--- a/docs/c_api/taichi/taichi_core.h.md
+++ b/docs/c_api/taichi/taichi_core.h.md
@@ -551,4 +551,4 @@ TI_DLL_EXPORT TiComputeGraph TI_API_CALL ti_get_aot_module_compute_graph(
 );
 ```
 
-Get a precompiled compute graph from the AOt module. `definition.null_handle` is returned if the module does not have a kernel of the specified name.
\ No newline at end of file
+Get a precompiled compute graph from the AOt module. `definition.null_handle` is returned if the module does not have a kernel of the specified name.
diff --git a/misc/generate_c_api_docs.py b/misc/generate_c_api_docs.py
index 6f19770ead5a5..3d21cd1d17745 100644
--- a/misc/generate_c_api_docs.py
+++ b/misc/generate_c_api_docs.py
@@ -1,11 +1,12 @@
+import re
 from collections import defaultdict
 from pathlib import Path
-import re
+
+from generate_c_api import get_declr, get_field, get_type_name
 from taichi_json import (Alias, BitField, BuiltInType, Definition, EntryBase,
                          Enumeration, Field, Function, Handle, Module,
                          Structure, Union)
 
-from generate_c_api import (get_type_name, get_field, get_declr)
 
 def get_title(x: EntryBase):
     ty = type(x)
@@ -63,7 +64,6 @@ def print_module_doc(module: Module, templ):
             continue
         documented_syms[cur_sym] += [line]
 
-
     for x in module.declr_reg:
         declr = module.declr_reg.resolve(x)
 
@@ -94,8 +94,10 @@ def generate_module_header(module):
     if Path(templ_path).exists():
         with open(templ_path) as f:
             templ = f.readlines()
-    else: 
-        print(f"ignored {templ_path} because the documentation template cannot be found")
+    else:
+        print(
+            f"ignored {templ_path} because the documentation template cannot be found"
+        )
         return
 
     print(f"processing module '{module.name}'")

From 6db53ecf12260736edef361f5308246827055f5e Mon Sep 17 00:00:00 2001
From: PENGUINLIONG <admin@penguinliong.moe>
Date: Fri, 12 Aug 2022 10:04:47 +0800
Subject: [PATCH 03/59] Relocate C-API documentations

---
 docs/{c_api/taichi => lang/articles/c-api}/taichi_core.h.md | 0
 misc/generate_c_api_docs.py                                 | 2 +-
 2 files changed, 1 insertion(+), 1 deletion(-)
 rename docs/{c_api/taichi => lang/articles/c-api}/taichi_core.h.md (100%)

diff --git a/docs/c_api/taichi/taichi_core.h.md b/docs/lang/articles/c-api/taichi_core.h.md
similarity index 100%
rename from docs/c_api/taichi/taichi_core.h.md
rename to docs/lang/articles/c-api/taichi_core.h.md
diff --git a/misc/generate_c_api_docs.py b/misc/generate_c_api_docs.py
index 3d21cd1d17745..1681037367296 100644
--- a/misc/generate_c_api_docs.py
+++ b/misc/generate_c_api_docs.py
@@ -101,7 +101,7 @@ def generate_module_header(module):
         return
 
     print(f"processing module '{module.name}'")
-    path = f"docs/c_api/{module.name}.md"
+    path = f"docs/lang/articles/c-api/{module.name}.md"
     with open(path, "w") as f:
         f.write(print_module_doc(module, templ))
 

From 0ca47494c7d0c75a2dbd4099bae882e308ea9907 Mon Sep 17 00:00:00 2001
From: PENGUINLIONG <admin@penguinliong.moe>
Date: Fri, 12 Aug 2022 10:31:02 +0800
Subject: [PATCH 04/59] Added category json

---
 docs/lang/articles/c-api/_category_.json | 4 ++++
 1 file changed, 4 insertions(+)
 create mode 100644 docs/lang/articles/c-api/_category_.json

diff --git a/docs/lang/articles/c-api/_category_.json b/docs/lang/articles/c-api/_category_.json
new file mode 100644
index 0000000000000..9e854c7688fd4
--- /dev/null
+++ b/docs/lang/articles/c-api/_category_.json
@@ -0,0 +1,4 @@
+{
+  "label": "Taichi Runtime C-API",
+  "position": 17
+}

From 2034accc37b53073283001e71e636e8fdbf6ae69 Mon Sep 17 00:00:00 2001
From: PENGUINLIONG <admin@penguinliong.moe>
Date: Fri, 12 Aug 2022 10:31:54 +0800
Subject: [PATCH 05/59] Rearrange

---
 docs/lang/articles/c-api/taichi_core.h.md | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/docs/lang/articles/c-api/taichi_core.h.md b/docs/lang/articles/c-api/taichi_core.h.md
index 527f00e62547f..34d9c08954fd7 100644
--- a/docs/lang/articles/c-api/taichi_core.h.md
+++ b/docs/lang/articles/c-api/taichi_core.h.md
@@ -1,4 +1,8 @@
-# Taichi C-API: Core Functionality
+---
+sidebar_position: 1
+---
+
+# Core Functionality
 
 Taichi Core exposes all necessary interfaces to offload AOT modules to Taichi. Here lists the features universally available disregards to any specific backend. The Taichi Core APIs are guaranteed to be forward compatible.
 

From 30664c3abda503b1701442229670217f4f3d32e4 Mon Sep 17 00:00:00 2001
From: PENGUINLIONG <admin@penguinliong.moe>
Date: Fri, 12 Aug 2022 14:07:43 +0800
Subject: [PATCH 06/59] Enriched documentation

---
 c_api/docs/taichi/taichi_core.h.md            | 190 +++++++++++++++---
 .../{taichi_core.h.md => taichi_core.md}      | 188 ++++++++++++++++-
 misc/generate_c_api_docs.py                   |   9 +-
 3 files changed, 347 insertions(+), 40 deletions(-)
 rename docs/lang/articles/c-api/{taichi_core.h.md => taichi_core.md} (62%)

diff --git a/c_api/docs/taichi/taichi_core.h.md b/c_api/docs/taichi/taichi_core.h.md
index 9c478f91c3728..e9a4d6f59c8e9 100644
--- a/c_api/docs/taichi/taichi_core.h.md
+++ b/c_api/docs/taichi/taichi_core.h.md
@@ -1,63 +1,189 @@
-# Taichi C-API: Core Functionalities
+---
+sidebar_position: 1
+---
+
+# Core Functionalities
 
 Taichi Core exposes all necessary interfaces to offload AOT modules to Taichi. Here lists the features universally available disregards to any specific backend. The Taichi Core APIs are guaranteed to be forward compatible.
 
-## Definitions
+## Availability
+
+Taichi C-API has bridged the following backends:
+
+|Backend|Offload Target|Maintenance Tier|
+|-|-|-|
+|Vulkan|GPU|Tier 1|
+|CUDA|GPU (NVIDIA)|Tier 1|
+|CPU (LLVM)|CPU|Tier 1|
+|DirectX 11|GPU (Windows)|N/A|
+|Metal|GPU (macOS, iOS)|N/A|
+|OpenGL|GPU|N/A|
+
+The backends with tier 1 support are the most intensively developed and tested ones. In contrast, you would expect a delay in fixes against minor issues on tier 2 backends. The backends currently unsupported might become supported.
+
+Among all the tier 1 backends, Vulkan has the most outstanding cross-platform compatibility, so most of the new features will be first available on Vulkan.
+
+For convenience, in the following text (and other C-API documentations), the term *host* refers to the user of the C-API; the term *device* refers to the logical (conceptual) compute device that Taichi Runtime offloads its compute tasks to. A *device* might not be an actual discrete processor away from the CPU and the *host* MAY NOT be able to access the memory allocated on the *device*.
+
+Unless explicitly explained, *device*, *backend*, *offload targer* and *GPU* can be used interchangeably; *host*, "user code", "user procedure" and "CPU" can too be used interchangeably.
+
+## How to...
+
+In this section we give an brief introduction about what you might want to do with the Taichi C-API.
+
+### Create and destroy a Runtime Instance
+
+To work with Taichi, you first create an runtime instance. You SHOULD only create a single runtime per thread. Currently we don't officially claim that multiple runtime instances can coexist in a process, please feel free to [report issues](https://github.com/taichi-dev/taichi/issues) if you encountered any problem with such usage.
+
+```cpp
+TiRuntime runtime = ti_create_runtime(TI_ARCH_VULKAN);
+```
+
+When your program reaches the end, you SHOULD destroy the runtime instance. Please ensure any other related resources have been destroyed before the `handle.runtime` itself.
+
+```cpp
+ti_destroy_runtime(runtime);
+```
+
+### Allocate and Free Device-Only Memory
+
+Allocate a piece of memory that is only visible to the device. On GPU backends, it usually means that the memory is located in the graphics memory (GRAM).
+
+```cpp
+TiMemoryAllocateInfo mai {};
+mai.size = 1024; // Size in bytes.
+mai.usage = TI_MEMORY_USAGE_STORAGE_BIT;
+TiMemory memory = ti_allocate_memory(runtime, &mai);
+```
+
+**NOTE** You don't need to allocate memory for field allocations. They are automatically allocated when the AOT module is loaded.
 
-To guarantee a uniform behavior on any platform, we make the following definitions as reference.
+You MAY free allocated memory explicitly; but memory allocations will be automatically freed when the related `handle.runtime` is destroyed.
 
-```c
-//~alias.bool
-//~definition.false
-//~definition.true
+```cpp
+ti_free_memory(runtime, memory);
 ```
 
-A boolean value is represented by an unsigned 32-bit integer. 1 represents a `true` state and 0 represents a `false` state.
+### Allocate Host-Accessible Memory
+
+To allow data to be streamed into the memory, `host_write` MUST be set true.
+
+```cpp
+TiMemoryAllocateInfo mai {};
+mai.size = 1024; // Size in bytes.
+mai.host_write = true;
+mai.usage = TI_MEMORY_USAGE_STORAGE_BIT;
+TiMemory steaming_memory = ti_allocate_memory(runtime, &mai);
+
+// ...
 
-```c
-//~alias.flags
+std::vector<uint8_t> src = some_random_data_source();
+
+void* dst = ti_map_memory(runtime, steaming_memory);
+std::memcpy(dst, src.data(), src.size());
+ti_unmap_memory(runtime, streaming_memory);
 ```
 
-A bit-field of flags is represented by an unsigned 32-bit integer.
+To read data back to the host, `host_read` MUST be set true.
+
+```cpp
+TiMemoryAllocateInfo mai {};
+mai.size = 1024; // Size in bytes.
+mai.host_write = true;
+mai.usage = TI_MEMORY_USAGE_STORAGE_BIT;
+TiMemory read_back_memory = ti_allocate_memory(runtime, &mai);
+
+// ...
 
-```c
-//~definition.null_handle
+std::vector<uint8_t> dst(1024);
+void* src = ti_map_memory(runtime, read_back_memory);
+std::memcpy(dst.data(), src, dst.size());
+ti_unmap_memory(runtime, read_back_memory);
+
+ti_free_memory(runtime, read_back_memory);
 ```
 
-A handle is an unsigned 64-bit interger. And a null handle is a handle of zero value.
+**NOTE** `host_read` and `host_write` can be set true simultaneously. But please note that host-accessible allocations MAY slow down computation on a GPU because the limited bus bandwidth between the host memory and the device.
 
-## Runtime
+### Load and destroy a Taichi AOT Module
 
-A runtime is an instance of Taichi targeting an offload devices.
+You can load a Taichi AOT module from the filesystem.
 
-```c
-//~handle.runtime
+```cpp
+TiAotModule aot_module = ti_load_aot_module(runtime, "/path/to/aot/module");
 ```
 
-A runtime needs to be created with the `enumeration.arch` of the demanded backend device.
+`/path/to/aot/module` should point to the directory that contains a `metadata.tcb`.
+
+You can destroy an unused AOT module if you have done with it; but please ensure there is no kernel or compute graph related to it pending to `function.submit`.
 
-```c
-//~enumeration.arch
-//~function.create_runtime
+```cpp
+ti_destroy_aot_module(aot_module);
 ```
 
-## AOT Module
+### Launch Kernels and Compute Graphs
 
-An AOT module is a pre-compiled collection of compute graphs and kernels.
+You can extract kernels and compute graphs from an AOT module. Kernel and compute graphs are a part of the module, so you don't have to destroy them.
 
-```c
-//~handle.aot_module
+```cpp
+TiKernel kernel = ti_get_aot_module_kernel(aot_module, "foo");
+TiComputeGraph compute_graph = ti_get_aot_module_compute_graph(aot_module, "bar");
 ```
 
-AOT modules can be loaded from the file system directly.
+You can launch a kernel with positional arguments. Please ensure the types, the sizes and the order matches the source code in Python.  
 
-```c
-//~function.load_aot_module
+```cpp
+TiNdArray ndarray{};
+ndarray.memory = get_some_memory();
+ndarray.shape.dim_count = 1;
+ndarray.shape.dims[0] = 16;
+ndarray.elem_shape.dim_count = 2;
+ndarray.elem_shape.dims[0] = 4;
+ndarray.elem_shape.dims[1] = 4;
+ndarray.elem_type = TI_DATA_TYPE_F32;
+
+std::array<TiArgument, 3> args{};
+
+TiArgument& arg0 = args[0];
+arg0.type = TI_ARGUMENT_TYPE_I32;
+arg0.value.i32 = 123;
+
+TiArgument& arg1 = args[1];
+arg1.type = TI_ARGUMENT_TYPE_F32;
+arg1.value.f32 = 123.0f;
+
+TiArgument& arg2 = args[2];
+arg1.type = TI_ARGUMENT_TYPE_NDARRAY;
+arg1.value.ndarray = ndarray;
+
+ti_launch_kernel(runtime, kernel, args.size(), args.data());
 ```
 
-## Device Commands
+You can launch a compute graph in a similar way. But additionally please ensure the argument names matches those in the Python source.
+
+```cpp
+std::array<TiNamedArgument, 3> named_args{};
+TiNamedArgument& named_arg0 = named_args[0];
+named_arg0.name = "foo";
+named_arg0.argument = args[0];
+TiNamedArgument& named_arg1 = named_args[1];
+named_arg1.name = "bar";
+named_arg1.argument = args[1];
+TiNamedArgument& named_arg2 = named_args[2];
+named_arg2.name = "baz";
+named_arg2.argument = args[2];
+
+ti_launch_compute_graph(runtime, compute_graph, named_args.size(), named_args.data());
+```
+
+When you have launched all kernels and compute graphs for this batch, you should `function.submit` and `function.wait` for the execution to finish.
+
+```cpp
+ti_submit(runtime);
+ti_wait(runtime);
+```
 
-Device commands are interfaces that logical device
+**WARNING** This part is subject to change. We're gonna introduce multi-queue in the future.
 
 ## Declarations
 
@@ -83,7 +209,7 @@ A sentinal invalid handle that will never be produced from a valid call to Taich
 
 `handle.runtime`
 
-Taichi runtime represents an instance of a logical computating device and its internal dynamic states. The user is responsible to synchronize any use of `handle.runtime`.
+Taichi runtime represents an instance of a logical computating device and its internal dynamic states. The user is responsible to synchronize any use of `handle.runtime`. The user MUST NOT manipulate multiple `handle.runtime`s in a same thread.
 
 `handle.aot_module`
 
diff --git a/docs/lang/articles/c-api/taichi_core.h.md b/docs/lang/articles/c-api/taichi_core.md
similarity index 62%
rename from docs/lang/articles/c-api/taichi_core.h.md
rename to docs/lang/articles/c-api/taichi_core.md
index 34d9c08954fd7..b4661eb1f41b5 100644
--- a/docs/lang/articles/c-api/taichi_core.h.md
+++ b/docs/lang/articles/c-api/taichi_core.md
@@ -2,15 +2,191 @@
 sidebar_position: 1
 ---
 
-# Core Functionality
+# Core Functionalities
 
 Taichi Core exposes all necessary interfaces to offload AOT modules to Taichi. Here lists the features universally available disregards to any specific backend. The Taichi Core APIs are guaranteed to be forward compatible.
 
-TODO: (@PENGUINLIONG) Example usage.
+## Availability
 
-## Declarations
+Taichi C-API has bridged the following backends:
+
+|Backend|Offload Target|Maintenance Tier|
+|-|-|-|
+|Vulkan|GPU|Tier 1|
+|CUDA|GPU (NVIDIA)|Tier 1|
+|CPU (LLVM)|CPU|Tier 1|
+|DirectX 11|GPU (Windows)|N/A|
+|Metal|GPU (macOS, iOS)|N/A|
+|OpenGL|GPU|N/A|
+
+The backends with tier 1 support are the most intensively developed and tested ones. In contrast, you would expect a delay in fixes against minor issues on tier 2 backends. The backends currently unsupported might become supported.
+
+Among all the tier 1 backends, Vulkan has the most outstanding cross-platform compatibility, so most of the new features will be first available on Vulkan.
+
+For convenience, in the following text (and other C-API documentations), the term *host* refers to the user of the C-API; the term *device* refers to the logical (conceptual) compute device that Taichi Runtime offloads its compute tasks to. A *device* might not be an actual discrete processor away from the CPU and the *host* MAY NOT be able to access the memory allocated on the *device*.
+
+Unless explicitly explained, *device*, *backend*, *offload targer* and *GPU* can be used interchangeably; *host*, "user code", "user procedure" and "CPU" can too be used interchangeably.
+
+## How to...
+
+In this section we give an brief introduction about what you might want to do with the Taichi C-API.
+
+### Create and destroy a Runtime Instance
+
+To work with Taichi, you first create an runtime instance. You SHOULD only create a single runtime per thread. Currently we don't officially claim that multiple runtime instances can coexist in a process, please feel free to [report issues](https://github.com/taichi-dev/taichi/issues) if you encountered any problem with such usage.
+
+```cpp
+TiRuntime runtime = ti_create_runtime(TI_ARCH_VULKAN);
+```
+
+When your program reaches the end, you SHOULD destroy the runtime instance. Please ensure any other related resources have been destroyed before the `handle.runtime` itself.
+
+```cpp
+ti_destroy_runtime(runtime);
+```
+
+### Allocate and Free Device-Only Memory
+
+Allocate a piece of memory that is only visible to the device. On GPU backends, it usually means that the memory is located in the graphics memory (GRAM).
+
+```cpp
+TiMemoryAllocateInfo mai {};
+mai.size = 1024; // Size in bytes.
+mai.usage = TI_MEMORY_USAGE_STORAGE_BIT;
+TiMemory memory = ti_allocate_memory(runtime, &mai);
+```
+
+**NOTE** You don't need to allocate memory for field allocations. They are automatically allocated when the AOT module is loaded.
+
+You MAY free allocated memory explicitly; but memory allocations will be automatically freed when the related `handle.runtime` is destroyed.
+
+```cpp
+ti_free_memory(runtime, memory);
+```
+
+### Allocate Host-Accessible Memory
+
+To allow data to be streamed into the memory, `host_write` MUST be set true.
+
+```cpp
+TiMemoryAllocateInfo mai {};
+mai.size = 1024; // Size in bytes.
+mai.host_write = true;
+mai.usage = TI_MEMORY_USAGE_STORAGE_BIT;
+TiMemory steaming_memory = ti_allocate_memory(runtime, &mai);
+
+// ...
+
+std::vector<uint8_t> src = some_random_data_source();
+
+void* dst = ti_map_memory(runtime, steaming_memory);
+std::memcpy(dst, src.data(), src.size());
+ti_unmap_memory(runtime, streaming_memory);
+```
+
+To read data back to the host, `host_read` MUST be set true.
+
+```cpp
+TiMemoryAllocateInfo mai {};
+mai.size = 1024; // Size in bytes.
+mai.host_write = true;
+mai.usage = TI_MEMORY_USAGE_STORAGE_BIT;
+TiMemory read_back_memory = ti_allocate_memory(runtime, &mai);
+
+// ...
+
+std::vector<uint8_t> dst(1024);
+void* src = ti_map_memory(runtime, read_back_memory);
+std::memcpy(dst.data(), src, dst.size());
+ti_unmap_memory(runtime, read_back_memory);
+
+ti_free_memory(runtime, read_back_memory);
+```
+
+**NOTE** `host_read` and `host_write` can be set true simultaneously. But please note that host-accessible allocations MAY slow down computation on a GPU because the limited bus bandwidth between the host memory and the device.
+
+### Load and destroy a Taichi AOT Module
+
+You can load a Taichi AOT module from the filesystem.
+
+```cpp
+TiAotModule aot_module = ti_load_aot_module(runtime, "/path/to/aot/module");
+```
+
+`/path/to/aot/module` should point to the directory that contains a `metadata.tcb`.
+
+You can destroy an unused AOT module if you have done with it; but please ensure there is no kernel or compute graph related to it pending to `function.submit`.
+
+```cpp
+ti_destroy_aot_module(aot_module);
+```
+
+### Launch Kernels and Compute Graphs
+
+You can extract kernels and compute graphs from an AOT module. Kernel and compute graphs are a part of the module, so you don't have to destroy them.
+
+```cpp
+TiKernel kernel = ti_get_aot_module_kernel(aot_module, "foo");
+TiComputeGraph compute_graph = ti_get_aot_module_compute_graph(aot_module, "bar");
+```
+
+You can launch a kernel with positional arguments. Please ensure the types, the sizes and the order matches the source code in Python.
+
+```cpp
+TiNdArray ndarray{};
+ndarray.memory = get_some_memory();
+ndarray.shape.dim_count = 1;
+ndarray.shape.dims[0] = 16;
+ndarray.elem_shape.dim_count = 2;
+ndarray.elem_shape.dims[0] = 4;
+ndarray.elem_shape.dims[1] = 4;
+ndarray.elem_type = TI_DATA_TYPE_F32;
+
+std::array<TiArgument, 3> args{};
+
+TiArgument& arg0 = args[0];
+arg0.type = TI_ARGUMENT_TYPE_I32;
+arg0.value.i32 = 123;
+
+TiArgument& arg1 = args[1];
+arg1.type = TI_ARGUMENT_TYPE_F32;
+arg1.value.f32 = 123.0f;
+
+TiArgument& arg2 = args[2];
+arg1.type = TI_ARGUMENT_TYPE_NDARRAY;
+arg1.value.ndarray = ndarray;
+
+ti_launch_kernel(runtime, kernel, args.size(), args.data());
+```
+
+You can launch a compute graph in a similar way. But additionally please ensure the argument names matches those in the Python source.
+
+```cpp
+std::array<TiNamedArgument, 3> named_args{};
+TiNamedArgument& named_arg0 = named_args[0];
+named_arg0.name = "foo";
+named_arg0.argument = args[0];
+TiNamedArgument& named_arg1 = named_args[1];
+named_arg1.name = "bar";
+named_arg1.argument = args[1];
+TiNamedArgument& named_arg2 = named_args[2];
+named_arg2.name = "baz";
+named_arg2.argument = args[2];
+
+ti_launch_compute_graph(runtime, compute_graph, named_args.size(), named_args.data());
+```
+
+When you have launched all kernels and compute graphs for this batch, you should `function.submit` and `function.wait` for the execution to finish.
+
+```cpp
+ti_submit(runtime);
+ti_wait(runtime);
+```
+
+**WARNING** This part is subject to change. We're gonna introduce multi-queue in the future.
+
+## API Reference
 
----
 ### Alias `TiBool`
 
 ```c
@@ -68,7 +244,7 @@ A sentinal invalid handle that will never be produced from a valid call to Taich
 typedef struct TiRuntime_t* TiRuntime;
 ```
 
-Taichi runtime represents an instance of a logical computating device and its internal dynamic states. The user is responsible to synchronize any use of `handle.runtime`.
+Taichi runtime represents an instance of a logical computating device and its internal dynamic states. The user is responsible to synchronize any use of `handle.runtime`. The user MUST NOT manipulate multiple `handle.runtime`s in a same thread.
 
 ---
 ### Handle `TiAotModule`
@@ -555,4 +731,4 @@ TI_DLL_EXPORT TiComputeGraph TI_API_CALL ti_get_aot_module_compute_graph(
 );
 ```
 
-Get a precompiled compute graph from the AOt module. `definition.null_handle` is returned if the module does not have a kernel of the specified name.
+Get a precompiled compute graph from the AOt module. `definition.null_handle` is returned if the module does not have a kernel of the specified name.
\ No newline at end of file
diff --git a/misc/generate_c_api_docs.py b/misc/generate_c_api_docs.py
index 1681037367296..05598a1cc6a42 100644
--- a/misc/generate_c_api_docs.py
+++ b/misc/generate_c_api_docs.py
@@ -64,11 +64,16 @@ def print_module_doc(module: Module, templ):
             continue
         documented_syms[cur_sym] += [line]
 
+    is_first = True
     for x in module.declr_reg:
         declr = module.declr_reg.resolve(x)
 
+        if is_first:
+            is_first = False
+        else:
+            out += ["---"]
+
         out += [
-            "---",
             f"### {get_title(declr)}",
             "",
             "```c",
@@ -101,7 +106,7 @@ def generate_module_header(module):
         return
 
     print(f"processing module '{module.name}'")
-    path = f"docs/lang/articles/c-api/{module.name}.md"
+    path = f"docs/lang/articles/c-api/{module.name[7:-2]}.md"
     with open(path, "w") as f:
         f.write(print_module_doc(module, templ))
 

From 76a719f49cb9349ea841c931b047aff695e5cc07 Mon Sep 17 00:00:00 2001
From: "pre-commit-ci[bot]"
 <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Date: Fri, 12 Aug 2022 06:09:03 +0000
Subject: [PATCH 07/59] [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci
---
 c_api/docs/taichi/taichi_core.h.md      | 2 +-
 docs/lang/articles/c-api/taichi_core.md | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/c_api/docs/taichi/taichi_core.h.md b/c_api/docs/taichi/taichi_core.h.md
index e9a4d6f59c8e9..0194736117981 100644
--- a/c_api/docs/taichi/taichi_core.h.md
+++ b/c_api/docs/taichi/taichi_core.h.md
@@ -130,7 +130,7 @@ TiKernel kernel = ti_get_aot_module_kernel(aot_module, "foo");
 TiComputeGraph compute_graph = ti_get_aot_module_compute_graph(aot_module, "bar");
 ```
 
-You can launch a kernel with positional arguments. Please ensure the types, the sizes and the order matches the source code in Python.  
+You can launch a kernel with positional arguments. Please ensure the types, the sizes and the order matches the source code in Python.
 
 ```cpp
 TiNdArray ndarray{};
diff --git a/docs/lang/articles/c-api/taichi_core.md b/docs/lang/articles/c-api/taichi_core.md
index b4661eb1f41b5..8005bcc98f6ea 100644
--- a/docs/lang/articles/c-api/taichi_core.md
+++ b/docs/lang/articles/c-api/taichi_core.md
@@ -731,4 +731,4 @@ TI_DLL_EXPORT TiComputeGraph TI_API_CALL ti_get_aot_module_compute_graph(
 );
 ```
 
-Get a precompiled compute graph from the AOt module. `definition.null_handle` is returned if the module does not have a kernel of the specified name.
\ No newline at end of file
+Get a precompiled compute graph from the AOt module. `definition.null_handle` is returned if the module does not have a kernel of the specified name.

From 4e780856163c6254ff1a0559d2dd2b1d23d23424 Mon Sep 17 00:00:00 2001
From: PENGUINLIONG <admin@penguinliong.moe>
Date: Fri, 12 Aug 2022 14:57:57 +0800
Subject: [PATCH 08/59] Fixed docs a bit

---
 c_api/docs/taichi/taichi_core.h.md      | 10 ++++------
 docs/lang/articles/c-api/taichi_core.md |  8 +++-----
 misc/generate_c_api_docs.py             |  4 +++-
 3 files changed, 10 insertions(+), 12 deletions(-)

diff --git a/c_api/docs/taichi/taichi_core.h.md b/c_api/docs/taichi/taichi_core.h.md
index 0194736117981..afc7147cf8ce7 100644
--- a/c_api/docs/taichi/taichi_core.h.md
+++ b/c_api/docs/taichi/taichi_core.h.md
@@ -19,13 +19,11 @@ Taichi C-API has bridged the following backends:
 |Metal|GPU (macOS, iOS)|N/A|
 |OpenGL|GPU|N/A|
 
-The backends with tier 1 support are the most intensively developed and tested ones. In contrast, you would expect a delay in fixes against minor issues on tier 2 backends. The backends currently unsupported might become supported.
+The backends with tier 1 support are the most intensively developed and tested ones. In contrast, you would expect a delay in fixes against minor issues on tier 2 backends. The backends currently unsupported might become supported. Among all the tier 1 backends, Vulkan has the most outstanding cross-platform compatibility, so most of the new features will be first available on Vulkan.
 
-Among all the tier 1 backends, Vulkan has the most outstanding cross-platform compatibility, so most of the new features will be first available on Vulkan.
+For convenience, in the following text (and other C-API documentations), the term **host** refers to the user of the C-API; the term **device** refers to the logical (conceptual) compute device that Taichi Runtime offloads its compute tasks to. A *device* might not be an actual discrete processor away from the CPU and the *host* MAY NOT be able to access the memory allocated on the *device*.
 
-For convenience, in the following text (and other C-API documentations), the term *host* refers to the user of the C-API; the term *device* refers to the logical (conceptual) compute device that Taichi Runtime offloads its compute tasks to. A *device* might not be an actual discrete processor away from the CPU and the *host* MAY NOT be able to access the memory allocated on the *device*.
-
-Unless explicitly explained, *device*, *backend*, *offload targer* and *GPU* can be used interchangeably; *host*, "user code", "user procedure" and "CPU" can too be used interchangeably.
+Unless explicitly explained, **device**, **backend**, **offload targer** and **GPU** are used interchangeably; **host**, **user code**, **user procedure** and **CPU** are used interchangeably too.
 
 ## How to...
 
@@ -185,7 +183,7 @@ ti_wait(runtime);
 
 **WARNING** This part is subject to change. We're gonna introduce multi-queue in the future.
 
-## Declarations
+## API Reference
 
 `alias.bool`
 
diff --git a/docs/lang/articles/c-api/taichi_core.md b/docs/lang/articles/c-api/taichi_core.md
index 8005bcc98f6ea..1bcd9141c7e47 100644
--- a/docs/lang/articles/c-api/taichi_core.md
+++ b/docs/lang/articles/c-api/taichi_core.md
@@ -19,13 +19,11 @@ Taichi C-API has bridged the following backends:
 |Metal|GPU (macOS, iOS)|N/A|
 |OpenGL|GPU|N/A|
 
-The backends with tier 1 support are the most intensively developed and tested ones. In contrast, you would expect a delay in fixes against minor issues on tier 2 backends. The backends currently unsupported might become supported.
+The backends with tier 1 support are the most intensively developed and tested ones. In contrast, you would expect a delay in fixes against minor issues on tier 2 backends. The backends currently unsupported might become supported. Among all the tier 1 backends, Vulkan has the most outstanding cross-platform compatibility, so most of the new features will be first available on Vulkan.
 
-Among all the tier 1 backends, Vulkan has the most outstanding cross-platform compatibility, so most of the new features will be first available on Vulkan.
+For convenience, in the following text (and other C-API documentations), the term **host** refers to the user of the C-API; the term **device** refers to the logical (conceptual) compute device that Taichi Runtime offloads its compute tasks to. A *device* might not be an actual discrete processor away from the CPU and the *host* MAY NOT be able to access the memory allocated on the *device*.
 
-For convenience, in the following text (and other C-API documentations), the term *host* refers to the user of the C-API; the term *device* refers to the logical (conceptual) compute device that Taichi Runtime offloads its compute tasks to. A *device* might not be an actual discrete processor away from the CPU and the *host* MAY NOT be able to access the memory allocated on the *device*.
-
-Unless explicitly explained, *device*, *backend*, *offload targer* and *GPU* can be used interchangeably; *host*, "user code", "user procedure" and "CPU" can too be used interchangeably.
+Unless explicitly explained, **device**, **backend**, **offload targer** and **GPU** are used interchangeably; **host**, **user code**, **user procedure** and **CPU** are used interchangeably too.
 
 ## How to...
 
diff --git a/misc/generate_c_api_docs.py b/misc/generate_c_api_docs.py
index 05598a1cc6a42..3fb711494e05d 100644
--- a/misc/generate_c_api_docs.py
+++ b/misc/generate_c_api_docs.py
@@ -50,7 +50,7 @@ def print_module_doc(module: Module, templ):
     for i in range(len(templ)):
         line = templ[i]
         out += [line.strip()]
-        if line.startswith("## Declarations"):
+        if line.startswith("## API Reference"):
             break
 
     out += [""]
@@ -87,6 +87,8 @@ def print_module_doc(module: Module, templ):
         else:
             print(f"WARNING: `{x}` is not documented")
 
+    out += [""]
+
     return '\n'.join(out)
 
 

From cc29789532fc5d34b20efb98f1bbdaa4c777bfdd Mon Sep 17 00:00:00 2001
From: PENGUINLIONG <admin@penguinliong.moe>
Date: Fri, 12 Aug 2022 17:18:25 +0800
Subject: [PATCH 09/59] Inline symbol reference

---
 docs/lang/articles/c-api/taichi_core.md | 10 ++---
 misc/generate_c_api.py                  | 21 +++++++++
 misc/generate_c_api_docs.py             | 57 ++++++++++++-------------
 3 files changed, 53 insertions(+), 35 deletions(-)

diff --git a/docs/lang/articles/c-api/taichi_core.md b/docs/lang/articles/c-api/taichi_core.md
index 1bcd9141c7e47..ea94740c33e54 100644
--- a/docs/lang/articles/c-api/taichi_core.md
+++ b/docs/lang/articles/c-api/taichi_core.md
@@ -37,7 +37,7 @@ To work with Taichi, you first create an runtime instance. You SHOULD only creat
 TiRuntime runtime = ti_create_runtime(TI_ARCH_VULKAN);
 ```
 
-When your program reaches the end, you SHOULD destroy the runtime instance. Please ensure any other related resources have been destroyed before the `handle.runtime` itself.
+When your program reaches the end, you SHOULD destroy the runtime instance. Please ensure any other related resources have been destroyed before the `TiRuntime` itself.
 
 ```cpp
 ti_destroy_runtime(runtime);
@@ -56,7 +56,7 @@ TiMemory memory = ti_allocate_memory(runtime, &mai);
 
 **NOTE** You don't need to allocate memory for field allocations. They are automatically allocated when the AOT module is loaded.
 
-You MAY free allocated memory explicitly; but memory allocations will be automatically freed when the related `handle.runtime` is destroyed.
+You MAY free allocated memory explicitly; but memory allocations will be automatically freed when the related `TiRuntime` is destroyed.
 
 ```cpp
 ti_free_memory(runtime, memory);
@@ -113,7 +113,7 @@ TiAotModule aot_module = ti_load_aot_module(runtime, "/path/to/aot/module");
 
 `/path/to/aot/module` should point to the directory that contains a `metadata.tcb`.
 
-You can destroy an unused AOT module if you have done with it; but please ensure there is no kernel or compute graph related to it pending to `function.submit`.
+You can destroy an unused AOT module if you have done with it; but please ensure there is no kernel or compute graph related to it pending to `ti_submit`.
 
 ```cpp
 ti_destroy_aot_module(aot_module);
@@ -174,7 +174,7 @@ named_arg2.argument = args[2];
 ti_launch_compute_graph(runtime, compute_graph, named_args.size(), named_args.data());
 ```
 
-When you have launched all kernels and compute graphs for this batch, you should `function.submit` and `function.wait` for the execution to finish.
+When you have launched all kernels and compute graphs for this batch, you should `ti_submit` and `ti_wait` for the execution to finish.
 
 ```cpp
 ti_submit(runtime);
@@ -360,7 +360,7 @@ typedef enum TiArgumentType {
 Types of kernel and compute graph argument.
 
 ---
-### Bit Field `TiMemoryUsageFlagBits`
+### BitField `TiMemoryUsageFlagBits`
 
 ```c
 // bit_field.memory_usage
diff --git a/misc/generate_c_api.py b/misc/generate_c_api.py
index fde1138ddc2ea..be3317ffa80cd 100644
--- a/misc/generate_c_api.py
+++ b/misc/generate_c_api.py
@@ -98,6 +98,27 @@ def get_declr(x: EntryBase):
         raise RuntimeError(f"'{x.id}' doesn't need declaration")
 
 
+def get_human_readable_name(x: EntryBase):
+    ty = type(x)
+    if ty is BuiltInType:
+        return ""
+
+    elif ty is Alias:
+        return f"{get_type_name(x)}"
+
+    elif ty is Definition:
+        return f"{x.name.screaming_snake_case}"
+
+    elif isinstance(x, (Handle, Enumeration, BitField, Structure, Union)):
+        return f"{get_type_name(x)}"
+
+    elif ty is Function:
+        return f"{x.name.snake_case}"
+
+    else:
+        raise RuntimeError(f"'{x.id}' doesn't have a human readable name")
+
+
 def print_module_header(module):
     out = ["#pragma once"]
 
diff --git a/misc/generate_c_api_docs.py b/misc/generate_c_api_docs.py
index 3fb711494e05d..59f04139fe74b 100644
--- a/misc/generate_c_api_docs.py
+++ b/misc/generate_c_api_docs.py
@@ -2,54 +2,51 @@
 from collections import defaultdict
 from pathlib import Path
 
-from generate_c_api import get_declr, get_field, get_type_name
+from generate_c_api import get_declr, get_human_readable_name
 from taichi_json import (Alias, BitField, BuiltInType, Definition, EntryBase,
                          Enumeration, Field, Function, Handle, Module,
                          Structure, Union)
 
+SYM_PATTERN = r"\`(\w+\.\w+)\`"
+
 
 def get_title(x: EntryBase):
-    ty = type(x)
-    if ty is BuiltInType:
+    if isinstance(x, BuiltInType):
         return ""
 
-    elif ty is Alias:
-        return f"Alias `{get_type_name(x)}`"
-
-    elif ty is Definition:
-        return f"Definition `{x.name.screaming_snake_case}`"
-
-    elif ty is Handle:
-        return f"Handle `{get_type_name(x)}`"
+    extra = ""
+    if isinstance(x, Function) and x.is_device_command:
+        extra += " (Device Command)"
 
-    elif ty is Enumeration:
-        return f"Enumeration `{get_type_name(x)}`"
-
-    elif ty is BitField:
-        return f"Bit Field `{get_type_name(x)}`"
+    if isinstance(x, (Alias, Definition, Handle, Enumeration, BitField, Structure, Union, Function)):
+        return f"{type(x).__name__} `{get_human_readable_name(x)}`" + extra
+    else:
+        raise RuntimeError(f"'{x.id}' doesn't need title")
 
-    elif ty is Structure:
-        return f"Structure `{get_type_name(x)}`"
 
-    elif ty is Union:
-        return f"Union `{get_type_name(x)}`"
+def resolve_inline_symbols(module: Module, line: str):
+    matches = re.findall(SYM_PATTERN, line)
 
-    elif ty is Function:
-        extra = ""
-        if x.is_device_command:
-            extra += " (Device Command)"
-        return f"Function `{x.name.snake_case}`" + extra
+    replacements = {}
+    for m in matches:
+        sym = str(m)
+        replacements[sym] = module.declr_reg.resolve(sym)
 
-    else:
-        raise RuntimeError(f"'{x.id}' doesn't need title")
+    for old, new in replacements.items():
+        if new is None:
+            print(f"WARNING: Unresolved inline symbol `{old}`")
+        else:
+            line = line.replace(old, get_human_readable_name(new))
+    return line
 
 
 def print_module_doc(module: Module, templ):
     out = []
 
     for i in range(len(templ)):
-        line = templ[i]
-        out += [line.strip()]
+        line = templ[i].strip()
+        line = resolve_inline_symbols(module, line)
+        out += [line]
         if line.startswith("## API Reference"):
             break
 
@@ -59,7 +56,7 @@ def print_module_doc(module: Module, templ):
     documented_syms = defaultdict(list)
     for line in templ[i:]:
         line = line.strip()
-        if re.match(r"\`\w+\.\w+\`", line):
+        if re.match(SYM_PATTERN, line):
             cur_sym = line[1:-1]
             continue
         documented_syms[cur_sym] += [line]

From de78a209609ecfc9c81376999421e6c0681c4e90 Mon Sep 17 00:00:00 2001
From: PENGUINLIONG <admin@penguinliong.moe>
Date: Fri, 12 Aug 2022 17:22:23 +0800
Subject: [PATCH 10/59] Addressed review comment

---
 c_api/docs/taichi/taichi_core.h.md | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/c_api/docs/taichi/taichi_core.h.md b/c_api/docs/taichi/taichi_core.h.md
index afc7147cf8ce7..80bdd44b6ac6c 100644
--- a/c_api/docs/taichi/taichi_core.h.md
+++ b/c_api/docs/taichi/taichi_core.h.md
@@ -4,7 +4,7 @@ sidebar_position: 1
 
 # Core Functionalities
 
-Taichi Core exposes all necessary interfaces to offload AOT modules to Taichi. Here lists the features universally available disregards to any specific backend. The Taichi Core APIs are guaranteed to be forward compatible.
+Taichi Core exposes all necessary interfaces to offload AOT modules to Taichi. Here lists the features universally available disregards to any specific backend. These APIs are still in active development so is subject to change.
 
 ## Availability
 
@@ -13,7 +13,7 @@ Taichi C-API has bridged the following backends:
 |Backend|Offload Target|Maintenance Tier|
 |-|-|-|
 |Vulkan|GPU|Tier 1|
-|CUDA|GPU (NVIDIA)|Tier 1|
+|CUDA (LLVM)|GPU (NVIDIA)|Tier 1|
 |CPU (LLVM)|CPU|Tier 1|
 |DirectX 11|GPU (Windows)|N/A|
 |Metal|GPU (macOS, iOS)|N/A|
@@ -54,8 +54,6 @@ mai.usage = TI_MEMORY_USAGE_STORAGE_BIT;
 TiMemory memory = ti_allocate_memory(runtime, &mai);
 ```
 
-**NOTE** You don't need to allocate memory for field allocations. They are automatically allocated when the AOT module is loaded.
-
 You MAY free allocated memory explicitly; but memory allocations will be automatically freed when the related `handle.runtime` is destroyed.
 
 ```cpp

From ea3d029709582e43e9eba4d8f0fe737609cd605b Mon Sep 17 00:00:00 2001
From: PENGUINLIONG <admin@penguinliong.moe>
Date: Fri, 12 Aug 2022 18:31:12 +0800
Subject: [PATCH 11/59] Field documentations

---
 c_api/docs/taichi/taichi_core.h.md      | 15 ++++++
 docs/lang/articles/c-api/taichi_core.md | 35 +++++++++-----
 misc/generate_c_api_docs.py             | 61 ++++++++++++++++++++++---
 3 files changed, 93 insertions(+), 18 deletions(-)

diff --git a/c_api/docs/taichi/taichi_core.h.md b/c_api/docs/taichi/taichi_core.h.md
index 80bdd44b6ac6c..80bff3b1d00cb 100644
--- a/c_api/docs/taichi/taichi_core.h.md
+++ b/c_api/docs/taichi/taichi_core.h.md
@@ -239,14 +239,29 @@ Elementary (primitive) data types.
 
 Types of kernel and compute graph argument.
 
+- `enumeration.argument_type.i32`: Signed 32-bit integer.
+- `enumeration.argument_type.f32`: Signed 32-bit floating-point number.
+- `enumeration.argument_type.ndarray`: ND-array wrapped around a `handle.memory`.
+
 `bit_field.memory_usage`
 
 Usages of a memory allocation.
 
+- `bit_field.memory_usage.storage`: The memory can be read/write accessed by any shader, you usually only need to set this flag.
+- `bit_field.memory_usage.uniform`: The memory can be used as a uniform buffer in graphics pipelines.
+- `bit_field.memory_usage.vertex`: The memory can be used as a vertex buffer in graphics pipelines.
+- `bit_field.memory_usage.index`: The memory can be used as a index buffer in graphics pipelines.
+
 `structure.memory_allocate_info`
 
 Parameters of a newly allocated memory.
 
+- `structure.memory_allocate_info.size`: Size of the allocation in bytes.
+- `structure.memory_allocate_info.host_write`: True if the host needs to write to the allocated memory.
+- `structure.memory_allocate_info.host_read`: True if the host needs to read from the allocated memory.
+- `structure.memory_allocate_info.export_sharing`: True if the memory allocation needs to be exported to other backends (e.g., from Vulkan to CUDA).
+- `structure.memory_allocate_info.usage`: All possible usage of this memory allocation. In most of the cases, `bit_field.memory_usage.storage` is enough.
+
 `structure.memory_slice`
 
 A subsection of a memory allocation.
diff --git a/docs/lang/articles/c-api/taichi_core.md b/docs/lang/articles/c-api/taichi_core.md
index ea94740c33e54..a530703282c02 100644
--- a/docs/lang/articles/c-api/taichi_core.md
+++ b/docs/lang/articles/c-api/taichi_core.md
@@ -4,7 +4,7 @@ sidebar_position: 1
 
 # Core Functionalities
 
-Taichi Core exposes all necessary interfaces to offload AOT modules to Taichi. Here lists the features universally available disregards to any specific backend. The Taichi Core APIs are guaranteed to be forward compatible.
+Taichi Core exposes all necessary interfaces to offload AOT modules to Taichi. Here lists the features universally available disregards to any specific backend. These APIs are still in active development so is subject to change.
 
 ## Availability
 
@@ -13,7 +13,7 @@ Taichi C-API has bridged the following backends:
 |Backend|Offload Target|Maintenance Tier|
 |-|-|-|
 |Vulkan|GPU|Tier 1|
-|CUDA|GPU (NVIDIA)|Tier 1|
+|CUDA (LLVM)|GPU (NVIDIA)|Tier 1|
 |CPU (LLVM)|CPU|Tier 1|
 |DirectX 11|GPU (Windows)|N/A|
 |Metal|GPU (macOS, iOS)|N/A|
@@ -54,8 +54,6 @@ mai.usage = TI_MEMORY_USAGE_STORAGE_BIT;
 TiMemory memory = ti_allocate_memory(runtime, &mai);
 ```
 
-**NOTE** You don't need to allocate memory for field allocations. They are automatically allocated when the AOT module is loaded.
-
 You MAY free allocated memory explicitly; but memory allocations will be automatically freed when the related `TiRuntime` is destroyed.
 
 ```cpp
@@ -192,7 +190,7 @@ ti_wait(runtime);
 typedef uint32_t TiBool;
 ```
 
-A boolean value. Can be either `definition.true` or `definition.false`. Assignment with other values could lead to undefined behavior.
+A boolean value. Can be either `TI_TRUE` or `TI_FALSE`. Assignment with other values could lead to undefined behavior.
 
 ---
 ### Definition `TI_FALSE`
@@ -242,7 +240,7 @@ A sentinal invalid handle that will never be produced from a valid call to Taich
 typedef struct TiRuntime_t* TiRuntime;
 ```
 
-Taichi runtime represents an instance of a logical computating device and its internal dynamic states. The user is responsible to synchronize any use of `handle.runtime`. The user MUST NOT manipulate multiple `handle.runtime`s in a same thread.
+Taichi runtime represents an instance of a logical computating device and its internal dynamic states. The user is responsible to synchronize any use of `TiRuntime`. The user MUST NOT manipulate multiple `TiRuntime`s in a same thread.
 
 ---
 ### Handle `TiAotModule`
@@ -359,6 +357,10 @@ typedef enum TiArgumentType {
 
 Types of kernel and compute graph argument.
 
+- `TI_ARGUMENT_TYPE_I32`: Signed 32-bit integer.
+- `TI_ARGUMENT_TYPE_F32`: Signed 32-bit floating-point number.
+- `TI_ARGUMENT_TYPE_NDARRAY`: ND-array wrapped around a `TiMemory`.
+
 ---
 ### BitField `TiMemoryUsageFlagBits`
 
@@ -375,6 +377,11 @@ typedef TiFlags TiMemoryUsageFlags;
 
 Usages of a memory allocation.
 
+- `TI_MEMORY_USAGE_STORAGE_BIT`: The memory can be read/write accessed by any shader, you usually only need to set this flag.
+- `TI_MEMORY_USAGE_UNIFORM_BIT`: The memory can be used as a uniform buffer in graphics pipelines.
+- `TI_MEMORY_USAGE_VERTEX_BIT`: The memory can be used as a vertex buffer in graphics pipelines.
+- `TI_MEMORY_USAGE_INDEX_BIT`: The memory can be used as a index buffer in graphics pipelines.
+
 ---
 ### Structure `TiMemoryAllocateInfo`
 
@@ -391,6 +398,12 @@ typedef struct TiMemoryAllocateInfo {
 
 Parameters of a newly allocated memory.
 
+- `TiMemoryAllocateInfo.size`: Size of the allocation in bytes.
+- `TiMemoryAllocateInfo.host_write`: True if the host needs to write to the allocated memory.
+- `TiMemoryAllocateInfo.host_read`: True if the host needs to read from the allocated memory.
+- `TiMemoryAllocateInfo.export_sharing`: True if the memory allocation needs to be exported to other backends (e.g., from Vulkan to CUDA).
+- `TiMemoryAllocateInfo.usage`: All possible usage of this memory allocation. In most of the cases, `TI_MEMORY_USAGE_STORAGE_BIT` is enough.
+
 ---
 ### Structure `TiMemorySlice`
 
@@ -483,7 +496,7 @@ TI_DLL_EXPORT TiRuntime TI_API_CALL ti_create_runtime(
 );
 ```
 
-Create a Taichi Runtime with the specified `enumeration.arch`.
+Create a Taichi Runtime with the specified `TiArch`.
 
 ---
 ### Function `ti_destroy_runtime`
@@ -628,7 +641,7 @@ TI_DLL_EXPORT void TI_API_CALL ti_signal_event(
 );
 ```
 
-Set an event primitive to a signaled state, so the queues waiting upon the event can go on execution. If the event has been signaled before, the event MUST be reset with `function.reset_event`; otherwise it is an undefined behavior.
+Set an event primitive to a signaled state, so the queues waiting upon the event can go on execution. If the event has been signaled before, the event MUST be reset with `ti_reset_event`; otherwise it is an undefined behavior.
 
 ---
 ### Function `ti_reset_event` (Device Command)
@@ -691,7 +704,7 @@ TI_DLL_EXPORT TiAotModule TI_API_CALL ti_load_aot_module(
 );
 ```
 
-Load a precompiled AOT module from the filesystem. `definition.null_handle` is returned if the runtime failed to load the AOT module from the given path.
+Load a precompiled AOT module from the filesystem. `TI_NULL_HANDLE` is returned if the runtime failed to load the AOT module from the given path.
 
 ---
 ### Function `ti_destroy_aot_module`
@@ -716,7 +729,7 @@ TI_DLL_EXPORT TiKernel TI_API_CALL ti_get_aot_module_kernel(
 );
 ```
 
-Get a precompiled Taichi kernel from the AOT module. `definition.null_handle` is returned if the module does not have a kernel of the specified name.
+Get a precompiled Taichi kernel from the AOT module. `TI_NULL_HANDLE` is returned if the module does not have a kernel of the specified name.
 
 ---
 ### Function `ti_get_aot_module_compute_graph`
@@ -729,4 +742,4 @@ TI_DLL_EXPORT TiComputeGraph TI_API_CALL ti_get_aot_module_compute_graph(
 );
 ```
 
-Get a precompiled compute graph from the AOt module. `definition.null_handle` is returned if the module does not have a kernel of the specified name.
+Get a precompiled compute graph from the AOt module. `TI_NULL_HANDLE` is returned if the module does not have a kernel of the specified name.
diff --git a/misc/generate_c_api_docs.py b/misc/generate_c_api_docs.py
index 59f04139fe74b..ca7a6b7e02785 100644
--- a/misc/generate_c_api_docs.py
+++ b/misc/generate_c_api_docs.py
@@ -7,7 +7,7 @@
                          Enumeration, Field, Function, Handle, Module,
                          Structure, Union)
 
-SYM_PATTERN = r"\`(\w+\.\w+)\`"
+SYM_PATTERN = r"\`(\w+\.\w+(?:\.\w+)?)\`"
 
 
 def get_title(x: EntryBase):
@@ -24,19 +24,66 @@ def get_title(x: EntryBase):
         raise RuntimeError(f"'{x.id}' doesn't need title")
 
 
-def resolve_inline_symbols(module: Module, line: str):
+def get_human_readable_field_name(x: EntryBase, field_name: str):
+    out = None
+    if isinstance(x, Enumeration):
+        out = x.name.extend(field_name).screaming_snake_case
+    elif isinstance(x, BitField):
+        out = x.name.extend(field_name).extend('bit').screaming_snake_case
+    elif isinstance(x, Structure):
+        for field in x.fields:
+            if field.name.snake_case == field_name:
+                out = f"{x.name.upper_camel_case}.{field.name.snake_case}"
+                break
+    elif isinstance(x, Union):
+        for field in x.variants:
+            if field.name.snake_case == field_name:
+                out = f"{x.name.upper_camel_case}.{field.name.snake_case}"
+                break
+    return out
+
+
+def resolve_symbol_to_name(module: Module, id: str):
+    try:
+        ifirst_dot = id.index('.')
+    except ValueError:
+        return None
+
+    field_name = ""
+    try:
+        isecond_dot = id.index('.', ifirst_dot + 1)
+        field_name = id[isecond_dot + 1:]
+        id = id[:isecond_dot]
+    except ValueError:
+        pass
+
+    out = module.declr_reg.resolve(id)
+
+    try:
+        if field_name:
+            out = get_human_readable_field_name(out, field_name)
+        else:
+            out = get_human_readable_name(out)
+    except:
+        print(f"WARNING: Unable to resolve symbol {id}")
+        out = id
+
+    return out
+
+
+def resolve_inline_symbols_to_names(module: Module, line: str):
     matches = re.findall(SYM_PATTERN, line)
 
     replacements = {}
     for m in matches:
-        sym = str(m)
-        replacements[sym] = module.declr_reg.resolve(sym)
+        id = str(m)
+        replacements[id] = resolve_symbol_to_name(module, id)
 
     for old, new in replacements.items():
         if new is None:
             print(f"WARNING: Unresolved inline symbol `{old}`")
         else:
-            line = line.replace(old, get_human_readable_name(new))
+            line = line.replace(old, new)
     return line
 
 
@@ -45,7 +92,7 @@ def print_module_doc(module: Module, templ):
 
     for i in range(len(templ)):
         line = templ[i].strip()
-        line = resolve_inline_symbols(module, line)
+        line = resolve_inline_symbols_to_names(module, line)
         out += [line]
         if line.startswith("## API Reference"):
             break
@@ -59,7 +106,7 @@ def print_module_doc(module: Module, templ):
         if re.match(SYM_PATTERN, line):
             cur_sym = line[1:-1]
             continue
-        documented_syms[cur_sym] += [line]
+        documented_syms[cur_sym] += [resolve_inline_symbols_to_names(module, line)]
 
     is_first = True
     for x in module.declr_reg:

From 4e4a9290805c7210172e655ba389f1fb641947c8 Mon Sep 17 00:00:00 2001
From: "pre-commit-ci[bot]"
 <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Date: Fri, 12 Aug 2022 09:19:53 +0000
Subject: [PATCH 12/59] [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci
---
 misc/generate_c_api_docs.py | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/misc/generate_c_api_docs.py b/misc/generate_c_api_docs.py
index ca7a6b7e02785..14743a52c36ed 100644
--- a/misc/generate_c_api_docs.py
+++ b/misc/generate_c_api_docs.py
@@ -18,7 +18,8 @@ def get_title(x: EntryBase):
     if isinstance(x, Function) and x.is_device_command:
         extra += " (Device Command)"
 
-    if isinstance(x, (Alias, Definition, Handle, Enumeration, BitField, Structure, Union, Function)):
+    if isinstance(x, (Alias, Definition, Handle, Enumeration, BitField,
+                      Structure, Union, Function)):
         return f"{type(x).__name__} `{get_human_readable_name(x)}`" + extra
     else:
         raise RuntimeError(f"'{x.id}' doesn't need title")

From 67815c5af10f1c9265b1287b5e23d54a00c37903 Mon Sep 17 00:00:00 2001
From: PENGUINLIONG <admin@penguinliong.moe>
Date: Sat, 13 Aug 2022 17:29:22 +0800
Subject: [PATCH 13/59] Improved docs

---
 c_api/docs/taichi/taichi_core.h.md      | 59 +++++++++++++++++----
 docs/lang/articles/c-api/taichi_core.md | 69 ++++++++++++++++++++-----
 misc/generate_c_api_docs.py             |  8 +--
 3 files changed, 109 insertions(+), 27 deletions(-)

diff --git a/c_api/docs/taichi/taichi_core.h.md b/c_api/docs/taichi/taichi_core.h.md
index 80bff3b1d00cb..55bb30e18131e 100644
--- a/c_api/docs/taichi/taichi_core.h.md
+++ b/c_api/docs/taichi/taichi_core.h.md
@@ -99,7 +99,7 @@ ti_unmap_memory(runtime, read_back_memory);
 ti_free_memory(runtime, read_back_memory);
 ```
 
-**NOTE** `host_read` and `host_write` can be set true simultaneously. But please note that host-accessible allocations MAY slow down computation on a GPU because the limited bus bandwidth between the host memory and the device.
+**NOTE** `host_read` and `host_write` can be set true simultaneously. But please note that host-accessible allocations MAY slow down computation on GPU because of the limited bus bandwidth between the host memory and the device.
 
 ### Load and destroy a Taichi AOT Module
 
@@ -197,7 +197,9 @@ A condition or a predicate is not satisfied; a statement is invalid.
 
 `alias.flags`
 
-A bit field that can be used to represent 32 orthogonal flags.
+A bit field that can be used to represent 32 orthogonal flags. Bits unspecified in the corresponding flag enum are ignored.
+
+**NOTE** Enumerations and bit-field flags in the C-API have a `TI_XXX_MAX_ENUM` case to ensure the enum to have a 32-bit range and in-memory size. It has no semantical impact and can be safely ignored.
 
 `definition.null_handle`
 
@@ -205,7 +207,7 @@ A sentinal invalid handle that will never be produced from a valid call to Taich
 
 `handle.runtime`
 
-Taichi runtime represents an instance of a logical computating device and its internal dynamic states. The user is responsible to synchronize any use of `handle.runtime`. The user MUST NOT manipulate multiple `handle.runtime`s in a same thread.
+Taichi runtime represents an instance of a logical backend and its internal dynamic state. The user is responsible to synchronize any use of `handle.runtime`. The user MUST NOT manipulate multiple `handle.runtime`s in a same thread.
 
 `handle.aot_module`
 
@@ -225,15 +227,32 @@ A Taichi kernel that can be launched on device for execution.
 
 `handle.compute_graph`
 
-A collection of Taichi kernels (a compute graph) to be launched on device with predefined order.
+A collection of Taichi kernels (a compute graph) to be launched on device in predefined order.
 
 `enumeration.arch`
 
-Types of logical offload devices.
+Types of backend archs.
+
+- `enumeration.arch.x64`: x64 native CPU backend.
+- `enumeration.arch.arm64`: Arm64 native CPU backend.
+- `enumeration.arch.cuda`: NVIDIA CUDA GPU backend.
+- `enumeration.arch.vulkan`: Vulkan GPU backend.
 
 `enumeration.data_type`
 
-Elementary (primitive) data types.
+Elementary (primitive) data types. There might be vendor-specific constraints on the available data types so it's recommended to use 32-bit data types if multi-platform distribution is desired.
+
+- `enumeration.data_type.f16`: 16-bit IEEE 754 floating-point number.
+- `enumeration.data_type.f32`: 32-bit IEEE 754 floating-point number.
+- `enumeration.data_type.f64`: 64-bit IEEE 754 floating-point number.
+- `enumeration.data_type.i8`: 8-bit one's complement signed integer.
+- `enumeration.data_type.i16`: 16-bit one's complement signed integer.
+- `enumeration.data_type.i32`: 32-bit one's complement signed integer.
+- `enumeration.data_type.i64`: 64-bit one's complement signed integer.
+- `enumeration.data_type.u8`: 8-bit unsigned integer.
+- `enumeration.data_type.u16`: 16-bit unsigned integer.
+- `enumeration.data_type.u32`: 32-bit unsigned integer.
+- `enumeration.data_type.u64`: 64-bit unsigned integer.
 
 `enumeration.argument_type`
 
@@ -264,28 +283,50 @@ Parameters of a newly allocated memory.
 
 `structure.memory_slice`
 
-A subsection of a memory allocation.
+A subsection of a memory allocation. The sum of `structure.memory_slice.offset` and `structure.memory_slice.size` cannot exceed the size of `structure.memory_slice.memory`.
+
+- `structure.memory_slice.memory`: The subsectioned memory allocation.
+- `structure.memory_slice.offset`: Offset from the beginning of the allocation.
+- `structure.memory_slice.size`: Size of the subsection.
 
 `structure.nd_shape`
 
-Multi-dimensional size of an ND-array.
+Multi-dimensional size of an ND-array. Dimension sizes after `structure.nd_shape.dim_count` are ignored.
+
+- `structure.nd_shape.dim_count`: Number of dimensions.
+- `structure.nd_shape.dims`: Dimension sizes.
 
 `structure.nd_array`
 
 Multi-dimentional array of dense primitive data.
 
+- `structure.nd_array.memory`: Memory bound to the ND-array.
+- `structure.nd_array.shape`: Shape of the ND-array.
+- `structure.nd_array.elem_shape`: Shape of the ND-array elements. You usually need to set this if it's a vector or matrix ND-array.
+- `structure.nd_array.elem_type`: Primitive data type of the ND-array elements.
+
 `union.argument_value`
 
 A scalar or structured argument value.
 
+- `union.argument_value.i32`: Value of a 32-bit one's complement signed integer.
+- `union.argument_value.f32`: Value of a 32-bit IEEE 754 floating-poing number.
+- `union.argument_value.ndarray`: An ND-array to be bound.
+
 `structure.argument`
 
 An argument value to feed kernels.
 
+- `structure.argument.type`: Type of the argument.
+- `structure.argument.value`: Value of the argument.
+
 `structure.named_argument`
 
 An named argument value to feed compute graphcs.
 
+- `structure.named_argument.name`: Name of the argument.
+- `structure.named_argument.argument`: Argument body.
+
 `function.create_runtime`
 
 Create a Taichi Runtime with the specified `enumeration.arch`.
@@ -328,7 +369,7 @@ Launch a Taichi kernel with provided arguments. The arguments MUST have the same
 
 `function.launch_compute_graph`
 
-Launch a Taichi kernel with provided named arguments. The named arguments MUST have the same count, names and types as in the source code.
+Launch a Taichi compute graph with provided named arguments. The named arguments MUST have the same count, names and types as in the source code.
 
 `function.signal_event`
 
diff --git a/docs/lang/articles/c-api/taichi_core.md b/docs/lang/articles/c-api/taichi_core.md
index a530703282c02..da3759e500f97 100644
--- a/docs/lang/articles/c-api/taichi_core.md
+++ b/docs/lang/articles/c-api/taichi_core.md
@@ -99,7 +99,7 @@ ti_unmap_memory(runtime, read_back_memory);
 ti_free_memory(runtime, read_back_memory);
 ```
 
-**NOTE** `host_read` and `host_write` can be set true simultaneously. But please note that host-accessible allocations MAY slow down computation on a GPU because the limited bus bandwidth between the host memory and the device.
+**NOTE** `host_read` and `host_write` can be set true simultaneously. But please note that host-accessible allocations MAY slow down computation on GPU because of the limited bus bandwidth between the host memory and the device.
 
 ### Load and destroy a Taichi AOT Module
 
@@ -220,7 +220,9 @@ A condition or a predicate is satisfied; a statement is valid.
 typedef uint32_t TiFlags;
 ```
 
-A bit field that can be used to represent 32 orthogonal flags.
+A bit field that can be used to represent 32 orthogonal flags. Bits unspecified in the corresponding flag enum are ignored.
+
+**NOTE** Enumerations and bit-field flags in the C-API have a `TI_XXX_MAX_ENUM` case to ensure the enum to have a 32-bit range and in-memory size. It has no semantical impact and can be safely ignored.
 
 ---
 ### Definition `TI_NULL_HANDLE`
@@ -240,7 +242,7 @@ A sentinal invalid handle that will never be produced from a valid call to Taich
 typedef struct TiRuntime_t* TiRuntime;
 ```
 
-Taichi runtime represents an instance of a logical computating device and its internal dynamic states. The user is responsible to synchronize any use of `TiRuntime`. The user MUST NOT manipulate multiple `TiRuntime`s in a same thread.
+Taichi runtime represents an instance of a logical backend and its internal dynamic state. The user is responsible to synchronize any use of `TiRuntime`. The user MUST NOT manipulate multiple `TiRuntime`s in a same thread.
 
 ---
 ### Handle `TiAotModule`
@@ -290,7 +292,7 @@ A Taichi kernel that can be launched on device for execution.
 typedef struct TiComputeGraph_t* TiComputeGraph;
 ```
 
-A collection of Taichi kernels (a compute graph) to be launched on device with predefined order.
+A collection of Taichi kernels (a compute graph) to be launched on device in predefined order.
 
 ---
 ### Enumeration `TiArch`
@@ -314,7 +316,12 @@ typedef enum TiArch {
 } TiArch;
 ```
 
-Types of logical offload devices.
+Types of backend archs.
+
+- `TI_ARCH_X64`: x64 native CPU backend.
+- `TI_ARCH_ARM64`: Arm64 native CPU backend.
+- `TI_ARCH_CUDA`: NVIDIA CUDA GPU backend.
+- `TI_ARCH_VULKAN`: Vulkan GPU backend.
 
 ---
 ### Enumeration `TiDataType`
@@ -340,7 +347,19 @@ typedef enum TiDataType {
 } TiDataType;
 ```
 
-Elementary (primitive) data types.
+Elementary (primitive) data types. There might be vendor-specific constraints on the available data types so it's recommended to use 32-bit data types if multi-platform distribution is desired.
+
+- `TI_DATA_TYPE_F16`: 16-bit IEEE 754 floating-point number.
+- `TI_DATA_TYPE_F32`: 32-bit IEEE 754 floating-point number.
+- `TI_DATA_TYPE_F64`: 64-bit IEEE 754 floating-point number.
+- `TI_DATA_TYPE_I8`: 8-bit one's complement signed integer.
+- `TI_DATA_TYPE_I16`: 16-bit one's complement signed integer.
+- `TI_DATA_TYPE_I32`: 32-bit one's complement signed integer.
+- `TI_DATA_TYPE_I64`: 64-bit one's complement signed integer.
+- `TI_DATA_TYPE_U8`: 8-bit unsigned integer.
+- `TI_DATA_TYPE_U16`: 16-bit unsigned integer.
+- `TI_DATA_TYPE_U32`: 32-bit unsigned integer.
+- `TI_DATA_TYPE_U64`: 64-bit unsigned integer.
 
 ---
 ### Enumeration `TiArgumentType`
@@ -398,11 +417,11 @@ typedef struct TiMemoryAllocateInfo {
 
 Parameters of a newly allocated memory.
 
-- `TiMemoryAllocateInfo.size`: Size of the allocation in bytes.
-- `TiMemoryAllocateInfo.host_write`: True if the host needs to write to the allocated memory.
-- `TiMemoryAllocateInfo.host_read`: True if the host needs to read from the allocated memory.
-- `TiMemoryAllocateInfo.export_sharing`: True if the memory allocation needs to be exported to other backends (e.g., from Vulkan to CUDA).
-- `TiMemoryAllocateInfo.usage`: All possible usage of this memory allocation. In most of the cases, `TI_MEMORY_USAGE_STORAGE_BIT` is enough.
+- `size`: Size of the allocation in bytes.
+- `host_write`: True if the host needs to write to the allocated memory.
+- `host_read`: True if the host needs to read from the allocated memory.
+- `export_sharing`: True if the memory allocation needs to be exported to other backends (e.g., from Vulkan to CUDA).
+- `usage`: All possible usage of this memory allocation. In most of the cases, `TI_MEMORY_USAGE_STORAGE_BIT` is enough.
 
 ---
 ### Structure `TiMemorySlice`
@@ -416,7 +435,11 @@ typedef struct TiMemorySlice {
 } TiMemorySlice;
 ```
 
-A subsection of a memory allocation.
+A subsection of a memory allocation. The sum of `offset` and `size` cannot exceed the size of `memory`.
+
+- `memory`: The subsectioned memory allocation.
+- `offset`: Offset from the beginning of the allocation.
+- `size`: Size of the subsection.
 
 ---
 ### Structure `TiNdShape`
@@ -429,7 +452,10 @@ typedef struct TiNdShape {
 } TiNdShape;
 ```
 
-Multi-dimensional size of an ND-array.
+Multi-dimensional size of an ND-array. Dimension sizes after `dim_count` are ignored.
+
+- `dim_count`: Number of dimensions.
+- `dims`: Dimension sizes.
 
 ---
 ### Structure `TiNdArray`
@@ -446,6 +472,11 @@ typedef struct TiNdArray {
 
 Multi-dimentional array of dense primitive data.
 
+- `memory`: Memory bound to the ND-array.
+- `shape`: Shape of the ND-array.
+- `elem_shape`: Shape of the ND-array elements. You usually need to set this if it's a vector or matrix ND-array.
+- `elem_type`: Primitive data type of the ND-array elements.
+
 ---
 ### Union `TiArgumentValue`
 
@@ -460,6 +491,10 @@ typedef union TiArgumentValue {
 
 A scalar or structured argument value.
 
+- `i32`: Value of a 32-bit one's complement signed integer.
+- `f32`: Value of a 32-bit IEEE 754 floating-poing number.
+- `ndarray`: An ND-array to be bound.
+
 ---
 ### Structure `TiArgument`
 
@@ -473,6 +508,9 @@ typedef struct TiArgument {
 
 An argument value to feed kernels.
 
+- `type`: Type of the argument.
+- `value`: Value of the argument.
+
 ---
 ### Structure `TiNamedArgument`
 
@@ -486,6 +524,9 @@ typedef struct TiNamedArgument {
 
 An named argument value to feed compute graphcs.
 
+- `name`: Name of the argument.
+- `argument`: Argument body.
+
 ---
 ### Function `ti_create_runtime`
 
@@ -628,7 +669,7 @@ TI_DLL_EXPORT void TI_API_CALL ti_launch_compute_graph(
 );
 ```
 
-Launch a Taichi kernel with provided named arguments. The named arguments MUST have the same count, names and types as in the source code.
+Launch a Taichi compute graph with provided named arguments. The named arguments MUST have the same count, names and types as in the source code.
 
 ---
 ### Function `ti_signal_event` (Device Command)
diff --git a/misc/generate_c_api_docs.py b/misc/generate_c_api_docs.py
index 14743a52c36ed..fe538e5050747 100644
--- a/misc/generate_c_api_docs.py
+++ b/misc/generate_c_api_docs.py
@@ -33,13 +33,13 @@ def get_human_readable_field_name(x: EntryBase, field_name: str):
         out = x.name.extend(field_name).extend('bit').screaming_snake_case
     elif isinstance(x, Structure):
         for field in x.fields:
-            if field.name.snake_case == field_name:
-                out = f"{x.name.upper_camel_case}.{field.name.snake_case}"
+            if str(field.name) == field_name:
+                out = str(field.name)
                 break
     elif isinstance(x, Union):
         for field in x.variants:
-            if field.name.snake_case == field_name:
-                out = f"{x.name.upper_camel_case}.{field.name.snake_case}"
+            if str(field.name) == field_name:
+                out = str(field.name)
                 break
     return out
 

From 0bba11f330174dfe35c92c8ddc0c2d73831300b8 Mon Sep 17 00:00:00 2001
From: "pre-commit-ci[bot]"
 <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Date: Fri, 12 Aug 2022 10:32:55 +0000
Subject: [PATCH 14/59] [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci
---
 misc/generate_c_api_docs.py | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/misc/generate_c_api_docs.py b/misc/generate_c_api_docs.py
index fe538e5050747..134044f4e5c93 100644
--- a/misc/generate_c_api_docs.py
+++ b/misc/generate_c_api_docs.py
@@ -107,7 +107,9 @@ def print_module_doc(module: Module, templ):
         if re.match(SYM_PATTERN, line):
             cur_sym = line[1:-1]
             continue
-        documented_syms[cur_sym] += [resolve_inline_symbols_to_names(module, line)]
+        documented_syms[cur_sym] += [
+            resolve_inline_symbols_to_names(module, line)
+        ]
 
     is_first = True
     for x in module.declr_reg:

From 34b18e03c348e648672d2fc2b410cf536d785e10 Mon Sep 17 00:00:00 2001
From: PENGUINLIONG <admin@penguinliong.moe>
Date: Sun, 14 Aug 2022 10:53:54 +0800
Subject: [PATCH 15/59] Vulkan C-API docs

---
 c_api/docs/taichi/taichi_core.h.md        |   2 +-
 c_api/docs/taichi/taichi_vulkan.h.md      |  66 +++++++++
 docs/lang/articles/c-api/taichi_core.md   |   2 +-
 docs/lang/articles/c-api/taichi_vulkan.md | 165 ++++++++++++++++++++++
 misc/generate_c_api.py                    |   4 +-
 5 files changed, 235 insertions(+), 4 deletions(-)
 create mode 100644 c_api/docs/taichi/taichi_vulkan.h.md
 create mode 100644 docs/lang/articles/c-api/taichi_vulkan.md

diff --git a/c_api/docs/taichi/taichi_core.h.md b/c_api/docs/taichi/taichi_core.h.md
index 55bb30e18131e..e26dbafeb795a 100644
--- a/c_api/docs/taichi/taichi_core.h.md
+++ b/c_api/docs/taichi/taichi_core.h.md
@@ -2,7 +2,7 @@
 sidebar_position: 1
 ---
 
-# Core Functionalities
+# Core Functionality
 
 Taichi Core exposes all necessary interfaces to offload AOT modules to Taichi. Here lists the features universally available disregards to any specific backend. These APIs are still in active development so is subject to change.
 
diff --git a/c_api/docs/taichi/taichi_vulkan.h.md b/c_api/docs/taichi/taichi_vulkan.h.md
new file mode 100644
index 0000000000000..16744941939f4
--- /dev/null
+++ b/c_api/docs/taichi/taichi_vulkan.h.md
@@ -0,0 +1,66 @@
+---
+sidebar_positions: 2
+---
+
+# Vulkan Backend Features
+
+Taichi's Vulkan API gives you further control over Vulkan version and extension requirements and allows you to interop with external Vulkan applications with shared resources.
+
+## API Reference
+
+`structure.vulkan_runtime_interop_info`
+
+Necessary detail to share a same Vulkan runtime between Taichi and user applications.
+
+- `structure.vulkan_runtime_interop_info.api_version`: Targeted Vulkan API version.
+- `structure.vulkan_runtime_interop_info.instance`: Vulkan instance handle.
+- `structure.vulkan_runtime_interop_info.physical_device`: Vulkan physical device handle.
+- `structure.vulkan_runtime_interop_info.device`: Vulkan logical device handle.
+- `structure.vulkan_runtime_interop_info.compute_queue`: Vulkan queue handle created in the queue family at `structure.vulkan_runtime_interop_info.compute_queue_family_index`.
+- `structure.vulkan_runtime_interop_info.compute_queue_family_index`: Index of a Vulkan queue family with the `VK_QUEUE_COMPUTE_BIT` set.
+- `structure.vulkan_runtime_interop_info.graphics_queue`: Vulkan queue handle created in the queue family at `structure.vulkan_runtime_interop_info.graphics_queue_family_index`.
+- `structure.vulkan_runtime_interop_info.graphics_queue_family_index`: Index of a Vulkan queue family with the `VK_QUEUE_GRAPHICS_BIT` set.
+
+**NOTE** `structure.vulkan_runtime_interop_info.compute_queue` and `structure.vulkan_runtime_interop_info.graphics_queue` can be the same if the queue family have `VK_QUEUE_COMPUTE_BIT` and `VK_QUEUE_GRAPHICS_BIT` set at the same tiem.
+
+`structure.vulkan_memory_interop_info`
+
+Necessary detail to share a same piece of Vulkan buffer between Taichi and user applications.
+
+- `structure.vulkan_memory_interop_info.buffer`: Vulkan buffer.
+- `structure.vulkan_memory_interop_info.size`: Size of the piece of memory in bytes.
+- `structure.vulkan_memory_interop_info.size`: Vulkan buffer usage. You usually want the `VK_BUFFER_USAGE_STORAGE_BUFFER_BIT` set.
+
+`structure.vulkan_event_interop_info`
+
+Necessary detail to share a same Vulkan event synchronization primitive between Taichi and user application.
+
+- `structure.vulkan_event_interop_info.event`: Vulkan event handle.
+
+`function.create_vulkan_runtime`
+
+Create a Vulkan Taichi runtime with user controlled capability settings.
+
+`function.import_vulkan_runtime`
+
+Import the Vulkan runtime owned by Taichi to external user applications.
+
+`function.export_vulkan_runtime`
+
+Export a Vulkan runtime from external user applications to Taichi.
+
+`function.import_vulkan_memory`
+
+Import the Vulkan buffer owned by Taichi to external user applications.
+
+`function.export_vulkan_memory`
+
+Export a Vulkan buffer from external user applications to Taichi.
+
+`function.import_vulkan_event`
+
+Import the Vulkan event owned by Taichi to external user applications.
+
+`function.export_vulkan_event`
+
+Export a Vulkan event from external user applications to Taichi.
diff --git a/docs/lang/articles/c-api/taichi_core.md b/docs/lang/articles/c-api/taichi_core.md
index da3759e500f97..37db74f350399 100644
--- a/docs/lang/articles/c-api/taichi_core.md
+++ b/docs/lang/articles/c-api/taichi_core.md
@@ -2,7 +2,7 @@
 sidebar_position: 1
 ---
 
-# Core Functionalities
+# Core Functionality
 
 Taichi Core exposes all necessary interfaces to offload AOT modules to Taichi. Here lists the features universally available disregards to any specific backend. These APIs are still in active development so is subject to change.
 
diff --git a/docs/lang/articles/c-api/taichi_vulkan.md b/docs/lang/articles/c-api/taichi_vulkan.md
new file mode 100644
index 0000000000000..468b2cc30b499
--- /dev/null
+++ b/docs/lang/articles/c-api/taichi_vulkan.md
@@ -0,0 +1,165 @@
+---
+sidebar_positions: 2
+---
+
+# Vulkan Backend Features
+
+Taichi's Vulkan API gives you further control over Vulkan version and extension requirements and allows you to interop with external Vulkan applications with shared resources.
+
+## API Reference
+
+### Structure `TiVulkanRuntimeInteropInfo`
+
+```c
+// structure.vulkan_runtime_interop_info
+typedef struct TiVulkanRuntimeInteropInfo {
+  uint32_t api_version;
+  VkInstance instance;
+  VkPhysicalDevice physical_device;
+  VkDevice device;
+  VkQueue compute_queue;
+  uint32_t compute_queue_family_index;
+  VkQueue graphics_queue;
+  uint32_t graphics_queue_family_index;
+} TiVulkanRuntimeInteropInfo;
+```
+
+Necessary detail to share a same Vulkan runtime between Taichi and user applications.
+
+- `api_version`: Targeted Vulkan API version.
+- `instance`: Vulkan instance handle.
+- `physical_device`: Vulkan physical device handle.
+- `device`: Vulkan logical device handle.
+- `compute_queue`: Vulkan queue handle created in the queue family at `compute_queue_family_index`.
+- `compute_queue_family_index`: Index of a Vulkan queue family with the `VK_QUEUE_COMPUTE_BIT` set.
+- `graphics_queue`: Vulkan queue handle created in the queue family at `graphics_queue_family_index`.
+- `graphics_queue_family_index`: Index of a Vulkan queue family with the `VK_QUEUE_GRAPHICS_BIT` set.
+
+**NOTE** `compute_queue` and `graphics_queue` can be the same if the queue family have `VK_QUEUE_COMPUTE_BIT` and `VK_QUEUE_GRAPHICS_BIT` set at the same tiem.
+
+---
+### Structure `TiVulkanMemoryInteropInfo`
+
+```c
+// structure.vulkan_memory_interop_info
+typedef struct TiVulkanMemoryInteropInfo {
+  VkBuffer buffer;
+  uint64_t size;
+  VkBufferUsageFlags usage;
+} TiVulkanMemoryInteropInfo;
+```
+
+Necessary detail to share a same piece of Vulkan buffer between Taichi and user applications.
+
+- `buffer`: Vulkan buffer.
+- `size`: Size of the piece of memory in bytes.
+- `size`: Vulkan buffer usage. You usually want the `VK_BUFFER_USAGE_STORAGE_BUFFER_BIT` set.
+
+---
+### Structure `TiVulkanEventInteropInfo`
+
+```c
+// structure.vulkan_event_interop_info
+typedef struct TiVulkanEventInteropInfo {
+  VkEvent event;
+} TiVulkanEventInteropInfo;
+```
+
+Necessary detail to share a same Vulkan event synchronization primitive between Taichi and user application.
+
+- `event`: Vulkan event handle.
+
+---
+### Function `ti_create_vulkan_runtime_ext`
+
+```c
+// function.create_vulkan_runtime
+TI_DLL_EXPORT TiRuntime TI_API_CALL ti_create_vulkan_runtime_ext(
+  uint32_t api_version,
+  uint32_t instance_extension_count,
+  const char** instance_extensions,
+  uint32_t device_extension_count,
+  const char** device_extensions
+);
+```
+
+Create a Vulkan Taichi runtime with user controlled capability settings.
+
+---
+### Function `ti_import_vulkan_runtime`
+
+```c
+// function.import_vulkan_runtime
+TI_DLL_EXPORT TiRuntime TI_API_CALL ti_import_vulkan_runtime(
+  const TiVulkanRuntimeInteropInfo* interop_info
+);
+```
+
+Import the Vulkan runtime owned by Taichi to external user applications.
+
+---
+### Function `ti_export_vulkan_runtime`
+
+```c
+// function.export_vulkan_runtime
+TI_DLL_EXPORT void TI_API_CALL ti_export_vulkan_runtime(
+  TiRuntime runtime,
+  TiVulkanRuntimeInteropInfo* interop_info
+);
+```
+
+Export a Vulkan runtime from external user applications to Taichi.
+
+---
+### Function `ti_import_vulkan_memory`
+
+```c
+// function.import_vulkan_memory
+TI_DLL_EXPORT TiMemory TI_API_CALL ti_import_vulkan_memory(
+  TiRuntime runtime,
+  const TiVulkanMemoryInteropInfo* interop_info
+);
+```
+
+Import the Vulkan buffer owned by Taichi to external user applications.
+
+---
+### Function `ti_export_vulkan_memory`
+
+```c
+// function.export_vulkan_memory
+TI_DLL_EXPORT void TI_API_CALL ti_export_vulkan_memory(
+  TiRuntime runtime,
+  TiMemory memory,
+  TiVulkanMemoryInteropInfo* interop_info
+);
+```
+
+Export a Vulkan buffer from external user applications to Taichi.
+
+---
+### Function `ti_import_vulkan_event`
+
+```c
+// function.import_vulkan_event
+TI_DLL_EXPORT TiEvent TI_API_CALL ti_import_vulkan_event(
+  TiRuntime runtime,
+  const TiVulkanEventInteropInfo* interop_info
+);
+```
+
+Import the Vulkan event owned by Taichi to external user applications.
+
+---
+### Function `ti_export_vulkan_event`
+
+```c
+// function.export_vulkan_event
+TI_DLL_EXPORT void TI_API_CALL ti_export_vulkan_event(
+  TiRuntime runtime,
+  TiEvent event,
+  TiVulkanEventInteropInfo* interop_info
+);
+```
+
+Export a Vulkan event from external user applications to Taichi.
diff --git a/misc/generate_c_api.py b/misc/generate_c_api.py
index be3317ffa80cd..c91e2ee6ac368 100644
--- a/misc/generate_c_api.py
+++ b/misc/generate_c_api.py
@@ -22,11 +22,11 @@ def get_field(x: Field):
     is_dyn_array = x.count and not isinstance(x.count, int)
 
     is_ptr = x.by_ref or x.by_mut or is_dyn_array
-    const_q = "const" if not x.by_mut else ""
+    const_q = "const " if not x.by_mut else ""
     type_name = get_type_name(x.type)
 
     if is_ptr:
-        return f"{const_q} {type_name}* {x.name}"
+        return f"{const_q}{type_name}* {x.name}"
     elif x.count:
         return f"{type_name} {x.name}[{x.count}]"
     else:

From 79eff3c65973904b1a58cd8b0cac3f003860ef81 Mon Sep 17 00:00:00 2001
From: PENGUINLIONG <admin@penguinliong.moe>
Date: Sun, 14 Aug 2022 12:18:59 +0800
Subject: [PATCH 16/59] Minor fixes

---
 c_api/docs/taichi/taichi_core.h.md | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/c_api/docs/taichi/taichi_core.h.md b/c_api/docs/taichi/taichi_core.h.md
index e26dbafeb795a..0955f9b26b256 100644
--- a/c_api/docs/taichi/taichi_core.h.md
+++ b/c_api/docs/taichi/taichi_core.h.md
@@ -43,7 +43,7 @@ When your program reaches the end, you SHOULD destroy the runtime instance. Plea
 ti_destroy_runtime(runtime);
 ```
 
-### Allocate and Free Device-Only Memory
+### Allocate and Free Memory
 
 Allocate a piece of memory that is only visible to the device. On GPU backends, it usually means that the memory is located in the graphics memory (GRAM).
 
@@ -62,6 +62,8 @@ ti_free_memory(runtime, memory);
 
 ### Allocate Host-Accessible Memory
 
+By default, memory allocations are physically or conceptually local to the offload target for performance reasons. You can configure the allocate info to enable host access to memory allocations. But please note that host-accessible allocations MAY slow down computation on GPU because of the limited bus bandwidth between the host memory and the device.
+
 To allow data to be streamed into the memory, `host_write` MUST be set true.
 
 ```cpp
@@ -99,7 +101,7 @@ ti_unmap_memory(runtime, read_back_memory);
 ti_free_memory(runtime, read_back_memory);
 ```
 
-**NOTE** `host_read` and `host_write` can be set true simultaneously. But please note that host-accessible allocations MAY slow down computation on GPU because of the limited bus bandwidth between the host memory and the device.
+**NOTE** `host_read` and `host_write` can be set true simultaneously.
 
 ### Load and destroy a Taichi AOT Module
 

From acc28ba2a1f83ff171f0bfeee4fafdb88b039495 Mon Sep 17 00:00:00 2001
From: PENGUINLIONG <admin@penguinliong.moe>
Date: Sun, 14 Aug 2022 12:35:36 +0800
Subject: [PATCH 17/59] Reference links

---
 docs/lang/articles/c-api/taichi_core.md | 30 +++++++++++++------------
 misc/generate_c_api_docs.py             | 13 ++++++++---
 2 files changed, 26 insertions(+), 17 deletions(-)

diff --git a/docs/lang/articles/c-api/taichi_core.md b/docs/lang/articles/c-api/taichi_core.md
index 37db74f350399..7bc0b710ee55f 100644
--- a/docs/lang/articles/c-api/taichi_core.md
+++ b/docs/lang/articles/c-api/taichi_core.md
@@ -37,13 +37,13 @@ To work with Taichi, you first create an runtime instance. You SHOULD only creat
 TiRuntime runtime = ti_create_runtime(TI_ARCH_VULKAN);
 ```
 
-When your program reaches the end, you SHOULD destroy the runtime instance. Please ensure any other related resources have been destroyed before the `TiRuntime` itself.
+When your program reaches the end, you SHOULD destroy the runtime instance. Please ensure any other related resources have been destroyed before the [`TiRuntime`](#handle-tiruntime) itself.
 
 ```cpp
 ti_destroy_runtime(runtime);
 ```
 
-### Allocate and Free Device-Only Memory
+### Allocate and Free Memory
 
 Allocate a piece of memory that is only visible to the device. On GPU backends, it usually means that the memory is located in the graphics memory (GRAM).
 
@@ -54,7 +54,7 @@ mai.usage = TI_MEMORY_USAGE_STORAGE_BIT;
 TiMemory memory = ti_allocate_memory(runtime, &mai);
 ```
 
-You MAY free allocated memory explicitly; but memory allocations will be automatically freed when the related `TiRuntime` is destroyed.
+You MAY free allocated memory explicitly; but memory allocations will be automatically freed when the related [`TiRuntime`](#handle-tiruntime) is destroyed.
 
 ```cpp
 ti_free_memory(runtime, memory);
@@ -62,6 +62,8 @@ ti_free_memory(runtime, memory);
 
 ### Allocate Host-Accessible Memory
 
+By default, memory allocations are physically or conceptually local to the offload target for performance reasons. You can configure the allocate info to enable host access to memory allocations. But please note that host-accessible allocations MAY slow down computation on GPU because of the limited bus bandwidth between the host memory and the device.
+
 To allow data to be streamed into the memory, `host_write` MUST be set true.
 
 ```cpp
@@ -99,7 +101,7 @@ ti_unmap_memory(runtime, read_back_memory);
 ti_free_memory(runtime, read_back_memory);
 ```
 
-**NOTE** `host_read` and `host_write` can be set true simultaneously. But please note that host-accessible allocations MAY slow down computation on GPU because of the limited bus bandwidth between the host memory and the device.
+**NOTE** `host_read` and `host_write` can be set true simultaneously.
 
 ### Load and destroy a Taichi AOT Module
 
@@ -111,7 +113,7 @@ TiAotModule aot_module = ti_load_aot_module(runtime, "/path/to/aot/module");
 
 `/path/to/aot/module` should point to the directory that contains a `metadata.tcb`.
 
-You can destroy an unused AOT module if you have done with it; but please ensure there is no kernel or compute graph related to it pending to `ti_submit`.
+You can destroy an unused AOT module if you have done with it; but please ensure there is no kernel or compute graph related to it pending to [`ti_submit`](#function-ti_submit).
 
 ```cpp
 ti_destroy_aot_module(aot_module);
@@ -172,7 +174,7 @@ named_arg2.argument = args[2];
 ti_launch_compute_graph(runtime, compute_graph, named_args.size(), named_args.data());
 ```
 
-When you have launched all kernels and compute graphs for this batch, you should `ti_submit` and `ti_wait` for the execution to finish.
+When you have launched all kernels and compute graphs for this batch, you should [`ti_submit`](#function-ti_submit) and [`ti_wait`](#function-ti_wait) for the execution to finish.
 
 ```cpp
 ti_submit(runtime);
@@ -190,7 +192,7 @@ ti_wait(runtime);
 typedef uint32_t TiBool;
 ```
 
-A boolean value. Can be either `TI_TRUE` or `TI_FALSE`. Assignment with other values could lead to undefined behavior.
+A boolean value. Can be either [`TI_TRUE`](#definition-ti_true) or [`TI_FALSE`](#definition-ti_false). Assignment with other values could lead to undefined behavior.
 
 ---
 ### Definition `TI_FALSE`
@@ -242,7 +244,7 @@ A sentinal invalid handle that will never be produced from a valid call to Taich
 typedef struct TiRuntime_t* TiRuntime;
 ```
 
-Taichi runtime represents an instance of a logical backend and its internal dynamic state. The user is responsible to synchronize any use of `TiRuntime`. The user MUST NOT manipulate multiple `TiRuntime`s in a same thread.
+Taichi runtime represents an instance of a logical backend and its internal dynamic state. The user is responsible to synchronize any use of [`TiRuntime`](#handle-tiruntime). The user MUST NOT manipulate multiple [`TiRuntime`](#handle-tiruntime)s in a same thread.
 
 ---
 ### Handle `TiAotModule`
@@ -378,7 +380,7 @@ Types of kernel and compute graph argument.
 
 - `TI_ARGUMENT_TYPE_I32`: Signed 32-bit integer.
 - `TI_ARGUMENT_TYPE_F32`: Signed 32-bit floating-point number.
-- `TI_ARGUMENT_TYPE_NDARRAY`: ND-array wrapped around a `TiMemory`.
+- `TI_ARGUMENT_TYPE_NDARRAY`: ND-array wrapped around a [`TiMemory`](#handle-timemory).
 
 ---
 ### BitField `TiMemoryUsageFlagBits`
@@ -537,7 +539,7 @@ TI_DLL_EXPORT TiRuntime TI_API_CALL ti_create_runtime(
 );
 ```
 
-Create a Taichi Runtime with the specified `TiArch`.
+Create a Taichi Runtime with the specified [`TiArch`](#enumeration-tiarch).
 
 ---
 ### Function `ti_destroy_runtime`
@@ -682,7 +684,7 @@ TI_DLL_EXPORT void TI_API_CALL ti_signal_event(
 );
 ```
 
-Set an event primitive to a signaled state, so the queues waiting upon the event can go on execution. If the event has been signaled before, the event MUST be reset with `ti_reset_event`; otherwise it is an undefined behavior.
+Set an event primitive to a signaled state, so the queues waiting upon the event can go on execution. If the event has been signaled before, the event MUST be reset with [`ti_reset_event`](#function-ti_reset_event-(device-command)); otherwise it is an undefined behavior.
 
 ---
 ### Function `ti_reset_event` (Device Command)
@@ -745,7 +747,7 @@ TI_DLL_EXPORT TiAotModule TI_API_CALL ti_load_aot_module(
 );
 ```
 
-Load a precompiled AOT module from the filesystem. `TI_NULL_HANDLE` is returned if the runtime failed to load the AOT module from the given path.
+Load a precompiled AOT module from the filesystem. [`TI_NULL_HANDLE`](#definition-ti_null_handle) is returned if the runtime failed to load the AOT module from the given path.
 
 ---
 ### Function `ti_destroy_aot_module`
@@ -770,7 +772,7 @@ TI_DLL_EXPORT TiKernel TI_API_CALL ti_get_aot_module_kernel(
 );
 ```
 
-Get a precompiled Taichi kernel from the AOT module. `TI_NULL_HANDLE` is returned if the module does not have a kernel of the specified name.
+Get a precompiled Taichi kernel from the AOT module. [`TI_NULL_HANDLE`](#definition-ti_null_handle) is returned if the module does not have a kernel of the specified name.
 
 ---
 ### Function `ti_get_aot_module_compute_graph`
@@ -783,4 +785,4 @@ TI_DLL_EXPORT TiComputeGraph TI_API_CALL ti_get_aot_module_compute_graph(
 );
 ```
 
-Get a precompiled compute graph from the AOt module. `TI_NULL_HANDLE` is returned if the module does not have a kernel of the specified name.
+Get a precompiled compute graph from the AOt module. [`TI_NULL_HANDLE`](#definition-ti_null_handle) is returned if the module does not have a kernel of the specified name.
diff --git a/misc/generate_c_api_docs.py b/misc/generate_c_api_docs.py
index 134044f4e5c93..b3eecbc42d34a 100644
--- a/misc/generate_c_api_docs.py
+++ b/misc/generate_c_api_docs.py
@@ -45,6 +45,7 @@ def get_human_readable_field_name(x: EntryBase, field_name: str):
 
 
 def resolve_symbol_to_name(module: Module, id: str):
+    """Returns the resolved symbol and its hyperlink (if available)"""
     try:
         ifirst_dot = id.index('.')
     except ValueError:
@@ -59,17 +60,19 @@ def resolve_symbol_to_name(module: Module, id: str):
         pass
 
     out = module.declr_reg.resolve(id)
+    href = None
 
     try:
         if field_name:
             out = get_human_readable_field_name(out, field_name)
         else:
+            href = "#" + get_title(out).lower().replace(' ', '-').replace('`', '')
             out = get_human_readable_name(out)
     except:
         print(f"WARNING: Unable to resolve symbol {id}")
         out = id
 
-    return out
+    return out, href
 
 
 def resolve_inline_symbols_to_names(module: Module, line: str):
@@ -80,11 +83,15 @@ def resolve_inline_symbols_to_names(module: Module, line: str):
         id = str(m)
         replacements[id] = resolve_symbol_to_name(module, id)
 
-    for old, new in replacements.items():
+    for old, (new, href) in replacements.items():
         if new is None:
             print(f"WARNING: Unresolved inline symbol `{old}`")
         else:
-            line = line.replace(old, new)
+            if href is None:
+                new = f"`{new}`"
+            else:
+                new = f"[`{new}`]({href})"
+            line = line.replace(f"`{old}`", new)
     return line
 
 

From 2c002aeb57acca986acf82a650fa99e32bd56248 Mon Sep 17 00:00:00 2001
From: "pre-commit-ci[bot]"
 <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Date: Sun, 14 Aug 2022 04:36:56 +0000
Subject: [PATCH 18/59] [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci
---
 misc/generate_c_api_docs.py | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/misc/generate_c_api_docs.py b/misc/generate_c_api_docs.py
index b3eecbc42d34a..337600c06e3f7 100644
--- a/misc/generate_c_api_docs.py
+++ b/misc/generate_c_api_docs.py
@@ -66,7 +66,8 @@ def resolve_symbol_to_name(module: Module, id: str):
         if field_name:
             out = get_human_readable_field_name(out, field_name)
         else:
-            href = "#" + get_title(out).lower().replace(' ', '-').replace('`', '')
+            href = "#" + get_title(out).lower().replace(' ', '-').replace(
+                '`', '')
             out = get_human_readable_name(out)
     except:
         print(f"WARNING: Unable to resolve symbol {id}")

From 0fdc6d2fbd7a8f0e9602034263fd1b70cf466357 Mon Sep 17 00:00:00 2001
From: PENGUINLIONG <admin@penguinliong.moe>
Date: Sun, 14 Aug 2022 12:40:32 +0800
Subject: [PATCH 19/59] Fixed href to device commands

---
 docs/lang/articles/c-api/taichi_core.md | 2 +-
 misc/generate_c_api_docs.py             | 3 +--
 2 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/docs/lang/articles/c-api/taichi_core.md b/docs/lang/articles/c-api/taichi_core.md
index 7bc0b710ee55f..240903008fe85 100644
--- a/docs/lang/articles/c-api/taichi_core.md
+++ b/docs/lang/articles/c-api/taichi_core.md
@@ -684,7 +684,7 @@ TI_DLL_EXPORT void TI_API_CALL ti_signal_event(
 );
 ```
 
-Set an event primitive to a signaled state, so the queues waiting upon the event can go on execution. If the event has been signaled before, the event MUST be reset with [`ti_reset_event`](#function-ti_reset_event-(device-command)); otherwise it is an undefined behavior.
+Set an event primitive to a signaled state, so the queues waiting upon the event can go on execution. If the event has been signaled before, the event MUST be reset with [`ti_reset_event`](#function-ti_reset_event-device-command); otherwise it is an undefined behavior.
 
 ---
 ### Function `ti_reset_event` (Device Command)
diff --git a/misc/generate_c_api_docs.py b/misc/generate_c_api_docs.py
index 337600c06e3f7..5cf51806f9641 100644
--- a/misc/generate_c_api_docs.py
+++ b/misc/generate_c_api_docs.py
@@ -66,8 +66,7 @@ def resolve_symbol_to_name(module: Module, id: str):
         if field_name:
             out = get_human_readable_field_name(out, field_name)
         else:
-            href = "#" + get_title(out).lower().replace(' ', '-').replace(
-                '`', '')
+            href = "#" + get_title(out).lower().replace(' ', '-').replace('`', '').replace('(', '').replace(')', '')
             out = get_human_readable_name(out)
     except:
         print(f"WARNING: Unable to resolve symbol {id}")

From 197d3d41b52486b4c9697b1a1031708b211a49ce Mon Sep 17 00:00:00 2001
From: "pre-commit-ci[bot]"
 <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Date: Sun, 14 Aug 2022 04:41:55 +0000
Subject: [PATCH 20/59] [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci
---
 misc/generate_c_api_docs.py | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/misc/generate_c_api_docs.py b/misc/generate_c_api_docs.py
index 5cf51806f9641..96ed8688e42ae 100644
--- a/misc/generate_c_api_docs.py
+++ b/misc/generate_c_api_docs.py
@@ -66,7 +66,8 @@ def resolve_symbol_to_name(module: Module, id: str):
         if field_name:
             out = get_human_readable_field_name(out, field_name)
         else:
-            href = "#" + get_title(out).lower().replace(' ', '-').replace('`', '').replace('(', '').replace(')', '')
+            href = "#" + get_title(out).lower().replace(' ', '-').replace(
+                '`', '').replace('(', '').replace(')', '')
             out = get_human_readable_name(out)
     except:
         print(f"WARNING: Unable to resolve symbol {id}")

From 23bcdd70036017a9565aee44f9e2dd85f3ddc31d Mon Sep 17 00:00:00 2001
From: PENGUINLIONG <admin@penguinliong.moe>
Date: Fri, 23 Sep 2022 18:38:27 +0800
Subject: [PATCH 21/59] Update c_api/docs/taichi/taichi_core.h.md

Co-authored-by: Vissidarte-Herman <93570324+Vissidarte-Herman@users.noreply.github.com>
---
 c_api/docs/taichi/taichi_core.h.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/c_api/docs/taichi/taichi_core.h.md b/c_api/docs/taichi/taichi_core.h.md
index 0955f9b26b256..08ea418e86760 100644
--- a/c_api/docs/taichi/taichi_core.h.md
+++ b/c_api/docs/taichi/taichi_core.h.md
@@ -64,7 +64,7 @@ ti_free_memory(runtime, memory);
 
 By default, memory allocations are physically or conceptually local to the offload target for performance reasons. You can configure the allocate info to enable host access to memory allocations. But please note that host-accessible allocations MAY slow down computation on GPU because of the limited bus bandwidth between the host memory and the device.
 
-To allow data to be streamed into the memory, `host_write` MUST be set true.
+You *must* set `host_write` to `true` to allow streaming data to the memory.
 
 ```cpp
 TiMemoryAllocateInfo mai {};

From b95c31cf51736e161ce06a75060db3473deadfe0 Mon Sep 17 00:00:00 2001
From: PENGUINLIONG <admin@penguinliong.moe>
Date: Fri, 23 Sep 2022 18:38:37 +0800
Subject: [PATCH 22/59] Update c_api/docs/taichi/taichi_core.h.md

Co-authored-by: Vissidarte-Herman <93570324+Vissidarte-Herman@users.noreply.github.com>
---
 c_api/docs/taichi/taichi_core.h.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/c_api/docs/taichi/taichi_core.h.md b/c_api/docs/taichi/taichi_core.h.md
index 08ea418e86760..d2be8f686b5dd 100644
--- a/c_api/docs/taichi/taichi_core.h.md
+++ b/c_api/docs/taichi/taichi_core.h.md
@@ -391,7 +391,7 @@ Submit all commands to the logical device for execution. Ensure that any previou
 
 `function.wait`
 
-Wait until all previously invoked device command has finished execution.
+Waits until all previously invoked device commands are executed.
 
 `function.load_aot_module`
 

From fd51672db1b0ae06a4843046d2021af802a2927c Mon Sep 17 00:00:00 2001
From: PENGUINLIONG <admin@penguinliong.moe>
Date: Fri, 23 Sep 2022 18:39:04 +0800
Subject: [PATCH 23/59] Update c_api/docs/taichi/taichi_core.h.md

Co-authored-by: Vissidarte-Herman <93570324+Vissidarte-Herman@users.noreply.github.com>
---
 c_api/docs/taichi/taichi_core.h.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/c_api/docs/taichi/taichi_core.h.md b/c_api/docs/taichi/taichi_core.h.md
index d2be8f686b5dd..7bfe7b71a3677 100644
--- a/c_api/docs/taichi/taichi_core.h.md
+++ b/c_api/docs/taichi/taichi_core.h.md
@@ -399,7 +399,7 @@ Load a precompiled AOT module from the filesystem. `definition.null_handle` is r
 
 `function.destroy_aot_module`
 
-Destroy a loaded AOT module and release all related resources.
+Destroys a loaded AOT module and releases all related resources.
 
 `function.get_aot_module_kernel`
 

From 0c3eeac7d5f6e29d27f945968fa10e2bcca6adc2 Mon Sep 17 00:00:00 2001
From: PENGUINLIONG <admin@penguinliong.moe>
Date: Fri, 23 Sep 2022 18:39:13 +0800
Subject: [PATCH 24/59] Update c_api/docs/taichi/taichi_core.h.md

Co-authored-by: Vissidarte-Herman <93570324+Vissidarte-Herman@users.noreply.github.com>
---
 c_api/docs/taichi/taichi_core.h.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/c_api/docs/taichi/taichi_core.h.md b/c_api/docs/taichi/taichi_core.h.md
index 7bfe7b71a3677..9d15e118ba1c3 100644
--- a/c_api/docs/taichi/taichi_core.h.md
+++ b/c_api/docs/taichi/taichi_core.h.md
@@ -379,7 +379,7 @@ Set an event primitive to a signaled state, so the queues waiting upon the event
 
 `function.reset_event`
 
-Set a signaled event primitive back to an unsignaled state.
+Sets a signaled event primitive back to an unsignaled state.
 
 `function.wait_event`
 

From d63c12d876a23358a4377a864266a53e9ace4ad4 Mon Sep 17 00:00:00 2001
From: PENGUINLIONG <admin@penguinliong.moe>
Date: Fri, 23 Sep 2022 18:39:30 +0800
Subject: [PATCH 25/59] Update c_api/docs/taichi/taichi_core.h.md

Co-authored-by: Vissidarte-Herman <93570324+Vissidarte-Herman@users.noreply.github.com>
---
 c_api/docs/taichi/taichi_core.h.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/c_api/docs/taichi/taichi_core.h.md b/c_api/docs/taichi/taichi_core.h.md
index 9d15e118ba1c3..89d3c01580ee6 100644
--- a/c_api/docs/taichi/taichi_core.h.md
+++ b/c_api/docs/taichi/taichi_core.h.md
@@ -4,7 +4,7 @@ sidebar_position: 1
 
 # Core Functionality
 
-Taichi Core exposes all necessary interfaces to offload AOT modules to Taichi. Here lists the features universally available disregards to any specific backend. These APIs are still in active development so is subject to change.
+Taichi Core exposes all necessary interfaces for offloading the AOT modules to Taichi. The following are a list of features that are available regardless of your backend. The corresponding APIs are still under development and subject to change.
 
 ## Availability
 

From 0724052ef7df4f6b3d8c6a25ecd86fa7fe520179 Mon Sep 17 00:00:00 2001
From: PENGUINLIONG <admin@penguinliong.moe>
Date: Fri, 23 Sep 2022 18:39:49 +0800
Subject: [PATCH 26/59] Update c_api/docs/taichi/taichi_core.h.md

Co-authored-by: Vissidarte-Herman <93570324+Vissidarte-Herman@users.noreply.github.com>
---
 c_api/docs/taichi/taichi_core.h.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/c_api/docs/taichi/taichi_core.h.md b/c_api/docs/taichi/taichi_core.h.md
index 89d3c01580ee6..773ae8118cd41 100644
--- a/c_api/docs/taichi/taichi_core.h.md
+++ b/c_api/docs/taichi/taichi_core.h.md
@@ -45,7 +45,7 @@ ti_destroy_runtime(runtime);
 
 ### Allocate and Free Memory
 
-Allocate a piece of memory that is only visible to the device. On GPU backends, it usually means that the memory is located in the graphics memory (GRAM).
+Allocate a piece of memory that is visible only to the device. On the GPU backends, it usually means that the memory is located in the graphics memory (GRAM).
 
 ```cpp
 TiMemoryAllocateInfo mai {};

From fb423190ad8e28040e80e84176e4416ffdc17f68 Mon Sep 17 00:00:00 2001
From: PENGUINLIONG <admin@penguinliong.moe>
Date: Fri, 23 Sep 2022 18:40:01 +0800
Subject: [PATCH 27/59] Update c_api/docs/taichi/taichi_core.h.md

Co-authored-by: Vissidarte-Herman <93570324+Vissidarte-Herman@users.noreply.github.com>
---
 c_api/docs/taichi/taichi_core.h.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/c_api/docs/taichi/taichi_core.h.md b/c_api/docs/taichi/taichi_core.h.md
index 773ae8118cd41..03277a262a25c 100644
--- a/c_api/docs/taichi/taichi_core.h.md
+++ b/c_api/docs/taichi/taichi_core.h.md
@@ -371,7 +371,7 @@ Launch a Taichi kernel with provided arguments. The arguments MUST have the same
 
 `function.launch_compute_graph`
 
-Launch a Taichi compute graph with provided named arguments. The named arguments MUST have the same count, names and types as in the source code.
+Launches a Taichi compute graph with provided named arguments. The named arguments *must* have the same count, names, and types as in the source code.
 
 `function.signal_event`
 

From ddda967db3e706fff7cd7ba3897704898a8d7384 Mon Sep 17 00:00:00 2001
From: PENGUINLIONG <admin@penguinliong.moe>
Date: Fri, 23 Sep 2022 18:40:09 +0800
Subject: [PATCH 28/59] Update c_api/docs/taichi/taichi_core.h.md

Co-authored-by: Vissidarte-Herman <93570324+Vissidarte-Herman@users.noreply.github.com>
---
 c_api/docs/taichi/taichi_core.h.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/c_api/docs/taichi/taichi_core.h.md b/c_api/docs/taichi/taichi_core.h.md
index 03277a262a25c..25b8fd18b6181 100644
--- a/c_api/docs/taichi/taichi_core.h.md
+++ b/c_api/docs/taichi/taichi_core.h.md
@@ -359,7 +359,7 @@ Create an event primitive.
 
 `function.destroy_event`
 
-Destroy an event primitive.
+Destroys an event primitive.
 
 `function.copy_memory_device_to_device`
 

From 6d4eb6c473e2a7b61146e6d5ac8bda8625c3b1e1 Mon Sep 17 00:00:00 2001
From: PENGUINLIONG <admin@penguinliong.moe>
Date: Fri, 23 Sep 2022 18:40:18 +0800
Subject: [PATCH 29/59] Update c_api/docs/taichi/taichi_core.h.md

Co-authored-by: Vissidarte-Herman <93570324+Vissidarte-Herman@users.noreply.github.com>
---
 c_api/docs/taichi/taichi_core.h.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/c_api/docs/taichi/taichi_core.h.md b/c_api/docs/taichi/taichi_core.h.md
index 25b8fd18b6181..2674c671df2c0 100644
--- a/c_api/docs/taichi/taichi_core.h.md
+++ b/c_api/docs/taichi/taichi_core.h.md
@@ -375,7 +375,7 @@ Launches a Taichi compute graph with provided named arguments. The named argumen
 
 `function.signal_event`
 
-Set an event primitive to a signaled state, so the queues waiting upon the event can go on execution. If the event has been signaled before, the event MUST be reset with `function.reset_event`; otherwise it is an undefined behavior.
+Sets an event primitive to a signaled state so that the queues waiting for it can go on execution. If the event has been signaled, you *must* call `function.reset_event` to reset it; otherwise, an undefined behavior would occur.
 
 `function.reset_event`
 

From 75350376eea9bd48aa889bd88f76135e13cf462e Mon Sep 17 00:00:00 2001
From: PENGUINLIONG <admin@penguinliong.moe>
Date: Fri, 23 Sep 2022 18:41:32 +0800
Subject: [PATCH 30/59] Update c_api/docs/taichi/taichi_core.h.md

Co-authored-by: Vissidarte-Herman <93570324+Vissidarte-Herman@users.noreply.github.com>
---
 c_api/docs/taichi/taichi_core.h.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/c_api/docs/taichi/taichi_core.h.md b/c_api/docs/taichi/taichi_core.h.md
index 2674c671df2c0..c926313f0f9db 100644
--- a/c_api/docs/taichi/taichi_core.h.md
+++ b/c_api/docs/taichi/taichi_core.h.md
@@ -355,7 +355,7 @@ Unmap an on-device memory and make any host-side changes about the memory visibl
 
 `function.create_event`
 
-Create an event primitive.
+Creates an event primitive.
 
 `function.destroy_event`
 

From ab34823e97d29aaac87a22772aa6918f1f3e8626 Mon Sep 17 00:00:00 2001
From: PENGUINLIONG <admin@penguinliong.moe>
Date: Fri, 23 Sep 2022 18:41:41 +0800
Subject: [PATCH 31/59] Update c_api/docs/taichi/taichi_core.h.md

Co-authored-by: Vissidarte-Herman <93570324+Vissidarte-Herman@users.noreply.github.com>
---
 c_api/docs/taichi/taichi_core.h.md | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/c_api/docs/taichi/taichi_core.h.md b/c_api/docs/taichi/taichi_core.h.md
index c926313f0f9db..5b148e5510a34 100644
--- a/c_api/docs/taichi/taichi_core.h.md
+++ b/c_api/docs/taichi/taichi_core.h.md
@@ -403,7 +403,8 @@ Destroys a loaded AOT module and releases all related resources.
 
 `function.get_aot_module_kernel`
 
-Get a precompiled Taichi kernel from the AOT module. `definition.null_handle` is returned if the module does not have a kernel of the specified name.
+Retrieves a pre-compiled Taichi kernel from the AOT module. 
+Returns `definition.null_handle` if the module does not have a kernel of the specified name.
 
 `function.get_aot_module_compute_graph`
 

From 9e3daf88580a55ca99cfcc0fde9d6b78f5644ff3 Mon Sep 17 00:00:00 2001
From: PENGUINLIONG <admin@penguinliong.moe>
Date: Fri, 23 Sep 2022 18:42:16 +0800
Subject: [PATCH 32/59] Update c_api/docs/taichi/taichi_core.h.md

Co-authored-by: Vissidarte-Herman <93570324+Vissidarte-Herman@users.noreply.github.com>
---
 c_api/docs/taichi/taichi_core.h.md | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/c_api/docs/taichi/taichi_core.h.md b/c_api/docs/taichi/taichi_core.h.md
index 5b148e5510a34..c6759b23e5605 100644
--- a/c_api/docs/taichi/taichi_core.h.md
+++ b/c_api/docs/taichi/taichi_core.h.md
@@ -19,7 +19,8 @@ Taichi C-API has bridged the following backends:
 |Metal|GPU (macOS, iOS)|N/A|
 |OpenGL|GPU|N/A|
 
-The backends with tier 1 support are the most intensively developed and tested ones. In contrast, you would expect a delay in fixes against minor issues on tier 2 backends. The backends currently unsupported might become supported. Among all the tier 1 backends, Vulkan has the most outstanding cross-platform compatibility, so most of the new features will be first available on Vulkan.
+The backends with tier-1 support are being developed and tested more intensively. And most new features will be available on Vulkan first, because it has the most outstanding cross-platform compatibility among all the tier-1 backends. 
+For the backends with tier-2 support, you should expect a delay in the fixes to the minor issues. 
 
 For convenience, in the following text (and other C-API documentations), the term **host** refers to the user of the C-API; the term **device** refers to the logical (conceptual) compute device that Taichi Runtime offloads its compute tasks to. A *device* might not be an actual discrete processor away from the CPU and the *host* MAY NOT be able to access the memory allocated on the *device*.
 

From bc66dd51772f183ec254c3cbd17ded5089608685 Mon Sep 17 00:00:00 2001
From: PENGUINLIONG <admin@penguinliong.moe>
Date: Fri, 23 Sep 2022 18:43:03 +0800
Subject: [PATCH 33/59] Update c_api/docs/taichi/taichi_core.h.md

Co-authored-by: Vissidarte-Herman <93570324+Vissidarte-Herman@users.noreply.github.com>
---
 c_api/docs/taichi/taichi_core.h.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/c_api/docs/taichi/taichi_core.h.md b/c_api/docs/taichi/taichi_core.h.md
index c6759b23e5605..b63455adec227 100644
--- a/c_api/docs/taichi/taichi_core.h.md
+++ b/c_api/docs/taichi/taichi_core.h.md
@@ -182,7 +182,7 @@ ti_submit(runtime);
 ti_wait(runtime);
 ```
 
-**WARNING** This part is subject to change. We're gonna introduce multi-queue in the future.
+**WARNING** This part is subject to change. We will introduce multi-queue in the future.
 
 ## API Reference
 

From 881f4a086ca2f33f5f136bdd00a57f270bb9b0bc Mon Sep 17 00:00:00 2001
From: PENGUINLIONG <admin@penguinliong.moe>
Date: Fri, 23 Sep 2022 18:43:14 +0800
Subject: [PATCH 34/59] Update c_api/docs/taichi/taichi_core.h.md

Co-authored-by: Vissidarte-Herman <93570324+Vissidarte-Herman@users.noreply.github.com>
---
 c_api/docs/taichi/taichi_core.h.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/c_api/docs/taichi/taichi_core.h.md b/c_api/docs/taichi/taichi_core.h.md
index b63455adec227..1d67317347a5c 100644
--- a/c_api/docs/taichi/taichi_core.h.md
+++ b/c_api/docs/taichi/taichi_core.h.md
@@ -348,7 +348,7 @@ Free a memory allocation.
 
 `function.map_memory`
 
-Map an on-device memory to a host-addressible space. The user MUST ensure the device is not being used by any device command before the map.
+Maps an on-device memory to a host-addressible space. You *must* ensure that the device is not being used by any device command before the mapping.
 
 `function.unmap_memory`
 

From 662e6c380bb3d2ae96b0421baab3f7125a5a9fdd Mon Sep 17 00:00:00 2001
From: PENGUINLIONG <admin@penguinliong.moe>
Date: Fri, 23 Sep 2022 18:43:59 +0800
Subject: [PATCH 35/59] Update c_api/docs/taichi/taichi_core.h.md

Co-authored-by: Vissidarte-Herman <93570324+Vissidarte-Herman@users.noreply.github.com>
---
 c_api/docs/taichi/taichi_core.h.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/c_api/docs/taichi/taichi_core.h.md b/c_api/docs/taichi/taichi_core.h.md
index 1d67317347a5c..2133a8d840519 100644
--- a/c_api/docs/taichi/taichi_core.h.md
+++ b/c_api/docs/taichi/taichi_core.h.md
@@ -352,7 +352,7 @@ Maps an on-device memory to a host-addressible space. You *must* ensure that the
 
 `function.unmap_memory`
 
-Unmap an on-device memory and make any host-side changes about the memory visible to the device. The user MUST ensure there is no further access to the previously mapped host-addressible space.
+Unmaps an on-device memory and makes any host-side changes about the memory visible to the device. You *must* ensure that there is no further access to the previously mapped host-addressible space.
 
 `function.create_event`
 

From 81c10e8f08736a7a10c964413287c1d9ecf1a192 Mon Sep 17 00:00:00 2001
From: PENGUINLIONG <admin@penguinliong.moe>
Date: Fri, 23 Sep 2022 18:44:08 +0800
Subject: [PATCH 36/59] Update c_api/docs/taichi/taichi_core.h.md

Co-authored-by: Vissidarte-Herman <93570324+Vissidarte-Herman@users.noreply.github.com>
---
 c_api/docs/taichi/taichi_core.h.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/c_api/docs/taichi/taichi_core.h.md b/c_api/docs/taichi/taichi_core.h.md
index 2133a8d840519..eda0e6385eb42 100644
--- a/c_api/docs/taichi/taichi_core.h.md
+++ b/c_api/docs/taichi/taichi_core.h.md
@@ -344,7 +344,7 @@ Allocate a contiguous on-device memory with provided parameters.
 
 `function.free_memory`
 
-Free a memory allocation.
+Frees a memory allocation.
 
 `function.map_memory`
 

From d1a62199697976e8363f7f97ad8f18a4574e8d82 Mon Sep 17 00:00:00 2001
From: PENGUINLIONG <admin@penguinliong.moe>
Date: Fri, 23 Sep 2022 18:44:17 +0800
Subject: [PATCH 37/59] Update c_api/docs/taichi/taichi_core.h.md

Co-authored-by: Vissidarte-Herman <93570324+Vissidarte-Herman@users.noreply.github.com>
---
 c_api/docs/taichi/taichi_core.h.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/c_api/docs/taichi/taichi_core.h.md b/c_api/docs/taichi/taichi_core.h.md
index eda0e6385eb42..0417c713131d0 100644
--- a/c_api/docs/taichi/taichi_core.h.md
+++ b/c_api/docs/taichi/taichi_core.h.md
@@ -120,7 +120,7 @@ You can destroy an unused AOT module if you have done with it; but please ensure
 ti_destroy_aot_module(aot_module);
 ```
 
-### Launch Kernels and Compute Graphs
+### Launch kernels and compute graphs
 
 You can extract kernels and compute graphs from an AOT module. Kernel and compute graphs are a part of the module, so you don't have to destroy them.
 

From a77648b199dbeb81e2303c1466e57db2a440809c Mon Sep 17 00:00:00 2001
From: PENGUINLIONG <admin@penguinliong.moe>
Date: Fri, 23 Sep 2022 18:44:25 +0800
Subject: [PATCH 38/59] Update c_api/docs/taichi/taichi_core.h.md

Co-authored-by: Vissidarte-Herman <93570324+Vissidarte-Herman@users.noreply.github.com>
---
 c_api/docs/taichi/taichi_core.h.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/c_api/docs/taichi/taichi_core.h.md b/c_api/docs/taichi/taichi_core.h.md
index 0417c713131d0..7535e364a4af3 100644
--- a/c_api/docs/taichi/taichi_core.h.md
+++ b/c_api/docs/taichi/taichi_core.h.md
@@ -364,7 +364,7 @@ Destroys an event primitive.
 
 `function.copy_memory_device_to_device`
 
-Copy the content of a contiguous subsection of on-device memory to another. The two subsections MUST NOT overlap.
+Copies the data in a contiguous subsection of the on-device memory to another subsection. Note that the two subsections *must not* overlap.
 
 `function.launch_kernel`
 

From de9dbbd7172b4cee910961a37d6e56b66d9845c5 Mon Sep 17 00:00:00 2001
From: PENGUINLIONG <admin@penguinliong.moe>
Date: Fri, 23 Sep 2022 18:44:53 +0800
Subject: [PATCH 39/59] Update c_api/docs/taichi/taichi_core.h.md

Co-authored-by: Vissidarte-Herman <93570324+Vissidarte-Herman@users.noreply.github.com>
---
 c_api/docs/taichi/taichi_core.h.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/c_api/docs/taichi/taichi_core.h.md b/c_api/docs/taichi/taichi_core.h.md
index 7535e364a4af3..feb91ea4ba2b7 100644
--- a/c_api/docs/taichi/taichi_core.h.md
+++ b/c_api/docs/taichi/taichi_core.h.md
@@ -55,7 +55,7 @@ mai.usage = TI_MEMORY_USAGE_STORAGE_BIT;
 TiMemory memory = ti_allocate_memory(runtime, &mai);
 ```
 
-You MAY free allocated memory explicitly; but memory allocations will be automatically freed when the related `handle.runtime` is destroyed.
+Allocated memory is automatically freed when the related `handle.runtime` is destroyed. You can also manually free the allocated memory. 
 
 ```cpp
 ti_free_memory(runtime, memory);

From e6ad433f89b6d3b063628489108af60597d955ec Mon Sep 17 00:00:00 2001
From: PENGUINLIONG <admin@penguinliong.moe>
Date: Fri, 23 Sep 2022 18:45:03 +0800
Subject: [PATCH 40/59] Update c_api/docs/taichi/taichi_core.h.md

Co-authored-by: Vissidarte-Herman <93570324+Vissidarte-Herman@users.noreply.github.com>
---
 c_api/docs/taichi/taichi_core.h.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/c_api/docs/taichi/taichi_core.h.md b/c_api/docs/taichi/taichi_core.h.md
index feb91ea4ba2b7..1ebdcca318aba 100644
--- a/c_api/docs/taichi/taichi_core.h.md
+++ b/c_api/docs/taichi/taichi_core.h.md
@@ -44,7 +44,7 @@ When your program reaches the end, you SHOULD destroy the runtime instance. Plea
 ti_destroy_runtime(runtime);
 ```
 
-### Allocate and Free Memory
+### Allocate and free memory
 
 Allocate a piece of memory that is visible only to the device. On the GPU backends, it usually means that the memory is located in the graphics memory (GRAM).
 

From 4cdb41e38d668797f83aab05b0238f12a9076bad Mon Sep 17 00:00:00 2001
From: PENGUINLIONG <admin@penguinliong.moe>
Date: Fri, 23 Sep 2022 18:45:17 +0800
Subject: [PATCH 41/59] Update c_api/docs/taichi/taichi_core.h.md

Co-authored-by: Vissidarte-Herman <93570324+Vissidarte-Herman@users.noreply.github.com>
---
 c_api/docs/taichi/taichi_core.h.md | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/c_api/docs/taichi/taichi_core.h.md b/c_api/docs/taichi/taichi_core.h.md
index 1ebdcca318aba..bc38ddeb91209 100644
--- a/c_api/docs/taichi/taichi_core.h.md
+++ b/c_api/docs/taichi/taichi_core.h.md
@@ -38,7 +38,9 @@ To work with Taichi, you first create an runtime instance. You SHOULD only creat
 TiRuntime runtime = ti_create_runtime(TI_ARCH_VULKAN);
 ```
 
-When your program reaches the end, you SHOULD destroy the runtime instance. Please ensure any other related resources have been destroyed before the `handle.runtime` itself.
+When your program runs to the end, ensure that:
+- You destroy the runtime instance,
+- All related resources are destroyed before the `handle.runtime` itself.
 
 ```cpp
 ti_destroy_runtime(runtime);

From 1449e05efce53f0f8ec4a89f9143b84884f68a9e Mon Sep 17 00:00:00 2001
From: "pre-commit-ci[bot]"
 <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Date: Fri, 23 Sep 2022 10:45:28 +0000
Subject: [PATCH 42/59] [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci
---
 c_api/docs/taichi/taichi_core.h.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/c_api/docs/taichi/taichi_core.h.md b/c_api/docs/taichi/taichi_core.h.md
index bc38ddeb91209..967b5ae03e114 100644
--- a/c_api/docs/taichi/taichi_core.h.md
+++ b/c_api/docs/taichi/taichi_core.h.md
@@ -19,8 +19,8 @@ Taichi C-API has bridged the following backends:
 |Metal|GPU (macOS, iOS)|N/A|
 |OpenGL|GPU|N/A|
 
-The backends with tier-1 support are being developed and tested more intensively. And most new features will be available on Vulkan first, because it has the most outstanding cross-platform compatibility among all the tier-1 backends. 
-For the backends with tier-2 support, you should expect a delay in the fixes to the minor issues. 
+The backends with tier-1 support are being developed and tested more intensively. And most new features will be available on Vulkan first, because it has the most outstanding cross-platform compatibility among all the tier-1 backends.
+For the backends with tier-2 support, you should expect a delay in the fixes to the minor issues.
 
 For convenience, in the following text (and other C-API documentations), the term **host** refers to the user of the C-API; the term **device** refers to the logical (conceptual) compute device that Taichi Runtime offloads its compute tasks to. A *device* might not be an actual discrete processor away from the CPU and the *host* MAY NOT be able to access the memory allocated on the *device*.
 
@@ -406,7 +406,7 @@ Destroys a loaded AOT module and releases all related resources.
 
 `function.get_aot_module_kernel`
 
-Retrieves a pre-compiled Taichi kernel from the AOT module. 
+Retrieves a pre-compiled Taichi kernel from the AOT module.
 Returns `definition.null_handle` if the module does not have a kernel of the specified name.
 
 `function.get_aot_module_compute_graph`

From 633d9f5e9c55573ff56b98ad34bb7fa4afdff8a0 Mon Sep 17 00:00:00 2001
From: "pre-commit-ci[bot]"
 <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Date: Fri, 23 Sep 2022 10:51:58 +0000
Subject: [PATCH 43/59] [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci
---
 c_api/docs/taichi/taichi_core.h.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/c_api/docs/taichi/taichi_core.h.md b/c_api/docs/taichi/taichi_core.h.md
index 967b5ae03e114..8713c81ff91ff 100644
--- a/c_api/docs/taichi/taichi_core.h.md
+++ b/c_api/docs/taichi/taichi_core.h.md
@@ -57,7 +57,7 @@ mai.usage = TI_MEMORY_USAGE_STORAGE_BIT;
 TiMemory memory = ti_allocate_memory(runtime, &mai);
 ```
 
-Allocated memory is automatically freed when the related `handle.runtime` is destroyed. You can also manually free the allocated memory. 
+Allocated memory is automatically freed when the related `handle.runtime` is destroyed. You can also manually free the allocated memory.
 
 ```cpp
 ti_free_memory(runtime, memory);

From 446b78428bce8010a7e8d2b3871d544835d2fc10 Mon Sep 17 00:00:00 2001
From: PENGUINLIONG <admin@penguinliong.moe>
Date: Fri, 23 Sep 2022 21:31:59 +0800
Subject: [PATCH 44/59] Update c_api/docs/taichi/taichi_core.h.md

Co-authored-by: Vissidarte-Herman <93570324+Vissidarte-Herman@users.noreply.github.com>
---
 c_api/docs/taichi/taichi_core.h.md | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/c_api/docs/taichi/taichi_core.h.md b/c_api/docs/taichi/taichi_core.h.md
index 8713c81ff91ff..5412a07ea2286 100644
--- a/c_api/docs/taichi/taichi_core.h.md
+++ b/c_api/docs/taichi/taichi_core.h.md
@@ -398,7 +398,8 @@ Waits until all previously invoked device commands are executed.
 
 `function.load_aot_module`
 
-Load a precompiled AOT module from the filesystem. `definition.null_handle` is returned if the runtime failed to load the AOT module from the given path.
+Loads a pre-compiled AOT module from the file system. 
+Returns `definition.null_handle` if the runtime fails to load the AOT module from the specified path.
 
 `function.destroy_aot_module`
 

From 97c6060a0550308f421ec6bab8fd636db33702ca Mon Sep 17 00:00:00 2001
From: "pre-commit-ci[bot]"
 <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Date: Fri, 23 Sep 2022 13:35:00 +0000
Subject: [PATCH 45/59] [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci
---
 c_api/docs/taichi/taichi_core.h.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/c_api/docs/taichi/taichi_core.h.md b/c_api/docs/taichi/taichi_core.h.md
index 5412a07ea2286..a100c2563b44a 100644
--- a/c_api/docs/taichi/taichi_core.h.md
+++ b/c_api/docs/taichi/taichi_core.h.md
@@ -398,7 +398,7 @@ Waits until all previously invoked device commands are executed.
 
 `function.load_aot_module`
 
-Loads a pre-compiled AOT module from the file system. 
+Loads a pre-compiled AOT module from the file system.
 Returns `definition.null_handle` if the runtime fails to load the AOT module from the specified path.
 
 `function.destroy_aot_module`

From 2fd41a0e42a0ae4cbabf17008fe2a191e5dff3ce Mon Sep 17 00:00:00 2001
From: PENGUINLIONG <admin@penguinliong.moe>
Date: Fri, 23 Sep 2022 21:40:57 +0800
Subject: [PATCH 46/59] Editorial changes

---
 c_api/docs/taichi/taichi_core.h.md | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/c_api/docs/taichi/taichi_core.h.md b/c_api/docs/taichi/taichi_core.h.md
index 5412a07ea2286..99b0d0d4e3aa1 100644
--- a/c_api/docs/taichi/taichi_core.h.md
+++ b/c_api/docs/taichi/taichi_core.h.md
@@ -8,21 +8,21 @@ Taichi Core exposes all necessary interfaces for offloading the AOT modules to T
 
 ## Availability
 
-Taichi C-API has bridged the following backends:
+Taichi C-API intends to support the following backends:
 
 |Backend|Offload Target|Maintenance Tier|
 |-|-|-|
 |Vulkan|GPU|Tier 1|
 |CUDA (LLVM)|GPU (NVIDIA)|Tier 1|
 |CPU (LLVM)|CPU|Tier 1|
+|OpenGL|GPU|Tier 2|
 |DirectX 11|GPU (Windows)|N/A|
 |Metal|GPU (macOS, iOS)|N/A|
-|OpenGL|GPU|N/A|
 
 The backends with tier-1 support are being developed and tested more intensively. And most new features will be available on Vulkan first, because it has the most outstanding cross-platform compatibility among all the tier-1 backends.
 For the backends with tier-2 support, you should expect a delay in the fixes to the minor issues.
 
-For convenience, in the following text (and other C-API documentations), the term **host** refers to the user of the C-API; the term **device** refers to the logical (conceptual) compute device that Taichi Runtime offloads its compute tasks to. A *device* might not be an actual discrete processor away from the CPU and the *host* MAY NOT be able to access the memory allocated on the *device*.
+For convenience, in the following text and other C-API documents, the term *host* refers to the user of the C-API; the term *device* refers to the logical (conceptual) compute device, to which Taichi's runtime offloads its compute tasks. A *device* may not be a physical discrete processor other than the CPU and the *host* may *not* be able to access the memory allocated on the *device*.
 
 Unless explicitly explained, **device**, **backend**, **offload targer** and **GPU** are used interchangeably; **host**, **user code**, **user procedure** and **CPU** are used interchangeably too.
 
@@ -32,7 +32,7 @@ In this section we give an brief introduction about what you might want to do wi
 
 ### Create and destroy a Runtime Instance
 
-To work with Taichi, you first create an runtime instance. You SHOULD only create a single runtime per thread. Currently we don't officially claim that multiple runtime instances can coexist in a process, please feel free to [report issues](https://github.com/taichi-dev/taichi/issues) if you encountered any problem with such usage.
+You *must* create a runtime instance before working with Taichi, and *only* one runtime per thread. Currently we do not officially claim that multiple runtime instances can coexist in a process, but please feel free to [file an issue with us](https://github.com/taichi-dev/taichi/issues) if you run into any problem with runtime instance coexistence.
 
 ```cpp
 TiRuntime runtime = ti_create_runtime(TI_ARCH_VULKAN);
@@ -232,7 +232,7 @@ A Taichi kernel that can be launched on device for execution.
 
 `handle.compute_graph`
 
-A collection of Taichi kernels (a compute graph) to be launched on device in predefined order.
+A collection of Taichi kernels (a compute graph) to launch on device in a predefined order.
 
 `enumeration.arch`
 

From a2041a3dc5abcf79b463ae9c5b6dfd0d13b26269 Mon Sep 17 00:00:00 2001
From: PENGUINLIONG <admin@penguinliong.moe>
Date: Fri, 23 Sep 2022 23:39:33 +0800
Subject: [PATCH 47/59] Finalizing editorial changes

---
 c_api/docs/taichi/taichi_core.h.md        |  30 +-
 docs/lang/articles/c-api/taichi_core.md   | 421 ++++++++++++++++++++--
 docs/lang/articles/c-api/taichi_vulkan.md |  41 +++
 misc/generate_c_api_docs.py               |  11 +
 4 files changed, 466 insertions(+), 37 deletions(-)

diff --git a/c_api/docs/taichi/taichi_core.h.md b/c_api/docs/taichi/taichi_core.h.md
index 534be2eafd902..4c9465d8b7e2c 100644
--- a/c_api/docs/taichi/taichi_core.h.md
+++ b/c_api/docs/taichi/taichi_core.h.md
@@ -226,6 +226,14 @@ A synchronization primitive to manage on-device execution flows in multiple queu
 
 A contiguous allocation of on-device memory.
 
+`handle.image`
+
+A contiguous allocation of on-device image.
+
+`handle.sampler`
+
+An image sampler. `definition.null_handle` represents a default image sampler provided by the runtime implementation. The filter modes, address modes of default samplers depends on backend implementation.
+
 `handle.kernel`
 
 A Taichi kernel that can be launched on device for execution.
@@ -234,6 +242,22 @@ A Taichi kernel that can be launched on device for execution.
 
 A collection of Taichi kernels (a compute graph) to launch on device in a predefined order.
 
+`enumeration.error`
+
+Errors reported by the Taichi C-API.
+
+- `enumeration.error.incomplete`: The output data is truncated because the user-provided buffer is too small.
+- `enumeration.error.success`: The Taichi C-API invocation finished gracefully.
+- `enumeration.error.not_supported`: The invoked API, or the combination of parameters is not supported by the Taichi C-API.
+- `enumeration.error.corrupted_data`: Provided data is corrupted.
+- `enumeration.error.name_not_found`: Provided name does not refer to any existing item.
+- `enumeration.error.invalid_argument`: One or more function arguments violate constraints specified in C-API documents; or kernel arguments mismatch the kernel argument list defined in the AOT module.
+- `enumeration.error.argument_null`: One or more by-reference (pointer) function arguments point to null.
+- `enumeration.error.argument_out_of_range`: One or more function arguments are out of its acceptable range; or enumeration arguments have undefined value.
+- `enumeration.error.argument_not_found`: One or more kernel arguments are missing.
+- `enumeration.error.invalid_interop`: The intended interoperation is not possible on the current arch. For example, attempts to export a Vulkan object from a CUDA runtime is not allowed.
+- `enumeration.error.invalid_state`: The Taichi C-API enters an unrecoverable invalid state. Related Taichi objects are potentially corrupted. The users *should* release the contaminated resources for stability. Please feel free to file an issue if you encountered this error in a normal routine.
+
 `enumeration.arch`
 
 Types of backend archs.
@@ -271,7 +295,7 @@ Types of kernel and compute graph argument.
 
 Usages of a memory allocation.
 
-- `bit_field.memory_usage.storage`: The memory can be read/write accessed by any shader, you usually only need to set this flag.
+- `bit_field.memory_usage.storage`: The memory can be read/write accessed by any kernel. In most of the cases, the users only need to set this flag.
 - `bit_field.memory_usage.uniform`: The memory can be used as a uniform buffer in graphics pipelines.
 - `bit_field.memory_usage.vertex`: The memory can be used as a vertex buffer in graphics pipelines.
 - `bit_field.memory_usage.index`: The memory can be used as a index buffer in graphics pipelines.
@@ -306,8 +330,8 @@ Multi-dimensional size of an ND-array. Dimension sizes after `structure.nd_shape
 Multi-dimentional array of dense primitive data.
 
 - `structure.nd_array.memory`: Memory bound to the ND-array.
-- `structure.nd_array.shape`: Shape of the ND-array.
-- `structure.nd_array.elem_shape`: Shape of the ND-array elements. You usually need to set this if it's a vector or matrix ND-array.
+- `structure.nd_array.shape`: Shape of the ND-array. 
+- `structure.nd_array.elem_shape`: Shape of the ND-array elements. It *must not* be empty for vector or matrix ND-arrays.
 - `structure.nd_array.elem_type`: Primitive data type of the ND-array elements.
 
 `union.argument_value`
diff --git a/docs/lang/articles/c-api/taichi_core.md b/docs/lang/articles/c-api/taichi_core.md
index 240903008fe85..bdcc7f2b8c393 100644
--- a/docs/lang/articles/c-api/taichi_core.md
+++ b/docs/lang/articles/c-api/taichi_core.md
@@ -4,24 +4,25 @@ sidebar_position: 1
 
 # Core Functionality
 
-Taichi Core exposes all necessary interfaces to offload AOT modules to Taichi. Here lists the features universally available disregards to any specific backend. These APIs are still in active development so is subject to change.
+Taichi Core exposes all necessary interfaces for offloading the AOT modules to Taichi. The following are a list of features that are available regardless of your backend. The corresponding APIs are still under development and subject to change.
 
 ## Availability
 
-Taichi C-API has bridged the following backends:
+Taichi C-API intends to support the following backends:
 
 |Backend|Offload Target|Maintenance Tier|
 |-|-|-|
 |Vulkan|GPU|Tier 1|
 |CUDA (LLVM)|GPU (NVIDIA)|Tier 1|
 |CPU (LLVM)|CPU|Tier 1|
+|OpenGL|GPU|Tier 2|
 |DirectX 11|GPU (Windows)|N/A|
 |Metal|GPU (macOS, iOS)|N/A|
-|OpenGL|GPU|N/A|
 
-The backends with tier 1 support are the most intensively developed and tested ones. In contrast, you would expect a delay in fixes against minor issues on tier 2 backends. The backends currently unsupported might become supported. Among all the tier 1 backends, Vulkan has the most outstanding cross-platform compatibility, so most of the new features will be first available on Vulkan.
+The backends with tier-1 support are being developed and tested more intensively. And most new features will be available on Vulkan first, because it has the most outstanding cross-platform compatibility among all the tier-1 backends.
+For the backends with tier-2 support, you should expect a delay in the fixes to the minor issues.
 
-For convenience, in the following text (and other C-API documentations), the term **host** refers to the user of the C-API; the term **device** refers to the logical (conceptual) compute device that Taichi Runtime offloads its compute tasks to. A *device* might not be an actual discrete processor away from the CPU and the *host* MAY NOT be able to access the memory allocated on the *device*.
+For convenience, in the following text and other C-API documents, the term *host* refers to the user of the C-API; the term *device* refers to the logical (conceptual) compute device, to which Taichi's runtime offloads its compute tasks. A *device* may not be a physical discrete processor other than the CPU and the *host* may *not* be able to access the memory allocated on the *device*.
 
 Unless explicitly explained, **device**, **backend**, **offload targer** and **GPU** are used interchangeably; **host**, **user code**, **user procedure** and **CPU** are used interchangeably too.
 
@@ -31,21 +32,23 @@ In this section we give an brief introduction about what you might want to do wi
 
 ### Create and destroy a Runtime Instance
 
-To work with Taichi, you first create an runtime instance. You SHOULD only create a single runtime per thread. Currently we don't officially claim that multiple runtime instances can coexist in a process, please feel free to [report issues](https://github.com/taichi-dev/taichi/issues) if you encountered any problem with such usage.
+You *must* create a runtime instance before working with Taichi, and *only* one runtime per thread. Currently we do not officially claim that multiple runtime instances can coexist in a process, but please feel free to [file an issue with us](https://github.com/taichi-dev/taichi/issues) if you run into any problem with runtime instance coexistence.
 
 ```cpp
 TiRuntime runtime = ti_create_runtime(TI_ARCH_VULKAN);
 ```
 
-When your program reaches the end, you SHOULD destroy the runtime instance. Please ensure any other related resources have been destroyed before the [`TiRuntime`](#handle-tiruntime) itself.
+When your program runs to the end, ensure that:
+- You destroy the runtime instance,
+- All related resources are destroyed before the [`TiRuntime`](#handle-tiruntime) itself.
 
 ```cpp
 ti_destroy_runtime(runtime);
 ```
 
-### Allocate and Free Memory
+### Allocate and free memory
 
-Allocate a piece of memory that is only visible to the device. On GPU backends, it usually means that the memory is located in the graphics memory (GRAM).
+Allocate a piece of memory that is visible only to the device. On the GPU backends, it usually means that the memory is located in the graphics memory (GRAM).
 
 ```cpp
 TiMemoryAllocateInfo mai {};
@@ -54,7 +57,7 @@ mai.usage = TI_MEMORY_USAGE_STORAGE_BIT;
 TiMemory memory = ti_allocate_memory(runtime, &mai);
 ```
 
-You MAY free allocated memory explicitly; but memory allocations will be automatically freed when the related [`TiRuntime`](#handle-tiruntime) is destroyed.
+Allocated memory is automatically freed when the related [`TiRuntime`](#handle-tiruntime) is destroyed. You can also manually free the allocated memory.
 
 ```cpp
 ti_free_memory(runtime, memory);
@@ -64,7 +67,7 @@ ti_free_memory(runtime, memory);
 
 By default, memory allocations are physically or conceptually local to the offload target for performance reasons. You can configure the allocate info to enable host access to memory allocations. But please note that host-accessible allocations MAY slow down computation on GPU because of the limited bus bandwidth between the host memory and the device.
 
-To allow data to be streamed into the memory, `host_write` MUST be set true.
+You *must* set `host_write` to `true` to allow streaming data to the memory.
 
 ```cpp
 TiMemoryAllocateInfo mai {};
@@ -119,7 +122,7 @@ You can destroy an unused AOT module if you have done with it; but please ensure
 ti_destroy_aot_module(aot_module);
 ```
 
-### Launch Kernels and Compute Graphs
+### Launch kernels and compute graphs
 
 You can extract kernels and compute graphs from an AOT module. Kernel and compute graphs are a part of the module, so you don't have to destroy them.
 
@@ -181,7 +184,7 @@ ti_submit(runtime);
 ti_wait(runtime);
 ```
 
-**WARNING** This part is subject to change. We're gonna introduce multi-queue in the future.
+**WARNING** This part is subject to change. We will introduce multi-queue in the future.
 
 ## API Reference
 
@@ -276,6 +279,26 @@ typedef struct TiMemory_t* TiMemory;
 
 A contiguous allocation of on-device memory.
 
+---
+### Handle `TiImage`
+
+```c
+// handle.image
+typedef struct TiImage_t* TiImage;
+```
+
+A contiguous allocation of on-device image.
+
+---
+### Handle `TiSampler`
+
+```c
+// handle.sampler
+typedef struct TiSampler_t* TiSampler;
+```
+
+An image sampler. [`TI_NULL_HANDLE`](#definition-ti_null_handle) represents a default image sampler provided by the runtime implementation. The filter modes, address modes of default samplers depends on backend implementation.
+
 ---
 ### Handle `TiKernel`
 
@@ -294,7 +317,42 @@ A Taichi kernel that can be launched on device for execution.
 typedef struct TiComputeGraph_t* TiComputeGraph;
 ```
 
-A collection of Taichi kernels (a compute graph) to be launched on device in predefined order.
+A collection of Taichi kernels (a compute graph) to launch on device in a predefined order.
+
+---
+### Enumeration `TiError`
+
+```c
+// enumeration.error
+typedef enum TiError {
+  TI_ERROR_INCOMPLETE = 1,
+  TI_ERROR_SUCCESS = 0,
+  TI_ERROR_NOT_SUPPORTED = -1,
+  TI_ERROR_CORRUPTED_DATA = -2,
+  TI_ERROR_NAME_NOT_FOUND = -3,
+  TI_ERROR_INVALID_ARGUMENT = -4,
+  TI_ERROR_ARGUMENT_NULL = -5,
+  TI_ERROR_ARGUMENT_OUT_OF_RANGE = -6,
+  TI_ERROR_ARGUMENT_NOT_FOUND = -7,
+  TI_ERROR_INVALID_INTEROP = -8,
+  TI_ERROR_INVALID_STATE = -9,
+  TI_ERROR_MAX_ENUM = 0xffffffff,
+} TiError;
+```
+
+Errors reported by the Taichi C-API.
+
+- `TI_ERROR_INCOMPLETE`: The output data is truncated because the user-provided buffer is too small.
+- `TI_ERROR_SUCCESS`: The Taichi C-API invocation finished gracefully.
+- `TI_ERROR_NOT_SUPPORTED`: The invoked API, or the combination of parameters is not supported by the Taichi C-API.
+- `TI_ERROR_CORRUPTED_DATA`: Provided data is corrupted.
+- `TI_ERROR_NAME_NOT_FOUND`: Provided name does not refer to any existing item.
+- `TI_ERROR_INVALID_ARGUMENT`: One or more function arguments violate constraints specified in C-API documents; or kernel arguments mismatch the kernel argument list defined in the AOT module.
+- `TI_ERROR_ARGUMENT_NULL`: One or more by-reference (pointer) function arguments point to null.
+- `TI_ERROR_ARGUMENT_OUT_OF_RANGE`: One or more function arguments are out of its acceptable range; or enumeration arguments have undefined value.
+- `TI_ERROR_ARGUMENT_NOT_FOUND`: One or more kernel arguments are missing.
+- `TI_ERROR_INVALID_INTEROP`: The intended interoperation is not possible on the current arch. For example, attempts to export a Vulkan object from a CUDA runtime is not allowed.
+- `TI_ERROR_INVALID_STATE`: The Taichi C-API enters an unrecoverable invalid state. Related Taichi objects are potentially corrupted. The users *should* release the contaminated resources for stability. Please feel free to file an issue if you encountered this error in a normal routine.
 
 ---
 ### Enumeration `TiArch`
@@ -311,9 +369,10 @@ typedef enum TiArch {
   TI_ARCH_METAL = 6,
   TI_ARCH_OPENGL = 7,
   TI_ARCH_DX11 = 8,
-  TI_ARCH_OPENCL = 9,
-  TI_ARCH_AMDGPU = 10,
-  TI_ARCH_VULKAN = 11,
+  TI_ARCH_DX12 = 9,
+  TI_ARCH_OPENCL = 10,
+  TI_ARCH_AMDGPU = 11,
+  TI_ARCH_VULKAN = 12,
   TI_ARCH_MAX_ENUM = 0xffffffff,
 } TiArch;
 ```
@@ -372,6 +431,7 @@ typedef enum TiArgumentType {
   TI_ARGUMENT_TYPE_I32 = 0,
   TI_ARGUMENT_TYPE_F32 = 1,
   TI_ARGUMENT_TYPE_NDARRAY = 2,
+  TI_ARGUMENT_TYPE_TEXTURE = 3,
   TI_ARGUMENT_TYPE_MAX_ENUM = 0xffffffff,
 } TiArgumentType;
 ```
@@ -383,7 +443,7 @@ Types of kernel and compute graph argument.
 - `TI_ARGUMENT_TYPE_NDARRAY`: ND-array wrapped around a [`TiMemory`](#handle-timemory).
 
 ---
-### BitField `TiMemoryUsageFlagBits`
+### BitField `TiMemoryUsageFlags`
 
 ```c
 // bit_field.memory_usage
@@ -398,7 +458,7 @@ typedef TiFlags TiMemoryUsageFlags;
 
 Usages of a memory allocation.
 
-- `TI_MEMORY_USAGE_STORAGE_BIT`: The memory can be read/write accessed by any shader, you usually only need to set this flag.
+- `TI_MEMORY_USAGE_STORAGE_BIT`: The memory can be read/write accessed by any kernel. In most of the cases, the users only need to set this flag.
 - `TI_MEMORY_USAGE_UNIFORM_BIT`: The memory can be used as a uniform buffer in graphics pipelines.
 - `TI_MEMORY_USAGE_VERTEX_BIT`: The memory can be used as a vertex buffer in graphics pipelines.
 - `TI_MEMORY_USAGE_INDEX_BIT`: The memory can be used as a index buffer in graphics pipelines.
@@ -413,7 +473,7 @@ typedef struct TiMemoryAllocateInfo {
   TiBool host_write;
   TiBool host_read;
   TiBool export_sharing;
-  TiMemoryUsageFlagBits usage;
+  TiMemoryUsageFlags usage;
 } TiMemoryAllocateInfo;
 ```
 
@@ -476,9 +536,206 @@ Multi-dimentional array of dense primitive data.
 
 - `memory`: Memory bound to the ND-array.
 - `shape`: Shape of the ND-array.
-- `elem_shape`: Shape of the ND-array elements. You usually need to set this if it's a vector or matrix ND-array.
+- `elem_shape`: Shape of the ND-array elements. It *must not* be empty for vector or matrix ND-arrays.
 - `elem_type`: Primitive data type of the ND-array elements.
 
+---
+### BitField `TiImageUsageFlags`
+
+```c
+// bit_field.image_usage
+typedef enum TiImageUsageFlagBits {
+  TI_IMAGE_USAGE_STORAGE_BIT = 1 << 0,
+  TI_IMAGE_USAGE_SAMPLED_BIT = 1 << 1,
+  TI_IMAGE_USAGE_ATTACHMENT_BIT = 1 << 2,
+} TiImageUsageFlagBits;
+typedef TiFlags TiImageUsageFlags;
+```
+---
+### Enumeration `TiImageDimension`
+
+```c
+// enumeration.image_dimension
+typedef enum TiImageDimension {
+  TI_IMAGE_DIMENSION_1D = 0,
+  TI_IMAGE_DIMENSION_2D = 1,
+  TI_IMAGE_DIMENSION_3D = 2,
+  TI_IMAGE_DIMENSION_1D_ARRAY = 3,
+  TI_IMAGE_DIMENSION_2D_ARRAY = 4,
+  TI_IMAGE_DIMENSION_CUBE = 5,
+  TI_IMAGE_DIMENSION_MAX_ENUM = 0xffffffff,
+} TiImageDimension;
+```
+---
+### Enumeration `TiImageLayout`
+
+```c
+// enumeration.image_layout
+typedef enum TiImageLayout {
+  TI_IMAGE_LAYOUT_UNDEFINED = 0,
+  TI_IMAGE_LAYOUT_SHADER_READ = 1,
+  TI_IMAGE_LAYOUT_SHADER_WRITE = 2,
+  TI_IMAGE_LAYOUT_SHADER_READ_WRITE = 3,
+  TI_IMAGE_LAYOUT_COLOR_ATTACHMENT = 4,
+  TI_IMAGE_LAYOUT_COLOR_ATTACHMENT_READ = 5,
+  TI_IMAGE_LAYOUT_DEPTH_ATTACHMENT = 6,
+  TI_IMAGE_LAYOUT_DEPTH_ATTACHMENT_READ = 7,
+  TI_IMAGE_LAYOUT_TRANSFER_DST = 8,
+  TI_IMAGE_LAYOUT_TRANSFER_SRC = 9,
+  TI_IMAGE_LAYOUT_PRESENT_SRC = 10,
+  TI_IMAGE_LAYOUT_MAX_ENUM = 0xffffffff,
+} TiImageLayout;
+```
+---
+### Enumeration `TiFormat`
+
+```c
+// enumeration.format
+typedef enum TiFormat {
+  TI_FORMAT_UNKNOWN = 0,
+  TI_FORMAT_R8 = 1,
+  TI_FORMAT_RG8 = 2,
+  TI_FORMAT_RGBA8 = 3,
+  TI_FORMAT_RGBA8SRGB = 4,
+  TI_FORMAT_BGRA8 = 5,
+  TI_FORMAT_BGRA8SRGB = 6,
+  TI_FORMAT_R8U = 7,
+  TI_FORMAT_RG8U = 8,
+  TI_FORMAT_RGBA8U = 9,
+  TI_FORMAT_R8I = 10,
+  TI_FORMAT_RG8I = 11,
+  TI_FORMAT_RGBA8I = 12,
+  TI_FORMAT_R16 = 13,
+  TI_FORMAT_RG16 = 14,
+  TI_FORMAT_RGB16 = 15,
+  TI_FORMAT_RGBA16 = 16,
+  TI_FORMAT_R16U = 17,
+  TI_FORMAT_RG16U = 18,
+  TI_FORMAT_RGB16U = 19,
+  TI_FORMAT_RGBA16U = 20,
+  TI_FORMAT_R16I = 21,
+  TI_FORMAT_RG16I = 22,
+  TI_FORMAT_RGB16I = 23,
+  TI_FORMAT_RGBA16I = 24,
+  TI_FORMAT_R16F = 25,
+  TI_FORMAT_RG16F = 26,
+  TI_FORMAT_RGB16F = 27,
+  TI_FORMAT_RGBA16F = 28,
+  TI_FORMAT_R32U = 29,
+  TI_FORMAT_RG32U = 30,
+  TI_FORMAT_RGB32U = 31,
+  TI_FORMAT_RGBA32U = 32,
+  TI_FORMAT_R32I = 33,
+  TI_FORMAT_RG32I = 34,
+  TI_FORMAT_RGB32I = 35,
+  TI_FORMAT_RGBA32I = 36,
+  TI_FORMAT_R32F = 37,
+  TI_FORMAT_RG32F = 38,
+  TI_FORMAT_RGB32F = 39,
+  TI_FORMAT_RGBA32F = 40,
+  TI_FORMAT_DEPTH16 = 41,
+  TI_FORMAT_DEPTH24STENCIL8 = 42,
+  TI_FORMAT_DEPTH32F = 43,
+  TI_FORMAT_MAX_ENUM = 0xffffffff,
+} TiFormat;
+```
+---
+### Structure `TiImageOffset`
+
+```c
+// structure.image_offset
+typedef struct TiImageOffset {
+  uint32_t x;
+  uint32_t y;
+  uint32_t z;
+  uint32_t array_layer_offset;
+} TiImageOffset;
+```
+---
+### Structure `TiImageExtent`
+
+```c
+// structure.image_extent
+typedef struct TiImageExtent {
+  uint32_t width;
+  uint32_t height;
+  uint32_t depth;
+  uint32_t array_layer_count;
+} TiImageExtent;
+```
+---
+### Structure `TiImageAllocateInfo`
+
+```c
+// structure.image_allocate_info
+typedef struct TiImageAllocateInfo {
+  TiImageDimension dimension;
+  TiImageExtent extent;
+  uint32_t mip_level_count;
+  TiFormat format;
+  TiImageUsageFlags usage;
+} TiImageAllocateInfo;
+```
+---
+### Structure `TiImageSlice`
+
+```c
+// structure.image_slice
+typedef struct TiImageSlice {
+  TiImage image;
+  TiImageOffset offset;
+  TiImageExtent extent;
+  uint32_t mip_level;
+} TiImageSlice;
+```
+---
+### Enumeration `TiFilter`
+
+```c
+// enumeration.filter
+typedef enum TiFilter {
+  TI_FILTER_NEAREST = 0,
+  TI_FILTER_LINEAR = 1,
+  TI_FILTER_MAX_ENUM = 0xffffffff,
+} TiFilter;
+```
+---
+### Enumeration `TiAddressMode`
+
+```c
+// enumeration.address_mode
+typedef enum TiAddressMode {
+  TI_ADDRESS_MODE_REPEAT = 0,
+  TI_ADDRESS_MODE_MIRRORED_REPEAT = 1,
+  TI_ADDRESS_MODE_CLAMP_TO_EDGE = 2,
+  TI_ADDRESS_MODE_MAX_ENUM = 0xffffffff,
+} TiAddressMode;
+```
+---
+### Structure `TiSamplerCreateInfo`
+
+```c
+// structure.sampler_create_info
+typedef struct TiSamplerCreateInfo {
+  TiFilter mag_filter;
+  TiFilter min_filter;
+  TiAddressMode address_mode;
+  float max_anisotropy;
+} TiSamplerCreateInfo;
+```
+---
+### Structure `TiTexture`
+
+```c
+// structure.texture
+typedef struct TiTexture {
+  TiImage image;
+  TiSampler sampler;
+  TiImageDimension dimension;
+  TiImageExtent extent;
+  TiFormat format;
+} TiTexture;
+```
 ---
 ### Union `TiArgumentValue`
 
@@ -488,6 +745,7 @@ typedef union TiArgumentValue {
   int32_t i32;
   float f32;
   TiNdArray ndarray;
+  TiTexture texture;
 } TiArgumentValue;
 ```
 
@@ -529,6 +787,26 @@ An named argument value to feed compute graphcs.
 - `name`: Name of the argument.
 - `argument`: Argument body.
 
+---
+### Function `ti_get_last_error`
+
+```c
+// function.get_last_error
+TI_DLL_EXPORT TiError TI_API_CALL ti_get_last_error(
+  uint64_t message_size,
+  char* message
+);
+```
+---
+### Function `ti_set_last_error`
+
+```c
+// function.set_last_error
+TI_DLL_EXPORT void TI_API_CALL ti_set_last_error(
+  TiError error,
+  const char* message
+);
+```
 ---
 ### Function `ti_create_runtime`
 
@@ -577,7 +855,7 @@ TI_DLL_EXPORT void TI_API_CALL ti_free_memory(
 );
 ```
 
-Free a memory allocation.
+Frees a memory allocation.
 
 ---
 ### Function `ti_map_memory`
@@ -590,7 +868,7 @@ TI_DLL_EXPORT void* TI_API_CALL ti_map_memory(
 );
 ```
 
-Map an on-device memory to a host-addressible space. The user MUST ensure the device is not being used by any device command before the map.
+Maps an on-device memory to a host-addressible space. You *must* ensure that the device is not being used by any device command before the mapping.
 
 ---
 ### Function `ti_unmap_memory`
@@ -603,8 +881,48 @@ TI_DLL_EXPORT void TI_API_CALL ti_unmap_memory(
 );
 ```
 
-Unmap an on-device memory and make any host-side changes about the memory visible to the device. The user MUST ensure there is no further access to the previously mapped host-addressible space.
+Unmaps an on-device memory and makes any host-side changes about the memory visible to the device. You *must* ensure that there is no further access to the previously mapped host-addressible space.
+
+---
+### Function `ti_allocate_image`
 
+```c
+// function.allocate_image
+TI_DLL_EXPORT TiImage TI_API_CALL ti_allocate_image(
+  TiRuntime runtime,
+  const TiImageAllocateInfo* allocate_info
+);
+```
+---
+### Function `ti_free_image`
+
+```c
+// function.free_image
+TI_DLL_EXPORT void TI_API_CALL ti_free_image(
+  TiRuntime runtime,
+  TiImage image
+);
+```
+---
+### Function `ti_create_sampler`
+
+```c
+// function.create_sampler
+TI_DLL_EXPORT TiSampler TI_API_CALL ti_create_sampler(
+  TiRuntime runtime,
+  const TiSamplerCreateInfo* create_info
+);
+```
+---
+### Function `ti_destroy_sampler`
+
+```c
+// function.destroy_sampler
+TI_DLL_EXPORT void TI_API_CALL ti_destroy_sampler(
+  TiRuntime runtime,
+  TiSampler sampler
+);
+```
 ---
 ### Function `ti_create_event`
 
@@ -615,7 +933,7 @@ TI_DLL_EXPORT TiEvent TI_API_CALL ti_create_event(
 );
 ```
 
-Create an event primitive.
+Creates an event primitive.
 
 ---
 ### Function `ti_destroy_event`
@@ -627,7 +945,7 @@ TI_DLL_EXPORT void TI_API_CALL ti_destroy_event(
 );
 ```
 
-Destroy an event primitive.
+Destroys an event primitive.
 
 ---
 ### Function `ti_copy_memory_device_to_device` (Device Command)
@@ -641,8 +959,41 @@ TI_DLL_EXPORT void TI_API_CALL ti_copy_memory_device_to_device(
 );
 ```
 
-Copy the content of a contiguous subsection of on-device memory to another. The two subsections MUST NOT overlap.
+Copies the data in a contiguous subsection of the on-device memory to another subsection. Note that the two subsections *must not* overlap.
 
+---
+### Function `ti_copy_image_device_to_device` (Device Command)
+
+```c
+// function.copy_image_device_to_device
+TI_DLL_EXPORT void TI_API_CALL ti_copy_image_device_to_device(
+  TiRuntime runtime,
+  const TiImageSlice* dst_image,
+  const TiImageSlice* src_image
+);
+```
+---
+### Function `ti_track_image_ext`
+
+```c
+// function.track_image
+TI_DLL_EXPORT void TI_API_CALL ti_track_image_ext(
+  TiRuntime runtime,
+  TiImage image,
+  TiImageLayout layout
+);
+```
+---
+### Function `ti_transition_image` (Device Command)
+
+```c
+// function.transition_image
+TI_DLL_EXPORT void TI_API_CALL ti_transition_image(
+  TiRuntime runtime,
+  TiImage image,
+  TiImageLayout layout
+);
+```
 ---
 ### Function `ti_launch_kernel` (Device Command)
 
@@ -671,7 +1022,7 @@ TI_DLL_EXPORT void TI_API_CALL ti_launch_compute_graph(
 );
 ```
 
-Launch a Taichi compute graph with provided named arguments. The named arguments MUST have the same count, names and types as in the source code.
+Launches a Taichi compute graph with provided named arguments. The named arguments *must* have the same count, names, and types as in the source code.
 
 ---
 ### Function `ti_signal_event` (Device Command)
@@ -684,7 +1035,7 @@ TI_DLL_EXPORT void TI_API_CALL ti_signal_event(
 );
 ```
 
-Set an event primitive to a signaled state, so the queues waiting upon the event can go on execution. If the event has been signaled before, the event MUST be reset with [`ti_reset_event`](#function-ti_reset_event-device-command); otherwise it is an undefined behavior.
+Sets an event primitive to a signaled state so that the queues waiting for it can go on execution. If the event has been signaled, you *must* call [`ti_reset_event`](#function-ti_reset_event-device-command) to reset it; otherwise, an undefined behavior would occur.
 
 ---
 ### Function `ti_reset_event` (Device Command)
@@ -697,7 +1048,7 @@ TI_DLL_EXPORT void TI_API_CALL ti_reset_event(
 );
 ```
 
-Set a signaled event primitive back to an unsignaled state.
+Sets a signaled event primitive back to an unsignaled state.
 
 ---
 ### Function `ti_wait_event` (Device Command)
@@ -734,7 +1085,7 @@ TI_DLL_EXPORT void TI_API_CALL ti_wait(
 );
 ```
 
-Wait until all previously invoked device command has finished execution.
+Waits until all previously invoked device commands are executed.
 
 ---
 ### Function `ti_load_aot_module`
@@ -747,7 +1098,8 @@ TI_DLL_EXPORT TiAotModule TI_API_CALL ti_load_aot_module(
 );
 ```
 
-Load a precompiled AOT module from the filesystem. [`TI_NULL_HANDLE`](#definition-ti_null_handle) is returned if the runtime failed to load the AOT module from the given path.
+Loads a pre-compiled AOT module from the file system.
+Returns [`TI_NULL_HANDLE`](#definition-ti_null_handle) if the runtime fails to load the AOT module from the specified path.
 
 ---
 ### Function `ti_destroy_aot_module`
@@ -759,7 +1111,7 @@ TI_DLL_EXPORT void TI_API_CALL ti_destroy_aot_module(
 );
 ```
 
-Destroy a loaded AOT module and release all related resources.
+Destroys a loaded AOT module and releases all related resources.
 
 ---
 ### Function `ti_get_aot_module_kernel`
@@ -772,7 +1124,8 @@ TI_DLL_EXPORT TiKernel TI_API_CALL ti_get_aot_module_kernel(
 );
 ```
 
-Get a precompiled Taichi kernel from the AOT module. [`TI_NULL_HANDLE`](#definition-ti_null_handle) is returned if the module does not have a kernel of the specified name.
+Retrieves a pre-compiled Taichi kernel from the AOT module.
+Returns [`TI_NULL_HANDLE`](#definition-ti_null_handle) if the module does not have a kernel of the specified name.
 
 ---
 ### Function `ti_get_aot_module_compute_graph`
diff --git a/docs/lang/articles/c-api/taichi_vulkan.md b/docs/lang/articles/c-api/taichi_vulkan.md
index 468b2cc30b499..b2aaf5950fd81 100644
--- a/docs/lang/articles/c-api/taichi_vulkan.md
+++ b/docs/lang/articles/c-api/taichi_vulkan.md
@@ -13,6 +13,7 @@ Taichi's Vulkan API gives you further control over Vulkan version and extension
 ```c
 // structure.vulkan_runtime_interop_info
 typedef struct TiVulkanRuntimeInteropInfo {
+  PFN_vkGetInstanceProcAddr get_instance_proc_addr;
   uint32_t api_version;
   VkInstance instance;
   VkPhysicalDevice physical_device;
@@ -55,6 +56,23 @@ Necessary detail to share a same piece of Vulkan buffer between Taichi and user
 - `size`: Size of the piece of memory in bytes.
 - `size`: Vulkan buffer usage. You usually want the `VK_BUFFER_USAGE_STORAGE_BUFFER_BIT` set.
 
+---
+### Structure `TiVulkanImageInteropInfo`
+
+```c
+// structure.vulkan_image_interop_info
+typedef struct TiVulkanImageInteropInfo {
+  VkImage image;
+  VkImageType image_type;
+  VkFormat format;
+  VkExtent3D extent;
+  uint32_t mip_level_count;
+  uint32_t array_layer_count;
+  VkSampleCountFlagBits sample_count;
+  VkImageTiling tiling;
+  VkImageUsageFlags usage;
+} TiVulkanImageInteropInfo;
+```
 ---
 ### Structure `TiVulkanEventInteropInfo`
 
@@ -137,6 +155,29 @@ TI_DLL_EXPORT void TI_API_CALL ti_export_vulkan_memory(
 
 Export a Vulkan buffer from external user applications to Taichi.
 
+---
+### Function `ti_import_vulkan_image`
+
+```c
+// function.import_vulkan_image
+TI_DLL_EXPORT TiImage TI_API_CALL ti_import_vulkan_image(
+  TiRuntime runtime,
+  const TiVulkanImageInteropInfo* interop_info,
+  VkImageViewType view_type,
+  VkImageLayout layout
+);
+```
+---
+### Function `ti_export_vulkan_image`
+
+```c
+// function.export_vulkan_image
+TI_DLL_EXPORT void TI_API_CALL ti_export_vulkan_image(
+  TiRuntime runtime,
+  TiImage image,
+  TiVulkanImageInteropInfo* interop_info
+);
+```
 ---
 ### Function `ti_import_vulkan_event`
 
diff --git a/misc/generate_c_api_docs.py b/misc/generate_c_api_docs.py
index 96ed8688e42ae..b9df0c98ca90b 100644
--- a/misc/generate_c_api_docs.py
+++ b/misc/generate_c_api_docs.py
@@ -188,6 +188,17 @@ def generate_module_header(module):
         BuiltInType("VkBuffer", "VkBuffer"),
         BuiltInType("VkBufferUsageFlags", "VkBufferUsageFlags"),
         BuiltInType("VkEvent", "VkEvent"),
+        BuiltInType("VkImage", "VkImage"),
+        BuiltInType("VkImageType", "VkImageType"),
+        BuiltInType("VkFormat", "VkFormat"),
+        BuiltInType("VkExtent3D", "VkExtent3D"),
+        BuiltInType("VkSampleCountFlagBits", "VkSampleCountFlagBits"),
+        BuiltInType("VkImageTiling", "VkImageTiling"),
+        BuiltInType("VkImageLayout", "VkImageLayout"),
+        BuiltInType("VkImageUsageFlags", "VkImageUsageFlags"),
+        BuiltInType("VkImageViewType", "VkImageViewType"),
+        BuiltInType("PFN_vkGetInstanceProcAddr", "PFN_vkGetInstanceProcAddr"),
+        BuiltInType("char", "char"),
     }
 
     for module in Module.load_all(builtin_tys):

From aa64467421bed069840ac13679d1ed0ac7e108cd Mon Sep 17 00:00:00 2001
From: "pre-commit-ci[bot]"
 <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Date: Fri, 23 Sep 2022 15:41:05 +0000
Subject: [PATCH 48/59] [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci
---
 c_api/docs/taichi/taichi_core.h.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/c_api/docs/taichi/taichi_core.h.md b/c_api/docs/taichi/taichi_core.h.md
index 4c9465d8b7e2c..50cad4207bb82 100644
--- a/c_api/docs/taichi/taichi_core.h.md
+++ b/c_api/docs/taichi/taichi_core.h.md
@@ -330,7 +330,7 @@ Multi-dimensional size of an ND-array. Dimension sizes after `structure.nd_shape
 Multi-dimentional array of dense primitive data.
 
 - `structure.nd_array.memory`: Memory bound to the ND-array.
-- `structure.nd_array.shape`: Shape of the ND-array. 
+- `structure.nd_array.shape`: Shape of the ND-array.
 - `structure.nd_array.elem_shape`: Shape of the ND-array elements. It *must not* be empty for vector or matrix ND-arrays.
 - `structure.nd_array.elem_type`: Primitive data type of the ND-array elements.
 

From 496de891e997ebf9d27aadc037a31bad6d21bf3f Mon Sep 17 00:00:00 2001
From: PENGUINLIONG <admin@penguinliong.moe>
Date: Sat, 24 Sep 2022 08:35:07 +0800
Subject: [PATCH 49/59] Update c_api/docs/taichi/taichi_core.h.md

Co-authored-by: Vissidarte-Herman <93570324+Vissidarte-Herman@users.noreply.github.com>
---
 c_api/docs/taichi/taichi_core.h.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/c_api/docs/taichi/taichi_core.h.md b/c_api/docs/taichi/taichi_core.h.md
index 50cad4207bb82..ac4bebd33050a 100644
--- a/c_api/docs/taichi/taichi_core.h.md
+++ b/c_api/docs/taichi/taichi_core.h.md
@@ -63,7 +63,7 @@ Allocated memory is automatically freed when the related `handle.runtime` is des
 ti_free_memory(runtime, memory);
 ```
 
-### Allocate Host-Accessible Memory
+### Allocate host-accessible memory
 
 By default, memory allocations are physically or conceptually local to the offload target for performance reasons. You can configure the allocate info to enable host access to memory allocations. But please note that host-accessible allocations MAY slow down computation on GPU because of the limited bus bandwidth between the host memory and the device.
 

From 42d28cd629d7e933bf5b5d8be35aefbb387430ec Mon Sep 17 00:00:00 2001
From: PENGUINLIONG <admin@penguinliong.moe>
Date: Sat, 24 Sep 2022 09:05:02 +0800
Subject: [PATCH 50/59] Editorial updates

---
 c_api/docs/taichi/taichi_core.h.md | 23 ++++++++++++-----------
 1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/c_api/docs/taichi/taichi_core.h.md b/c_api/docs/taichi/taichi_core.h.md
index 4c9465d8b7e2c..c099aa8a9ff10 100644
--- a/c_api/docs/taichi/taichi_core.h.md
+++ b/c_api/docs/taichi/taichi_core.h.md
@@ -24,11 +24,11 @@ For the backends with tier-2 support, you should expect a delay in the fixes to
 
 For convenience, in the following text and other C-API documents, the term *host* refers to the user of the C-API; the term *device* refers to the logical (conceptual) compute device, to which Taichi's runtime offloads its compute tasks. A *device* may not be a physical discrete processor other than the CPU and the *host* may *not* be able to access the memory allocated on the *device*.
 
-Unless explicitly explained, **device**, **backend**, **offload targer** and **GPU** are used interchangeably; **host**, **user code**, **user procedure** and **CPU** are used interchangeably too.
+Unless otherwise specified, **device**, **backend**, **offload targer**, and **GPU** are interchangeable; **host**, **user code**, **user procedure**, and **CPU** are interchangeable.
 
 ## How to...
 
-In this section we give an brief introduction about what you might want to do with the Taichi C-API.
+The following section provides a brief introduction to the Taichi C-API.
 
 ### Create and destroy a Runtime Instance
 
@@ -65,7 +65,7 @@ ti_free_memory(runtime, memory);
 
 ### Allocate Host-Accessible Memory
 
-By default, memory allocations are physically or conceptually local to the offload target for performance reasons. You can configure the allocate info to enable host access to memory allocations. But please note that host-accessible allocations MAY slow down computation on GPU because of the limited bus bandwidth between the host memory and the device.
+By default, memory allocations are physically or conceptually local to the offload target for performance reasons. You can configure the `structure.memory_allocate_info` to enable host access to memory allocations. But please note that host-accessible allocations *may* slow down computation on GPU because of the limited bus bandwidth between the host memory and the device.
 
 You *must* set `host_write` to `true` to allow streaming data to the memory.
 
@@ -85,7 +85,7 @@ std::memcpy(dst, src.data(), src.size());
 ti_unmap_memory(runtime, streaming_memory);
 ```
 
-To read data back to the host, `host_read` MUST be set true.
+To read data back to the host, `host_read` *must* be set true.
 
 ```cpp
 TiMemoryAllocateInfo mai {};
@@ -104,9 +104,9 @@ ti_unmap_memory(runtime, read_back_memory);
 ti_free_memory(runtime, read_back_memory);
 ```
 
-**NOTE** `host_read` and `host_write` can be set true simultaneously.
+> You can set `host_read` and `host_write` at the same time.
 
-### Load and destroy a Taichi AOT Module
+### Load and destroy a Taichi AOT module
 
 You can load a Taichi AOT module from the filesystem.
 
@@ -394,7 +394,7 @@ Copies the data in a contiguous subsection of the on-device memory to another su
 
 `function.launch_kernel`
 
-Launch a Taichi kernel with provided arguments. The arguments MUST have the same count and types in the same order as in the source code.
+Launches a Taichi kernel with the provided arguments. The arguments MUST have the same count and types in the same order as in the source code.
 
 `function.launch_compute_graph`
 
@@ -410,15 +410,15 @@ Sets a signaled event primitive back to an unsignaled state.
 
 `function.wait_event`
 
-Wait on an event primitive until it transitions to a signaled state. The user MUST signal the awaited event; otherwise it is an undefined behavior.
+Waits until an event primitive transitions to a signaled state. The awaited event *must* be signaled by an external procedure or an previous invocation to `function.reset_event`; otherwise, an undefined behavior would occur.
 
 `function.submit`
 
-Submit all commands to the logical device for execution. Ensure that any previous device command has been offloaded to the logical computing device.
+Submits all previously invoked device commands to the offload device for execution.
 
 `function.wait`
 
-Waits until all previously invoked device commands are executed.
+Waits until all previously invoked device commands are executed. Any invoked command that has not been submitted is submitted first.
 
 `function.load_aot_module`
 
@@ -436,4 +436,5 @@ Returns `definition.null_handle` if the module does not have a kernel of the spe
 
 `function.get_aot_module_compute_graph`
 
-Get a precompiled compute graph from the AOt module. `definition.null_handle` is returned if the module does not have a kernel of the specified name.
+Retrieves a pre-compiled compute graph from the AOT module. 
+Returns `definition.null_handle` if the module does not have a compute graph of the specified name.

From d923943fd7bcc09f47cca212dd08076cfbeb67a7 Mon Sep 17 00:00:00 2001
From: PENGUINLIONG <admin@penguinliong.moe>
Date: Sat, 24 Sep 2022 10:35:33 +0800
Subject: [PATCH 51/59] Added docs for new interfaces

---
 c_api/docs/taichi/taichi_core.h.md        | 158 +++++++++++++++++----
 c_api/docs/taichi/taichi_vulkan.h.md      |  49 +++++--
 c_api/taichi.json                         |   4 +
 docs/lang/articles/c-api/taichi_core.md   | 163 +++++++++++++++++-----
 docs/lang/articles/c-api/taichi_vulkan.md |  46 ++++--
 5 files changed, 330 insertions(+), 90 deletions(-)

diff --git a/c_api/docs/taichi/taichi_core.h.md b/c_api/docs/taichi/taichi_core.h.md
index b5a9d3f47a80f..0b837b7de85f1 100644
--- a/c_api/docs/taichi/taichi_core.h.md
+++ b/c_api/docs/taichi/taichi_core.h.md
@@ -4,7 +4,7 @@ sidebar_position: 1
 
 # Core Functionality
 
-Taichi Core exposes all necessary interfaces for offloading the AOT modules to Taichi. The following are a list of features that are available regardless of your backend. The corresponding APIs are still under development and subject to change.
+Taichi Core exposes all necessary interfaces for offloading the AOT modules to Taichi. The following is a list of features that are available regardless of your backend. The corresponding APIs are still under development and subject to change.
 
 ## Availability
 
@@ -19,12 +19,12 @@ Taichi C-API intends to support the following backends:
 |DirectX 11|GPU (Windows)|N/A|
 |Metal|GPU (macOS, iOS)|N/A|
 
-The backends with tier-1 support are being developed and tested more intensively. And most new features will be available on Vulkan first, because it has the most outstanding cross-platform compatibility among all the tier-1 backends.
-For the backends with tier-2 support, you should expect a delay in the fixes to the minor issues.
+The backends with tier-1 support are being developed and tested more intensively. And most new features will be available on Vulkan first because it has the most outstanding cross-platform compatibility among all the tier-1 backends.
+For the backends with tier-2 support, you should expect a delay in the fixes to minor issues.
 
 For convenience, in the following text and other C-API documents, the term *host* refers to the user of the C-API; the term *device* refers to the logical (conceptual) compute device, to which Taichi's runtime offloads its compute tasks. A *device* may not be a physical discrete processor other than the CPU and the *host* may *not* be able to access the memory allocated on the *device*.
 
-Unless otherwise specified, **device**, **backend**, **offload targer**, and **GPU** are interchangeable; **host**, **user code**, **user procedure**, and **CPU** are interchangeable.
+Unless otherwise specified, **device**, **backend**, **offload target**, and **GPU** are interchangeable; **host**, **user code**, **user procedure**, and **CPU** are interchangeable.
 
 ## How to...
 
@@ -32,7 +32,7 @@ The following section provides a brief introduction to the Taichi C-API.
 
 ### Create and destroy a Runtime Instance
 
-You *must* create a runtime instance before working with Taichi, and *only* one runtime per thread. Currently we do not officially claim that multiple runtime instances can coexist in a process, but please feel free to [file an issue with us](https://github.com/taichi-dev/taichi/issues) if you run into any problem with runtime instance coexistence.
+You *must* create a runtime instance before working with Taichi, and *only* one runtime per thread. Currently, we do not officially claim that multiple runtime instances can coexist in a process, but please feel free to [file an issue with us](https://github.com/taichi-dev/taichi/issues) if you run into any problem with runtime instance coexistence.
 
 ```cpp
 TiRuntime runtime = ti_create_runtime(TI_ARCH_VULKAN);
@@ -85,7 +85,7 @@ std::memcpy(dst, src.data(), src.size());
 ti_unmap_memory(runtime, streaming_memory);
 ```
 
-To read data back to the host, `host_read` *must* be set true.
+To read data back to the host, `host_read` *must* be set to true.
 
 ```cpp
 TiMemoryAllocateInfo mai {};
@@ -204,7 +204,7 @@ A condition or a predicate is not satisfied; a statement is invalid.
 
 A bit field that can be used to represent 32 orthogonal flags. Bits unspecified in the corresponding flag enum are ignored.
 
-**NOTE** Enumerations and bit-field flags in the C-API have a `TI_XXX_MAX_ENUM` case to ensure the enum to have a 32-bit range and in-memory size. It has no semantical impact and can be safely ignored.
+**NOTE** Enumerations and bit-field flags in the C-API have a `TI_XXX_MAX_ENUM` case to ensure the enum has a 32-bit range and in-memory size. It has no semantical impact and can be safely ignored.
 
 `definition.null_handle`
 
@@ -212,7 +212,7 @@ A sentinal invalid handle that will never be produced from a valid call to Taich
 
 `handle.runtime`
 
-Taichi runtime represents an instance of a logical backend and its internal dynamic state. The user is responsible to synchronize any use of `handle.runtime`. The user MUST NOT manipulate multiple `handle.runtime`s in a same thread.
+Taichi runtime represents an instance of a logical backend and its internal dynamic state. The user is responsible to synchronize any use of `handle.runtime`. The user MUST NOT manipulate multiple `handle.runtime`s in the same thread.
 
 `handle.aot_module`
 
@@ -220,27 +220,27 @@ An ahead-of-time (AOT) compiled Taichi module, which contains a collection of ke
 
 `handle.event`
 
-A synchronization primitive to manage on-device execution flows in multiple queues.
+A synchronization primitive to manage device execution flows in multiple queues.
 
 `handle.memory`
 
-A contiguous allocation of on-device memory.
+A contiguous allocation of device memory.
 
 `handle.image`
 
-A contiguous allocation of on-device image.
+A contiguous allocation of device image.
 
 `handle.sampler`
 
-An image sampler. `definition.null_handle` represents a default image sampler provided by the runtime implementation. The filter modes, address modes of default samplers depends on backend implementation.
+An image sampler. `definition.null_handle` represents a default image sampler provided by the runtime implementation. The filter modes and address modes of default samplers depend on backend implementation.
 
 `handle.kernel`
 
-A Taichi kernel that can be launched on device for execution.
+A Taichi kernel that can be launched on the offload target for execution.
 
 `handle.compute_graph`
 
-A collection of Taichi kernels (a compute graph) to launch on device in a predefined order.
+A collection of Taichi kernels (a compute graph) to launch on the offload target in a predefined order.
 
 `enumeration.error`
 
@@ -251,11 +251,11 @@ Errors reported by the Taichi C-API.
 - `enumeration.error.not_supported`: The invoked API, or the combination of parameters is not supported by the Taichi C-API.
 - `enumeration.error.corrupted_data`: Provided data is corrupted.
 - `enumeration.error.name_not_found`: Provided name does not refer to any existing item.
-- `enumeration.error.invalid_argument`: One or more function arguments violate constraints specified in C-API documents; or kernel arguments mismatch the kernel argument list defined in the AOT module.
+- `enumeration.error.invalid_argument`: One or more function arguments violate constraints specified in C-API documents, or kernel arguments mismatch the kernel argument list defined in the AOT module.
 - `enumeration.error.argument_null`: One or more by-reference (pointer) function arguments point to null.
 - `enumeration.error.argument_out_of_range`: One or more function arguments are out of its acceptable range; or enumeration arguments have undefined value.
 - `enumeration.error.argument_not_found`: One or more kernel arguments are missing.
-- `enumeration.error.invalid_interop`: The intended interoperation is not possible on the current arch. For example, attempts to export a Vulkan object from a CUDA runtime is not allowed.
+- `enumeration.error.invalid_interop`: The intended interoperation is not possible on the current arch. For example, attempts to export a Vulkan object from a CUDA runtime are not allowed.
 - `enumeration.error.invalid_state`: The Taichi C-API enters an unrecoverable invalid state. Related Taichi objects are potentially corrupted. The users *should* release the contaminated resources for stability. Please feel free to file an issue if you encountered this error in a normal routine.
 
 `enumeration.arch`
@@ -295,7 +295,7 @@ Types of kernel and compute graph argument.
 
 Usages of a memory allocation.
 
-- `bit_field.memory_usage.storage`: The memory can be read/write accessed by any kernel. In most of the cases, the users only need to set this flag.
+- `bit_field.memory_usage.storage`: The memory can be read/write accessed by any kernel. In most cases, the users only need to set this flag.
 - `bit_field.memory_usage.uniform`: The memory can be used as a uniform buffer in graphics pipelines.
 - `bit_field.memory_usage.vertex`: The memory can be used as a vertex buffer in graphics pipelines.
 - `bit_field.memory_usage.index`: The memory can be used as a index buffer in graphics pipelines.
@@ -308,7 +308,7 @@ Parameters of a newly allocated memory.
 - `structure.memory_allocate_info.host_write`: True if the host needs to write to the allocated memory.
 - `structure.memory_allocate_info.host_read`: True if the host needs to read from the allocated memory.
 - `structure.memory_allocate_info.export_sharing`: True if the memory allocation needs to be exported to other backends (e.g., from Vulkan to CUDA).
-- `structure.memory_allocate_info.usage`: All possible usage of this memory allocation. In most of the cases, `bit_field.memory_usage.storage` is enough.
+- `structure.memory_allocate_info.usage`: All possible usage of this memory allocation. In most cases, `bit_field.memory_usage.storage` is enough.
 
 `structure.memory_slice`
 
@@ -327,13 +327,95 @@ Multi-dimensional size of an ND-array. Dimension sizes after `structure.nd_shape
 
 `structure.nd_array`
 
-Multi-dimentional array of dense primitive data.
+Multi-dimensional array of dense primitive data.
 
 - `structure.nd_array.memory`: Memory bound to the ND-array.
 - `structure.nd_array.shape`: Shape of the ND-array.
 - `structure.nd_array.elem_shape`: Shape of the ND-array elements. It *must not* be empty for vector or matrix ND-arrays.
 - `structure.nd_array.elem_type`: Primitive data type of the ND-array elements.
 
+`bit_field.image_usage`
+
+Usages of an image allocation.
+
+- `bit_field.image_usage.storage`: The image can be read/write accessed by any kernel. In most cases, the users only need to set this flag and `bit_field.image_usage.sampled`.
+- `bit_field.image_usage.sampled`: The image can be read-only accessed by any kernel. In most cases, the users only need to set this flag and `bit_field.image_usage.storage`.
+- `bit_field.image_usage.attachment`: The image can be used as a color or depth-stencil attachment depending on its format.
+
+`enumeration.image_dimension`
+
+Dimensions of an image allocation.
+
+- `enumeration.image_dimension.1d`: The image is 1-dimensional.
+- `enumeration.image_dimension.2d`: The image is 2-dimensional.
+- `enumeration.image_dimension.3d`: The image is 3-dimensional.
+- `enumeration.image_dimension.1d_array`: The image is 1-dimensional and it has one or more layers.
+- `enumeration.image_dimension.2d_array`: The image is 2-dimensional and it has one or more layers.
+- `enumeration.image_dimension.cube`: The image is 2-dimensional and it has 6 layers for the faces towards +X, -X, +Y, -Y, +Z, -Z in sequence.
+
+`enumeration.image_layout`
+
+- `enumeration.image_layout.`: Undefined layout. An image in this layout does not contain any semantical information.
+- `enumeration.image_layout.shader_read`: Optimal layout for read-only access, including sampling.
+- `enumeration.image_layout.shader_write`: Optimal layout for write-only access.
+- `enumeration.image_layout.shader_read_write`: Optimal layout for read/write access.
+- `enumeration.image_layout.color_attachment`: Optimal layout as a color attachment.
+- `enumeration.image_layout.color_attachment_read`: Optimal layout as an input color attachment.
+- `enumeration.image_layout.depth_attachment`: Optimal layout as a depth attachment.
+- `enumeration.image_layout.depth_attachment_read`: Optimal layout as an input depth attachment.
+- `enumeration.image_layout.transfer_dst`: Optimal layout as a data copy destination.
+- `enumeration.image_layout.transfer_src`: Optimal layout as a data copy source.
+- `enumeration.image_layout.present_src`:  Optimal layout as a presentation source.
+
+`structure.image_offset`
+
+Offsets of an image in X, Y, Z, and array layers.
+
+- `structure.image_offset.x`: Image offset in the X direction.
+- `structure.image_offset.y`: Image offset in the Y direction. *Must* be 0 if the image has a dimension of `enumeration.image_dimension.1d` or `enumeration.image_dimension.1d_array`.
+- `structure.image_offset.z`: Image offset in the Z direction. *Must* be 0 if the image has a dimension of `enumeration.image_dimension.1d`, `enumeration.image_dimension.2d`, `enumeration.image_dimension.1d_array`, `enumeration.image_dimension.2d_array` or `enumeration.image_dimension.cube_array`.
+- `structure.image_offset.array_layer_offset`: Image offset in array layers. *Must* be 0 if the image has a dimension of `enumeration.image_dimension.1d`, `enumeration.image_dimension.2d` or `enumeration.image_dimension.3d`.
+
+`structure.image_extent`
+
+Extents of an image in X, Y, Z, and array layers.
+
+- `structure.image_extent.width`: Image extent in the X direction.
+- `structure.image_extent.height`: Image extent in the Y direction. *Must* be 1 if the image has a dimension of `enumeration.image_dimension.1d` or `enumeration.image_dimension.1d_array`.
+- `structure.image_extent.depth`: Image extent in the Z direction. *Must* be 1 if the image has a dimension of `enumeration.image_dimension.1d`, `enumeration.image_dimension.2d`, `enumeration.image_dimension.1d_array`, `enumeration.image_dimension.2d_array` or `enumeration.image_dimension.cube_array`.
+- `structure.image_extent.array_layer_count`: Image extent in array layers. *Must* be 1 if the image has a dimension of `enumeration.image_dimension.1d`, `enumeration.image_dimension.2d` or `enumeration.image_dimension.3d`. *Must* be 6 if the image has a dimension of `enumeration.image_dimension.cube_array`.
+
+`structure.image_allocate_info`
+
+Parameters of a newly allocated image.
+
+- `structure.image_allocate_info.dimension`: Image dimension.
+- `structure.image_allocate_info.extent`: Image extent.
+- `structure.image_allocate_info.mip_level_count`: Number of mip-levels.
+- `structure.image_allocate_info.format`: Image texel format.
+- `structure.image_allocate_info.host_read`: True if the host needs to read from the allocated memory.
+- `structure.image_allocate_info.export_sharing`: True if the memory allocation needs to be exported to other backends (e.g., from Vulkan to CUDA).
+- `structure.image_allocate_info.usage`: All possible usage of this image allocation. In most cases, `bit_field.image_usage.storage` and `bit_field.image_usage.sampled` enough.
+
+`structure.image_slice`
+
+A subsection of a memory allocation. The sum of `structure.image_slice.offset` and `structure.image_slice.extent` in each dimension cannot exceed the size of `structure.image_slice.image`.
+
+- `structure.image_slice.image`: The subsectioned image allocation.
+- `structure.image_slice.offset`: Offset from the beginning of the allocation in each dimension.
+- `structure.image_slice.extent`: Size of the subsection in each dimension.
+- `structure.image_slice.mip_level`: The subsectioned mip-level.
+
+`structure.texture`
+
+Image data bound to a sampler.
+
+- `structure.nd_array.image`: Image bound to the texture.
+- `structure.nd_array.sampler`: The bound sampler that controls the sampling behavior of `structure.nd_array.image`.
+- `structure.nd_array.dimension`: Image Dimension.
+- `structure.nd_array.extent`: Extent of image.
+- `structure.nd_array.format`: Image texel format.
+
 `union.argument_value`
 
 A scalar or structured argument value.
@@ -351,22 +433,22 @@ An argument value to feed kernels.
 
 `structure.named_argument`
 
-An named argument value to feed compute graphcs.
+A named argument value to feed compute graphs.
 
 - `structure.named_argument.name`: Name of the argument.
 - `structure.named_argument.argument`: Argument body.
 
 `function.create_runtime`
 
-Create a Taichi Runtime with the specified `enumeration.arch`.
+Creates a Taichi Runtime with the specified `enumeration.arch`.
 
 `function.destroy_runtime`
 
-Destroy a Taichi Runtime.
+Destroys a Taichi Runtime.
 
 `function.allocate_memory`
 
-Allocate a contiguous on-device memory with provided parameters.
+Allocates a contiguous device memory with provided parameters.
 
 `function.free_memory`
 
@@ -374,11 +456,19 @@ Frees a memory allocation.
 
 `function.map_memory`
 
-Maps an on-device memory to a host-addressible space. You *must* ensure that the device is not being used by any device command before the mapping.
+Maps a device memory to a host-addressable space. You *must* ensure that the device is not being used by any device command before the mapping.
 
 `function.unmap_memory`
 
-Unmaps an on-device memory and makes any host-side changes about the memory visible to the device. You *must* ensure that there is no further access to the previously mapped host-addressible space.
+Unmaps a device memory and makes any host-side changes about the memory visible to the device. You *must* ensure that there is no further access to the previously mapped host-addressable space.
+
+`function.allocate_image`
+
+Allocate a device image with provided parameters.
+
+`function.free_image`
+
+Frees an image allocation.
 
 `function.create_event`
 
@@ -390,7 +480,19 @@ Destroys an event primitive.
 
 `function.copy_memory_device_to_device`
 
-Copies the data in a contiguous subsection of the on-device memory to another subsection. Note that the two subsections *must not* overlap.
+Copies the data in a contiguous subsection of the device memory to another subsection. The two subsections *must not* overlap.
+
+`function.copy_image_device_to_device`
+
+Copies the image data in a contiguous subsection of the device image to another subsection. The two subsections *must not* overlap.
+
+`function.track_image`
+
+Tracks the device image with the provided image layout. Because Taichi tracks image layouts internally, it is *only* useful to inform Taichi that the image is transitioned to a new layout by external procedures.
+
+`function.transition_image`
+
+Transition the image to the provided image layout. Because Taichi tracks image layouts internally, it is *only* useful to enforce an image layout for external procedures to use.
 
 `function.launch_kernel`
 
@@ -410,7 +512,7 @@ Sets a signaled event primitive back to an unsignaled state.
 
 `function.wait_event`
 
-Waits until an event primitive transitions to a signaled state. The awaited event *must* be signaled by an external procedure or an previous invocation to `function.reset_event`; otherwise, an undefined behavior would occur.
+Waits until an event primitive transitions to a signaled state. The awaited event *must* be signaled by an external procedure or a previous invocation to `function.reset_event`; otherwise, an undefined behavior would occur.
 
 `function.submit`
 
@@ -436,5 +538,5 @@ Returns `definition.null_handle` if the module does not have a kernel of the spe
 
 `function.get_aot_module_compute_graph`
 
-Retrieves a pre-compiled compute graph from the AOT module. 
+Retrieves a pre-compiled compute graph from the AOT module.
 Returns `definition.null_handle` if the module does not have a compute graph of the specified name.
diff --git a/c_api/docs/taichi/taichi_vulkan.h.md b/c_api/docs/taichi/taichi_vulkan.h.md
index 16744941939f4..5cd73795dc93e 100644
--- a/c_api/docs/taichi/taichi_vulkan.h.md
+++ b/c_api/docs/taichi/taichi_vulkan.h.md
@@ -4,15 +4,16 @@ sidebar_positions: 2
 
 # Vulkan Backend Features
 
-Taichi's Vulkan API gives you further control over Vulkan version and extension requirements and allows you to interop with external Vulkan applications with shared resources.
+Taichi's Vulkan API gives you further control over the Vulkan version and extension requirements and allows you to interop with external Vulkan applications with shared resources.
 
 ## API Reference
 
 `structure.vulkan_runtime_interop_info`
 
-Necessary detail to share a same Vulkan runtime between Taichi and user applications.
+Necessary detail to share the same Vulkan runtime between Taichi and external procedures.
 
-- `structure.vulkan_runtime_interop_info.api_version`: Targeted Vulkan API version.
+- `structure.vulkan_runtime_interop_info.get_instance_proc_addr`: Pointer to Vulkan loader function `vkGetInstanceProcAddr`.
+- `structure.vulkan_runtime_interop_info.api_version`: Target Vulkan API version.
 - `structure.vulkan_runtime_interop_info.instance`: Vulkan instance handle.
 - `structure.vulkan_runtime_interop_info.physical_device`: Vulkan physical device handle.
 - `structure.vulkan_runtime_interop_info.device`: Vulkan logical device handle.
@@ -25,42 +26,64 @@ Necessary detail to share a same Vulkan runtime between Taichi and user applicat
 
 `structure.vulkan_memory_interop_info`
 
-Necessary detail to share a same piece of Vulkan buffer between Taichi and user applications.
+Necessary detail to share the same piece of Vulkan buffer between Taichi and external procedures.
 
 - `structure.vulkan_memory_interop_info.buffer`: Vulkan buffer.
 - `structure.vulkan_memory_interop_info.size`: Size of the piece of memory in bytes.
-- `structure.vulkan_memory_interop_info.size`: Vulkan buffer usage. You usually want the `VK_BUFFER_USAGE_STORAGE_BUFFER_BIT` set.
+- `structure.vulkan_memory_interop_info.usage`: Vulkan buffer usage. In most of the cases, Taichi requires the `VK_BUFFER_USAGE_STORAGE_BUFFER_BIT`.
+
+`structure.vulkan_image_interop_info`
+
+Necessary detail to share the same piece of Vulkan image between Taichi and external procedures.
+
+- `structure.vulkan_image_interop_info.image`: Vulkan image.
+- `structure.vulkan_image_interop_info.image_type`: Vulkan image allocation type.
+- `structure.vulkan_image_interop_info.format`: Pixel format.
+- `structure.vulkan_image_interop_info.extent`: Image extent.
+- `structure.vulkan_image_interop_info.mip_level_count`: Number of mip-levels of the image.
+- `structure.vulkan_image_interop_info.array_layer_count`: Number of array layers.
+- `structure.vulkan_image_interop_info.sample_count`: Number of samples per pixel.
+- `structure.vulkan_image_interop_info.tiling`: Image tiling.
+- `structure.vulkan_image_interop_info.usage`: Vulkan image usage. In most cases, Taichi requires the `VK_IMAGE_USAGE_STORAGE_BIT` and the `VK_IMAGE_USAGE_SAMPLED_BIT`.
 
 `structure.vulkan_event_interop_info`
 
-Necessary detail to share a same Vulkan event synchronization primitive between Taichi and user application.
+Necessary detail to share the same Vulkan event synchronization primitive between Taichi and the user application.
 
 - `structure.vulkan_event_interop_info.event`: Vulkan event handle.
 
 `function.create_vulkan_runtime`
 
-Create a Vulkan Taichi runtime with user controlled capability settings.
+Create a Vulkan Taichi runtime with user-controlled capability settings.
 
 `function.import_vulkan_runtime`
 
-Import the Vulkan runtime owned by Taichi to external user applications.
+Import the Vulkan runtime owned by Taichi to external procedures.
 
 `function.export_vulkan_runtime`
 
-Export a Vulkan runtime from external user applications to Taichi.
+Export a Vulkan runtime from external procedures to Taichi.
 
 `function.import_vulkan_memory`
 
-Import the Vulkan buffer owned by Taichi to external user applications.
+Import the Vulkan buffer owned by Taichi to external procedures.
 
 `function.export_vulkan_memory`
 
-Export a Vulkan buffer from external user applications to Taichi.
+Export a Vulkan buffer from external procedures to Taichi.
+
+`function.import_vulkan_image`
+
+Import the Vulkan image owned by Taichi to external procedures.
+
+`function.export_vulkan_image`
+
+Export a Vulkan image from external procedures to Taichi.
 
 `function.import_vulkan_event`
 
-Import the Vulkan event owned by Taichi to external user applications.
+Import the Vulkan event owned by Taichi to external procedures.
 
 `function.export_vulkan_event`
 
-Export a Vulkan event from external user applications to Taichi.
+Export a Vulkan event from external procedures to Taichi.
diff --git a/c_api/taichi.json b/c_api/taichi.json
index dad2b9b44bac4..0968f3d1dc840 100644
--- a/c_api/taichi.json
+++ b/c_api/taichi.json
@@ -302,6 +302,10 @@
                             "name": "format",
                             "type": "enumeration.format"
                         },
+                        {
+                            "name": "export_sharing",
+                            "type": "alias.bool"
+                        },
                         {
                             "name": "usage",
                             "type": "bit_field.image_usage"
diff --git a/docs/lang/articles/c-api/taichi_core.md b/docs/lang/articles/c-api/taichi_core.md
index bdcc7f2b8c393..abc7bfaab9a37 100644
--- a/docs/lang/articles/c-api/taichi_core.md
+++ b/docs/lang/articles/c-api/taichi_core.md
@@ -4,7 +4,7 @@ sidebar_position: 1
 
 # Core Functionality
 
-Taichi Core exposes all necessary interfaces for offloading the AOT modules to Taichi. The following are a list of features that are available regardless of your backend. The corresponding APIs are still under development and subject to change.
+Taichi Core exposes all necessary interfaces for offloading the AOT modules to Taichi. The following is a list of features that are available regardless of your backend. The corresponding APIs are still under development and subject to change.
 
 ## Availability
 
@@ -19,20 +19,20 @@ Taichi C-API intends to support the following backends:
 |DirectX 11|GPU (Windows)|N/A|
 |Metal|GPU (macOS, iOS)|N/A|
 
-The backends with tier-1 support are being developed and tested more intensively. And most new features will be available on Vulkan first, because it has the most outstanding cross-platform compatibility among all the tier-1 backends.
-For the backends with tier-2 support, you should expect a delay in the fixes to the minor issues.
+The backends with tier-1 support are being developed and tested more intensively. And most new features will be available on Vulkan first because it has the most outstanding cross-platform compatibility among all the tier-1 backends.
+For the backends with tier-2 support, you should expect a delay in the fixes to minor issues.
 
 For convenience, in the following text and other C-API documents, the term *host* refers to the user of the C-API; the term *device* refers to the logical (conceptual) compute device, to which Taichi's runtime offloads its compute tasks. A *device* may not be a physical discrete processor other than the CPU and the *host* may *not* be able to access the memory allocated on the *device*.
 
-Unless explicitly explained, **device**, **backend**, **offload targer** and **GPU** are used interchangeably; **host**, **user code**, **user procedure** and **CPU** are used interchangeably too.
+Unless otherwise specified, **device**, **backend**, **offload target**, and **GPU** are interchangeable; **host**, **user code**, **user procedure**, and **CPU** are interchangeable.
 
 ## How to...
 
-In this section we give an brief introduction about what you might want to do with the Taichi C-API.
+The following section provides a brief introduction to the Taichi C-API.
 
 ### Create and destroy a Runtime Instance
 
-You *must* create a runtime instance before working with Taichi, and *only* one runtime per thread. Currently we do not officially claim that multiple runtime instances can coexist in a process, but please feel free to [file an issue with us](https://github.com/taichi-dev/taichi/issues) if you run into any problem with runtime instance coexistence.
+You *must* create a runtime instance before working with Taichi, and *only* one runtime per thread. Currently, we do not officially claim that multiple runtime instances can coexist in a process, but please feel free to [file an issue with us](https://github.com/taichi-dev/taichi/issues) if you run into any problem with runtime instance coexistence.
 
 ```cpp
 TiRuntime runtime = ti_create_runtime(TI_ARCH_VULKAN);
@@ -63,9 +63,9 @@ Allocated memory is automatically freed when the related [`TiRuntime`](#handle-t
 ti_free_memory(runtime, memory);
 ```
 
-### Allocate Host-Accessible Memory
+### Allocate host-accessible memory
 
-By default, memory allocations are physically or conceptually local to the offload target for performance reasons. You can configure the allocate info to enable host access to memory allocations. But please note that host-accessible allocations MAY slow down computation on GPU because of the limited bus bandwidth between the host memory and the device.
+By default, memory allocations are physically or conceptually local to the offload target for performance reasons. You can configure the [`TiMemoryAllocateInfo`](#structure-timemoryallocateinfo) to enable host access to memory allocations. But please note that host-accessible allocations *may* slow down computation on GPU because of the limited bus bandwidth between the host memory and the device.
 
 You *must* set `host_write` to `true` to allow streaming data to the memory.
 
@@ -85,7 +85,7 @@ std::memcpy(dst, src.data(), src.size());
 ti_unmap_memory(runtime, streaming_memory);
 ```
 
-To read data back to the host, `host_read` MUST be set true.
+To read data back to the host, `host_read` *must* be set to true.
 
 ```cpp
 TiMemoryAllocateInfo mai {};
@@ -104,9 +104,9 @@ ti_unmap_memory(runtime, read_back_memory);
 ti_free_memory(runtime, read_back_memory);
 ```
 
-**NOTE** `host_read` and `host_write` can be set true simultaneously.
+> You can set `host_read` and `host_write` at the same time.
 
-### Load and destroy a Taichi AOT Module
+### Load and destroy a Taichi AOT module
 
 You can load a Taichi AOT module from the filesystem.
 
@@ -227,7 +227,7 @@ typedef uint32_t TiFlags;
 
 A bit field that can be used to represent 32 orthogonal flags. Bits unspecified in the corresponding flag enum are ignored.
 
-**NOTE** Enumerations and bit-field flags in the C-API have a `TI_XXX_MAX_ENUM` case to ensure the enum to have a 32-bit range and in-memory size. It has no semantical impact and can be safely ignored.
+**NOTE** Enumerations and bit-field flags in the C-API have a `TI_XXX_MAX_ENUM` case to ensure the enum has a 32-bit range and in-memory size. It has no semantical impact and can be safely ignored.
 
 ---
 ### Definition `TI_NULL_HANDLE`
@@ -247,7 +247,7 @@ A sentinal invalid handle that will never be produced from a valid call to Taich
 typedef struct TiRuntime_t* TiRuntime;
 ```
 
-Taichi runtime represents an instance of a logical backend and its internal dynamic state. The user is responsible to synchronize any use of [`TiRuntime`](#handle-tiruntime). The user MUST NOT manipulate multiple [`TiRuntime`](#handle-tiruntime)s in a same thread.
+Taichi runtime represents an instance of a logical backend and its internal dynamic state. The user is responsible to synchronize any use of [`TiRuntime`](#handle-tiruntime). The user MUST NOT manipulate multiple [`TiRuntime`](#handle-tiruntime)s in the same thread.
 
 ---
 ### Handle `TiAotModule`
@@ -267,7 +267,7 @@ An ahead-of-time (AOT) compiled Taichi module, which contains a collection of ke
 typedef struct TiEvent_t* TiEvent;
 ```
 
-A synchronization primitive to manage on-device execution flows in multiple queues.
+A synchronization primitive to manage device execution flows in multiple queues.
 
 ---
 ### Handle `TiMemory`
@@ -277,7 +277,7 @@ A synchronization primitive to manage on-device execution flows in multiple queu
 typedef struct TiMemory_t* TiMemory;
 ```
 
-A contiguous allocation of on-device memory.
+A contiguous allocation of device memory.
 
 ---
 ### Handle `TiImage`
@@ -287,7 +287,7 @@ A contiguous allocation of on-device memory.
 typedef struct TiImage_t* TiImage;
 ```
 
-A contiguous allocation of on-device image.
+A contiguous allocation of device image.
 
 ---
 ### Handle `TiSampler`
@@ -297,7 +297,7 @@ A contiguous allocation of on-device image.
 typedef struct TiSampler_t* TiSampler;
 ```
 
-An image sampler. [`TI_NULL_HANDLE`](#definition-ti_null_handle) represents a default image sampler provided by the runtime implementation. The filter modes, address modes of default samplers depends on backend implementation.
+An image sampler. [`TI_NULL_HANDLE`](#definition-ti_null_handle) represents a default image sampler provided by the runtime implementation. The filter modes and address modes of default samplers depend on backend implementation.
 
 ---
 ### Handle `TiKernel`
@@ -307,7 +307,7 @@ An image sampler. [`TI_NULL_HANDLE`](#definition-ti_null_handle) represents a de
 typedef struct TiKernel_t* TiKernel;
 ```
 
-A Taichi kernel that can be launched on device for execution.
+A Taichi kernel that can be launched on the offload target for execution.
 
 ---
 ### Handle `TiComputeGraph`
@@ -317,7 +317,7 @@ A Taichi kernel that can be launched on device for execution.
 typedef struct TiComputeGraph_t* TiComputeGraph;
 ```
 
-A collection of Taichi kernels (a compute graph) to launch on device in a predefined order.
+A collection of Taichi kernels (a compute graph) to launch on the offload target in a predefined order.
 
 ---
 ### Enumeration `TiError`
@@ -347,11 +347,11 @@ Errors reported by the Taichi C-API.
 - `TI_ERROR_NOT_SUPPORTED`: The invoked API, or the combination of parameters is not supported by the Taichi C-API.
 - `TI_ERROR_CORRUPTED_DATA`: Provided data is corrupted.
 - `TI_ERROR_NAME_NOT_FOUND`: Provided name does not refer to any existing item.
-- `TI_ERROR_INVALID_ARGUMENT`: One or more function arguments violate constraints specified in C-API documents; or kernel arguments mismatch the kernel argument list defined in the AOT module.
+- `TI_ERROR_INVALID_ARGUMENT`: One or more function arguments violate constraints specified in C-API documents, or kernel arguments mismatch the kernel argument list defined in the AOT module.
 - `TI_ERROR_ARGUMENT_NULL`: One or more by-reference (pointer) function arguments point to null.
 - `TI_ERROR_ARGUMENT_OUT_OF_RANGE`: One or more function arguments are out of its acceptable range; or enumeration arguments have undefined value.
 - `TI_ERROR_ARGUMENT_NOT_FOUND`: One or more kernel arguments are missing.
-- `TI_ERROR_INVALID_INTEROP`: The intended interoperation is not possible on the current arch. For example, attempts to export a Vulkan object from a CUDA runtime is not allowed.
+- `TI_ERROR_INVALID_INTEROP`: The intended interoperation is not possible on the current arch. For example, attempts to export a Vulkan object from a CUDA runtime are not allowed.
 - `TI_ERROR_INVALID_STATE`: The Taichi C-API enters an unrecoverable invalid state. Related Taichi objects are potentially corrupted. The users *should* release the contaminated resources for stability. Please feel free to file an issue if you encountered this error in a normal routine.
 
 ---
@@ -458,7 +458,7 @@ typedef TiFlags TiMemoryUsageFlags;
 
 Usages of a memory allocation.
 
-- `TI_MEMORY_USAGE_STORAGE_BIT`: The memory can be read/write accessed by any kernel. In most of the cases, the users only need to set this flag.
+- `TI_MEMORY_USAGE_STORAGE_BIT`: The memory can be read/write accessed by any kernel. In most cases, the users only need to set this flag.
 - `TI_MEMORY_USAGE_UNIFORM_BIT`: The memory can be used as a uniform buffer in graphics pipelines.
 - `TI_MEMORY_USAGE_VERTEX_BIT`: The memory can be used as a vertex buffer in graphics pipelines.
 - `TI_MEMORY_USAGE_INDEX_BIT`: The memory can be used as a index buffer in graphics pipelines.
@@ -483,7 +483,7 @@ Parameters of a newly allocated memory.
 - `host_write`: True if the host needs to write to the allocated memory.
 - `host_read`: True if the host needs to read from the allocated memory.
 - `export_sharing`: True if the memory allocation needs to be exported to other backends (e.g., from Vulkan to CUDA).
-- `usage`: All possible usage of this memory allocation. In most of the cases, `TI_MEMORY_USAGE_STORAGE_BIT` is enough.
+- `usage`: All possible usage of this memory allocation. In most cases, `TI_MEMORY_USAGE_STORAGE_BIT` is enough.
 
 ---
 ### Structure `TiMemorySlice`
@@ -532,7 +532,7 @@ typedef struct TiNdArray {
 } TiNdArray;
 ```
 
-Multi-dimentional array of dense primitive data.
+Multi-dimensional array of dense primitive data.
 
 - `memory`: Memory bound to the ND-array.
 - `shape`: Shape of the ND-array.
@@ -551,6 +551,13 @@ typedef enum TiImageUsageFlagBits {
 } TiImageUsageFlagBits;
 typedef TiFlags TiImageUsageFlags;
 ```
+
+Usages of an image allocation.
+
+- `TI_IMAGE_USAGE_STORAGE_BIT`: The image can be read/write accessed by any kernel. In most cases, the users only need to set this flag and `TI_IMAGE_USAGE_SAMPLED_BIT`.
+- `TI_IMAGE_USAGE_SAMPLED_BIT`: The image can be read-only accessed by any kernel. In most cases, the users only need to set this flag and `TI_IMAGE_USAGE_STORAGE_BIT`.
+- `TI_IMAGE_USAGE_ATTACHMENT_BIT`: The image can be used as a color or depth-stencil attachment depending on its format.
+
 ---
 ### Enumeration `TiImageDimension`
 
@@ -566,6 +573,16 @@ typedef enum TiImageDimension {
   TI_IMAGE_DIMENSION_MAX_ENUM = 0xffffffff,
 } TiImageDimension;
 ```
+
+Dimensions of an image allocation.
+
+- `TI_IMAGE_DIMENSION_1D`: The image is 1-dimensional.
+- `TI_IMAGE_DIMENSION_2D`: The image is 2-dimensional.
+- `TI_IMAGE_DIMENSION_3D`: The image is 3-dimensional.
+- `TI_IMAGE_DIMENSION_1D_ARRAY`: The image is 1-dimensional and it has one or more layers.
+- `TI_IMAGE_DIMENSION_2D_ARRAY`: The image is 2-dimensional and it has one or more layers.
+- `TI_IMAGE_DIMENSION_CUBE`: The image is 2-dimensional and it has 6 layers for the faces towards +X, -X, +Y, -Y, +Z, -Z in sequence.
+
 ---
 ### Enumeration `TiImageLayout`
 
@@ -586,6 +603,19 @@ typedef enum TiImageLayout {
   TI_IMAGE_LAYOUT_MAX_ENUM = 0xffffffff,
 } TiImageLayout;
 ```
+
+- `enumeration.image_layout.`: Undefined layout. An image in this layout does not contain any semantical information.
+- `TI_IMAGE_LAYOUT_SHADER_READ`: Optimal layout for read-only access, including sampling.
+- `TI_IMAGE_LAYOUT_SHADER_WRITE`: Optimal layout for write-only access.
+- `TI_IMAGE_LAYOUT_SHADER_READ_WRITE`: Optimal layout for read/write access.
+- `TI_IMAGE_LAYOUT_COLOR_ATTACHMENT`: Optimal layout as a color attachment.
+- `TI_IMAGE_LAYOUT_COLOR_ATTACHMENT_READ`: Optimal layout as an input color attachment.
+- `TI_IMAGE_LAYOUT_DEPTH_ATTACHMENT`: Optimal layout as a depth attachment.
+- `TI_IMAGE_LAYOUT_DEPTH_ATTACHMENT_READ`: Optimal layout as an input depth attachment.
+- `TI_IMAGE_LAYOUT_TRANSFER_DST`: Optimal layout as a data copy destination.
+- `TI_IMAGE_LAYOUT_TRANSFER_SRC`: Optimal layout as a data copy source.
+- `TI_IMAGE_LAYOUT_PRESENT_SRC`:  Optimal layout as a presentation source.
+
 ---
 ### Enumeration `TiFormat`
 
@@ -651,6 +681,14 @@ typedef struct TiImageOffset {
   uint32_t array_layer_offset;
 } TiImageOffset;
 ```
+
+Offsets of an image in X, Y, Z, and array layers.
+
+- `x`: Image offset in the X direction.
+- `y`: Image offset in the Y direction. *Must* be 0 if the image has a dimension of `TI_IMAGE_DIMENSION_1D` or `TI_IMAGE_DIMENSION_1D_ARRAY`.
+- `z`: Image offset in the Z direction. *Must* be 0 if the image has a dimension of `TI_IMAGE_DIMENSION_1D`, `TI_IMAGE_DIMENSION_2D`, `TI_IMAGE_DIMENSION_1D_ARRAY`, `TI_IMAGE_DIMENSION_2D_ARRAY` or `TI_IMAGE_DIMENSION_CUBE_ARRAY`.
+- `array_layer_offset`: Image offset in array layers. *Must* be 0 if the image has a dimension of `TI_IMAGE_DIMENSION_1D`, `TI_IMAGE_DIMENSION_2D` or `TI_IMAGE_DIMENSION_3D`.
+
 ---
 ### Structure `TiImageExtent`
 
@@ -663,6 +701,14 @@ typedef struct TiImageExtent {
   uint32_t array_layer_count;
 } TiImageExtent;
 ```
+
+Extents of an image in X, Y, Z, and array layers.
+
+- `width`: Image extent in the X direction.
+- `height`: Image extent in the Y direction. *Must* be 1 if the image has a dimension of `TI_IMAGE_DIMENSION_1D` or `TI_IMAGE_DIMENSION_1D_ARRAY`.
+- `depth`: Image extent in the Z direction. *Must* be 1 if the image has a dimension of `TI_IMAGE_DIMENSION_1D`, `TI_IMAGE_DIMENSION_2D`, `TI_IMAGE_DIMENSION_1D_ARRAY`, `TI_IMAGE_DIMENSION_2D_ARRAY` or `TI_IMAGE_DIMENSION_CUBE_ARRAY`.
+- `array_layer_count`: Image extent in array layers. *Must* be 1 if the image has a dimension of `TI_IMAGE_DIMENSION_1D`, `TI_IMAGE_DIMENSION_2D` or `TI_IMAGE_DIMENSION_3D`. *Must* be 6 if the image has a dimension of `TI_IMAGE_DIMENSION_CUBE_ARRAY`.
+
 ---
 ### Structure `TiImageAllocateInfo`
 
@@ -673,9 +719,21 @@ typedef struct TiImageAllocateInfo {
   TiImageExtent extent;
   uint32_t mip_level_count;
   TiFormat format;
+  TiBool export_sharing;
   TiImageUsageFlags usage;
 } TiImageAllocateInfo;
 ```
+
+Parameters of a newly allocated image.
+
+- `dimension`: Image dimension.
+- `extent`: Image extent.
+- `mip_level_count`: Number of mip-levels.
+- `format`: Image texel format.
+- `structure.image_allocate_info.host_read`: True if the host needs to read from the allocated memory.
+- `export_sharing`: True if the memory allocation needs to be exported to other backends (e.g., from Vulkan to CUDA).
+- `usage`: All possible usage of this image allocation. In most cases, `TI_IMAGE_USAGE_STORAGE_BIT` and `TI_IMAGE_USAGE_SAMPLED_BIT` enough.
+
 ---
 ### Structure `TiImageSlice`
 
@@ -688,6 +746,14 @@ typedef struct TiImageSlice {
   uint32_t mip_level;
 } TiImageSlice;
 ```
+
+A subsection of a memory allocation. The sum of `offset` and `extent` in each dimension cannot exceed the size of `image`.
+
+- `image`: The subsectioned image allocation.
+- `offset`: Offset from the beginning of the allocation in each dimension.
+- `extent`: Size of the subsection in each dimension.
+- `mip_level`: The subsectioned mip-level.
+
 ---
 ### Enumeration `TiFilter`
 
@@ -736,6 +802,15 @@ typedef struct TiTexture {
   TiFormat format;
 } TiTexture;
 ```
+
+Image data bound to a sampler.
+
+- `structure.nd_array.image`: Image bound to the texture.
+- `structure.nd_array.sampler`: The bound sampler that controls the sampling behavior of `structure.nd_array.image`.
+- `structure.nd_array.dimension`: Image Dimension.
+- `structure.nd_array.extent`: Extent of image.
+- `structure.nd_array.format`: Image texel format.
+
 ---
 ### Union `TiArgumentValue`
 
@@ -782,7 +857,7 @@ typedef struct TiNamedArgument {
 } TiNamedArgument;
 ```
 
-An named argument value to feed compute graphcs.
+A named argument value to feed compute graphs.
 
 - `name`: Name of the argument.
 - `argument`: Argument body.
@@ -817,7 +892,7 @@ TI_DLL_EXPORT TiRuntime TI_API_CALL ti_create_runtime(
 );
 ```
 
-Create a Taichi Runtime with the specified [`TiArch`](#enumeration-tiarch).
+Creates a Taichi Runtime with the specified [`TiArch`](#enumeration-tiarch).
 
 ---
 ### Function `ti_destroy_runtime`
@@ -829,7 +904,7 @@ TI_DLL_EXPORT void TI_API_CALL ti_destroy_runtime(
 );
 ```
 
-Destroy a Taichi Runtime.
+Destroys a Taichi Runtime.
 
 ---
 ### Function `ti_allocate_memory`
@@ -842,7 +917,7 @@ TI_DLL_EXPORT TiMemory TI_API_CALL ti_allocate_memory(
 );
 ```
 
-Allocate a contiguous on-device memory with provided parameters.
+Allocates a contiguous device memory with provided parameters.
 
 ---
 ### Function `ti_free_memory`
@@ -868,7 +943,7 @@ TI_DLL_EXPORT void* TI_API_CALL ti_map_memory(
 );
 ```
 
-Maps an on-device memory to a host-addressible space. You *must* ensure that the device is not being used by any device command before the mapping.
+Maps a device memory to a host-addressable space. You *must* ensure that the device is not being used by any device command before the mapping.
 
 ---
 ### Function `ti_unmap_memory`
@@ -881,7 +956,7 @@ TI_DLL_EXPORT void TI_API_CALL ti_unmap_memory(
 );
 ```
 
-Unmaps an on-device memory and makes any host-side changes about the memory visible to the device. You *must* ensure that there is no further access to the previously mapped host-addressible space.
+Unmaps a device memory and makes any host-side changes about the memory visible to the device. You *must* ensure that there is no further access to the previously mapped host-addressable space.
 
 ---
 ### Function `ti_allocate_image`
@@ -893,6 +968,9 @@ TI_DLL_EXPORT TiImage TI_API_CALL ti_allocate_image(
   const TiImageAllocateInfo* allocate_info
 );
 ```
+
+Allocate a device image with provided parameters.
+
 ---
 ### Function `ti_free_image`
 
@@ -903,6 +981,9 @@ TI_DLL_EXPORT void TI_API_CALL ti_free_image(
   TiImage image
 );
 ```
+
+Frees an image allocation.
+
 ---
 ### Function `ti_create_sampler`
 
@@ -959,7 +1040,7 @@ TI_DLL_EXPORT void TI_API_CALL ti_copy_memory_device_to_device(
 );
 ```
 
-Copies the data in a contiguous subsection of the on-device memory to another subsection. Note that the two subsections *must not* overlap.
+Copies the data in a contiguous subsection of the device memory to another subsection. The two subsections *must not* overlap.
 
 ---
 ### Function `ti_copy_image_device_to_device` (Device Command)
@@ -972,6 +1053,9 @@ TI_DLL_EXPORT void TI_API_CALL ti_copy_image_device_to_device(
   const TiImageSlice* src_image
 );
 ```
+
+Copies the image data in a contiguous subsection of the device image to another subsection. The two subsections *must not* overlap.
+
 ---
 ### Function `ti_track_image_ext`
 
@@ -983,6 +1067,9 @@ TI_DLL_EXPORT void TI_API_CALL ti_track_image_ext(
   TiImageLayout layout
 );
 ```
+
+Tracks the device image with the provided image layout. Because Taichi tracks image layouts internally, it is *only* useful to inform Taichi that the image is transitioned to a new layout by external procedures.
+
 ---
 ### Function `ti_transition_image` (Device Command)
 
@@ -994,6 +1081,9 @@ TI_DLL_EXPORT void TI_API_CALL ti_transition_image(
   TiImageLayout layout
 );
 ```
+
+Transition the image to the provided image layout. Because Taichi tracks image layouts internally, it is *only* useful to enforce an image layout for external procedures to use.
+
 ---
 ### Function `ti_launch_kernel` (Device Command)
 
@@ -1007,7 +1097,7 @@ TI_DLL_EXPORT void TI_API_CALL ti_launch_kernel(
 );
 ```
 
-Launch a Taichi kernel with provided arguments. The arguments MUST have the same count and types in the same order as in the source code.
+Launches a Taichi kernel with the provided arguments. The arguments MUST have the same count and types in the same order as in the source code.
 
 ---
 ### Function `ti_launch_compute_graph` (Device Command)
@@ -1061,7 +1151,7 @@ TI_DLL_EXPORT void TI_API_CALL ti_wait_event(
 );
 ```
 
-Wait on an event primitive until it transitions to a signaled state. The user MUST signal the awaited event; otherwise it is an undefined behavior.
+Waits until an event primitive transitions to a signaled state. The awaited event *must* be signaled by an external procedure or a previous invocation to [`ti_reset_event`](#function-ti_reset_event-device-command); otherwise, an undefined behavior would occur.
 
 ---
 ### Function `ti_submit`
@@ -1073,7 +1163,7 @@ TI_DLL_EXPORT void TI_API_CALL ti_submit(
 );
 ```
 
-Submit all commands to the logical device for execution. Ensure that any previous device command has been offloaded to the logical computing device.
+Submits all previously invoked device commands to the offload device for execution.
 
 ---
 ### Function `ti_wait`
@@ -1085,7 +1175,7 @@ TI_DLL_EXPORT void TI_API_CALL ti_wait(
 );
 ```
 
-Waits until all previously invoked device commands are executed.
+Waits until all previously invoked device commands are executed. Any invoked command that has not been submitted is submitted first.
 
 ---
 ### Function `ti_load_aot_module`
@@ -1138,4 +1228,5 @@ TI_DLL_EXPORT TiComputeGraph TI_API_CALL ti_get_aot_module_compute_graph(
 );
 ```
 
-Get a precompiled compute graph from the AOt module. [`TI_NULL_HANDLE`](#definition-ti_null_handle) is returned if the module does not have a kernel of the specified name.
+Retrieves a pre-compiled compute graph from the AOT module.
+Returns [`TI_NULL_HANDLE`](#definition-ti_null_handle) if the module does not have a compute graph of the specified name.
diff --git a/docs/lang/articles/c-api/taichi_vulkan.md b/docs/lang/articles/c-api/taichi_vulkan.md
index b2aaf5950fd81..417a34a97ce83 100644
--- a/docs/lang/articles/c-api/taichi_vulkan.md
+++ b/docs/lang/articles/c-api/taichi_vulkan.md
@@ -4,7 +4,7 @@ sidebar_positions: 2
 
 # Vulkan Backend Features
 
-Taichi's Vulkan API gives you further control over Vulkan version and extension requirements and allows you to interop with external Vulkan applications with shared resources.
+Taichi's Vulkan API gives you further control over the Vulkan version and extension requirements and allows you to interop with external Vulkan applications with shared resources.
 
 ## API Reference
 
@@ -25,9 +25,10 @@ typedef struct TiVulkanRuntimeInteropInfo {
 } TiVulkanRuntimeInteropInfo;
 ```
 
-Necessary detail to share a same Vulkan runtime between Taichi and user applications.
+Necessary detail to share the same Vulkan runtime between Taichi and external procedures.
 
-- `api_version`: Targeted Vulkan API version.
+- `get_instance_proc_addr`: Pointer to Vulkan loader function `vkGetInstanceProcAddr`.
+- `api_version`: Target Vulkan API version.
 - `instance`: Vulkan instance handle.
 - `physical_device`: Vulkan physical device handle.
 - `device`: Vulkan logical device handle.
@@ -50,11 +51,11 @@ typedef struct TiVulkanMemoryInteropInfo {
 } TiVulkanMemoryInteropInfo;
 ```
 
-Necessary detail to share a same piece of Vulkan buffer between Taichi and user applications.
+Necessary detail to share the same piece of Vulkan buffer between Taichi and external procedures.
 
 - `buffer`: Vulkan buffer.
 - `size`: Size of the piece of memory in bytes.
-- `size`: Vulkan buffer usage. You usually want the `VK_BUFFER_USAGE_STORAGE_BUFFER_BIT` set.
+- `usage`: Vulkan buffer usage. In most of the cases, Taichi requires the `VK_BUFFER_USAGE_STORAGE_BUFFER_BIT`.
 
 ---
 ### Structure `TiVulkanImageInteropInfo`
@@ -73,6 +74,19 @@ typedef struct TiVulkanImageInteropInfo {
   VkImageUsageFlags usage;
 } TiVulkanImageInteropInfo;
 ```
+
+Necessary detail to share the same piece of Vulkan image between Taichi and external procedures.
+
+- `image`: Vulkan image.
+- `image_type`: Vulkan image allocation type.
+- `format`: Pixel format.
+- `extent`: Image extent.
+- `mip_level_count`: Number of mip-levels of the image.
+- `array_layer_count`: Number of array layers.
+- `sample_count`: Number of samples per pixel.
+- `tiling`: Image tiling.
+- `usage`: Vulkan image usage. In most cases, Taichi requires the `VK_IMAGE_USAGE_STORAGE_BIT` and the `VK_IMAGE_USAGE_SAMPLED_BIT`.
+
 ---
 ### Structure `TiVulkanEventInteropInfo`
 
@@ -83,7 +97,7 @@ typedef struct TiVulkanEventInteropInfo {
 } TiVulkanEventInteropInfo;
 ```
 
-Necessary detail to share a same Vulkan event synchronization primitive between Taichi and user application.
+Necessary detail to share the same Vulkan event synchronization primitive between Taichi and the user application.
 
 - `event`: Vulkan event handle.
 
@@ -101,7 +115,7 @@ TI_DLL_EXPORT TiRuntime TI_API_CALL ti_create_vulkan_runtime_ext(
 );
 ```
 
-Create a Vulkan Taichi runtime with user controlled capability settings.
+Create a Vulkan Taichi runtime with user-controlled capability settings.
 
 ---
 ### Function `ti_import_vulkan_runtime`
@@ -113,7 +127,7 @@ TI_DLL_EXPORT TiRuntime TI_API_CALL ti_import_vulkan_runtime(
 );
 ```
 
-Import the Vulkan runtime owned by Taichi to external user applications.
+Import the Vulkan runtime owned by Taichi to external procedures.
 
 ---
 ### Function `ti_export_vulkan_runtime`
@@ -126,7 +140,7 @@ TI_DLL_EXPORT void TI_API_CALL ti_export_vulkan_runtime(
 );
 ```
 
-Export a Vulkan runtime from external user applications to Taichi.
+Export a Vulkan runtime from external procedures to Taichi.
 
 ---
 ### Function `ti_import_vulkan_memory`
@@ -139,7 +153,7 @@ TI_DLL_EXPORT TiMemory TI_API_CALL ti_import_vulkan_memory(
 );
 ```
 
-Import the Vulkan buffer owned by Taichi to external user applications.
+Import the Vulkan buffer owned by Taichi to external procedures.
 
 ---
 ### Function `ti_export_vulkan_memory`
@@ -153,7 +167,7 @@ TI_DLL_EXPORT void TI_API_CALL ti_export_vulkan_memory(
 );
 ```
 
-Export a Vulkan buffer from external user applications to Taichi.
+Export a Vulkan buffer from external procedures to Taichi.
 
 ---
 ### Function `ti_import_vulkan_image`
@@ -167,6 +181,9 @@ TI_DLL_EXPORT TiImage TI_API_CALL ti_import_vulkan_image(
   VkImageLayout layout
 );
 ```
+
+Import the Vulkan image owned by Taichi to external procedures.
+
 ---
 ### Function `ti_export_vulkan_image`
 
@@ -178,6 +195,9 @@ TI_DLL_EXPORT void TI_API_CALL ti_export_vulkan_image(
   TiVulkanImageInteropInfo* interop_info
 );
 ```
+
+Export a Vulkan image from external procedures to Taichi.
+
 ---
 ### Function `ti_import_vulkan_event`
 
@@ -189,7 +209,7 @@ TI_DLL_EXPORT TiEvent TI_API_CALL ti_import_vulkan_event(
 );
 ```
 
-Import the Vulkan event owned by Taichi to external user applications.
+Import the Vulkan event owned by Taichi to external procedures.
 
 ---
 ### Function `ti_export_vulkan_event`
@@ -203,4 +223,4 @@ TI_DLL_EXPORT void TI_API_CALL ti_export_vulkan_event(
 );
 ```
 
-Export a Vulkan event from external user applications to Taichi.
+Export a Vulkan event from external procedures to Taichi.

From 9b3cc348c443a995ca33a7e95b1e8db208b14c28 Mon Sep 17 00:00:00 2001
From: PENGUINLIONG <admin@penguinliong.moe>
Date: Sat, 24 Sep 2022 10:48:11 +0800
Subject: [PATCH 52/59] Fixed docs

---
 c_api/docs/taichi/taichi_core.h.md        | 29 ++++++++++++++++-------
 c_api/docs/taichi/taichi_vulkan.h.md      | 18 +++++++-------
 docs/lang/articles/c-api/taichi_core.md   | 27 ++++++++++++++-------
 docs/lang/articles/c-api/taichi_vulkan.md | 18 +++++++-------
 misc/generate_c_api_docs.py               |  5 ++++
 5 files changed, 63 insertions(+), 34 deletions(-)

diff --git a/c_api/docs/taichi/taichi_core.h.md b/c_api/docs/taichi/taichi_core.h.md
index 0b837b7de85f1..c9b63d3f8d8af 100644
--- a/c_api/docs/taichi/taichi_core.h.md
+++ b/c_api/docs/taichi/taichi_core.h.md
@@ -393,7 +393,6 @@ Parameters of a newly allocated image.
 - `structure.image_allocate_info.extent`: Image extent.
 - `structure.image_allocate_info.mip_level_count`: Number of mip-levels.
 - `structure.image_allocate_info.format`: Image texel format.
-- `structure.image_allocate_info.host_read`: True if the host needs to read from the allocated memory.
 - `structure.image_allocate_info.export_sharing`: True if the memory allocation needs to be exported to other backends (e.g., from Vulkan to CUDA).
 - `structure.image_allocate_info.usage`: All possible usage of this image allocation. In most cases, `bit_field.image_usage.storage` and `bit_field.image_usage.sampled` enough.
 
@@ -410,11 +409,11 @@ A subsection of a memory allocation. The sum of `structure.image_slice.offset` a
 
 Image data bound to a sampler.
 
-- `structure.nd_array.image`: Image bound to the texture.
-- `structure.nd_array.sampler`: The bound sampler that controls the sampling behavior of `structure.nd_array.image`.
-- `structure.nd_array.dimension`: Image Dimension.
-- `structure.nd_array.extent`: Extent of image.
-- `structure.nd_array.format`: Image texel format.
+- `structure.texture.image`: Image bound to the texture.
+- `structure.texture.sampler`: The bound sampler that controls the sampling behavior of `structure.texture.image`.
+- `structure.texture.dimension`: Image Dimension.
+- `structure.texture.extent`: Extent of image.
+- `structure.texture.format`: Image texel format.
 
 `union.argument_value`
 
@@ -438,6 +437,20 @@ A named argument value to feed compute graphs.
 - `structure.named_argument.name`: Name of the argument.
 - `structure.named_argument.argument`: Argument body.
 
+`function.get_last_error`
+
+Get the last error raised by Taichi C-API invocations. Returns the semantical error code.
+
+- `function.get_last_error.message_size`: Size of textual error message in `function.get_last_error.message`
+- `function.get_last_error.message`: Text buffer for the textual error message. Ignored when `message_size` is 0.
+
+`function.set_last_error`
+
+Set the provided error as the last error raised by Taichi C-API invocations. It can be useful in extended validation procedures in Taichi C-API wrappers and helper libraries.
+
+- `function.set_last_error.error`: Semantical error code.
+- `function.set_last_error.message`: A `\0`-terminated string of the textual error message. Ignored when `message_size` is 0.
+
 `function.create_runtime`
 
 Creates a Taichi Runtime with the specified `enumeration.arch`.
@@ -464,7 +477,7 @@ Unmaps a device memory and makes any host-side changes about the memory visible
 
 `function.allocate_image`
 
-Allocate a device image with provided parameters.
+Allocates a device image with provided parameters.
 
 `function.free_image`
 
@@ -492,7 +505,7 @@ Tracks the device image with the provided image layout. Because Taichi tracks im
 
 `function.transition_image`
 
-Transition the image to the provided image layout. Because Taichi tracks image layouts internally, it is *only* useful to enforce an image layout for external procedures to use.
+Transitions the image to the provided image layout. Because Taichi tracks image layouts internally, it is *only* useful to enforce an image layout for external procedures to use.
 
 `function.launch_kernel`
 
diff --git a/c_api/docs/taichi/taichi_vulkan.h.md b/c_api/docs/taichi/taichi_vulkan.h.md
index 5cd73795dc93e..22122cbadd024 100644
--- a/c_api/docs/taichi/taichi_vulkan.h.md
+++ b/c_api/docs/taichi/taichi_vulkan.h.md
@@ -54,36 +54,36 @@ Necessary detail to share the same Vulkan event synchronization primitive betwee
 
 `function.create_vulkan_runtime`
 
-Create a Vulkan Taichi runtime with user-controlled capability settings.
+Creates a Vulkan Taichi runtime with user-controlled capability settings.
 
 `function.import_vulkan_runtime`
 
-Import the Vulkan runtime owned by Taichi to external procedures.
+Imports the Vulkan runtime owned by Taichi to external procedures.
 
 `function.export_vulkan_runtime`
 
-Export a Vulkan runtime from external procedures to Taichi.
+Exports a Vulkan runtime from external procedures to Taichi.
 
 `function.import_vulkan_memory`
 
-Import the Vulkan buffer owned by Taichi to external procedures.
+Imports the Vulkan buffer owned by Taichi to external procedures.
 
 `function.export_vulkan_memory`
 
-Export a Vulkan buffer from external procedures to Taichi.
+Exports a Vulkan buffer from external procedures to Taichi.
 
 `function.import_vulkan_image`
 
-Import the Vulkan image owned by Taichi to external procedures.
+Imports the Vulkan image owned by Taichi to external procedures.
 
 `function.export_vulkan_image`
 
-Export a Vulkan image from external procedures to Taichi.
+Exports a Vulkan image from external procedures to Taichi.
 
 `function.import_vulkan_event`
 
-Import the Vulkan event owned by Taichi to external procedures.
+Imports the Vulkan event owned by Taichi to external procedures.
 
 `function.export_vulkan_event`
 
-Export a Vulkan event from external procedures to Taichi.
+Exports a Vulkan event from external procedures to Taichi.
diff --git a/docs/lang/articles/c-api/taichi_core.md b/docs/lang/articles/c-api/taichi_core.md
index abc7bfaab9a37..63bcff6b8a84a 100644
--- a/docs/lang/articles/c-api/taichi_core.md
+++ b/docs/lang/articles/c-api/taichi_core.md
@@ -730,7 +730,6 @@ Parameters of a newly allocated image.
 - `extent`: Image extent.
 - `mip_level_count`: Number of mip-levels.
 - `format`: Image texel format.
-- `structure.image_allocate_info.host_read`: True if the host needs to read from the allocated memory.
 - `export_sharing`: True if the memory allocation needs to be exported to other backends (e.g., from Vulkan to CUDA).
 - `usage`: All possible usage of this image allocation. In most cases, `TI_IMAGE_USAGE_STORAGE_BIT` and `TI_IMAGE_USAGE_SAMPLED_BIT` enough.
 
@@ -805,11 +804,11 @@ typedef struct TiTexture {
 
 Image data bound to a sampler.
 
-- `structure.nd_array.image`: Image bound to the texture.
-- `structure.nd_array.sampler`: The bound sampler that controls the sampling behavior of `structure.nd_array.image`.
-- `structure.nd_array.dimension`: Image Dimension.
-- `structure.nd_array.extent`: Extent of image.
-- `structure.nd_array.format`: Image texel format.
+- `image`: Image bound to the texture.
+- `sampler`: The bound sampler that controls the sampling behavior of `image`.
+- `dimension`: Image Dimension.
+- `extent`: Extent of image.
+- `format`: Image texel format.
 
 ---
 ### Union `TiArgumentValue`
@@ -872,6 +871,12 @@ TI_DLL_EXPORT TiError TI_API_CALL ti_get_last_error(
   char* message
 );
 ```
+
+Get the last error raised by Taichi C-API invocations. Returns the semantical error code.
+
+- `message_size`: Size of textual error message in `message`
+- `message`: Text buffer for the textual error message. Ignored when `message_size` is 0.
+
 ---
 ### Function `ti_set_last_error`
 
@@ -882,6 +887,12 @@ TI_DLL_EXPORT void TI_API_CALL ti_set_last_error(
   const char* message
 );
 ```
+
+Set the provided error as the last error raised by Taichi C-API invocations. It can be useful in extended validation procedures in Taichi C-API wrappers and helper libraries.
+
+- `error`: Semantical error code.
+- `message`: A `\0`-terminated string of the textual error message. Ignored when `message_size` is 0.
+
 ---
 ### Function `ti_create_runtime`
 
@@ -969,7 +980,7 @@ TI_DLL_EXPORT TiImage TI_API_CALL ti_allocate_image(
 );
 ```
 
-Allocate a device image with provided parameters.
+Allocates a device image with provided parameters.
 
 ---
 ### Function `ti_free_image`
@@ -1082,7 +1093,7 @@ TI_DLL_EXPORT void TI_API_CALL ti_transition_image(
 );
 ```
 
-Transition the image to the provided image layout. Because Taichi tracks image layouts internally, it is *only* useful to enforce an image layout for external procedures to use.
+Transitions the image to the provided image layout. Because Taichi tracks image layouts internally, it is *only* useful to enforce an image layout for external procedures to use.
 
 ---
 ### Function `ti_launch_kernel` (Device Command)
diff --git a/docs/lang/articles/c-api/taichi_vulkan.md b/docs/lang/articles/c-api/taichi_vulkan.md
index 417a34a97ce83..5ee4029d86611 100644
--- a/docs/lang/articles/c-api/taichi_vulkan.md
+++ b/docs/lang/articles/c-api/taichi_vulkan.md
@@ -115,7 +115,7 @@ TI_DLL_EXPORT TiRuntime TI_API_CALL ti_create_vulkan_runtime_ext(
 );
 ```
 
-Create a Vulkan Taichi runtime with user-controlled capability settings.
+Creates a Vulkan Taichi runtime with user-controlled capability settings.
 
 ---
 ### Function `ti_import_vulkan_runtime`
@@ -127,7 +127,7 @@ TI_DLL_EXPORT TiRuntime TI_API_CALL ti_import_vulkan_runtime(
 );
 ```
 
-Import the Vulkan runtime owned by Taichi to external procedures.
+Imports the Vulkan runtime owned by Taichi to external procedures.
 
 ---
 ### Function `ti_export_vulkan_runtime`
@@ -140,7 +140,7 @@ TI_DLL_EXPORT void TI_API_CALL ti_export_vulkan_runtime(
 );
 ```
 
-Export a Vulkan runtime from external procedures to Taichi.
+Exports a Vulkan runtime from external procedures to Taichi.
 
 ---
 ### Function `ti_import_vulkan_memory`
@@ -153,7 +153,7 @@ TI_DLL_EXPORT TiMemory TI_API_CALL ti_import_vulkan_memory(
 );
 ```
 
-Import the Vulkan buffer owned by Taichi to external procedures.
+Imports the Vulkan buffer owned by Taichi to external procedures.
 
 ---
 ### Function `ti_export_vulkan_memory`
@@ -167,7 +167,7 @@ TI_DLL_EXPORT void TI_API_CALL ti_export_vulkan_memory(
 );
 ```
 
-Export a Vulkan buffer from external procedures to Taichi.
+Exports a Vulkan buffer from external procedures to Taichi.
 
 ---
 ### Function `ti_import_vulkan_image`
@@ -182,7 +182,7 @@ TI_DLL_EXPORT TiImage TI_API_CALL ti_import_vulkan_image(
 );
 ```
 
-Import the Vulkan image owned by Taichi to external procedures.
+Imports the Vulkan image owned by Taichi to external procedures.
 
 ---
 ### Function `ti_export_vulkan_image`
@@ -196,7 +196,7 @@ TI_DLL_EXPORT void TI_API_CALL ti_export_vulkan_image(
 );
 ```
 
-Export a Vulkan image from external procedures to Taichi.
+Exports a Vulkan image from external procedures to Taichi.
 
 ---
 ### Function `ti_import_vulkan_event`
@@ -209,7 +209,7 @@ TI_DLL_EXPORT TiEvent TI_API_CALL ti_import_vulkan_event(
 );
 ```
 
-Import the Vulkan event owned by Taichi to external procedures.
+Imports the Vulkan event owned by Taichi to external procedures.
 
 ---
 ### Function `ti_export_vulkan_event`
@@ -223,4 +223,4 @@ TI_DLL_EXPORT void TI_API_CALL ti_export_vulkan_event(
 );
 ```
 
-Export a Vulkan event from external procedures to Taichi.
+Exports a Vulkan event from external procedures to Taichi.
diff --git a/misc/generate_c_api_docs.py b/misc/generate_c_api_docs.py
index b9df0c98ca90b..bf9902390b84a 100644
--- a/misc/generate_c_api_docs.py
+++ b/misc/generate_c_api_docs.py
@@ -41,6 +41,11 @@ def get_human_readable_field_name(x: EntryBase, field_name: str):
             if str(field.name) == field_name:
                 out = str(field.name)
                 break
+    elif isinstance(x, Function):
+        for field in x.params:
+            if str(field.name) == field_name:
+                out = str(field.name)
+                break
     return out
 
 

From 3e202ae9657d62af80c58fc54bac7d39942fc30d Mon Sep 17 00:00:00 2001
From: PENGUINLIONG <admin@penguinliong.moe>
Date: Sat, 24 Sep 2022 18:04:43 +0800
Subject: [PATCH 53/59] Editorial updates

---
 c_api/docs/taichi/taichi_core.h.md | 32 ++++++++++++++++--------------
 1 file changed, 17 insertions(+), 15 deletions(-)

diff --git a/c_api/docs/taichi/taichi_core.h.md b/c_api/docs/taichi/taichi_core.h.md
index c9b63d3f8d8af..24b705887e83e 100644
--- a/c_api/docs/taichi/taichi_core.h.md
+++ b/c_api/docs/taichi/taichi_core.h.md
@@ -90,7 +90,7 @@ To read data back to the host, `host_read` *must* be set to true.
 ```cpp
 TiMemoryAllocateInfo mai {};
 mai.size = 1024; // Size in bytes.
-mai.host_write = true;
+mai.host_read = true;
 mai.usage = TI_MEMORY_USAGE_STORAGE_BIT;
 TiMemory read_back_memory = ti_allocate_memory(runtime, &mai);
 
@@ -244,7 +244,7 @@ A collection of Taichi kernels (a compute graph) to launch on the offload target
 
 `enumeration.error`
 
-Errors reported by the Taichi C-API.
+Errors reported by the Taichi C-API. Enumerants greater than or equal to zero are success states.
 
 - `enumeration.error.incomplete`: The output data is truncated because the user-provided buffer is too small.
 - `enumeration.error.success`: The Taichi C-API invocation finished gracefully.
@@ -271,9 +271,9 @@ Types of backend archs.
 
 Elementary (primitive) data types. There might be vendor-specific constraints on the available data types so it's recommended to use 32-bit data types if multi-platform distribution is desired.
 
-- `enumeration.data_type.f16`: 16-bit IEEE 754 floating-point number.
-- `enumeration.data_type.f32`: 32-bit IEEE 754 floating-point number.
-- `enumeration.data_type.f64`: 64-bit IEEE 754 floating-point number.
+- `enumeration.data_type.f16`: 16-bit IEEE 754 half-precision floating-point number.
+- `enumeration.data_type.f32`: 32-bit IEEE 754 single-precision floating-point number.
+- `enumeration.data_type.f64`: 64-bit IEEE 754 double-precision floating-point number.
 - `enumeration.data_type.i8`: 8-bit one's complement signed integer.
 - `enumeration.data_type.i16`: 16-bit one's complement signed integer.
 - `enumeration.data_type.i32`: 32-bit one's complement signed integer.
@@ -287,15 +287,16 @@ Elementary (primitive) data types. There might be vendor-specific constraints on
 
 Types of kernel and compute graph argument.
 
-- `enumeration.argument_type.i32`: Signed 32-bit integer.
-- `enumeration.argument_type.f32`: Signed 32-bit floating-point number.
+- `enumeration.argument_type.i32`: 32-bit one's complement signed integer.
+- `enumeration.argument_type.f32`: 32-bit IEEE 754 single-precision floating-point number.
 - `enumeration.argument_type.ndarray`: ND-array wrapped around a `handle.memory`.
+- `enumeration.argument_type.texture`: Texture wrapped around a `handle.image`.
 
 `bit_field.memory_usage`
 
-Usages of a memory allocation.
+Usages of a memory allocation. Taichi requires kernel argument memories to be allocated with `bit_field.memory_usage.storage`.
 
-- `bit_field.memory_usage.storage`: The memory can be read/write accessed by any kernel. In most cases, the users only need to set this flag.
+- `bit_field.memory_usage.storage`: The memory can be read/write accessed by any kernel.
 - `bit_field.memory_usage.uniform`: The memory can be used as a uniform buffer in graphics pipelines.
 - `bit_field.memory_usage.vertex`: The memory can be used as a vertex buffer in graphics pipelines.
 - `bit_field.memory_usage.index`: The memory can be used as a index buffer in graphics pipelines.
@@ -336,10 +337,10 @@ Multi-dimensional array of dense primitive data.
 
 `bit_field.image_usage`
 
-Usages of an image allocation.
+Usages of an image allocation. Taichi requires kernel argument images to be allocated with `bit_field.image_usage.storage` and `bit_field.image_usage.sampled`.
 
-- `bit_field.image_usage.storage`: The image can be read/write accessed by any kernel. In most cases, the users only need to set this flag and `bit_field.image_usage.sampled`.
-- `bit_field.image_usage.sampled`: The image can be read-only accessed by any kernel. In most cases, the users only need to set this flag and `bit_field.image_usage.storage`.
+- `bit_field.image_usage.storage`: The image can be read/write accessed by any kernel.
+- `bit_field.image_usage.sampled`: The image can be read-only accessed by any kernel.
 - `bit_field.image_usage.attachment`: The image can be used as a color or depth-stencil attachment depending on its format.
 
 `enumeration.image_dimension`
@@ -355,7 +356,7 @@ Dimensions of an image allocation.
 
 `enumeration.image_layout`
 
-- `enumeration.image_layout.`: Undefined layout. An image in this layout does not contain any semantical information.
+- `enumeration.image_layout.undefined`: Undefined layout. An image in this layout does not contain any semantical information.
 - `enumeration.image_layout.shader_read`: Optimal layout for read-only access, including sampling.
 - `enumeration.image_layout.shader_write`: Optimal layout for write-only access.
 - `enumeration.image_layout.shader_read_write`: Optimal layout for read/write access.
@@ -412,7 +413,7 @@ Image data bound to a sampler.
 - `structure.texture.image`: Image bound to the texture.
 - `structure.texture.sampler`: The bound sampler that controls the sampling behavior of `structure.texture.image`.
 - `structure.texture.dimension`: Image Dimension.
-- `structure.texture.extent`: Extent of image.
+- `structure.texture.extent`: Image extent.
 - `structure.texture.format`: Image texel format.
 
 `union.argument_value`
@@ -420,8 +421,9 @@ Image data bound to a sampler.
 A scalar or structured argument value.
 
 - `union.argument_value.i32`: Value of a 32-bit one's complement signed integer.
-- `union.argument_value.f32`: Value of a 32-bit IEEE 754 floating-poing number.
+- `union.argument_value.f32`: Value of a 32-bit IEEE 754 single-precision floating-poing number.
 - `union.argument_value.ndarray`: An ND-array to be bound.
+- `union.argument_value.texture`: A texture to be bound.
 
 `structure.argument`
 

From e600355a2ba10f2d4e339373a7ca554253921964 Mon Sep 17 00:00:00 2001
From: PENGUINLIONG <admin@penguinliong.moe>
Date: Sat, 24 Sep 2022 18:16:46 +0800
Subject: [PATCH 54/59] Editorial updates

---
 c_api/docs/taichi/taichi_core.h.md      |  2 +-
 docs/lang/articles/c-api/taichi_core.md | 34 +++++++++++++------------
 2 files changed, 19 insertions(+), 17 deletions(-)

diff --git a/c_api/docs/taichi/taichi_core.h.md b/c_api/docs/taichi/taichi_core.h.md
index 24b705887e83e..93541731eb5c2 100644
--- a/c_api/docs/taichi/taichi_core.h.md
+++ b/c_api/docs/taichi/taichi_core.h.md
@@ -299,7 +299,7 @@ Usages of a memory allocation. Taichi requires kernel argument memories to be al
 - `bit_field.memory_usage.storage`: The memory can be read/write accessed by any kernel.
 - `bit_field.memory_usage.uniform`: The memory can be used as a uniform buffer in graphics pipelines.
 - `bit_field.memory_usage.vertex`: The memory can be used as a vertex buffer in graphics pipelines.
-- `bit_field.memory_usage.index`: The memory can be used as a index buffer in graphics pipelines.
+- `bit_field.memory_usage.index`: The memory can be used as an index buffer in graphics pipelines.
 
 `structure.memory_allocate_info`
 
diff --git a/docs/lang/articles/c-api/taichi_core.md b/docs/lang/articles/c-api/taichi_core.md
index 63bcff6b8a84a..e5eb4e4d283a5 100644
--- a/docs/lang/articles/c-api/taichi_core.md
+++ b/docs/lang/articles/c-api/taichi_core.md
@@ -90,7 +90,7 @@ To read data back to the host, `host_read` *must* be set to true.
 ```cpp
 TiMemoryAllocateInfo mai {};
 mai.size = 1024; // Size in bytes.
-mai.host_write = true;
+mai.host_read = true;
 mai.usage = TI_MEMORY_USAGE_STORAGE_BIT;
 TiMemory read_back_memory = ti_allocate_memory(runtime, &mai);
 
@@ -340,7 +340,7 @@ typedef enum TiError {
 } TiError;
 ```
 
-Errors reported by the Taichi C-API.
+Errors reported by the Taichi C-API. Enumerants greater than or equal to zero are success states.
 
 - `TI_ERROR_INCOMPLETE`: The output data is truncated because the user-provided buffer is too small.
 - `TI_ERROR_SUCCESS`: The Taichi C-API invocation finished gracefully.
@@ -410,9 +410,9 @@ typedef enum TiDataType {
 
 Elementary (primitive) data types. There might be vendor-specific constraints on the available data types so it's recommended to use 32-bit data types if multi-platform distribution is desired.
 
-- `TI_DATA_TYPE_F16`: 16-bit IEEE 754 floating-point number.
-- `TI_DATA_TYPE_F32`: 32-bit IEEE 754 floating-point number.
-- `TI_DATA_TYPE_F64`: 64-bit IEEE 754 floating-point number.
+- `TI_DATA_TYPE_F16`: 16-bit IEEE 754 half-precision floating-point number.
+- `TI_DATA_TYPE_F32`: 32-bit IEEE 754 single-precision floating-point number.
+- `TI_DATA_TYPE_F64`: 64-bit IEEE 754 double-precision floating-point number.
 - `TI_DATA_TYPE_I8`: 8-bit one's complement signed integer.
 - `TI_DATA_TYPE_I16`: 16-bit one's complement signed integer.
 - `TI_DATA_TYPE_I32`: 32-bit one's complement signed integer.
@@ -438,9 +438,10 @@ typedef enum TiArgumentType {
 
 Types of kernel and compute graph argument.
 
-- `TI_ARGUMENT_TYPE_I32`: Signed 32-bit integer.
-- `TI_ARGUMENT_TYPE_F32`: Signed 32-bit floating-point number.
+- `TI_ARGUMENT_TYPE_I32`: 32-bit one's complement signed integer.
+- `TI_ARGUMENT_TYPE_F32`: 32-bit IEEE 754 single-precision floating-point number.
 - `TI_ARGUMENT_TYPE_NDARRAY`: ND-array wrapped around a [`TiMemory`](#handle-timemory).
+- `TI_ARGUMENT_TYPE_TEXTURE`: Texture wrapped around a [`TiImage`](#handle-tiimage).
 
 ---
 ### BitField `TiMemoryUsageFlags`
@@ -456,12 +457,12 @@ typedef enum TiMemoryUsageFlagBits {
 typedef TiFlags TiMemoryUsageFlags;
 ```
 
-Usages of a memory allocation.
+Usages of a memory allocation. Taichi requires kernel argument memories to be allocated with `TI_MEMORY_USAGE_STORAGE_BIT`.
 
-- `TI_MEMORY_USAGE_STORAGE_BIT`: The memory can be read/write accessed by any kernel. In most cases, the users only need to set this flag.
+- `TI_MEMORY_USAGE_STORAGE_BIT`: The memory can be read/write accessed by any kernel.
 - `TI_MEMORY_USAGE_UNIFORM_BIT`: The memory can be used as a uniform buffer in graphics pipelines.
 - `TI_MEMORY_USAGE_VERTEX_BIT`: The memory can be used as a vertex buffer in graphics pipelines.
-- `TI_MEMORY_USAGE_INDEX_BIT`: The memory can be used as a index buffer in graphics pipelines.
+- `TI_MEMORY_USAGE_INDEX_BIT`: The memory can be used as an index buffer in graphics pipelines.
 
 ---
 ### Structure `TiMemoryAllocateInfo`
@@ -552,10 +553,10 @@ typedef enum TiImageUsageFlagBits {
 typedef TiFlags TiImageUsageFlags;
 ```
 
-Usages of an image allocation.
+Usages of an image allocation. Taichi requires kernel argument images to be allocated with `TI_IMAGE_USAGE_STORAGE_BIT` and `TI_IMAGE_USAGE_SAMPLED_BIT`.
 
-- `TI_IMAGE_USAGE_STORAGE_BIT`: The image can be read/write accessed by any kernel. In most cases, the users only need to set this flag and `TI_IMAGE_USAGE_SAMPLED_BIT`.
-- `TI_IMAGE_USAGE_SAMPLED_BIT`: The image can be read-only accessed by any kernel. In most cases, the users only need to set this flag and `TI_IMAGE_USAGE_STORAGE_BIT`.
+- `TI_IMAGE_USAGE_STORAGE_BIT`: The image can be read/write accessed by any kernel.
+- `TI_IMAGE_USAGE_SAMPLED_BIT`: The image can be read-only accessed by any kernel.
 - `TI_IMAGE_USAGE_ATTACHMENT_BIT`: The image can be used as a color or depth-stencil attachment depending on its format.
 
 ---
@@ -604,7 +605,7 @@ typedef enum TiImageLayout {
 } TiImageLayout;
 ```
 
-- `enumeration.image_layout.`: Undefined layout. An image in this layout does not contain any semantical information.
+- `TI_IMAGE_LAYOUT_UNDEFINED`: Undefined layout. An image in this layout does not contain any semantical information.
 - `TI_IMAGE_LAYOUT_SHADER_READ`: Optimal layout for read-only access, including sampling.
 - `TI_IMAGE_LAYOUT_SHADER_WRITE`: Optimal layout for write-only access.
 - `TI_IMAGE_LAYOUT_SHADER_READ_WRITE`: Optimal layout for read/write access.
@@ -807,7 +808,7 @@ Image data bound to a sampler.
 - `image`: Image bound to the texture.
 - `sampler`: The bound sampler that controls the sampling behavior of `image`.
 - `dimension`: Image Dimension.
-- `extent`: Extent of image.
+- `extent`: Image extent.
 - `format`: Image texel format.
 
 ---
@@ -826,8 +827,9 @@ typedef union TiArgumentValue {
 A scalar or structured argument value.
 
 - `i32`: Value of a 32-bit one's complement signed integer.
-- `f32`: Value of a 32-bit IEEE 754 floating-poing number.
+- `f32`: Value of a 32-bit IEEE 754 single-precision floating-poing number.
 - `ndarray`: An ND-array to be bound.
+- `texture`: A texture to be bound.
 
 ---
 ### Structure `TiArgument`

From 7da89837b5128825fffc23724f3e4cdf7acde41e Mon Sep 17 00:00:00 2001
From: PENGUINLIONG <admin@penguinliong.moe>
Date: Sat, 24 Sep 2022 18:23:41 +0800
Subject: [PATCH 55/59] Editorial update

---
 c_api/docs/taichi/taichi_core.h.md      | 1 +
 docs/lang/articles/c-api/taichi_core.md | 1 +
 2 files changed, 2 insertions(+)

diff --git a/c_api/docs/taichi/taichi_core.h.md b/c_api/docs/taichi/taichi_core.h.md
index 93541731eb5c2..cccfadc163f55 100644
--- a/c_api/docs/taichi/taichi_core.h.md
+++ b/c_api/docs/taichi/taichi_core.h.md
@@ -266,6 +266,7 @@ Types of backend archs.
 - `enumeration.arch.arm64`: Arm64 native CPU backend.
 - `enumeration.arch.cuda`: NVIDIA CUDA GPU backend.
 - `enumeration.arch.vulkan`: Vulkan GPU backend.
+- `enumeration.arch.opengl`: OpenGL GPU backend.
 
 `enumeration.data_type`
 
diff --git a/docs/lang/articles/c-api/taichi_core.md b/docs/lang/articles/c-api/taichi_core.md
index e5eb4e4d283a5..421c0522da140 100644
--- a/docs/lang/articles/c-api/taichi_core.md
+++ b/docs/lang/articles/c-api/taichi_core.md
@@ -383,6 +383,7 @@ Types of backend archs.
 - `TI_ARCH_ARM64`: Arm64 native CPU backend.
 - `TI_ARCH_CUDA`: NVIDIA CUDA GPU backend.
 - `TI_ARCH_VULKAN`: Vulkan GPU backend.
+- `TI_ARCH_OPENGL`: OpenGL GPU backend.
 
 ---
 ### Enumeration `TiDataType`

From 5861295b3b7c4ed2544c813166a63cfd634e5cab Mon Sep 17 00:00:00 2001
From: PENGUINLIONG <admin@penguinliong.moe>
Date: Sat, 24 Sep 2022 18:40:11 +0800
Subject: [PATCH 56/59] Editorial update

---
 c_api/docs/taichi/taichi_core.h.md      | 18 +++++++++---------
 docs/lang/articles/c-api/taichi_core.md | 18 +++++++++---------
 2 files changed, 18 insertions(+), 18 deletions(-)

diff --git a/c_api/docs/taichi/taichi_core.h.md b/c_api/docs/taichi/taichi_core.h.md
index cccfadc163f55..baddfcf49476f 100644
--- a/c_api/docs/taichi/taichi_core.h.md
+++ b/c_api/docs/taichi/taichi_core.h.md
@@ -67,12 +67,12 @@ ti_free_memory(runtime, memory);
 
 By default, memory allocations are physically or conceptually local to the offload target for performance reasons. You can configure the `structure.memory_allocate_info` to enable host access to memory allocations. But please note that host-accessible allocations *may* slow down computation on GPU because of the limited bus bandwidth between the host memory and the device.
 
-You *must* set `host_write` to `true` to allow streaming data to the memory.
+You *must* set `host_write` to `definition.true` to allow streaming data to the memory.
 
 ```cpp
 TiMemoryAllocateInfo mai {};
 mai.size = 1024; // Size in bytes.
-mai.host_write = true;
+mai.host_write = TI_TRUE;
 mai.usage = TI_MEMORY_USAGE_STORAGE_BIT;
 TiMemory steaming_memory = ti_allocate_memory(runtime, &mai);
 
@@ -85,12 +85,12 @@ std::memcpy(dst, src.data(), src.size());
 ti_unmap_memory(runtime, streaming_memory);
 ```
 
-To read data back to the host, `host_read` *must* be set to true.
+To read data back to the host, `host_read` *must* be set to `definition.true`.
 
 ```cpp
 TiMemoryAllocateInfo mai {};
 mai.size = 1024; // Size in bytes.
-mai.host_read = true;
+mai.host_read = TI_TRUE;
 mai.usage = TI_MEMORY_USAGE_STORAGE_BIT;
 TiMemory read_back_memory = ti_allocate_memory(runtime, &mai);
 
@@ -154,8 +154,8 @@ arg1.type = TI_ARGUMENT_TYPE_F32;
 arg1.value.f32 = 123.0f;
 
 TiArgument& arg2 = args[2];
-arg1.type = TI_ARGUMENT_TYPE_NDARRAY;
-arg1.value.ndarray = ndarray;
+arg2.type = TI_ARGUMENT_TYPE_NDARRAY;
+arg2.value.ndarray = ndarray;
 
 ti_launch_kernel(runtime, kernel, args.size(), args.data());
 ```
@@ -204,7 +204,7 @@ A condition or a predicate is not satisfied; a statement is invalid.
 
 A bit field that can be used to represent 32 orthogonal flags. Bits unspecified in the corresponding flag enum are ignored.
 
-**NOTE** Enumerations and bit-field flags in the C-API have a `TI_XXX_MAX_ENUM` case to ensure the enum has a 32-bit range and in-memory size. It has no semantical impact and can be safely ignored.
+> Enumerations and bit-field flags in the C-API have a `TI_XXX_MAX_ENUM` case to ensure the enum has a 32-bit range and in-memory size. It has no semantical impact and can be safely ignored.
 
 `definition.null_handle`
 
@@ -396,7 +396,7 @@ Parameters of a newly allocated image.
 - `structure.image_allocate_info.mip_level_count`: Number of mip-levels.
 - `structure.image_allocate_info.format`: Image texel format.
 - `structure.image_allocate_info.export_sharing`: True if the memory allocation needs to be exported to other backends (e.g., from Vulkan to CUDA).
-- `structure.image_allocate_info.usage`: All possible usage of this image allocation. In most cases, `bit_field.image_usage.storage` and `bit_field.image_usage.sampled` enough.
+- `structure.image_allocate_info.usage`: All possible usages of this image allocation. In most cases, `bit_field.image_usage.storage` and `bit_field.image_usage.sampled` enough.
 
 `structure.image_slice`
 
@@ -452,7 +452,7 @@ Get the last error raised by Taichi C-API invocations. Returns the semantical er
 Set the provided error as the last error raised by Taichi C-API invocations. It can be useful in extended validation procedures in Taichi C-API wrappers and helper libraries.
 
 - `function.set_last_error.error`: Semantical error code.
-- `function.set_last_error.message`: A `\0`-terminated string of the textual error message. Ignored when `message_size` is 0.
+- `function.set_last_error.message`: A null-terminated string of the textual error message or `nullptr` for empty error message.
 
 `function.create_runtime`
 
diff --git a/docs/lang/articles/c-api/taichi_core.md b/docs/lang/articles/c-api/taichi_core.md
index 421c0522da140..69e7fa2675a03 100644
--- a/docs/lang/articles/c-api/taichi_core.md
+++ b/docs/lang/articles/c-api/taichi_core.md
@@ -67,12 +67,12 @@ ti_free_memory(runtime, memory);
 
 By default, memory allocations are physically or conceptually local to the offload target for performance reasons. You can configure the [`TiMemoryAllocateInfo`](#structure-timemoryallocateinfo) to enable host access to memory allocations. But please note that host-accessible allocations *may* slow down computation on GPU because of the limited bus bandwidth between the host memory and the device.
 
-You *must* set `host_write` to `true` to allow streaming data to the memory.
+You *must* set `host_write` to [`TI_TRUE`](#definition-ti_true) to allow streaming data to the memory.
 
 ```cpp
 TiMemoryAllocateInfo mai {};
 mai.size = 1024; // Size in bytes.
-mai.host_write = true;
+mai.host_write = TI_TRUE;
 mai.usage = TI_MEMORY_USAGE_STORAGE_BIT;
 TiMemory steaming_memory = ti_allocate_memory(runtime, &mai);
 
@@ -85,12 +85,12 @@ std::memcpy(dst, src.data(), src.size());
 ti_unmap_memory(runtime, streaming_memory);
 ```
 
-To read data back to the host, `host_read` *must* be set to true.
+To read data back to the host, `host_read` *must* be set to [`TI_TRUE`](#definition-ti_true).
 
 ```cpp
 TiMemoryAllocateInfo mai {};
 mai.size = 1024; // Size in bytes.
-mai.host_read = true;
+mai.host_read = TI_TRUE;
 mai.usage = TI_MEMORY_USAGE_STORAGE_BIT;
 TiMemory read_back_memory = ti_allocate_memory(runtime, &mai);
 
@@ -154,8 +154,8 @@ arg1.type = TI_ARGUMENT_TYPE_F32;
 arg1.value.f32 = 123.0f;
 
 TiArgument& arg2 = args[2];
-arg1.type = TI_ARGUMENT_TYPE_NDARRAY;
-arg1.value.ndarray = ndarray;
+arg2.type = TI_ARGUMENT_TYPE_NDARRAY;
+arg2.value.ndarray = ndarray;
 
 ti_launch_kernel(runtime, kernel, args.size(), args.data());
 ```
@@ -227,7 +227,7 @@ typedef uint32_t TiFlags;
 
 A bit field that can be used to represent 32 orthogonal flags. Bits unspecified in the corresponding flag enum are ignored.
 
-**NOTE** Enumerations and bit-field flags in the C-API have a `TI_XXX_MAX_ENUM` case to ensure the enum has a 32-bit range and in-memory size. It has no semantical impact and can be safely ignored.
+> Enumerations and bit-field flags in the C-API have a `TI_XXX_MAX_ENUM` case to ensure the enum has a 32-bit range and in-memory size. It has no semantical impact and can be safely ignored.
 
 ---
 ### Definition `TI_NULL_HANDLE`
@@ -733,7 +733,7 @@ Parameters of a newly allocated image.
 - `mip_level_count`: Number of mip-levels.
 - `format`: Image texel format.
 - `export_sharing`: True if the memory allocation needs to be exported to other backends (e.g., from Vulkan to CUDA).
-- `usage`: All possible usage of this image allocation. In most cases, `TI_IMAGE_USAGE_STORAGE_BIT` and `TI_IMAGE_USAGE_SAMPLED_BIT` enough.
+- `usage`: All possible usages of this image allocation. In most cases, `TI_IMAGE_USAGE_STORAGE_BIT` and `TI_IMAGE_USAGE_SAMPLED_BIT` enough.
 
 ---
 ### Structure `TiImageSlice`
@@ -894,7 +894,7 @@ TI_DLL_EXPORT void TI_API_CALL ti_set_last_error(
 Set the provided error as the last error raised by Taichi C-API invocations. It can be useful in extended validation procedures in Taichi C-API wrappers and helper libraries.
 
 - `error`: Semantical error code.
-- `message`: A `\0`-terminated string of the textual error message. Ignored when `message_size` is 0.
+- `message`: A null-terminated string of the textual error message or `nullptr` for empty error message.
 
 ---
 ### Function `ti_create_runtime`

From 01597b928112e4d6b3b1ee77279e6b089cce0238 Mon Sep 17 00:00:00 2001
From: PENGUINLIONG <admin@penguinliong.moe>
Date: Sun, 25 Sep 2022 08:40:22 +0800
Subject: [PATCH 57/59] Editorial update

---
 c_api/docs/taichi/taichi_core.h.md      | 2 +-
 docs/lang/articles/c-api/taichi_core.md | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/c_api/docs/taichi/taichi_core.h.md b/c_api/docs/taichi/taichi_core.h.md
index baddfcf49476f..580c254f2519f 100644
--- a/c_api/docs/taichi/taichi_core.h.md
+++ b/c_api/docs/taichi/taichi_core.h.md
@@ -67,7 +67,7 @@ ti_free_memory(runtime, memory);
 
 By default, memory allocations are physically or conceptually local to the offload target for performance reasons. You can configure the `structure.memory_allocate_info` to enable host access to memory allocations. But please note that host-accessible allocations *may* slow down computation on GPU because of the limited bus bandwidth between the host memory and the device.
 
-You *must* set `host_write` to `definition.true` to allow streaming data to the memory.
+You *must* set `host_write` to `definition.true` to allow zero-copy data streaming to the memory.
 
 ```cpp
 TiMemoryAllocateInfo mai {};
diff --git a/docs/lang/articles/c-api/taichi_core.md b/docs/lang/articles/c-api/taichi_core.md
index 69e7fa2675a03..f5f2433ea182d 100644
--- a/docs/lang/articles/c-api/taichi_core.md
+++ b/docs/lang/articles/c-api/taichi_core.md
@@ -67,7 +67,7 @@ ti_free_memory(runtime, memory);
 
 By default, memory allocations are physically or conceptually local to the offload target for performance reasons. You can configure the [`TiMemoryAllocateInfo`](#structure-timemoryallocateinfo) to enable host access to memory allocations. But please note that host-accessible allocations *may* slow down computation on GPU because of the limited bus bandwidth between the host memory and the device.
 
-You *must* set `host_write` to [`TI_TRUE`](#definition-ti_true) to allow streaming data to the memory.
+You *must* set `host_write` to [`TI_TRUE`](#definition-ti_true) to allow zero-copy data streaming to the memory.
 
 ```cpp
 TiMemoryAllocateInfo mai {};

From 5a495fb01da9694a28001b1d3987705628341fed Mon Sep 17 00:00:00 2001
From: PENGUINLIONG <admin@penguinliong.moe>
Date: Sun, 25 Sep 2022 08:44:46 +0800
Subject: [PATCH 58/59] Editorial update

---
 c_api/docs/taichi/taichi_core.h.md      | 4 ++--
 docs/lang/articles/c-api/taichi_core.md | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/c_api/docs/taichi/taichi_core.h.md b/c_api/docs/taichi/taichi_core.h.md
index 580c254f2519f..946f237144d59 100644
--- a/c_api/docs/taichi/taichi_core.h.md
+++ b/c_api/docs/taichi/taichi_core.h.md
@@ -212,7 +212,7 @@ A sentinal invalid handle that will never be produced from a valid call to Taich
 
 `handle.runtime`
 
-Taichi runtime represents an instance of a logical backend and its internal dynamic state. The user is responsible to synchronize any use of `handle.runtime`. The user MUST NOT manipulate multiple `handle.runtime`s in the same thread.
+Taichi runtime represents an instance of a logical backend and its internal dynamic state. The user is responsible to synchronize any use of `handle.runtime`. The user *must not* manipulate multiple `handle.runtime`s in the same thread.
 
 `handle.aot_module`
 
@@ -512,7 +512,7 @@ Transitions the image to the provided image layout. Because Taichi tracks image
 
 `function.launch_kernel`
 
-Launches a Taichi kernel with the provided arguments. The arguments MUST have the same count and types in the same order as in the source code.
+Launches a Taichi kernel with the provided arguments. The arguments *must* have the same count and types in the same order as in the source code.
 
 `function.launch_compute_graph`
 
diff --git a/docs/lang/articles/c-api/taichi_core.md b/docs/lang/articles/c-api/taichi_core.md
index f5f2433ea182d..da47354f44d7d 100644
--- a/docs/lang/articles/c-api/taichi_core.md
+++ b/docs/lang/articles/c-api/taichi_core.md
@@ -247,7 +247,7 @@ A sentinal invalid handle that will never be produced from a valid call to Taich
 typedef struct TiRuntime_t* TiRuntime;
 ```
 
-Taichi runtime represents an instance of a logical backend and its internal dynamic state. The user is responsible to synchronize any use of [`TiRuntime`](#handle-tiruntime). The user MUST NOT manipulate multiple [`TiRuntime`](#handle-tiruntime)s in the same thread.
+Taichi runtime represents an instance of a logical backend and its internal dynamic state. The user is responsible to synchronize any use of [`TiRuntime`](#handle-tiruntime). The user *must not* manipulate multiple [`TiRuntime`](#handle-tiruntime)s in the same thread.
 
 ---
 ### Handle `TiAotModule`
@@ -1111,7 +1111,7 @@ TI_DLL_EXPORT void TI_API_CALL ti_launch_kernel(
 );
 ```
 
-Launches a Taichi kernel with the provided arguments. The arguments MUST have the same count and types in the same order as in the source code.
+Launches a Taichi kernel with the provided arguments. The arguments *must* have the same count and types in the same order as in the source code.
 
 ---
 ### Function `ti_launch_compute_graph` (Device Command)

From 59c8faa213f81ba76bb77d97ce645329d48acef8 Mon Sep 17 00:00:00 2001
From: PENGUINLIONG <admin@penguinliong.moe>
Date: Tue, 27 Sep 2022 09:29:00 +0800
Subject: [PATCH 59/59] Insert C-API references after 'Deployment'

---
 c_api/docs/taichi/taichi_core.h.md              | 4 ++--
 docs/lang/articles/c-api/_category_.json        | 2 +-
 docs/lang/articles/c-api/taichi_core.md         | 4 ++--
 docs/lang/articles/contribution/_category_.json | 2 +-
 docs/lang/articles/internals/_category_.json    | 2 +-
 docs/lang/articles/math/_category_.json         | 2 +-
 docs/lang/articles/reference/_category_.json    | 2 +-
 7 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/c_api/docs/taichi/taichi_core.h.md b/c_api/docs/taichi/taichi_core.h.md
index 946f237144d59..a76eafefb7153 100644
--- a/c_api/docs/taichi/taichi_core.h.md
+++ b/c_api/docs/taichi/taichi_core.h.md
@@ -26,7 +26,7 @@ For convenience, in the following text and other C-API documents, the term *host
 
 Unless otherwise specified, **device**, **backend**, **offload target**, and **GPU** are interchangeable; **host**, **user code**, **user procedure**, and **CPU** are interchangeable.
 
-## How to...
+## HowTo
 
 The following section provides a brief introduction to the Taichi C-API.
 
@@ -116,7 +116,7 @@ TiAotModule aot_module = ti_load_aot_module(runtime, "/path/to/aot/module");
 
 `/path/to/aot/module` should point to the directory that contains a `metadata.tcb`.
 
-You can destroy an unused AOT module if you have done with it; but please ensure there is no kernel or compute graph related to it pending to `function.submit`.
+You can destroy an unused AOT module, but please ensure that there is no kernel or compute graph related to it pending to [`ti_submit`](#function-ti_submit).
 
 ```cpp
 ti_destroy_aot_module(aot_module);
diff --git a/docs/lang/articles/c-api/_category_.json b/docs/lang/articles/c-api/_category_.json
index 9e854c7688fd4..30b93f60aa8b6 100644
--- a/docs/lang/articles/c-api/_category_.json
+++ b/docs/lang/articles/c-api/_category_.json
@@ -1,4 +1,4 @@
 {
   "label": "Taichi Runtime C-API",
-  "position": 17
+  "position": 11
 }
diff --git a/docs/lang/articles/c-api/taichi_core.md b/docs/lang/articles/c-api/taichi_core.md
index da47354f44d7d..0a7ee38fe849b 100644
--- a/docs/lang/articles/c-api/taichi_core.md
+++ b/docs/lang/articles/c-api/taichi_core.md
@@ -26,7 +26,7 @@ For convenience, in the following text and other C-API documents, the term *host
 
 Unless otherwise specified, **device**, **backend**, **offload target**, and **GPU** are interchangeable; **host**, **user code**, **user procedure**, and **CPU** are interchangeable.
 
-## How to...
+## HowTo
 
 The following section provides a brief introduction to the Taichi C-API.
 
@@ -116,7 +116,7 @@ TiAotModule aot_module = ti_load_aot_module(runtime, "/path/to/aot/module");
 
 `/path/to/aot/module` should point to the directory that contains a `metadata.tcb`.
 
-You can destroy an unused AOT module if you have done with it; but please ensure there is no kernel or compute graph related to it pending to [`ti_submit`](#function-ti_submit).
+You can destroy an unused AOT module, but please ensure that there is no kernel or compute graph related to it pending to [`ti_submit`](#function-ti_submit).
 
 ```cpp
 ti_destroy_aot_module(aot_module);
diff --git a/docs/lang/articles/contribution/_category_.json b/docs/lang/articles/contribution/_category_.json
index 48b78c06f1537..439f5dafc6b50 100644
--- a/docs/lang/articles/contribution/_category_.json
+++ b/docs/lang/articles/contribution/_category_.json
@@ -1,4 +1,4 @@
 {
   "label": "Contribution",
-  "position": 12
+  "position": 13
 }
diff --git a/docs/lang/articles/internals/_category_.json b/docs/lang/articles/internals/_category_.json
index 461c4610b1b2d..b2a238177ea77 100644
--- a/docs/lang/articles/internals/_category_.json
+++ b/docs/lang/articles/internals/_category_.json
@@ -1,4 +1,4 @@
 {
   "label": "Internals",
-  "position": 14
+  "position": 15
 }
diff --git a/docs/lang/articles/math/_category_.json b/docs/lang/articles/math/_category_.json
index 1de4c73752683..8d1a77265b9c3 100644
--- a/docs/lang/articles/math/_category_.json
+++ b/docs/lang/articles/math/_category_.json
@@ -1,4 +1,4 @@
 {
   "label": "Math Library",
-  "position": 11
+  "position": 12
 }
diff --git a/docs/lang/articles/reference/_category_.json b/docs/lang/articles/reference/_category_.json
index 5550f144e8609..fe517970195fb 100644
--- a/docs/lang/articles/reference/_category_.json
+++ b/docs/lang/articles/reference/_category_.json
@@ -1,4 +1,4 @@
 {
   "label": "References",
-  "position": 13
+  "position": 14
 }