Skip to content

Releases: taichi-dev/taichi

v1.2.0

25 Oct 09:56
f189fd7
Compare
Choose a tag to compare

Starting from the v1.2.0 release, Taichi follows semantic versioning where regular releases cutting from master branch bumps MINOR version and PATCH version is only bumped when cherry-picking critial bug fixes.

Deprecation Notice

Indexing multi-dimensional ti.ndrange() with a single loop index will be disallowed in future releases.

Highlights

New features

Offline Cache

We introduced the offline cache on CPU and CUDA backends in v1.1.0. In this release, we support this feature on other backends, including Vulkan, OpenGL, and Metal.

  • If your code behaves abnormally, disable offline cache by setting the environment variable TI_OFFLINE_CACHE=0 or offline_cache=False in the ti.init() method call and file an issue with us on Taichi's GitHub repo.
  • See Offline cache for more information.

GDAR (Global Data Access Rule)

A checker is provided for detecting potential violations of global data access rules.

  1. The checker only works in debug mode. To enable it, set debug=True when calling ti.init().
  2. Set validation=True when using ti.ad.Tape() to validate the kernels captured by ti.ad.Tape().
    If a violation occurs, the checker pinpoints the line of code breaking the rules.

For example:

import taichi as ti
ti.init(debug=True)

N = 5
x = ti.field(dtype=ti.f32, shape=N, needs_grad=True)
loss = ti.field(dtype=ti.f32, shape=(), needs_grad=True)
b = ti.field(dtype=ti.f32, shape=(), needs_grad=True)

@ti.kernel
def func_1():
    for i in range(N):
        loss[None] += x[i] * b[None]

@ti.kernel
def func_2():
    b[None] += 100

b[None] = 10
with ti.ad.Tape(loss, validation=True):
    func_1()
    func_2()

"""
taichi.lang.exception.TaichiAssertionError:
(kernel=func_2_c78_0) Breaks the global data access rule. Snode S10 is overwritten unexpectedly.
File "across_kernel.py", line 16, in func_2:
    b[None] += 100
    ^^^^^^^^^^^^^^
"""

Improvements

Performance

Improved Vulkan performance with loops (#6072) (by Lin Jiang)

Python Frontend

  • PrefixSumExecutor is added to improve the performance of prefix-sum operations. The legacy prefix-sum function allocates auxiliary gpu buffers at every function call, which causes an obvious performance problem. The new PrefixSumExecutor is able to avoid allocating buffers again and again. For arrays with the same length, the PrefixSumExecutor only needs to be initialized once, then it is able to perform any number of times prefix-sum operations without redundant field allocations. The prefix-sum operation is only supported on CUDA backend currently. (#6132) (by Yu Zhang)

    Usage:

    N = 100
    arr0 = ti.field(dtype, N)
    arr1 = ti.field(dtype, N)
    arr2 = ti.field(dtype, N)
    arr3 = ti.field(dtype, N)
    arr4 = ti.field(dtype, N)
    
    # initialize arr0, arr1, arr2, arr3, arr4, ...
    # ...
    
    # Performing an inclusive in-place's parallel prefix sum,
    # only one executor is needed for a specified sorting length.
    executor = ti.algorithms.PrefixSumExecutor(N)
    executor.run(arr0)
    executor.run(arr1)
    executor.run(arr2)
    executor.run(arr3)
    executor.run(arr4)
    
  • Runtime integer overflow detection on addition, subtraction, multiplication and shift left operators on Vulkan, CPU and CUDA backends is now available when debug mode is on. To use overflow detection on Vulkan backend, you need to enable printing, and the overflow detection of 64-bit multiplication on Vulkan backend requires NVIDIA driver 510 or higher. (#6178) (#6279) (by Lin Jiang)

    For the following program:

    import taichi as ti
    
    ti.init(debug=True)
    
    @ti.kernel
    def add(a: ti.u64, b: ti.u64)->ti.u64:
        return a + b
    
    add(2 ** 63, 2 ** 63)
      The following warning is printed at runtime:
    Addition overflow detected in File "/home/lin/test/overflow.py", line 7, in add:
        return a + b
               ^^^^^
    
  • Printing is now supported on Vulkan backend on Unix/Windows platforms. To enable printing on vulkan backend, follow instructions at https://docs.taichi-lang.org/docs/master/debugging#applicable-backends (#6075) (by Ailing)

GGUI

Taichi Examples

Three new examples from community contributors are also merged in this release. They include:

  • Animating the fundamental solution of a Laplacian equation, (#6249) (by @bismarckkk)
  • Animating the Kerman vortex street using LBM, (#6249) (by @hietwl)
  • Animating the two streams of instability (#6249) (by JiaoLuhuai)

You can view these examples by running ti example in terminal and select the corresponding index.

Important bug fixes

  • "ti.data_oriented" class instance now correctly releases its allocated memory upon garbage collection. (#6256) (by Zhanlue Yang)
  • "ti.fields" can now be correctly indexed using non-i32 typed indices. (#6276) (by Zhanlue Yang)
  • "ti.select" and "ti.ifte" can now be printed correctly in Taichi Kernels. (#6297) (by Zhanlue Yang)
  • Before this release, setting u64 arguments with numbers greater than 2^63 raises error, and u64 return values are treated as i64 in Python (integers greater than 2^63 are returned as negative numbers). This release fixed those two bugs. (#6267) (#6364) (by Lin Jiang)
  • Taichi now raises an error when the number of the loop variables does not match the dimension of the ndrange for loop instead of malfunctioning. (#6360) (by Lin Jiang)
  • calling ti.append with vector/matrix now throws more proper error message. (#6322) (by Ailing)
  • Division on unsigned integers now works properly on LLVM backends. (#6128) (by Yi Xu)
  • Operator ">>=" now works properly. (#6153) (by Yi Xu)
  • Numpy int is now allowed for SNode shape setting. (#6211) (by Yi Xu)
  • Dimension check for GlobalPtrStmt is now aware of whether it is a cell access. (#6275) (by Yi Xu)
  • Before this release, Taichi autodiff may fail in cases where the condition of an if statement depends on the index of a outer for-loop. The bug has been fixed in this release. (#6207) (by Mingrui Zhang)

Full changelog:

  • [Error] Deprecate ndrange with number of the loop variables != the dimension of the ndrange (#6422) (by Lin Jiang)
  • Adjust aot_demo.sh (by jim19930609)
  • [error] Warn Linux users about manylinux2014 build on startup i(#6416) (by Proton)
  • [misc] Bug fix (by jim19930609)
  • [misc] Bump version (by jim19930609)
  • [vulkan] [bug] Stop using the buffer device address feature on macOS (#6415) (by Yi Xu)
  • [Lang] [bug] Allow filling a field with Expr (#6391) (by Yi Xu)
  • [misc] Rc v1.2.0 cherry-pick PR number 2 (#6384) (by Zhanlue Yang)
  • [misc] Revert PR 6360 (#6386) (by Zhanlue Yang)
  • [misc] Rc v1.2.0 c1 (#6380) (by Zhanlue Yang)
  • [bug] Fix potential bug in #6362 (#6363) (#6371) (by Zhanlue Yang)
  • [example] Add example "laplace equation" (#6302) (by 猫猫子Official)
  • [ci] Android Demo: leave Docker containers intact for debugging (#6357) (by Proton)
  • [autodiff] Skip gradient kernel compilation for validation kernel (#6356) (by Mingrui Zhang)
  • [autodiff] Move autodiff gdar checker to release (#6355) (by Mingrui Zhang)
  • [aot] Removed constraint on same-allocation copy (#6354) (by PENGUINLIONG)
  • [ci] Add new performance monitoring (#6349) (by Proton)
  • [dx12] Only use llvm to compile dx12. (#6339) (by Xiang Li)
  • [opengl] Fix with_opengl when TI_WITH_OPENGL is off (#6353) (by Ailing)
  • [Doc] Add instructions about running clang-tidy checks locally (by Ailing Zhang)
  • [build] Enable readability-redundant-member-init in clang-tidy check (by Ailing Zhang)
  • [build] Enable TI_WITH_VULKAN and TI_WITH_OPENGL for clang-tidy checks (by Ailing Zhang)
  • [build] Enable a few modernize checks in clang-tidy (by Ailing Zhang)
  • [autodiff] Recover kernel autodiff mode after validation (#6265) (by Mingrui Zhang)
  • [test] Adjust rtol for sparse_linear_solver tests (#6352) (by Ailing)
  • [lang] MatrixType bug fix: Fix array indexing with MatrixType-index (#6323) (by Zhanlue Yang)
  • [Lang] MatrixNdarray refactor part13: Add scalarization for TernaryOpStmt (#6314) (by Zhanlue Yang)
  • [Lang] MatrixNdarray refactor part12: Add scalarization for AtomicOpStmt (#6312) (by Zhanlue Yang)
  • [build] Enable a few modernize checks in clang-tidy (by Ailing Zhang)
  • [build] Enable google-explicit-constructor check in clang-tidy (by Ailing Zhang)
  • [build] Enable google-build-explicit-make-pair check in clang-tidy (by Ailing Zhang)
  • [build] Enable a few bugprone related rules in clang-tidy (by Ailing Zhang)
  • [build] Enable modernize-use-override in clang-tidy (by Ailing Zhang)
  • [ci] Use .clang-tidy for check_static_analyzer job (by Ailing Zhang)
  • [mesh] Support arm64 backend for MeshTaichi (#6329) (by Chang Yu)
  • [lang] Throw proper error message if calling ti.append with vector/matrix (#6322) (by Ailing)
  • [aot] Fixed buffer device address import (#6326) (by PENGUINLIONG)
  • [aot] Fixed export of get_instance_proc_addr (#6324) (by PENGUINLIONG)
  • [build] Allow building test when LLVM is off (#6327) (by Ailing)
  • [bug] Fix generating LLVM AOT module for the second time failed (#6311) (by PGZXB)
  • [aot] Per-parameter documentation in C-API header (#6317) (by **P...
Read more

v1.1.3

20 Sep 06:06
1262a70
Compare
Choose a tag to compare

Highlights:

  • Aot module
    • Added texture interfaces to C-API (#5520) (by PENGUINLIONG)
  • Bug fixes
    • Disable vkCmdWriteTimestamp with MacOS to enable tests on Vulkan (#6020) (by Zhanlue Yang)
    • Fix printing i8/u8 (#5893) (by Yi Xu)
    • Fix wrong type cast in codegen of storing quant floats (#5818) (by Yi Xu)
    • Remove wrong optimization: Float x // 1 -> x (#5672) (by Yi Xu)
  • Build system
    • Clean up Taichi core cmake (#5595) (by Bo Qiao)
  • CI/CD workflow
    • Update torch and cuda version (#6054) (by pengyu)
  • Documentation
    • Refactor field (#6006) (by Zhao Liang)
    • Update docstring of pow() (#6046) (by Yi Xu)
    • Fix spelling of numerical and nightly in README.md (#6025) (by Lauchlin)
    • Added Accelerate Python (#5940) (by Vissidarte-Herman)
    • New FAQs added (#5784) (by Olinaaaloompa)
    • Update type cast (#5831) (by Zhao Liang)
    • Update global_settings.md (#5764) (by Zhao Liang)
    • Update init docstring (#5759) (by Zhao Liang)
    • Add introduction to quantized types (#5705) (by Yi Xu)
    • Add docs for GGUI's new features (#5647) (by Mocki)
    • Add introduction to forward mode autodiff (#5680) (by Mingrui Zhang)
    • Add doc about offline cache (#5646) (by Mingming Zhang)
    • Typo in the doc. (#5652) (by dongqi shen)
  • Error messages
    • Add error when breaking/continuing a static for inside non-static if (#5755) (by Lin Jiang)
    • Do not show warning when the offline cache path does not exist (#5747) (by Lin Jiang)
  • Language and syntax
    • Sort coo to build correct csr format sparse matrix on GPU (#6050) (by pengyu)
    • MatrixNdarray refactor part6: Add scalarization for LocalLoadStmt & GlobalLoadStmt with TensorType (#6024) (by Zhanlue Yang)
    • MatrixField refactor 4/n: Disallow invalid matrix field definition (#6074) (by Yi Xu)
    • Fixes matrix-vector multiplication (#6014) (by Mike He)
    • MatrixNdarray refactor part5: Add scalarization for LocalStoreStmt & GlobalStoreStmt with TensorType (#5946) (by Zhanlue Yang)
    • Deprecate SOA-layout for NdarrayMatrix/NdarrayVector (#6030) (by Zhanlue Yang)
    • Indexing for new local matrix implementation (#5783) (by Mike He)
    • Make scalar kernel arguments immutable (#5990) (by Lin Jiang)
    • Demote pow() with integer exponent (#6044) (by Yi Xu)
    • Support abs(i64) (#6018) (by Yi Xu)
    • MatrixNdarray refactor part4: Lowered TensorType to CHI IR level for elementwise-indexed MatrixNdarray (#5936) (by Zhanlue Yang)
    • MatrixNdarray refactor part3: Enable TensorType for MatrixNdarray at Frontend IR level (#5900) (by Zhanlue Yang)
    • Support linear system solving on GPU with cuSolver (#5860) (by pengyu)
    • MatrixNdarray refactor part2: Remove redundant members in python-scope AnyArray (#5885) (by Zhanlue Yang)
    • MatrixNdarray refactor part1: Refactor Taichi kernel argument to use TensorType (#5881) (by Zhanlue Yang)
    • MatrixNdarray refactor part0: Support direct TensorType construction in Ndarray and refactor use of element_shape (#5875) (by Zhanlue Yang)
    • Enable definition of local matrices/vectors (#5782) (by Mike He)
    • Build csr sparse matrix on GPU using coo format ndarray (#5838) (by pengyu)
    • Add @python_scope decorator for selected MatrixNdarray/VectorNdarray methods (#5844) (by Zhanlue Yang)
    • Make python scope comparison return 1 instead of -1 (#5840) (by daylily)
    • Allow implicit conversion of integer types in if conditions (#5763) (by daylily)
    • Support sparse matrix on GPU (#5185) (by pengyu)
    • Improve loop error message and remove the check for real type id (#5792) (by Zhao Liang)
    • Implement index validation for matrices/vectors (#5605) (by Mike He)
  • MeshTaichi
    • Fix nested mesh for (#6062) (by Chang Yu)
  • Vulkan backend
    • Track image layout internally (#5597) (by PENGUINLIONG)

Full changelog:

  • [bug] [gui] Fix a bug of drawing mesh instacing that cpu/cuda objects have an offset when copying to vulkan object (#6028) (by Mocki)
  • [bug] Fix cleaning cache failed (#6100) (by PGZXB)
  • [aot] Support multi-target builds for Apple M1 (#6083) (by PENGUINLIONG)
  • [spirv] [refactor] Rename debug_ segment to names_ (#6094) (by Ailing)
  • [dx12] Update codegen for range_for and mesh_for (#6092) (by Xiang Li)
  • [gui] Direct image presentation & faster direct copy routine (#6085) (by Bob Cao)
  • [vulkan] Support printing in debug mode on vulkan backend (#6075) (by Ailing)
  • [bug] Fix crashing when loading old offline cache files (#6089) (by PGZXB)
  • [ci] Update prebuild binary for llvm 15. (#6091) (by Xiang Li)
  • [example] Add RHI examples (#5969) (by Bob Cao)
  • [aot] Pragma once in taichi.cpp (#6088) (by PENGUINLIONG)
  • [Lang] Sort coo to build correct csr format sparse matrix on GPU (#6050) (by pengyu)
  • [build] Refactor test infrastructure for AOT tests (#6064) (by Zhanlue Yang)
  • [Lang] MatrixNdarray refactor part6: Add scalarization for LocalLoadStmt & GlobalLoadStmt with TensorType (#6024) (by Zhanlue Yang)
  • [Lang] MatrixField refactor 4/n: Disallow invalid matrix field definition (#6074) (by Yi Xu)
  • [bug] Remove unnecessary lower() in AotModuleBuilder::add (#6068) (by PGZXB)
  • [lang] Preserve shape info for Vectors (#6076) (by Mike He)
  • [misc] Simplify PR template (#6063) (by Ailing)
  • [Bug] Disable vkCmdWriteTimestamp with MacOS to enable tests on Vulkan (#6020) (by Zhanlue Yang)
  • [bug] Set cfg.offline_cache after reset() (#6073) (by PGZXB)
  • [ci] [dx12] Enable dx12 build for windows cpu ci. (#6069) (by Xiang Li)
  • [ci] Upgrade conda cudatoolkit version to 11.3 (#6070) (by Proton)
  • [Mesh] [bug] Fix nested mesh for (#6062) (by Chang Yu)
  • [Lang] Fixes matrix-vector multiplication (#6014) (by Mike He)
  • [ir] MatrixField refactor 3/n: Add MatrixFieldExpression (#6010) (by Yi Xu)
  • [dx12] Drop code for llvm passes which prepare for DXIL generation. (#5998) (by Xiang Li)
  • [aot] Guard C-API interfaces with try-catch (#6060) (by PENGUINLIONG)
  • [CI] Update torch and cuda version (#6054) (by pengyu)
  • [Lang] MatrixNdarray refactor part5: Add scalarization for LocalStoreStmt & GlobalStoreStmt with TensorType (#5946) (by Zhanlue Yang)
  • [Lang] Deprecate SOA-layout for NdarrayMatrix/NdarrayVector (#6030) (by Zhanlue Yang)
  • [aot] Dump required device capability in AOT module meta (#6056) (by PENGUINLIONG)
  • [Doc] Refactor field (#6006) (by Zhao Liang)
  • [Lang] Indexing for new local matrix implementation (#5783) (by Mike He)
  • [lang] Reformat source indicator in Python convention (#6053) (by PENGUINLIONG)
  • [misc] Enable offline cache in frontend instead of C++ Side (#6051) (by PGZXB)
  • [lang] Remove redundant codegen of integer pow (#6048) (by Yi Xu)
  • [Doc] Update docstring of pow() (#6046) (by Yi Xu)
  • [Lang] Make scalar kernel arguments immutable (#5990) (by Lin Jiang)
  • [build] Fix compile error on gcc (#6047) (by PGZXB)
  • [llvm] [refactor] Split LLVMCompiledData of kernels and tasks (#6019) (by Lin Jiang)
  • [Lang] Demote pow() with integer exponent (#6044) (by Yi Xu)
  • [doc] Refactor type system (#5984) (by Zhao Liang)
  • [test] Change deprecated make_camera() to Camera() (#6009) (by Zihua Wu)
  • [doc] Fix a typo in README.md (#6033) (by OccupyMars2025)
  • [misc] Lazy load spirv code from disk during offline cache (#6000) (by PGZXB)
  • [aot] Fixed compilation on Linux distros (#6043) (by PENGUINLIONG)
  • [bug] [test] Run C-API tests correctly on Windows (#6038) (by PGZXB)
  • [aot] C-API texture support and tests (#5994) (by PENGUINLIONG)
  • [Doc] Fix spelling of numerical and nightly in README.md (#6025) (by Lauchlin)
  • [doc] Fixed a format issue (#6023) (by Vissidarte-Herman)
  • [doc] Indenting (#6022) (by Vissidarte-Herman)
  • [Lang] Support abs(i64) (#6018) (by Yi Xu)
  • [lang] Merge ti_core.make_index_expr and ti_core.subscript (#5993) (by Zhanlue Yang)
  • [llvm] [refactor] Remove the use of vector with size=1 (#6002) (by Lin Jiang)
  • [bug] [test] Fix patch_os_environ_helper (#6017) (by Lin Jiang)
  • [ci] Remove legacy perf monitoring (to be reworked) (#6015) (by Proton)
  • Fix (#5999) (by PGZXB)
  • [doc] Format updates (#6016) (by Olinaaaloompa)
  • [refactor] Turn on torch_io tests for opengl, vulkan and dx11 backend (#5997) (by Ailing)
  • [ci] Adjust Windows GPU task buildbot tag (#6008) (by Proton)
  • Fixed compilation (#6005) (by PENGUINLIONG)
  • [autodiff] Avoid initializing Field with None (#6007) (by Yi Xu)
  • [doc] Cloth simulation tutorial (#6004) (by Olinaaaloompa)
  • [Lang] MatrixNdarray refactor part4: Lowered TensorType to CHI IR level for elementwise-indexed MatrixNdarray (#5936) (by Zhanlue Yang)
  • [llvm] [refactor] Link modules instead of cloning modules (#5962) (by Lin Jiang)
  • [dx12] Drop code for dxil generation. (#5958) (by Xiang Li)
  • [ci] Windows Build: Use PowerShell 7 (pwsh) (#5996) (by Proton)
  • Use CUDA primary context to work with PyTorch and Numba. (#5992) (by Haidong Lan)
  • [vulkan] Implement offline cache cleaning on vulkan (#5968) (by PGZXB)
  • [ir] MatrixField refactor 2/n: Rename GlobalVariableExpression to FieldExpression (#5989) (by Yi Xu)
    -...
Read more

v1.1.2

18 Aug 13:20
Compare
Choose a tag to compare

This is a bug fix release for v1.1.0.
Full changelog:

  • [misc] Bump version to v1.1.2
  • [Bug] [type] Fix wrong type cast in codegen of storing quant floats (#5818)
  • [bug] Fix incorrect autodiff_mode information in offline cache key (#5737)
  • [Error] Do not show warning when the offline cache path does not exist (#5747)
  • [autodiff] Support shift ptr in dynamic index (#5770)

v1.1.0

10 Aug 17:57
f5bb646
Compare
Choose a tag to compare

Highlights

New features

Quantized data types

High-resolution simulations can deliver great visual quality, but are often limited by the capacity of the onboard GPU memory. This release adds quantized data types, allowing you to define your own integers, fixed-point numbers, or floating-point numbers of arbitrary number of bits that may strike a balance between your hardware limits and simulation effects. See Using quantized data types for a comprehensive introduction.

Offline cache

A Taichi kernel is implicitly compiled the first time it is called. The compilation results are kept in an online in-memory cache to reduce the overhead in the subsequent function calls. As long as the kernel function is unchanged, it can be directly loaded and launched. The cache, however, is no longer available when the program terminates. Then, if you run the program again, Taichi has to re-compile all kernel functions and reconstruct the online in-memory cache. And the first launch of a Taichi function is always slow due to the compilation overhead.
To address this problem, this release adds the offline cache feature, which dumps the compilation cache to the disk for future runs. The first launch overhead can be drastically reduced in subsequent runs. Taichi now constructs and maintains an offline cache by default.
The following table shows the launch overhead of running cornell_box on the CUDA backend with and without offline cache:

Time spent on compilation and cached data loading
Offline cache disabled 24.856s
Offline cache enabled (1st run) 25.435s
Offline cache enabled (2nd run) 0.677s

Note that, for now, the offline cache feature works only on the CPU and CUDA backends. If your code behaves abnormally, disable offline cache by setting the environment variable TI_OFFLINE_CACHE=0 or ti.init(offline_cache=False) and file an issue with us on Taichi's GitHub repo. See Offline cache for more information.

Forward-mode automatic differentiation

Adds forward-mode automatic differentiation via ti.ad.FwdMode. Unlike the existing reverse-mode automatic differentiation, which computes vector-Jacobian product (vJp), forward-mode computes Jacobian-vector product (Jvp) when evaluating derivatives. Therefore, forward-mode automatic differentiation is much more efficient in situations where the number of a function's outputs is greater than its inputs. Read this example, which demonstrates Jacobian matrix computation in forward mode and reverse mode.

SharedArray (experimental)

GPU's shared memory is a fast small memory that is visible within each thread block (or workgroup in Vulkan). It is widely used in scenarios where performance is a crucial concern. To give you access to your GPU's shared memory, this release adds the SharedArray API under the namespace ti.simt.block.
The following diagram illustrates the performance benefits of Taichi's SharedArray. With SharedArray, Taichi Lang is comparable to or even outperforms the equivalent CUDA code.

n-body benchmarking

Texture (experimental)

Taichi now supports texture bilinear sampling and raw texel fetch on both Vulkan and OpenGL backends. This feature leverages the hardware texture unit and diminishes the need for manual composition of bilinear interpolation code in image processing tasks. This feature also provides an easy way for texture mapping in tasks such as rasterization or ray-tracing. On Vulkan backend, Taichi additionally supports image load and store. You can directly manipulate texels of an image and use this very image in subsequent texture mapping.

Note that the current texture and image APIs are in the early stages and subject to change. In the future we plan to support bindless textures to extend to tasks such as ray-tracing. We also plan to extend full texture support to all backends that support texture APIs.

Run ti example simple_texture to see an example of texture support!

Improvements

GGUI

  • Supports fetching and storing the depth information of the current scene:
    • In a Taichi field: ti.ui.Window.get_depth_buffer(field);
    • In a NumPy array: ti.ui.Window.get_depth_buffer_as_numpy().
  • Supports drawing 3D lines using Scene.lines(vertices, width).
  • Supports drawing mesh instances. You can pass a list of transformation matrices (ti.Matrix.field(4, 4, ti.f32, shape=N)) and call ti.ui.Scene.mesh_instance(vertices, transforms=TransformMatrixField) to put various mesh instances at different places.
  • Supports showing the wireframe of a mesh when calling Scene.mesh() or Scene.mesh_instance() by setting show_wireframe=True.

Syntax

  • Taichi dataclass: Taichi now recommends using the @ti.dataclass decorator to define struct types, or even attach functions to them. See Taichi dataclasses for more information.

    @ti.dataclass
    class Sphere:
      center: vec3
      radius: ti.f32
      @ti.func
      def area(self):
        # a function to run in taichi scope
        return 4 * math.pi * self.radius * self.radius
      def is_zero_sized(self):
        # a python scope function
        return self.radius == 0.0
  • As shown in the dataclass example above, vec2, vec3, and vec4 in the taichi.math module (same for ivec and uvec) can be directly used as type hints. The numeric precision of these types is determined by default_ip or default_fp in ti.init().

  • More flexible instantiation for a struct or dataclass:
    In earlier releases, to instantiate a taichi.types.struct and taichi.dataclass, you have to explicitly put down a complete list of member-value pairs like:

    ray = Ray(ro=vec3(0), rd=vec3(1, 0, 0), t=1.0)

    As of this release, you are given more options. The positional arguments are passed to the struct members in the order they are defined; the keyword arguments set the corresponding struct members. Unspecified struct members are automatically set to zero. For example:

    # use positional arguments to set struct members in order
    ray = Ray(vec3(0), vec3(1, 0, 0), 1.0)
    
    # ro is set to vec3(0) and t will be set to 0
    ray = Ray(vec3(0), rd=vec3(1, 0, 0))
    
    # both ro and rd are set to vec3(0)
    ray = Ray(t=1.0)
    
    # ro is set to vec3(1), rd=vec3(0) and t=0.0
    ray = Ray(1)
    
    # all members are set to 0.
    ray = Ray()
  • Supports calling fill() from both the Python scope and the Taichi scope.
    In earlier releases, you can only call fill() from the Python scope, which is a method in the ScalarField or MatrixField class. As of this release, you can call this method from either the Python scope or the Taichi scope. See the following code snippet:

    x = ti.field(int, shape=(10, 10))
    x.fill(1)
    
    @ti.kernel
    def test():
        x.fill(-1)
  • More flexible initialization for customized matrix types:
    As the following code snippet shows, matrix types created using taichi.types.matrix() or taichi.types.vector() can be initialized more flexibly: Taichi automatically combines the inputs and converts them to a matrix whose shape matches the shape of the target matrix type.

    # mat2 and vec3 are predefined types in the ti.math module
    mat2 = ti.types.matrix(2, 2, float)
    vec3 = ti.types.vector(3, float)
    
    m = mat2(1)  # [[1., 1.], [1., 1.]]
    m = mat2(1, 2, 3, 4)  # [[1., 2.], [3, 4.]]
    m = mat2([1, 2], [3, 4])  # [[1., 2.], [3, 4.]]
    m = mat2([1, 2, 3, 4])  # [[1., 2.], [3, 4.]]
    v = vec3(1, 2, 3)
    m = mat2(v, 4)  # [[1., 2.], [3, 4.]]
  • Makes ti.f32(x) syntax sugar for ti.cast(x, ti.f32), if x is neither a literal nor of a compound data type. Same for other primitive types such as ti.i32, ti.u8, or ti.f64.

  • More convenient axes order adjustment: A common way to improve the performance of a Taichi program is to adjust the order of axes when laying out field data in the memory. In earlier releases, this requires in-depth knowledge about the data definition language (the SNode system) and may become an extra burden in situations where sparse data structures are not required. As of this release, Taichi supports specifying the order of axes when defining a Taichi field.

    # Before
    x = ti.field(ti.i32)
    y = ti.field(ti.i32)
    ti.root.dense(ti.i, M).dense(ti.j, N).place(x)  # row-major
    ti.root.dense(ti.j, N).dense(ti.i, M).place(y)  # column-major
    # New syntax
    x = ti.field(ti.i32, shape=(M, N), order='ij')
    y = ti.field(ti.i32, shape=(M, N), order='ji')
    # SoA vs. AoS example
    p = ti.Vector.field(3, ti.i32, shape=(M, N), order='ji', layout=ti.Layout.SOA)
    q = ti.Vector.field(3, ti.i32, shape=(M, N), order='ji', layout=ti.Layout.AOS)

Important bug fixes

  • Fixed infinite loop when an integer pow() has a negative exponent (#5275)
  • Fixed numerical issues with matrix slicing (#4677)
  • Improved data type checks for ti.ndrange (#4478)

API changes

Added

  • ti.BitpackedFields
  • `ti.from_p...
Read more

v1.0.4

12 Jul 11:44
Compare
Choose a tag to compare

Highlights:

  • Documentation
    • Fix typos (#5283) (by Kian-Meng Ang)
    • Update dev_install.md (#5266) (by Vissidarte-Herman)
    • Updated README command lines (#5199) (by Vissidarte-Herman)
    • Modify compilation warnings (#5180) (by Olinaaaloompa)
    • Updated odop.md, removing obsolete information (#5163) (by Vissidarte-Herman)
  • Language and syntax
    • Refine SNode with quant 7/n: Support placing QuantFixedType under quant_array (#5386) (by Yi Xu)
    • Add determinant for 1d case (#5375) (by Zhao Liang)
    • Make floor, ceil and round accept a dtype optional argument (#5307) (by Zhao Liang)
    • Rename struct_class to dataclass (#5365) (by Zhao Liang)
    • Improve ti example so that users can choose which example to run by entering numbers. (#5265) (by Zhao Liang)
    • Refine SNode with quant 5/n: Rename bit_array to quant_array (#5344) (by Yi Xu)
    • Make bit_vectorize a parameter of ti.loop_config (#5334) (by Yi Xu)
    • Refine SNode with quant 3/n: Turn bit_vectorize into an on/off switch (#5331) (by Yi Xu)
    • Add errror message for missing init call (#5280) (by Zhao Liang)
    • Fix fractal gui close warning (#5281) (by Zhao Liang)
    • Refine SNode with quant 2/n: Enable struct for on bit_array with bit_vectorize off (#5253) (by Yi Xu)
    • Refactor indexing expressions in AST & enforce integer indices (#5138) (by daylily)

Full changelog:

  • Revert "[llvm] (Decomp of #5251 11/n) Enable parallel compilation on CPU backend (#5394)" (by Proton)
  • [refactor] Default dtype of ndarray type should be None instead of f32 (#5391) (by Ailing)
  • [llvm] (Decomp of #5251 11/n) Enable parallel compilation on CPU backend (#5394) (by Lin Jiang)
  • [gui] [vulkan] Surpport for python users to control the start index and count number of particles & meshes data. (#5388) (by Mocki)
  • [autodiff] Support binary operators for forward mode (#5389) (by Mingrui Zhang)
  • [llvm] (Decomp of #5251 10/n) Make SNode tree compatible with parallel compilation (#5390) (by Lin Jiang)
  • [llvm] [refactor] (Decomp of #5251 9/n) Refactor CodeGen to support parallel compilation on LLVM backend (#5387) (by Lin Jiang)
  • [Lang] [type] Refine SNode with quant 7/n: Support placing QuantFixedType under quant_array (#5386) (by Yi Xu)
  • [llvm] [refactor] (Decomp of #5251 8/n) Refactor KernelCacheData (#5383) (by Lin Jiang)
  • [cuda] [type] Refine SNode with quant 6/n: Support __ldg for loading QuantFixedType and QuantFloatType (#5374) (by Yi Xu)
  • [doc] Add simt functions in operators (#5333) (by Bo Qiao)
  • [Lang] Add determinant for 1d case (#5375) (by Zhao Liang)
  • [lang] Texture image load store support (#5317) (by Bob Cao)
  • [bug] Cast scalar to right type before converting to uint64 (by Ailing Zhang)
  • [refactor] Check dtype mismatch in cgraph compilation and runtime (by Ailing Zhang)
  • [refactor] Check field_dim mismatch in cgraph compilation and runtime (by Ailing Zhang)
  • [test] Check repeated arg names in cgraph (by Ailing Zhang)
  • [llvm] [refactor] (Decomp of #5251 6/n) Let ModuleToFunctionConverter support multiple modules (#5372) (by Lin Jiang)
  • [Lang] Make floor, ceil and round accept a dtype optional argument (#5307) (by Zhao Liang)
  • [refactor] Rename the confused needs_grad (#5359) (by Mingrui Zhang)
  • [autodiff] Support unary ops for forward mode (#5366) (by Mingrui Zhang)
  • [llvm] (Decomp of #5251 7/n) Change the way to record the time of offline cache (#5373) (by Lin Jiang)
  • [llvm] (Decomp of #5251 5/n) Add the parallel compilation worker to LlvmProgramImpl (#5364) (by Lin Jiang)
  • [gui] [test] Fix bug in test_ggui.py when some pc env do not surrport ggui (#5370) (by Mocki)
  • [Lang] Rename struct_class to dataclass (#5365) (by Zhao Liang)
  • [llvm] Drop code for llvm 15. (#5313) (by Xiang Li)
  • [llvm] [aot] Rewrite LLVM AOT tests with LlvmRuntimeExecutor (#5358) (by Zhanlue Yang)
  • [example] Avoid f64 type in simulation/initial_value_problem.py (#5355) (by Proton)
  • [ci] testing: add retention-days for broken wheels (#5326) (by Proton)
  • [test] (Decomp of #5251 4/n) Delete tests for AsyncTaichi (#5357) (by Lin Jiang)
  • [llvm] [refactor] (Decomp of #5251 2/n) Make modulegen a virtual function and let LLVMCompiledData replace ModuleGenValue (#5353) (by Lin Jiang)
  • [gui] Support exporting gif && video in GGUI (#5354) (by Mocki)
  • [autodiff] Handle field accessing by zero for forward mode (#5339) (by Mingrui Zhang)
  • [llvm] [refactor] (Decomp of #5251 3/n) Remove codegen from OffloadedTask and let it replace OffloadedTaskCacheData (#5356) (by Lin Jiang)
  • [refactor] Turn off stack traceback info by default (#5347) (by Ailing)
  • [refactor] (Decomp of #5251 1/n) Move ParallelExecutor out of async engine (#5351) (by Lin Jiang)
  • [Lang] Improve ti example so that users can choose which example to run by entering numbers. (#5265) (by Zhao Liang)
  • [gui] Add get_view_matrix() and get_projection_matrix() APIs for camera (#5345) (by Mocki)
  • [bug] Added warning messages for implicit type conversion for RangeFor boundaries (#5322) (by Zhanlue Yang)
  • [example] Fix simulation/waterwave.py:update race condition (#5346) (by Proton)
  • [Lang] [type] Refine SNode with quant 5/n: Rename bit_array to quant_array (#5344) (by Yi Xu)
  • [llvm] [aot] Added CGraph tests for LLVM backend (#5305) (by Zhanlue Yang)
  • [autodiff] [test] Add for-loop tests for forward mode (#5336) (by Mingrui Zhang)
  • [example] Lower example GUI resolution to fit buildbot display (#5337) (by Proton)
  • [build] [bug] Fix building on macOS 10.14 failed (#5332) (by PGZXB)
  • [llvm] [aot] Replaced LlvmProgramImpl with LlvmRuntimeExecutor for LlvmAotModuleLoader (#5330) (by Zhanlue Yang)
  • [AOT] Fixed certain crashes in C-API (#5335) (by PENGUINLIONG)
  • [Lang] [type] Make bit_vectorize a parameter of ti.loop_config (#5334) (by Yi Xu)
  • [autodiff] Skip store forwarding to keep the GlobalLoadStmt alive (#5315) (by Mingrui Zhang)
  • [llvm] [aot] RModified ModuleToFunctionConverter to use LlvmRuntimeExecutor instead of LlvmProgramImpl (#5328) (by Zhanlue Yang)
  • [llvm] Changed LlvmProgramImpl to save cache_data_ with unique_ptr instead of raw object (#5329) (by Zhanlue Yang)
  • [Lang] [type] Refine SNode with quant 3/n: Turn bit_vectorize into an on/off switch (#5331) (by Yi Xu)
  • [misc] Fix a few compilation warnings (#5325) (by yekuang)
  • [bug] Accept numpy integers in ndrange (#5245) (#5323) (by Proton)
  • [misc] Implement cache file cleaning (#5310) (by PGZXB)
  • Fixed C-AP build on Android (#5321) (by PENGUINLIONG)
  • [AOT] Save AOT module artifacts as zip archive (#5316) (by PENGUINLIONG)
  • [llvm] [aot] Added LLVM backend support for Compute Graph (#5294) (by Zhanlue Yang)
  • [AOT] Unity native plugin interfaces (#5273) (by PENGUINLIONG)
  • [autodiff] Check not placed field.grad when needs_grad = True (#5295) (by Mingrui Zhang)
  • [autodiff] Fix alloca block and add control flow test case for forward mode (#5301) (by Mingrui Zhang)
  • [refactor] Synchronize should always be called in non-async mode (#5302) (by Ailing)
  • [Lang] Add errror message for missing init call (#5280) (by Zhao Liang)
  • Update prtags.json (#5304) (by Bob Cao)
  • [refactor] Get rid ndarray host accessor kernels (by Ailing Zhang)
  • [refactor] Use device api for CPU/CUDA ndarray (by Ailing Zhang)
  • [refactor] Switch to using staging buffer for metal/vulkan/opengl (by Ailing Zhang)
  • [llvm] Use LlvmProgramImpl::cache_data_ to store compiled kernel info (#5290) (by Zhanlue Yang)
  • [opengl] Texture support in OpenGL (#5296) (by Bob Cao)
  • [build] [refactor] Cleanup backends folder and rename to RHI (#5288) (by Bo Qiao)
  • [Lang] Fix fractal gui close warning (#5281) (by Zhao Liang)
  • [autodiff] [test] Add atomic test for forward autodiff (#5286) (by Mingrui Zhang)
  • [dx11] Fix DX backend with new runtime & Better D3D11 buffer handling (#5244) (by Bob Cao)
  • [autodiff] Set default seed only for scalar parameter to avoid silent unexpected results (#5287) (by Mingrui Zhang)
  • test (#5292) (by Ailing)
  • [AOT] Added C-API for on-device memory copy (#5271) (by PENGUINLIONG)
  • [Doc] Fix typos (#5283) (by Kian-Meng Ang)
  • [autodiff] Support control flow for forward mode (by mingrui)
  • [autodiff] Support for-loop and mutation for forward mode (by mingrui)
  • [autodiff] Refactor dual field allocation (by mingrui)
  • [AOT] Refactor C-API codegen (#5272) (by PENGUINLIONG)
  • Update README.md (#5279) (by Taichi contributor)
  • [metal] Support memcpy_internal via buffer_copy (#5268) (by Ailing)
  • [bug] Fix missing old but useful metadata in offline cache (#5267) (by PGZXB)
  • [Lang] [type] Refine SNode with quant 2/n: Enable struct for on bit_array with bit_vectorize off (#5253) (by Yi Xu)
  • [Doc] Update dev_install.md (#5266) (by Vissidarte-Herman)
  • [build] [bug] Fix dependency for opengl_rhi target (by Bo Qiao)
  • Update fallback order, move opengl behind Vulkan (#5257) (by Bob Cao)
  • [opengl] Move OpenGL backend onto Gfx runtime (#5246) (by Bob Cao)
  • [build] [refactor] Move LLVM source files to target locations (#5254) (by Bo Qiao)
  • [bug] Fixed misuse of std::forward (#5237) (by Zhanlue Yang)
  • [AOT] Added safety checks to prevent hard crashes on failure (#5249) (...
Read more

v1.0.3

13 Jun 23:07
fae94a2
Compare
Choose a tag to compare

Highlights:

  • Aot module
    • Support importing external Vulkan buffers (#5020) (by PENGUINLIONG)
    • Supported inclusion of taichi as subdirectory for AOT modules (#5007) (by PENGUINLIONG)
  • Bug fixes
    • Fix frontend type check for reading a whole bit_struct (#5027) (by Yi Xu)
    • Remove redundant AllocStmt when lowering FrontendWhileStmt (#4870) (by Zhanlue Yang)
  • Build system
    • Improve Windows build script (#4955) (by PENGUINLIONG)
    • Improved building on Windows (#4925) (by PENGUINLIONG)
    • Define Cmake OpenGL runtime target (#4887) (by Bo Qiao)
    • Use keywords instead of plain target_link_libraries CMake (#4864) (by Bo Qiao)
    • Define runtime build target (#4838) (by Bo Qiao)
    • Switch to scikit-build as the build backend (#4624) (by Frost Ming)
  • Documentation
    • Improve ODOP doc structure (#5089) (by Yi Xu)
    • Add documentation of Taichi Struct Classes. (#5075) (by bsavery)
    • Updated type system (#5054) (by Vissidarte-Herman)
    • Branding updates. Also tests netlify. (#4994) (by Vissidarte-Herman)
    • Fix netlify cache & sync doc without pr content (#5003) (by Justin)
    • Update trouble shooting URL in bug report template (#4988) (by Haidong Lan)
    • Updated URL (#4990) (by Vissidarte-Herman)
    • Fix docs deploy netlify test configuration (#4991) (by Justin)
    • Updated relative path (#4929) (by Vissidarte-Herman)
    • Updated broken links (#4912) (by Vissidarte-Herman)
    • Updated links that may break. (#4874) (by Vissidarte-Herman)
    • Add limitation about TLS optimization (#4877) (by Ailing)
  • Examples
    • Fix block_dim warning in ggui (#5128) (by Zhao Liang)
    • Update visual effects of mass_spring_3d_ggui.py (#5081) (by Zhao Liang)
    • Update mass_spring_3d_ggui.py to v2 (#3879) (by Alex Brown)
  • Language and syntax
    • Add more initialization routines for glsl matrix types (#5069) (by Zhao Liang)
    • Support constructing vector and matrix ndarray from ti.ndarray() (by ailzhang)
    • Disallow reading a whole bit_struct (#5061) (by Yi Xu)
    • Struct Classes implementation (#4989) (by bsavery)
    • Add short-circuit if-then-else operator (#5022) (by daylily)
    • Build sparse matrix from ndarray (#4841) (by pengyu)
    • Fix potential precision bug when using math vector and matrix types (#5032) (by Zhao Liang)
    • Refactor quant type definition APIs (#5036) (by Yi Xu)
    • Fix parameter name 'range' for ti.types.quant.fixed (#5006) (by Yi Xu)
    • Refactor quantized_types module and make quant APIs public (#4985) (by Yi Xu)
    • Add more functions to math module (#4939) (by Zhao Liang)
    • Support sparse matrix datatype and storage format configuration (#4673) (by pengyu)
    • Copy-free interaction between Taichi and PaddlePaddle (#4886) (by 0xzhang)
  • LLVM backend (CPU and CUDA)
    • Add AOT builder and loader (#5013) (by yekuang)
  • Metal backend
    • Support Ndarray (#4720) (by yekuang)
  • RFC
    • AOT for all SNodes (#4806) (by yekuang)
  • SIMT programming
    • Add match_all warp intrinsics (#4961) (by Zeyu Li)
    • Add match_any warp intrinsics (#4921) (by Zeyu Li)
    • Add uni_sync warp intrinsics (#4927) (by 0xzhang)
    • Add activemask warp intrinsics (#4918) (by Zeyu Li)
    • Add syncwarp warp intrinsics (#4917) (by Zeyu Li)
  • Vulkan backend
    • Fixed vulkan backend crash on AOT examples (#5047) (by PENGUINLIONG)
  • GitHub Actions/Workflows
    • Update release_test.sh (#4960) (by Chuandong Yan)

Full changelog:

  • [aot] [llvm] LLVM AOT Field #2: Updated LLVM AOTModuleLoader & AOTModuleBuilder to support Fields (#5120) (by Zhanlue Yang)
  • [type] [refactor] Misc improvements to quant codegen (#5129) (by Yi Xu)
  • [ci] Enable yapf and isort on example files (#5140) (by Ailing)
  • [Example] Fix block_dim warning in ggui (#5128) (by Zhao Liang)
  • fix mass_spring_3d_ggui backend (#5127) (by Zhao Liang)
  • [lang] Texture support 0/n: IR changes (#5134) (by Bob Cao)
  • Editorial update (#5119) (by Olinaaaloompa)
  • [aot] [llvm] LLVM AOT Field #1: Adjust serialization/deserialization logics for FieldCacheData (#5111) (by Zhanlue Yang)
  • [aot][bug] Use cached compiled kernel pointer when it's added to graph (#5122) (by Ailing)
  • [aot] [llvm] LLVM AOT Field #0: Implemented FieldCacheData & refactored initialize_llvm_runtime_snodes() (#5108) (by Zhanlue Yang)
  • [autodiff] Add forward mode pipeline for autodiff pass (#5098) (by Mingrui Zhang)
  • [build] [refactor] Move Vulkan runtime out of backends dir (#5106) (by Bo Qiao)
  • [bug] Fix build without llvm backend crash (#5113) (by Bo Qiao)
  • [type] [llvm] [refactor] Fix function names in codegen_llvm_quant (#5115) (by Yi Xu)
  • [llvm] [refactor] Replace cast_int() with LLVM native integer cast (#5110) (by Yi Xu)
  • [type] [refactor] Remove redundant promotion for custom int in type_check (#5102) (by Yi Xu)
  • [Example] Update visual effects of mass_spring_3d_ggui.py (#5081) (by Zhao Liang)
  • [test] Save mpm88 graph in python and load in C++ test. (#5104) (by Ailing)
  • [llvm] [refactor] Move load_bit_pointer() to CodeGenLLVM (#5099) (by Yi Xu)
  • [refactor] Remove ndarray element shape from extra arg buffer (#5100) (by Haidong Lan)
  • [refactor] Update Ndarray constructor used in AOT runtime. (#5095) (by Ailing)
  • clean hidden override functions (#5097) (by Mingrui Zhang)
  • [llvm] [aot] CUDA-AOT PR #2: Implemented AOTModuleLoader & AOTModuleBuilder for LLVM-CUDA backend (#5087) (by Zhanlue Yang)
  • [Doc] Improve ODOP doc structure (#5089) (by Yi Xu)
  • Use pre-calculated runtime size array for gfx runtime. (#5094) (by Haidong Lan)
  • [bug] Minor fix for ndarray element_shape in graph mode (#5093) (by Ailing)
  • [llvm] [refactor] Use LLVM native atomic ops if possible (#5091) (by Yi Xu)
  • [autodiff] Extract shared components for reverse and forward mode (#5088) (by Mingrui Zhang)
  • [llvm] [aot] Add LLVM-CPU AOT tests (#5079) (by Zhanlue Yang)
  • [Doc] Add documentation of Taichi Struct Classes. (#5075) (by bsavery)
  • [build] [refactor] Change CMake global include_directories to target based function (#5082) (by Bo Qiao)
  • [autodiff] Allocate dual and adjoint snode (#5083) (by Mingrui Zhang)
  • [refactor] Make sure Ndarray shape is field shape (#5085) (by Ailing)
  • [llvm] [refactor] Merge AtomicOpStmt codegen in CPU and CUDA backends (#5086) (by Yi Xu)
  • [llvm] [aot] CUDA-AOT PR #1: Extracted common logics from CPUAotModuleImpl into LLVMAotModule (#5072) (by Zhanlue Yang)
  • [infra] Refactor Vulkan runtime into true Common Runtime (#5058) (by Bob Cao)
  • [refactor] Correctly set ndarray element_size and nelement (#5080) (by Ailing)
  • [cuda] [simt] Add assertions for warp intrinsics on old GPUs (#5077) (by Bo Qiao)
  • [Lang] Add more initialization routines for glsl matrix types (#5069) (by Zhao Liang)
  • [spirv] Specialize element shape for spirv codegen. (#5068) (by Haidong Lan)
  • [llvm] Specialize element shape for LLVM backend (#5071) (by Haidong Lan)
  • [doc] Fix broken link for github action status badge (#5076) (by Ailing)
  • [Example] Update mass_spring_3d_ggui.py to v2 (#3879) (by Alex Brown)
  • [refactor] Resolve comments from #5065 (#5074) (by Ailing)
  • [Lang] Support constructing vector and matrix ndarray from ti.ndarray() (by ailzhang)
  • [refactor] Pass element_shape and layout to C++ Ndarray (by ailzhang)
  • [refactor] Specialized Ndarray Type is (element_type, shape, layout) (by ailzhang)
  • [aot] [CUDA-AOT PR #0] Refactored compile_module_to_executable() to CUDAModuleToFunctionConverter (#5070) (by Zhanlue Yang)
  • [refactor] Split GraphBuilder out of Graph class (#5064) (by Ailing)
  • [build] [bug] Ensure the assets folder is copied to the project directory (#5063) (by Frost Ming)
  • [bug] Remove operator ! for Expr (#5062) (by Yi Xu)
  • [Lang] [type] Disallow reading a whole bit_struct (#5061) (by Yi Xu)
  • [Lang] Struct Classes implementation (#4989) (by bsavery)
  • [Lang] [ir] Add short-circuit if-then-else operator (#5022) (by daylily)
  • [bug] Ndarray type should include primitive dtype as well (#5052) (by Ailing)
  • [Doc] Updated type system (#5054) (by Vissidarte-Herman)
  • [bug] Added type promotion support for atan2 (#5037) (by Zhanlue Yang)
  • [Lang] Build sparse matrix from ndarray (#4841) (by pengyu)
  • Set host_write to false for opengl ndarray (#5038) (by Ailing)
  • [ci] Run cpp tests via run_tests.py (#5035) (by yekuang)
  • Exit CI builds when download of prebuilt packages fails (#5043) (by PENGUINLIONG)
  • [Vulkan] Fixed vulkan backend crash on AOT examples (#5047) (by PENGUINLIONG)
  • [Lang] Fix potential precision bug when using math vector and matrix types (#5032) (by Zhao Liang)
  • [Metal] Support Ndarray (#4720) (by yekuang)
  • [Lang] [type] Refactor quant type definition APIs (#5036) (by Yi Xu)
  • [aot] Bind graph APIs to python and add mpm88 example (#5034) (by Ailing)
  • [aot] Move ArgKind as first argument in Arg class (by ailzhang)
  • [aot] Serialize built graph, deserialize and run. (by ailzhang)
  • [ci] Disable win cpu docker job test (#5033) (by Bo Qiao)
  • [doc] Update OS names (#5030) (by Bo Qiao)...
Read more

v1.0.2

18 May 07:44
Compare
Choose a tag to compare

Highlights:

The v1.0.2 release is a patch fix that improves Taichi's stability on multiple platforms, especially for GGUI and the Vulkan backend.

  • Bug fixes
    • Remove redundant AllocStmt when lowering FrontendWhileStmt (#4870) (by Zhanlue Yang)
  • Build system
    • Define Cmake OpenGL runtime target (#4887) (by Bo Qiao)
    • Use keywords instead of plain target_link_libraries CMake (#4864) (by Bo Qiao)
    • Define runtime build target (#4838) (by Bo Qiao)
    • Switch to scikit-build as the build backend (#4624) (by Frost Ming)
  • Documentation
    • Add limitation about TLS optimization (#4877) (by Ailing)

Full changelog:

  • [ci] Fix Nightly (#4948) (by Bo Qiao)
  • [ci] [build] Containerize Windows CPU build and test (#4933) (by Bo Qiao)
  • [vulkan] Set kApiVersion to VK_API_VERSION_1_3 (#4970) (by Haidong Lan)
  • [ci] Add new buildbot with latest driver for Linux/Vulkan test (#4953) (by Bo Qiao)
  • [vulkan] Add new VMA vulkan functions. (#4893) (by Bob Cao)
  • [vulkan] Fix typo for waitSemaphoreCount (#4892) (by Gabriel H)
  • [Build] [refactor] Define Cmake OpenGL runtime target (#4887) (by Bo Qiao)
  • [Build] [refactor] Use keywords instead of plain target_link_libraries CMake (#4864) (by Bo Qiao)
  • [vulkan] Device API explicit semaphores (#4852) (by Bob Cao)
  • [build] Change the library output dir for export core (#4880) (by Frost Ming)
  • [ci] Use the updated docker image for libtaichi_export_core (#4881) (by Bo Qiao)
  • [Doc] Add limitation about TLS optimization (#4877) (by Ailing)
  • [Build] [refactor] Define runtime build target (#4838) (by Bo Qiao)
  • [ci] Add libtaichi_export_core build for desktop in CI (#4871) (by Ailing)
  • [build] [bug] Fix a bug of skbuild that loses the root package_dir (#4875) (by Frost Ming)
  • [Bug] Remove redundant AllocStmt when lowering FrontendWhileStmt (#4870) (by Zhanlue Yang)
  • [misc] Bump version to v1.0.2 (#4867) (by Taichi Gardener)
  • [build] Install export core library to build dir (#4866) (by Frost Ming)
  • [Build] Switch to scikit-build as the build backend (#4624) (by Frost Ming)

v1.0.1

27 Apr 04:36
1c3619d
Compare
Choose a tag to compare

Highlights:

  • Automatic differentiation
    • Implement ti.ad.no_grad to skip autograd (#4751) (by Shawn Yao)
  • Bug fixes
    • Fix and refactor type check for atomic ops (#4858) (by Yi Xu)
    • Fix and refactor type check for local stores (#4843) (by Yi Xu)
    • Fix implicit cast warning for global stores (#4834) (by Yi Xu)
  • Documentation
    • Updated URL (#4847) (by Vissidarte-Herman)
    • LLVM sparse runtime design doc (#4790) (by yekuang)
    • Proofread Getting started (#4682) (by Vissidarte-Herman)
    • Editorial review to fields (advanced) (#4686) (by Vissidarte-Herman)
    • Update docstring for ti.Mesh (#4818) (by Chang Yu)
    • Remove redundant semicolon in path (#4801) (by gaoxinge)
  • Error messages
    • Show warning when serialize=True is set on a struct for (#4844) (by Lin Jiang)
    • Provide source code info in warnings (#4840) (by Yi Xu)
  • Language and syntax
    • Add single character property for vector swizzle && test (#4845) (by Zhao Liang)
    • Remove obsolete vectypes class (#4831) (by LiangZhao)
    • Add support for keyword arguments (#4794) (by Lin Jiang)
    • Support swizzles on all Matrix/Vector types (#4828) (by yekuang)
    • Add 2d and 3d rotation functions to math module (#4822) (by Zhao Liang)
    • Walkaround Vulkan backend behavior which changes cwd on Mac (#4812) (by TiGeekMan)
    • Add mod function to math module (#4809) (by Zhao Liang)
    • Support in-place operator of ti.Matrix in python scope (#4799) (by Lin Jiang)
    • Move short-circuit boolean logic into AST-to-IR passes (#4580) (by daylily)
    • Promote output type of log, exp, and sqrt ops (#4622) (by Andrew Sun)
    • Fix integral type promotion rules (e.g., u8 + u8 now leads to u8 instead of i32) (#4789) (by Yuanming Hu)
    • Add basic complex arithmetic and add a mandelbrot example (#4780) (by Zhao Liang)
  • SIMT programming
    • Add shfl_down_f32 intrinsic. (#4819) (by Chun Cai)

Full changelog:

  • [gui] Avoid implicit type casts in staging_buffer (#4861) (by Yi Xu)
  • [lang] Add better error detection for swizzle patterens (#4860) (by yekuang)
  • [Bug] [ir] Fix and refactor type check for atomic ops (#4858) (by Yi Xu)
  • [Doc] Updated URL (#4847) (by Vissidarte-Herman)
  • [bug] Fix bug that building with TI_EXPORT_CORE:BOOL=ON failed (#4850) (by PGZXB)
  • [Error] Show warning when serialize=True is set on a struct for (#4844) (by Lin Jiang)
  • [lang] Group related Matrix methods closer (#4836) (by yekuang)
  • [Lang] Add single character property for vector swizzle && test (#4845) (by Zhao Liang)
  • [Bug] [ir] Fix and refactor type check for local stores (#4843) (by Yi Xu)
  • [Error] Provide source code info in warnings (#4840) (by Yi Xu)
  • [misc] Update pre-commit hooks (#4713) (by pre-commit-ci[bot])
  • [Bug] [ir] Fix implicit cast warning for global stores (#4834) (by Yi Xu)
  • [mesh] Remove link hints from ti.Mesh (#4825) (by yixu)
  • [Lang] Remove obsolete vectypes class (#4831) (by LiangZhao)
  • [doc] Fix doc link (#4835) (by yekuang)
  • [Doc] LLVM sparse runtime design doc (#4790) (by yekuang)
  • [Lang] Add support for keyword arguments (#4794) (by Lin Jiang)
  • [Lang] Support swizzles on all Matrix/Vector types (#4828) (by yekuang)
  • [test] Add simple test for offline-cache-key of compile-config (#4805) (by PGZXB)
  • [vulkan] Device API blending (#4815) (by Bob Cao)
  • [spirv] Fix int casts (#4814) (by Bob Cao)
  • [gui] Only call ImGui_ImplVulkan_Shutdown if it's initialized (#4827) (by Ailing)
  • [ci] Use a new PAT for project with org permission (#4826) (by Frost Ming)
  • [Lang] Add 2d and 3d rotation functions to math module (#4822) (by Zhao Liang)
  • [Doc] Proofread Getting started (#4682) (by Vissidarte-Herman)
  • [Doc] Editorial review to fields (advanced) (#4686) (by Vissidarte-Herman)
  • [bug] Fix bug that building with gcc9.4 will fail (#4823) (by PGZXB)
  • [SIMT] Add shfl_down_f32 intrinsic. (#4819) (by Chun Cai)
  • [workflow] Add issues to project when issue opened (#4816) (by Frost Ming)
  • [vulkan] Fix vulkan initialization on macOS with cpu backend (#4813) (by Bob Cao)
  • [Doc] [mesh] Update docstring for ti.Mesh (#4818) (by Chang Yu)
  • [vulkan] Fix Vulkan device score bug (#4803) (by Andrew Sun)
  • [Lang] Walkaround Vulkan backend behavior which changes cwd on Mac (#4812) (by TiGeekMan)
  • [misc] Add SNode to offline-cache key (#4716) (by PGZXB)
  • [Lang] Add mod function to math module (#4809) (by Zhao Liang)
  • [doc] Fix doc of running C++ tests (#4798) (by Yi Xu)
  • [Lang] Support in-place operator of ti.Matrix in python scope (#4799) (by Lin Jiang)
  • [Lang] [ir] Move short-circuit boolean logic into AST-to-IR passes (#4580) (by daylily)
  • [lang] Fix frontend type check for sqrt, log, exp (#4797) (by Yi Xu)
  • [Doc] Remove redundant semicolon in path (#4801) (by gaoxinge)
  • [Lang] [ir] Promote output type of log, exp, and sqrt ops (#4622) (by Andrew Sun)
  • [ci] Update ci images to use latest git (#4792) (by Bo Qiao)
  • [Lang] Fix integral type promotion rules (e.g., u8 + u8 now leads to u8 instead of i32) (#4789) (by Yuanming Hu)
  • [Lang] Add basic complex arithmetic and add a mandelbrot example (#4780) (by Zhao Liang)
  • Update index.md (#4791) (by Bob Cao)
  • [spirv] Add 16 bit float immediate number (#4787) (by Bob Cao)
  • [ci] Update ubuntu 18.04 image to use latest git (#4785) (by Frost Ming)
  • [lang] Store relations with 16-bit type (#4779) (by Chang Yu)
  • [Autodiff] Implement ti.ad.no_grad to skip autograd (#4751) (by Shawn Yao)
  • [misc] Remove some unnecessary attributes from offline-cache key of compile-config (#4770) (by PGZXB)
  • [doc] Update install instruction with "--upgrade" (#4775) (by Yuanming Hu)
  • Expose VboHelpers class (#4773) (by Ailing)
  • Bump version to v1.0.1 (#4774) (by Taichi Gardener)
  • [refactor] Merge Kernel.argument_names and argument_annotations (#4753) (by dongqi shen)
  • [dx11] Constant buffer binding and AtomicIncrement in RAND_STATE (#4650) (by quadpixels)

v1.0.0

13 Apr 04:36
6a15da8
Compare
Choose a tag to compare

v1.0.0 was released on April 13, 2022.

Compatibility changes

License change

Taichi's license is changed from MIT to Apache-2.0 after a public vote in #4607.

Python 3.10 support

This release supports Python 3.10 on all supported operating systems (Windows, macOS, and Linux).

Manylinux2014-compatible wheels

Before v1.0.0, Taichi works only on Linux distributions that support glibc 2.27+ (for example Ubuntu 18.04+). As of v1.0.0, in addition to the normal Taichi wheels, Taichi provides the manylinux2014-compatible wheels to work on most modern Linux distributions, including CentOS 7.

  • The normal wheels support all backends; the incoming manylinux2014-compatible wheels support the CPU and CUDA backends only. Choose the wheels that work best for you.
  • If you encounter any issue when installing the wheels, try upgrading your pip to the latest version first.

Deprecations

  • This release deprecates ti.ext_arr() and uses ti.types.ndarray() instead. ti.types.ndarray() supports both Taichi Ndarrays and external arrays, for example NumPy arrays.
  • Taichi plans to drop support for Python 3.6 in the next minor release (v1.1.0). If you have any questions or concerns, please let us know at #4772.

New features

Non-Python deployment solution

By working together with OPPO US Research Center, Taichi delivers Taichi AOT, a solution for deploying kernels in non-Python environments, such as in mobile devices.

Compiled Taichi kernels can be saved from a Python process, then loaded and run by the provided C++ runtime library. With a set of APIs, your Python/Taichi code can be easily deployed in any C++ environment. We demonstrate the simplicity of this workflow by porting the implicit FEM (finite element method) demo released in v0.9.0 to an Android application. Download the Android package and find out what Taichi AOT has to offer! If you want to try out this solution, please also check out the taichi-aot-demo repo.

# In Python app.py
module = ti.aot.Module(ti.vulkan) 
module.add_kernel(my_kernel, template_args={'x': x})
module.save('my_app')

The following code snippet shows the C++ workflow for loading the compiled AOT modules.

// Initialize Vulkan program pipeline
taichi::lang::vulkan::VulkanDeviceCreator::Params evd_params;
evd_params.api_version = VK_API_VERSION_1_2;
auto embedded_device =
    std::make_unique<taichi::lang::vulkan::VulkanDeviceCreator>(evd_params);

std::vector<uint64_t> host_result_buffer;
host_result_buffer.resize(taichi_result_buffer_entries);
taichi::lang::vulkan::VkRuntime::Params params;
params.host_result_buffer = host_result_buffer.data();
params.device = embedded_device->device();
auto vulkan_runtime = std::make_unique<taichi::lang::vulkan::VkRuntime>(std::move(params));

// Load AOT module saved from Python
taichi::lang::vulkan::AotModuleParams aot_params{"my_app", vulkan_runtime.get()};
auto module = taichi::lang::aot::Module::load(taichi::Arch::vulkan, aot_params);
auto my_kernel = module->get_kernel("my_kernel");

// Allocate device buffer
taichi::lang::Device::AllocParams alloc_params;
alloc_params.host_write = true;
alloc_params.size = /*Ndarray size for `x`*/;
alloc_params.usage = taichi::lang::AllocUsage::Storage;
auto devalloc_x = embedded_device->device()->allocate_memory(alloc_params);

// Execute my_kernel without Python environment
taichi::lang::RuntimeContext host_ctx;
host_ctx.set_arg_devalloc(/*arg_id=*/0, devalloc_x, /*shape=*/{128}, /*element_shape=*/{3, 1});
my_kernel->launch(&host_ctx);

Note that Taichi only supports the Vulkan backend in the C++ runtime library. The Taichi team is working on supporting more backends.

Real functions (experimental)

All Taichi functions are inlined into the Taichi kernel during compile time. However, the kernel becomes lengthy and requires longer compile time if it has too many Taichi function calls. This becomes especially obvious if a Taichi function involves compile-time recursion. For example, the following code calculates the Fibonacci numbers recursively:

@ti.func
def fib_impl(n: ti.template()):
    if ti.static(n <= 0):
        return 0
    if ti.static(n == 1):
        return 1
    return fib_impl(n - 1) + fib_impl(n - 2)

@ti.kernel
def fibonacci(n: ti.template()):
    print(fib_impl(n))

In this code, fib_impl() recursively calls itself until n reaches 1 or 0. The total time of the calls to fib_impl() increases exponentially as n grows, so the length of the kernel also increases exponentially. When n reaches 25, it takes more than a minute to compile the kernel.

This release introduces "real function", a new type of Taichi function that compiles independently instead of being inlined into the kernel. It is an experimental feature and only supports scalar arguments and scalar return value for now.

You can use it by decorating the function with @ti.experimental.real_func. For example, the following is the real function version of the code above.

@ti.experimental.real_func
def fib_impl(n: ti.i32) -> ti.i32:
    if n <= 0:
        return 0
    if n == 1:
        return 1
    return fib_impl(n - 1) + fib_impl(n - 2)

@ti.kernel
def fibonacci(n: ti.i32):
    print(fib_impl(n))

The length of the kernel does not increase as n grows because the kernel only makes a call to the function instead of inlining the whole function. As a result, the code takes far less than a second to compile regardless of the value of n.

The main differences between a normal Taichi function and a real function are listed below:

  • You can write return statements in any part of a real function, while you cannot write return statements inside the scope of non-static if / for / while statements in a normal Taichi function.
  • A real function can be called recursively at runtime, while a normal Taichi function only supports compile-time recursion.
  • The return value and arguments of a real function must be type hinted, while the type hints are optional in a normal Taichi function.

Type annotations for literals

Previously, you cannot explicitly give a type to a literal. For example,

@ti.kernel
def foo():
    a = 2891336453  # i32 overflow (>2^31-1)

In the code snippet above, 2891336453 is first turned into a default integer type (ti.i32 if not changed). This causes an overflow. Starting from v1.0.0, you can write type annotations for literals:

@ti.kernel
def foo():
    a = ti.u32(2891336453)  # similar to 2891336453u in C

Top-level loop configurations

You can use ti.loop_config to control the behavior of the subsequent top-level for-loop. Available parameters are:

  • block_dim: Sets the number of threads in a block on GPU.
  • parallelize: Sets the number of threads to use on CPU.
  • serialize: If you set serialize to True, the for-loop runs serially, and you can write break statements inside it (Only applies on range/ndrange for-loops). Setting serialize to True Equals setting parallelize to 1.

Here are two examples:

@ti.kernel
def break_in_serial_for() -> ti.i32:
    a = 0
    ti.loop_config(serialize=True)
    for i in range(100):  # This loop runs serially
        a += i
        if i == 10:
            break
    return a

break_in_serial_for()  # returns 55
n = 128
val = ti.field(ti.i32, shape=n)

@ti.kernel
def fill():
    ti.loop_config(parallelize=8, block_dim=16)
    # If the kernel is run on the CPU backend, 8 threads will be used to run it
    # If the kernel is run on the CUDA backend, each block will have 16 threads
    for i in range(n):
        val[i] = i

math module

This release adds a math module to support GLSL-standard vector operations and to make it easier to port GLSL shader code to Taichi. For example, vector types, including vec2, vec3, vec4, mat2, mat3, and mat4, and functions, including mix(), clamp(), and smoothstep(), act similarly to their counterparts in GLSL. See the following examples:

Vector initialization and swizzling

You can use the rgba, xyzw, uvw properties to get and set vector entries:

import taichi.math as tm

@ti.kernel
def example():
    v = tm.vec3(1.0)  # (1.0, 1.0, 1.0)
    w = tm.vec4(0.0, 1.0, 2.0, 3.0)
    v.rgg += 1.0  # v = (2.0, 3.0, 1.0)
    w.zxy += tm.sin(v)

Matrix multiplication

Each Taichi vector is implemented as a column vector. Ensure that you put the the matrix before the vector in a matrix multiplication.

@ti.kernel
def example():
    M = ti.Matrix([[1, 0, 0], [0, 1, 0], [0, 0, 1]])
    v = tm.vec3(1, 2, 3)
    w = (M @ v).xyz  # [1, 2, 3]

GLSL-standard functions

@ti.kernel
def example():
    v = tm.vec3(0., 1., 2.)
    w = tm.smoothstep(0.0, 1.0, v.xyz)
    w = tm.clamp(w, 0.2, 0.8)

CLI command ti gallery

This release introduces a CLI command ti gallery, allowing you to select and run Taichi examples in a pop-up window. To do so:

  1. Open a terminal:
ti gallery

A window pops up:

Read more

v0.9.2

23 Mar 06:33
Compare
Choose a tag to compare

Highlights:

  • CI/CD workflow
    • Generate manylinux2014-compatible wheels with CUDA backend in release workflow (#4550) (by Yi Xu)
  • Command line interface
    • Fix a few bugs in taichi gallery command (#4548) (by Zhao Liang)
  • Documentation
    • Fixed broken links. (#4563) (by Vissidarte-Herman)
    • Refactored README.md (#4549) (by Vissidarte-Herman)
    • Create CODE_OF_CONDUCT (#4564) (by notginger)
    • Update syntax.md (#4557) (by Vissidarte-Herman)
    • Update docstring for ndrange (#4486) (by Zhao Liang)
    • Minor updates: It is recommended to type hint arguments and return values (#4510) (by Vissidarte-Herman)
    • Refactored Kernels and functions. (#4496) (by Vissidarte-Herman)
    • Add initial variable and fragments (#4457) (by Justin)
  • Language and syntax
    • Add taichi gallery command for user to choose and run example in gui (#4532) (by TiGeekMan)
    • Add ti.serialize and ti.loop_config (#4525) (by Lin Jiang)
    • Support simple matrix slicing (#4488) (by Xiangyun Yang)
    • Remove legacy ways to construct matrices (#4521) (by Yi Xu)

Full changelog:

  • [lang] Replace keywords in python (#4606) (by Jiasheng Zhang)
  • [lang] Fix py36 block_dim bug (#4601) (by Jiasheng Zhang)
  • [ci] Fix release script bug (#4599) (by Jiasheng Zhang)
  • [aot] Support return in vulkan aot (#4593) (by Ailing)
  • [ci] Release script add test for tests/python/examples (#4590) (by Jiasheng Zhang)
  • [misc] Write version info right after creation of uuid (#4589) (by Jiasheng Zhang)
  • [gui] Make GGUI VBO configurable (#4575) (by Ye Kuang)
  • [test] Fix ill-formed test_binary_func_ret (#4587) (by Yi Xu)
  • Update differences_between_taichi_and_python_programs.md (#4583) (by Vissidarte-Herman)
  • [misc] Fix a few warnings (#4572) (by Ye Kuang)
  • [aot] Remove redundant module_path argument (#4573) (by Ailing)
  • [bug] [opt] Fix some bugs when deal with real function (#4568) (by Xiangyun Yang)
  • [build] Guard llvm usage inside TI_WITH_LLVM (#4570) (by Ailing)
  • [aot] [refactor] Add make_new_field for Metal (#4559) (by Bo Qiao)
  • [llvm] [lang] Add support for multiple return statements in real function (#4536) (by Lin Jiang)
  • [test] Add test for offline-cache (#4562) (by PGZXB)
  • Format updates (#4567) (by Vissidarte-Herman)
  • [aot] Add KernelTemplate interface (#4558) (by Ye Kuang)
  • [test] Eliminate the warnings in test suite (#4556) (by Frost Ming)
  • [Doc] Fixed broken links. (#4563) (by Vissidarte-Herman)
  • [Doc] Refactored README.md (#4549) (by Vissidarte-Herman)
  • [Doc] Create CODE_OF_CONDUCT (#4564) (by notginger)
  • [misc] Reset counters in Program::finalize() (#4561) (by PGZXB)
  • [misc] Add TI_CI env to CI/CD (#4551) (by Jiasheng Zhang)
  • [ir] Add basic tests for Block (#4553) (by Ye Kuang)
  • [refactor] Fix error message (#4552) (by Ye Kuang)
  • [Doc] Update syntax.md (#4557) (by Vissidarte-Herman)
  • [gui] Hack to make GUI.close() work on macOS (#4555) (by Ye Kuang)
  • [aot] Fix get_kernel API semantics (#4554) (by Ye Kuang)
  • [opt] Support offline-cache for kernel with arch=cpu (#4500) (by PGZXB)
  • [CLI] Fix a few bugs in taichi gallery command (#4548) (by Zhao Liang)
  • [ir] Small optimizations to codegen (#4442) (by Bob Cao)
  • [CI] Generate manylinux2014-compatible wheels with CUDA backend in release workflow (#4550) (by Yi Xu)
  • [misc] Metadata update (#4539) (by Jiasheng Zhang)
  • [test] Parametrize the test cases with pytest.mark (#4546) (by Frost Ming)
  • [Doc] Update docstring for ndrange (#4486) (by Zhao Liang)
  • [build] Default symbol visibility to hidden for all targets (#4545) (by Gabriel H)
  • [autodiff] Handle multiple, mixed Independent Blocks (IBs) within multi-levels serial for-loops (#4523) (by Mingrui Zhang)
  • [bug] [lang] Cast the arguments of real function to the desired types (#4538) (by Lin Jiang)
  • [Lang] Add taichi gallery command for user to choose and run example in gui (#4532) (by TiGeekMan)
  • [bug] Fix bug that calling std::getenv when cpp-tests running will fail (#4537) (by PGZXB)
  • [vulkan] Fix performance (#4535) (by Bob Cao)
  • [Lang] Add ti.serialize and ti.loop_config (#4525) (by Lin Jiang)
  • [Lang] Support simple matrix slicing (#4488) (by Xiangyun Yang)
  • Update vulkan_api.cpp (#4533) (by Bob Cao)
  • [lang] Quick fix for mesh_local analyzer (#4529) (by Chang Yu)
  • [test] Show arch info in the verbose test report (#4528) (by Frost Ming)
  • [aot] Add binding_id of root/gtmp/rets/args bufs to CompiledOffloadedTask (#4522) (by Ailing)
  • [vulkan] Relax a few test precisions for vulkan (#4524) (by Ailing)
  • [build] Option to use LLD (#4513) (by Bob Cao)
  • [misc] [linux] Implement XDG Base Directory support (#4514) (by ruro)
  • [Lang] [refactor] Remove legacy ways to construct matrices (#4521) (by Yi Xu)
  • [misc] Make result of irpass::print hold more information (#4517) (by PGZXB)
  • [refactor] Misc improvements over AST helper functions (#4398) (by daylily)
  • [misc] [build] Bump catch external library 2.13.3 -> 2.13.8 (#4516) (by ruro)
  • [autodiff] Reduce the number of ad stack using knowledge of derivative formulas (#4512) (by Mingrui Zhang)
  • [ir] [opt] Fix a bug about 'continue' stmt in cfg_build (#4507) (by Xiangyun Yang)
  • [Doc] Minor updates: It is recommended to type hint arguments and return values (#4510) (by Vissidarte-Herman)
  • [ci] Fix the taichi repo name by hardcode (#4506) (by Frost Ming)
  • [build] Guard dx lib search with TI_WITH_DX11 (#4505) (by Ailing)
  • [ci] Reduce the default device memory usage for GPU tests (#4508) (by Bo Qiao)
  • [Doc] Refactored Kernels and functions. (#4496) (by Vissidarte-Herman)
  • [aot] [refactor] Refactor AOT field API for Vulkan (#4490) (by Bo Qiao)
  • [ci] Fix: fill in the pull request body created by bot (#4503) (by Frost Ming)
  • [ci] Skip in steps rather than the whole job (#4499) (by Frost Ming)
  • [ci] Add a Dockerfile for building manylinux2014-compatible Taichi wheels with CUDA backend (#4491) (by Yi Xu)
  • [ci] Automate release publishing (#4428) (by Frost Ming)
  • [fix] dangling ti.func decorator in euler.py (#4492) (by Zihua Wu)
  • [ir] Fix a bug in simplify pass (#4489) (by Xiangyun Yang)
  • [test] Add test for recursive real function (#4477) (by Lin Jiang)
  • [Doc] Add initial variable and fragments (#4457) (by Justin)
  • [misc] Add a convenient script for testing compatibility of Taichi releases. (#4485) (by Chengchen(Rex) Wang)
  • [misc] Version bump: v0.9.1 -> v0.9.2 (#4484) (by Chengchen(Rex) Wang)
  • [ci] Update gpu docker image to test python 3.10 (#4472) (by Bo Qiao)