Releases: taichi-dev/taichi
v1.7.2
Highlights:
- Bug fixes
- Build system
- Language and syntax
Full changelog:
- [misc] Bump version to v1.7.2 (by Proton)
- [aot] Add stream_ variable for CUDAContext to use a specific CUDA stream to launch CUDA kernel (#8579) (by Sichao He)
- [Build] Lift macOS min compat version to Big Sur (#8583) (by Proton)
- [ci] Drop nvidia driver 510 support (already EOL'd) (#8582) (by Proton)
- [Build] Drop manylinux2014 wheel support (#8581) (by Proton)
- [Bug] Fix Loop-Invariant-Cache for dynamic indexed pointers (#8577) (by Zhanlue Yang)
- [bug] Fix assign may lose precision warning & improve related logging (#8553) (by Bob Cao)
- [bug] Fixes for numpy 2.0 (unblocking python 3.12 release build on mac) (#8552) (by Bob Cao)
- [doc] Fix typo: 'inheritence' -> 'inheritance' (#8551) (by 3n3l)
- [misc] Ensure succeeded variable is properly initialized in matrix-free solvers (#8484) (by liblaf)
- [Bug] Fix bug to disable taichi header print (#8517) (by Yong-Chao Wu)
- [misc] Add conversions for unsigned types, torch > 2.3.0 (#8528) (by Oliver Batchelor)
- [doc] Fix typo & missing typedef in math/math_module.md (#8541) (by Jingwei Xu)
- [ci] Remove driver470, add driver 550 (#8546) (by Proton)
- [misc] Bump spdlog version and fix unformattable error (#8543) (by Bob Cao)
- [build] Fix build.py bootstrap corner cases (#8544) (by Proton)
- [ci] Force TI_USE_GIT_CACHE on (#8545) (by Proton)
- [Lang] Migrate irpass::force_scalarize_matrix() beforehand (#8532) (by Zhanlue Yang)
- [bug] Fix offline cache emit dependencies (#8510) (by Mingrui Zhang)
- [Lang] Add config.force_scalarize_matrix to avoid perf-regression in certain scenario (#8509) (by Zhanlue Yang)
v1.7.1
Highlights:
- Bug fixes
- Fix CFG aliasing error with matrix of matrix (#8445) (by Zhanlue Yang)
- Documentation
- Miscellaneous
- Bump version to 1.7.1 (by Haidong Lan)
- Bump taichi version to v1.8.0 (#8458) (by Zhanlue Yang)
Full changelog:
- [Misc] Bump version to 1.7.1 (by Haidong Lan)
- [bug] Fix abs on unsigned types (#8476) (by Lin Jiang)
- [Doc] Update offset.md (#8470) (by Kenshi Takayama)
- [Doc] Update math_module.md (#8471) (by Kenshi Takayama)
- [Doc] Update accelerate_pytorch.md | Fix typo in recap: Eeasy -> Easy (#8475) (by Aryan Garg)
- [Misc] Bump taichi version to v1.8.0 (#8458) (by Zhanlue Yang)
- [lang] Warn about non-contiguous gradient tensors (#8450) (by Bob Cao)
- [autodiff] Fix the type of cmp statements in autodiff (#8452) (by Lin Jiang)
- [Bug] Fix CFG aliasing error with matrix of matrix (#8445) (by Zhanlue Yang)
- [misc] Add flag to disable taichi header print (#8413) (by Chaoming Wang)
v1.7.0
1. New features
1.1 Real Function
We are excited to announce the stabilization of the Real Function feature in Taichi Lang v1.7.0. Initially introduced as an experimental feature in v1.0.0, it has now matured with enhanced capabilities and usability.
Key Updates
- Decorator Change: The Real Function now uses
@ti.real_func
. The previous decorator,@ti.experimental.real_func
, is deprecated. - Performance Improvements: Real Functions, unlike Taichi inline functions (
@ti.func
), are compiled as separate entities, akin to CUDA's device functions. This separation allows for recursive runtime calls and significantly faster compilation. For instance, the Cornell box example's compilation time is reduced from2.34s
to1.01s
on an i9-11900K when switching from inline to real functions. - Enhanced Functionality: Real Functions support multiple return statements, offering greater flexibility in coding.
Limitations
- Backend Support: Real Functions are currently only compatible with LLVM-based backends, including CPU and CUDA.
- Parallel Loops: Writing parallel loops within Real Functions is not supported. However, if called within a parallel loop in a kernel, the Real Function will be parallelized accordingly.
Important Note on Usage: Ensure all arguments and return values in Real Functions are explicitly type-hinted.
Usage Example
The following example demonstrates the recursive capability of Real Functions. The sum_func
Real Function is used to calculate the sum of numbers from 1 to n, showcasing its ability to handle multiple return statements and variable recursion depths.
@ti.real_func
def sum_func(n: ti.i32) -> ti.i32:
if n == 0:
return 0
return sum_func(n - 1) + n
@ti.kernel
def sum(n: ti.i32) -> ti.i32:
return sum_func(n)
print(sum(100)) # 5050
You can find more examples of the real function in the repository.
1.2 Enhancements in Kernel Arguments and Return Values
Support for Multiple Return Values in Taichi Kernel:
In this update, we've introduced the capability to return multiple values from a Taichi kernel. This can be achieved by specifying a tuple as the return type. You can directly use (ti.f32, s0)
as the type hint or write the type hint in Python manner like typing.Tuple[ti.f32, s0]
or for Python 3.9 and above, tuple[ti.f32, s0]
. The following example illustrates this new feature:
s0 = ti.types.struct(a=ti.math.vec3, b=ti.i16)
@ti.real_func
def foo() -> (ti.f32, s0):
return 1, s0(a=ti.math.vec3([100, 0.5, 3]), b=1)
@ti.kernel
def bar() -> (ti.f32, s0):
return foo()
ret1, ret2 = bar()
print(ret1) # 1.0
print(ret2) # {'a': [100.0, 0.5, 3.0], 'b': 1}
Removal of Size Limit on Kernel Arguments and Return Values:
We have eliminated the size restrictions on kernel arguments and return values. However, it's crucial to remember that keeping these small is advisable. Large argument or return value sizes can lead to substantially longer compile times. While we support larger sizes, we haven't thoroughly tested arguments and return values exceeding 4KB and cannot guarantee their flawless functionality.
1.3 Argument Pack
Taichi now introduces a powerful feature for developers - Argument Packs. This new functionality enables efficient caching of unchanged parameters between multiple kernel calls, which not only provides convenience when launching a kernel, but also boosts the performance.
Key Advantages
- Argument Pack: User-defined data types that encapsulate multiple parameters into a single, manageable unit.
- Buffering Capability: Store and reuse parameters that remain constant across kernel calls, reducing the overhead of repeated parameter passing.
- Device-level Caching: Taichi optimizes performance by caching argpacks directly on the device.
Usage Example
import taichi as ti
ti.init()
# Defining a custom argument type using "ti.types.argpack"
view_params_tmpl = ti.types.argpack(view_mtx=ti.math.mat4, proj_mtx=ti.math.mat4, far=ti.f32)
# Declaration of a Taichi kernel leveraging Argument Packs
@ti.kernel
def p(view_params: view_params_tmpl) -> ti.f32:
return view_params.far
# Instantiation of the argument pack
view_params = view_params_tmpl(
view_mtx=ti.math.mat4(
[[1, 0, 0, 0],
[0, 1, 0, 0],
[0, 0, 1, 0],
[0, 0, 0, 1]]),
proj_mtx=ti.math.mat4(
[[1, 0, 0, 0],
[0, 1, 0, 0],
[0, 0, 1, 0],
[0, 0, 0, 1]]),
far=1)
# Executing the kernel with the Argument Pack
print(p(view_params)) # Outputs: 1.0
Supported Data Types
Argument Packs are currently compatible with a variety of data types, including scalar
, matrix
, vector
, Ndarray
, and Struct
.
Limitations
Please note that Argument Packs currently do not support the following features and data types:
- Ahead-of-Time (AOT) Compilation and Compute Graph
ti.template
ti.data_oriented
2. Improvements
2.1 CUDA Memory Allocation Improvements
Dynamic VRAM Allocation:
- In our latest update, the CUDA backend has been optimized to dynamically allocate Video RAM (VRAM), significantly reducing the initial preallocation requirement. Now, less than 50MB is preallocated upon
ti.init
.
Changes in device_memory_GB
and device_memory_fraction
Usage:
- These settings are now specifically tailored for preallocating memory for SPARSE data structures, such as
ti.pointer
. This preallocation occurs only once a Sparse data structure is detected in your code.
Impact on VRAM Consumption:
- Users can expect a noticeable decrease in VRAM usage with these enhancements. For instance:
diffmpm3d: 3866MB --> 3190 MB
nerf_train_deploy: 5618MB --> 4664 MB
2.2 CUDA SIMT APIs
Added the following ti.simt.block
APIs:
ti.simt.block.sync_any_nonzero
ti.simt.block.sync_all_nonzero
ti.simt.block.sync_count_nonzero
2.3 Sparse grid APIs
Added helper function to create a 2D/3D sparse grid, for example:
# create a 2D sparse grid
grid = ti.sparse.grid(
{
"pos": ti.math.vec2,
"mass": ti.f32,
"grid2particles": ti.types.vector(20, ti.i32),
},
shape=(10, 10),
)
# access
grid[0, 0].pos = ti.math.vec2(1, 2)
grid[0, 0].mass = 1.0
grid[0, 0].grid2particles[2] = 123
2.4 GGUI
- Added Metal backend support for GGUI
2.5 AOT
- Added C-APIs of
ti_import_cpu_memory()
andti_import_cuda_memory()
- Added support for multiple AOT runtime devices
- Added support for matrix/vector in compute graph in C-API
- Added support for matrix/vector in compute graph in Python
2.6 Error reporting
- Improved the quality and coverage of error messages
2.7 Autodiff
- supports passing vector/matrix arguments in autodiff kernel
- supports autodiff for torch Tensor and taichi ndarray on CPU and CUDA
- supports passing grad tensor to primal kernel
3. Bug Fixes
3.1 Autodiff Bugfixes
- Fixed a few bugs with use of
ti.ad.Tape
- Fixed a bug with random seed for loss
3.2 AOT Bugfixes
- Fixed a few bugs with compute graph
- Fixed a few bugs with C-API
3.3 API Bugfixes
- Fixed a bunch of bugs related to Matrix/Vector
- Fixed an error with Ndarray type check
- Fixed a few error with
taichi.math
APIs - Fixed an error with SNode destruction
- Fixed an error with dataclass support for struct with matrix
- Fixed an error with
ti.func
- Fixed a few errors with ti.struct and struct field
- Fixed a few errors with Sparse Matrix
3.4 Build & Environment Bugfixes
- Fixed a few compilation issues on Windows platform
- Fixed an issue with cusolver dependency
3.5 GGUI Bugfixes
- Fix
vec_to_euler
that breaks GGUI cameras & handle camera logic better - Fix for ImGui widget size on HiDPI
4. Deprecation Notice
- We have removed the CC backend because it is rarely used, and it lacks maintenance.
- We are deprecating
ti.experimental.real_func
because it is no longer experimental. Please useti.real_func
instead.
5. Full changelog
Highlights:
- **Bug fixes**
- Fix macro error with ti_import_cpu_memory (#8401) (by **Zhanlue Yang**)
- Fix argpack nesting issues (by **listerily**)
- Convert matrices to structs in argpack type members, Fixing layout error (by **listerily**)
- Fix error when returning a struct field member when the return … (#8271) (by **秋云未云**)
- Fix Erroneous handling of ndarray in real function in CFG (#8245) (by **Lin Jiang**)
- Fix issue with passing python-scope Matrix as ti.func argument (#8197) (by **Zhanlue Yang**)
- Fix incorrect CFG Graph structure due to missing Block wiith OffloadedStmts on LLVM backend (#8113) (by **Zhanlue Yang**)
- Fix type inference error with LowerMatrixPtr pass (#8105) (by **Zhanlue Yang**)
- Set initial value for Cuda device allocation (#8063) (by **Zhanlue Yang**)
- Fix the insertion position of the access chain (#7957) (by **Lin Jiang**)
- Fix wrong datatype size when writing to ndarray from Python scope (by **Ailing Zhang**)
- **CUDA backend**
- Warn driver version if it doesn't support memory pool. (#7912) (by **Haidong Lan**)
- **Documentation**
- Fixing typo in impl.py on ti.grouped function documentation (#8407) (by **Quentin Warnant**)
- Update doc about kernels and functions (#8400) (by **Lin Jiang**)
- Update documentation (#8089) (by **Zhao Liang**)
- Update docstring for inverse func (#8170) (by **Zhao Liang**)
- Update type.md, add descriptions of the vector (#8048) (by **Chenzhan Shang**)
- Fix a bug in faq.md (#7992) (by **Zhao Liang**)
...
v1.6.0
Deprecation Notice
- We removed some APIs that were deprecated a long time ago. See the table below:
Removed API | Replace with |
---|---|
Using atomic operations like a.atomic_add(b) | ti.atomic_add(a, b) or a += b |
Using is and is not inside Taichi kernel and Taichi function | Not supported |
Ndrange for loop with the number of the loop variables not equal to the dimension of the ndrange | Not supported |
ti.ui.make_camera() | ti.ui.Camera() |
ti.ui.Window.write_image() | ti.ui.Window.save_image() |
ti.SOA | ti.Layout.SOA |
ti.AOS | ti.Layout.AOS |
ti.print_profile_info | ti.profiler.print_scoped_profiler_info |
ti.clear_profile_info | ti.profiler.clear_scoped_profiler_info |
ti.print_memory_profile_info | ti.profiler.print_memory_profiler_info |
ti.CuptiMetric | ti.profiler.CuptiMetric |
ti.get_predefined_cupti_metrics | ti.profiler.get_predefined_cupti_metrics |
ti.print_kernel_profile_info | ti.profiler.print_kernel_profiler_info |
ti.query_kernel_profile_info | ti.profiler.query_kernel_profiler_info |
ti.clear_kernel_profile_info | ti.profiler.clear_kernel_profiler_info |
ti.kernel_profiler_total_time | ti.profiler.get_kernel_profiler_total_time |
ti.set_kernel_profiler_toolkit | ti.profiler.set_kernel_profiler_toolkit |
ti.set_kernel_profile_metrics | ti.profiler.set_kernel_profiler_metrics |
ti.collect_kernel_profile_metrics | ti.profiler.collect_kernel_profiler_metrics |
ti.VideoManager | ti.tools.VideoManager |
ti.PLYWriter | ti.tools.PLYWriter |
ti.imread | ti.tools.imread |
ti.imresize | ti.tools.imresize |
ti.imshow | ti.tools.imshow |
ti.imwrite | ti.tools.imwrite |
ti.ext_arr | ti.types.ndarray |
ti.any_arr | ti.types.ndarray |
ti.Tape | ti.ad.Tape |
ti.clear_all_gradients | ti.ad.clear_all_gradients |
ti.linalg.sparse_matrix_builder | ti.types.sparse_matrix_builder |
- We no longer deprecate the builtin min/max function in the Taichi kernel anymore.
- We deprecate some arguments in the declaration of the arguments of the compute graph, and they will be removed in v1.7.0. Including:
element_shape
argument for scalar and ndarrayshape
,channel_format
andnum_channels
arguments for texture
cc
backend will be removed at next release (v1.7.0
)
New features
Struct arguments
You can now use struct arguments in all backends. The structs can be nested, and it can contain matrices and vectors. Here's an example:
transform_type = ti.types.struct(R=ti.math.mat3, T=ti.math.vec3)
pos_type = ti.types.struct(x=ti.math.vec3, trans=transform_type)
@ti.kernel
def kernel_with_nested_struct_arg(p: pos_type) -> ti.math.vec3:
return p.trans.R @ p.x + p.trans.T
trans = transform_type(ti.math.mat3(1), [1, 1, 1])
p = pos_type(x=[1, 1, 1], trans=trans)
print(kernel_with_nested_struct_arg(p)) # [4., 4., 4.]
Ndarray
- Support 0 dim ndarray read & write in python scope
- Fixed a bug when writing into ndarray from Python scope
Improvements
- Support rsqrt operator in autodiff
- Added assembly printer for CPU backend Zhanlue Yang
- Supporting CUDA shared array allocation over 48KiB
Performance
- Improved vectorization support on CPU backend, with significant performance gains for specific applications
New Examples
- 2D euler fluid simulation example by Lee-abcde
Misc
- Python 3.11 support
ti.frexp
is supported on CUDA, Vulkan, Metal, OpenGL backends.ti.math.popcnt
intrinsic by Garry Ling- Fixed a memory leak issue during SNodeTree destruction Zhanlue Yang
- Added validation and improved error report for ti.Field finalization Zhanlue Yang
- Fixed a memory leak issue with Cuda backend in C-API Zhanlue Yang
- Added support for formatted printing with str.format() and f-strings Tianyi Liu
- Changed Python code formatter from
yapf
toblack
Developer Experience
- build.py script for preparing build & testing environment
Full changelog
Highlights:
- Bug fixes
- Fix wrong datatype size when writing to ndarray from Python scope (by Ailing Zhang)
- CUDA backend
- Documentation
- Add doc about struct arguments (#7959) (by Lin Jiang)
- Fix docstring of mix function (#7922) (by Zhao Liang)
- Update faq and ggui, and add them to CI (#7861) (by Zhao Liang)
- Update doc for dynamic snode (#7804) (by Zhao Liang)
- Update field.md (#7819) (by zhoooou)
- Update readme (#7808) (by yanqingzhang)
- Update write_test.md (#7745) (by Qian Bao)
- Update performance.md (#7720) (by Zhao Liang)
- Update readme (#7673) (by Zhao Liang)
- Update tutorial.md (#7512) (by Chenzhan Shang)
- Update gui_system.md (#7628) (by Qian Bao)
- Remove deprecated api docstrings (#7596) (by pengyu)
- Fix the cexp docstring (#7588) (by Zhao Liang)
- Add doc about returning struct (#7556) (by Lin Jiang)
- Error messages
- Update deprecation warning of the graph arguments (#7965) (by Lin Jiang)
- Language and syntax
- Remove deprecated funcs in init.py (#7941) (by Lin Jiang)
- Remove deprecated sparse_matrix_builder function (#7942) (by Lin Jiang)
- Remove deprecated funcs in ti.ui (#7940) (by Lin Jiang)
- Remove the support for 'is' (#7930) (by Lin Jiang)
- Raise error when the dimension of the ndrange does not equal to the number of the loop variable (#7933) (by Lin Jiang)
- Remove a.atomic(b) (#7925) (by Lin Jiang)
- Cancel deprecating native min/max (#7928) (by Lin Jiang)
- Let nested data classes have methods (#7909) (by Lin Jiang)
- Let kernel argument support matrix nested in a struct (by lin-hitonami)
- Support the functions of dataclass as kernel argument and return value (#7865) (by Lin Jiang)
- Fix a bug on PosixPath (#7860) (by Zhao Liang)
- Seprate out the scalarization for MatrixOfMatrixPtrStmt and MatrixOfGlobalPtrStmt (#7803) (by Zhanlue Yang)
- Fix pylance warning (#7805) (by Zhao Liang)
- Support taking structs as kernel arguments (by lin-hitonami)
- Fix math module circular import bugs (#7762) (by Zhao Liang)
- Support formatted printing in str.format() and f-strings (#7686) (by 魔法少女赵志辉)
- Replace internal representation of Python-scope ti.Matrix with numpy arrays (#7559) (by Yi Xu)
- Stop letting ti.Struct inherit from TaichiOperations (#7474) (by Yi Xu)
- Support writing sparse matrix as matrix market file (#7529) (by pengyu)
...
v1.5.0
Deprecation Notice
- ndarray no longer accepts field_dim, replaced by the ndim argument.
- [RFC] Deprecate ti.cc backend in favor of TiRT and its C API, if you have any concerns please let us know at #7629
New features
AOT
- Taichi Runtime (TiRT) now supports Apple's Metal API and OpenGL ES for compatibility on old mobile platforms. Now Taichi programs can be deployed to any mainstream consumer devices.
NOTE Taichi program deployment on mobile platforms is experimental. Please contact us at contact@taichi.graphics for long-term services. - Taichi AOT now fully supports float16 dtype.
Ndarray
- Out of bound check is now supported on ndarrays
Improvements
Python Frontend
We now support returning a struct on LLVM-based backends (CPU and CUDA backend). The struct can contain vectors and matrices, and it can also nest with other structs. Here's an example.
s0 = ti.types.struct(a=ti.math.vec3, b=ti.i16)
s1 = ti.types.struct(a=ti.f32, b=s0)
@ti.kernel
def foo() -> s1:
return s1(a=1, b=s0(a=ti.math.vec3(100, 0.2, 3), b=1))
print(foo()) # {'a': 1.0, 'b': {'a': [100.0, 0.2, 3.0], 'b': 1}}
Performance
- Support atomic operation on half2 for CUDA backend (with compute capability > 60). You can enable this with ti.init(half2_vectorization=True). This feature could effectively accelerate the Nerf training process, please refer to this repo for details.
GGUI
- GGUI now has no computing backend restrictions! You can now use Metal, OpenGL, AMDGPU, or DirectX 11, in addition to CPU, CUDA, Vulklan that's previously suported by GGUI.
- GGUI now has been validated on mesa's software rasterizer lavapipe, you can utilize this solution for headless server visualization, or on servers with no graphics capabilities (such as A100)
- Add the fps_limit option which adjusts the maximal frame rate in GGUI.
Full changelog:
Highlights:
- **AMDGPU backend**
- Enable shared array on amdgpu backend (#7403) (by **Zeyu Li**)
- Add print kernel amdgcn (#7357) (by **Zeyu Li**)
- Add amdgpu backend profiler (#7330) (by **Zeyu Li**)
- **Aot module**
- Let AOT kernel inherit CallableBase and use LaunchContextBuilder (by **lin-hitonami**)
- Deprecate element shape and field dim for AOT symbolic args (#7100) (by **Haidong Lan**)
- **Bug fixes**
- Fix copy_from() of StructField (#7294) (by **Yi Xu**)
- Fix caching same loop invariant global vars inside nested fors (#7285) (by **Lin Jiang**)
- Fix num_splits in parallel_struct_for (#7121) (by **Yi Xu**)
- Fix ret_type and cast_type of UnaryOpStmt in Scalarize (#7082) (by **Yi Xu**)
- **Documentation**
- Update GGUI docs with correct API (#7525) (by **pengyu**)
- Fix typos and improve example code in data_oriented_class.md (#7520) (by **pengyu**)
- Update gui_system.md, remove unnecessary example (#7487) (by **NextoneX**)
- Fix typo in API doc (#7511) (by **pengyu**)
- Update math_module (#7405) (by **Zhao Liang**)
- Update hello_world.md (#7400) (by **Zhao Liang**)
- Update debugging.md (#7401) (by **Zhao Liang**)
- Update hello_world.md (#7380) (by **Zhao Liang**)
- Update type.md (#7376) (by **Zhao Liang**)
- Update kernel_function.md (#7375) (by **Zhao Liang**)
- Update hello_world.md (#7369) (by **Zhao Liang**)
- Update hello_world.md (#7368) (by **Zhao Liang**)
- Update data_oriented_class.md (#6790) (by **Zhao Liang**)
- Update hello_world.md (#7367) (by **Zhao Liang**)
- Update kernel_function.md (#7364) (by **Zhao Liang**)
- Update hello_world.md (#7354) (by **Zhao Liang**)
- Update llvm_sparse_runtime.md (#7323) (by **Gabriel Vainer**)
- Update profiler.md (#7358) (by **Zhao Liang**)
- Update kernel_function.md (#7356) (by **Zhao Liang**)
- Update tut.md (#7352) (by **Gabriel Vainer**)
- Update type.md (#7350) (by **Zhao Liang**)
- Update hello_world.md (#7337) (by **Zhao Liang**)
- Update append docstring (#7265) (by **Zhao Liang**)
- Update ndarray.md (#7236) (by **Gabriel Vainer**)
- Update llvm_sparse_runtime.md (#7215) (by **Zhao Liang**)
- Remove doc tutorial (#7198) (by **Olinaaaloompa**)
- Rename tutorial doc (#7186) (by **Zhao Liang**)
- Update tutorial.md (#7176) (by **Zhao Liang**)
- Update math_module.md (#7175) (by **Zhao Liang**)
- Update debugging.md (#7173) (by **Zhao Liang**)
- Fix C++ tutorial does not display on doc site (#7174) (by **Zhao Liang**)
- Update doc regarding dynamic index (#7148) (by **Yi Xu**)
- Move glossary to top level (#7118) (by **Zhao Liang**)
- Update type.md (#7038) (by **Zhao Liang**)
- Fix docstring (#7065) (by **Zhao Liang**)
- **Error messages**
- Allow IfExp on matrices when the condition is scalar (#7241) (by **Lin Jiang**)
- Remove deprecations in ti.ui in 1.6.0 (#7229) (by **Lin Jiang**)
- Remove deprecated ti.linalg.sparse_matrix_builder in 1.6.0 (#7228) (by **Lin Jiang**)
- Remove deprecations in ASTTransformer in 1.6.0 (#7226) (by **Lin Jiang**)
- Remove deprecated a.atomic_op(b) in Taichi v1.6.0 (#7225) (by **Lin Jiang**)
- Remove deprecations in taichi/__init__.py in v1.6.0 (#7222) (by **Lin Jiang**)
- Raise error when using deprecated ifexp on matrices (#7224) (by **Lin Jiang**)
- Better error message when creating sparse snodes on backends that do not support sparse (#7191) (by **Lin Jiang**)
- Raise errors when using metal sparse (#7113) (by **Lin Jiang**)
- **GUI**
- GGUI use shader "factory" (GGUI rework n/N) (#7271) (by **Bob Cao**)
- **Intermediate representation**
- Unified type system for internal operations (#6337) (by **daylily**)
- **Language and syntax**
- Keep ti.pyfunc (#7530) (by **Lin Jiang**)
- Type check assignments between tensors (#7480) (by **Yi Xu**)
- Fix pylance warnings raised by ti.static (#7437) (by **Zhao Liang**)
- Deprecate arithmetic operations and fill() on ti.Struct (#7456) (by **Yi Xu**)
- Fix pylance warnnings by ti.random (#7439) (by **Zhao Liang**)
- Fix pylance types warning (#7417) (by **Zhao Liang**)
- Add better error message for dynamic snode (#7238) (by **Zhao Liang**)
- Simplify the swizzle generator (#7216) (by **Zhao Liang**)
- Remove the deprecated dynamic_index switch (#7195) (by **Yi Xu**)
- Remove deprecated packed switch (#7104) (by **Yi Xu**)
- Raise errors when using the packed switch (#7125) (by **Yi Xu**)
- Fix cannot use taichi in REPL (#7114) (by **Zhao Liang**)
- Remove deprecated ti.Matrix.rotation2d() (#7098) (by **Yi Xu**)
- Remove filename kwarg in aot Module save() (#7085) (by **Ailing**)
- Remove sourceinspect deprecation warning message (#7081) (by **Zhao Liang**)
- Make slicing a single row/column of a matrix return a vector (#7068) (by **Yi Xu**)
- **Miscellaneous**
- Strictly check ndim with external array (#7126) (by **Haidong Lan**)
Full changelog:
- [cc] Add deprecation notice for cc backend (#7651) (by **Ailing**)
- [misc] Cherry pick struct return related commits (#7575) (by **Haidong Lan**)
- [Lang] Keep ti.pyfunc (#7530) (by **Lin Jiang**)
- [bug] Fix symbol conflicts with taichi_cpp_tests (#7528) (by **Zhanlue Yang**)
- [bug] Fix numerical issue with TensorType'd arithmetics (#7526) (by **Zhanlue Yang**)
- [aot] Enable Metal AOT test (#7461) (by **PENGUINLIONG**)
- [Doc] Update GGUI docs with correct API (#7525) (by **pengyu**)
- [misc] Implement KernelCompialtionManager::clean_offline_cache (#7515) (by **PGZXB**)
- [ir] Except shared array from demote atomics pass. (#7513) (by **Haidong Lan**)
- [bug] Fix error with windows-clang compilation for cuda_runtime.cu (#7519) (by **Zhanlue Yang**)
- [misc] Deprecate field dim and update deprecation warnings (#7491) (by **Haidong Lan**)
- [build] Fix build failure without nvcc (#7521) (by **Ailing**)
- [Doc] Fix typos and improve example code in data_oriented_class.md (#7520) (by **pengyu**)
- [aot] Kernel argument count limit (#7518) (by **PENGUINLIONG**)
- [Doc] Update gui_system.md, remove unnecessary example (#7487) (by **NextoneX**)
- [AOT] [llvm] Let AOT kernel inherit CallableBase and use LaunchContextBuilder (by **lin-hitonami**)
- [llvm] Let the offline cache record the type info of arguments and return values (by **lin-hitonami**)
- [ir] Separate LaunchContextBuilder from Kernel (by **lin-hitonami**)
- [Doc] Fix typo in API doc (#7511) (by **pengyu**)
- [aot] Build Runtime C-API by default (#7508) (by **PENGUINLIONG**)
- [bug] Fix run_tests.py --with-offline-cache (#7507) (by **PGZXB**)
- [vulkan] Support printing constant strings containing % (#7499) (by **魔法少女赵志辉**)
- [ci] Fix nightly version number, 2nd try (#7501) (by **Proton**)
- [aot] Fixed memory leak in metal backend (#7500) (by **PENGUINLIONG**)
- [ci] Fix nightly version number issue (#7498) (by **Proton**)
- [example] Remove cv2, cairo dependency (#7496) (by **Zhao Liang**)
- [type] Let Type * be serializable (by **lin-hitonami**)
- [ci] Second attempt at permission check for ghstack landing (#7490) (by **Proton**)
- [docs] Reword words of warning about building from source (#7488) (by **Anselm Schüler**)
- [lang] Fixed double release of Metal command buffer (#7484) (by **PENGUINLIONG**)
- [ci] Switch Android bots lock redis to bot-master (#7482) (by **Proton**)
- [ci] Status check of ghstack CI bot (#7479) (by **Proton**)
- [Lang] Type check assignments between tensors (#7480) (by **Yi Xu**)
- [doc] Fix typo i...
v1.4.1
Highlights:
Full changelog:
v1.4.0
Deprecation Notice
- Support for sparse SNodes on the Metal backend has been removed.
- ti.Matrix.rotation2d() has been removed.
- The packed switch in ti.init() has been removed.
- The dynamic_index switch in ti.init() is now deprecated and will be removed in v1.5.0. See the feature introduction below for details.
- Slicing from a single row/column of a matrix (e.g.a[x, a:b]) now returns a vector instead of a matrix.
New features
AOT
Taichi AOT is officially available in Taichi v1.4.0, along with a native Taichi Runtime (TiRT) library taichi_c_api. Native applications can now load compiled AOT modules and launch Taichi kernels without a Python interpreter.
In this release, TiRT has stabilized the Vulkan backend on desktop platforms and Android. You can find prebuilt TiRT binaries on the release page. You can refer to a comprehensive tutorial on the doc site; the detailed TiRT C-API documentation is available at https://docs.taichi-lang.org/docs/taichi_core.
Ndarray
Taichi ndarray is now formally released in v1.4.0. The ndarray is an array object that holds contiguous multi-dimensional data to allow easy exchange with external libraries. See documentation for more details.
Dynamic index
Before v1.4.0, when you wanted to access a vector/matrix with a runtime variable instead of a compile-time constant, you had to set ti.init(dynamic_index=True). However, that option only works for LLVM-based backends (CPU & CUDA) and may slow down runtime performance because all matrices are affected. Starting from v1.4.0, that option is no longer needed. You can use variable indices whenever necessary on all backends without affecting the performance of those matrices with only constant indices.
Improvements
Performance
- The compilation speed has been optimized by ~2x.
Example list & ti gallery
Since v1.0.0, we have been enriching our taichi example collection, bringing the number of demos in the gallery window from eight to twelve. Run ti gallery to check out some new demos!
Bug fixes
- Incorrect behavior of struct fors on sparse SNodes in certain cases has been fixed. (#7121)
- CUDA will no longer allocate extra device memory when performing
to_numpy()
andfrom_numpy()
. (#7008) - StructType is now allowed as a type hint to ti.func. (#6964)
- Incorrect recompilation caused by filling in a matrix field with the same matrix has been fixed. (#6951)
- Matrix type inference has been fixed. (#6928)
- Getting 64-bit data from ndarrays in the Python scope is now handled correctly. (#6836)
- Name collision problem in ti.dataclass has been fixed. (#6737)
Highlights:
- Aot module
- Deprecate element shape and field dim for AOT symbolic args (#7100) (by Haidong Lan)
- Bug fixes
- Build system
- Deprecate export_core (#7028) (by Zhanlue Yang)
- Command line interface
- Add "ti cache clean" command to clean the offline cache files manually (#6937) (by PGZXB)
- Documentation
- Update tutorial.md (#7176) (by Zhao Liang)
- Update math_module.md (#7175) (by Zhao Liang)
- Update debugging.md (#7173) (by Zhao Liang)
- Fix C++ tutorial does not display on doc site (#7174) (by Zhao Liang)
- Update doc regarding dynamic index (#7148) (by Yi Xu)
- Move glossary to top level (#7118) (by Zhao Liang)
- Update type.md (#7038) (by Zhao Liang)
- Fix docstring (#7065) (by Zhao Liang)
- Remove packed mode in doc (#7030) (by Zhao Liang)
- Minor doc update (#6952) (by Zhao Liang)
- Glossary (#6101) (by Olinaaaloompa)
- Update dac (#6875) (by Gabriel Vainer)
- Update faq.md (#6921) (by Zhao Liang)
- Update dataclass.md (#6876) (by Gabriel Vainer)
- Update the documentation about Dynamic SNode (#6752) (by Lin Jiang)
- Stop mentioning packed mode (#6755) (by Yi Xu)
- Error messages
- GUI
- Support colored texts (#7036) (by Dunfan Lu)
- Intermediate representation
- Allow a maximum of 12 SNode indices (#6901) (by Dunfan Lu)
- Language and syntax
- Raise errors when using the packed switch (#7125) (by Yi Xu)
- Fix cannot use taichi in REPL (#7114) (by Zhao Liang)
- Remove deprecated ti.Matrix.rotation2d() (#7098) (by Yi Xu)
- Remove filename kwarg in aot Module save() (#7085) (by Ailing)
- Remove sourceinspect deprecation warning message (#7081) (by Zhao Liang)
- Make slicing a single row/column of a matrix return a vector (#7068) (by Yi Xu)
- Deprecate the dynamic_index switch (#7071) (by Yi Xu)
- Add irpass::eliminate_immutable_local_vars() test cases for TensorType (#7043) (by Zhanlue Yang)
- Fix gui docstring (#7003) (by Zhao Liang)
- Support dynamic indexing in spirv (#6990) (by Yi Xu)
- Support dynamic indexing in metal (#6985) (by Yi Xu)
- Support LU sparse solver on CUDA backend (#6967) (by pengyu)
- Fix struct type problem (#6949) (by Zhao Liang)
- Add warning message when converting dynamic snode to numpy (#6853) (by Zhao Liang)
- Deprecate sourceinspect dependency (#6894) (by Zhao Liang)
- Warn users if ndarray size is out of int32 boundary (#6846) (by Yi Xu)
- Remove the real_matrix switch (#6885) (by Yi Xu)
- Enable real_matrix and real_matrix_scalarize by default (#6801) (by Zhanlue Yang)
- Raise an error for the semantic change of transpose() (#6813) (by Yi Xu)
- Add bool type in python as an alias to i32 (#6742) (by daylily)
- Add deprecation warning for the removal of the packed switch (#6753) (by Yi Xu)
- Metal backend
- Raise deprecate warning and error when using sparse snodes on metal (#6739) (by Lin Jiang)
- Miscellaneous
Full changelog:
- [Doc] Update tutorial.md (#7176) (by Zhao Liang)
- [aot] (cherry-pick) Removed unused archs in C-API (#7167), FindTaichi CMake module to help outside project integration (#7168) (#7177) (by PENGUINLIONG)
- [docs] Create windows_debug.md (#7164) (by Bob Cao)
- [Doc] Update math_module.md (#7175) (by Zhao Liang)
- [Doc] Update debugging.md (#7173) (by Zhao Liang)
- [Doc] Fix C++ tutorial does not display on doc site (#7174) (by Zhao Liang)
- [doc] Fix spelling of "paticle_field" (#7024) (by Xiang (Kevin) Li)
- [doc] Update accelerate_python.md to use ti.max (#7161) (by Tao Jin)
- [aot] Fixed ti_get_last_error signature (#7165) (by PENGUINLIONG)
- [example] Update quaternion arithmetics in fractal_3d_ggui (#7139) (by Zhao Liang)
- [doc] Add doc ndarray (#7157) (by Olinaaaloompa)
- [doc] Update field.md (Fields advanced) (#6867) (by Gabriel Vainer)
- [ci] Use make_changelog.py to generate the full changelog (#7152) (by Lin Jiang)
- [aot] Introduce new AOT deployment tutorial (#7144) (by PENGUINLIONG)
- [Doc] Update doc regarding dynamic index (#7148) (by Yi Xu)
- [Misc] Strictly check ndim with external array (#7126) (by Haidong Lan)
- [ci] Run test when pushing to rc branches (#7146) (by Lin Jiang)
- [ci] Disable backward_cpp on macOS (#7145) (by Proton)
- [gui] Fix scene line renderable (#7131) (by Bob Cao)
- [Lang] Raise errors when using the packed switch (#7125) (by Yi Xu)
- [cpu] Reuse VirtualMemoryAllocator for CPU ndarray memory allocation (#7128) (by Ailing)
- [ci] Temporarily disable ad_external_array on Metal (#7136) (by Bob Cao)
- [Error] Raise errors when using metal sparse (#7113) (by Lin Jiang)
- [misc] Cherry-pick #7072 into rc-v1.4.0 (#7135) (by Ailing)
- [aot] Rename device capability atomic_i64 to atomic_int64 for consistency (#7095) (by PENGUINLIONG)
- [Lang] Fix cannot use taichi in REPL (#7114) (by Zhao Liang)
- [Bug] Fix num_splits in parallel_struct_for (#7121) (by Yi Xu)
- [Doc] Move glossary to top level (#7118) (by Zhao Liang)
- [Aot] Deprecate element shape and field dim for AOT symbolic args (#7100) (by Haidong Lan)
- [Lang] Remove deprecated ti.Matrix.rotation2d() (#7098) (by Yi Xu)
- [doc] Modified some errors in the function examples (#7094) (by welann)
- [ci] More Windows git hacks (#7102) (by Proton)
- [Lang] Remove filename kwarg in aot Module save() (#7085) (by Ailing)
- [Lang] Remove sourceinspect deprecation warning message (#7081) (by Zhao Liang)
- [example] Remove gui warning message (#7090) (by Zhao Liang)
- [Bug] Fix ret_type and cast_type of UnaryOpStmt in Scalarize (#7082) (by Yi Xu)
- [doc] Update ndarray deprecation warning to 1.5.0 (#7083) (by Haidong Lan)
- [example] Update gallery images (#7053) (by Zhao Liang)
- [Doc] Update type.md (#7038) (by Zhao Liang)
- [Doc] Fix docstring (#7065) (by Zhao Liang)
- [Lang] Make sl...
v1.3.0
Deprecation Notice
- Using sparse data structures on the Metal backend is now deprecated. The support for Dynamic SNode has been removed in v1.3.0, and the support for Pointer/Bitmasked SNode will be removed in v1.4.0.
- The
packed
switch inti.init()
is now deprecated and will be removed in v1.4.0. See the feature introduction below for details. ti.Matrix.rotation2d()
is now deprecated and will be removed in v1.4.0. Useti.math.rotation2d()
instead.- To clearly distinguish vectors from matrices,
transpose()
on a vector is no longer allowed. If you want something likea @ b.transpose()
, writea.outer_product(b)
instead. - Ndarray: The arguments of ndarray type annotation
element_dim
,element_shape
andfield_dim
will be deprecated in v1.4.0. Thefield_dim
is renamed tondim
to make it more intuitive.element_dim
andelement_shape
will be replaced by passing a matrix type intodtype
argument. For example, theti.types.ndarray(element_dim=2, element_shape=(3,3))
will be replaced byti.types.ndarray(dtype=ti.matrix(3,3))
.
New features
Dynamic SNode
To support variable-length fields, Taichi provides dynamic SNodes.
You can now use the dynamic SNode on fields of different data types, even struct fields and matrix fields.
You can use x[i].append(...)
to append an element, use x[i].length()
to get the length, and use x[i].deactivate()
to clear the list as shown in the following code snippet.
pair = ti.types.struct(a=ti.i16, b=ti.i64)
pair_field = pair.field()
block = ti.root.dense(ti.i, 4)
pixel = block.dynamic(ti.j, 100, chunk_size=4)
pixel.place(pair_field)
l = ti.field(ti.i32)
ti.root.dense(ti.i, 5).place(l)
@ti.kernel
def dynamic_pair():
for i in range(4):
pair_field[i].deactivate()
for j in range(i * i):
pair_field[i].append(pair(i, j + 1))
# pair_field = [[],
# [(1, 1)],
# [(2, 1), (2, 2), (2, 3), (2, 4)],
# [(3, 1), (3, 2), ... , (3, 8), (3, 9)]]
l[i] = pair_field[i].length() # l = [0, 1, 4, 9]
Packed Mode
Packed mode was introduced in v0.8.0 to allow users to trade runtime performance for memory usage. In v1.3.0, after the elimination of runtime overhead in common cases, packed mode has become the default mode. There's no longer any automatic padding behavior behind the scenes, so users can use fields and SNodes without surprise.
Sparse Matrix
We introduce the experimental sparse matrix and sparse solver on the CUDA backend. The API of using is the same as CPU backend. Currently, only the f32
data type and LLT linear solver are supported on CUDA. You can only use ti.ndarray
to compute SpMV and linear solver operation. Float64 data type and other linear solvers are under implementation.
Improvements
Python Frontend
- Matrix slicing now supports augmented assign (e.g. +=) besides assign.
Taichi Examples
- Our user https://github.com/Linyou contributed an excellent example on instant ngp renderer PR #6673. Run
taichi_ngp
to check it out!
[Developers only] LLVM15 upgrade
Starting from v1.3.0, Taichi has upgraded its LLVM dependency to version 15.0.0. If you're interested in contributing or simply building Taichi from source, please follow our installation doc for developers.
Note this change has no impact on Taichi users.
Highlights
- Documentation
- Language and syntax
- Add deprecation warning for the removal of the packed switch (#6753) (by Yi Xu)
- Metal backend
- Raise deprecate warning and error when using sparse snodes on metal (#6739) (by Lin Jiang)
Full changelog
- [aot] Revert C-API Device capability improvements (#6772) (by PENGUINLIONG)
- [aot] C-API Device capability improvements (#6702) (by PENGUINLIONG)
- [aot] C-API to get available archs (#6766) (by PENGUINLIONG)
- [doc] Update sparse matrix document (#6719) (by pengyu)
- [autodiff] Separate non-linear operators to an individual class (#6700) (by Mingrui Zhang)
- [bug] Fix dereferencing nullptr (#6763) (by Yi Xu)
- [Doc] Update the documentation about Dynamic SNode (#6752) (by Lin Jiang)
- [doc] Update dev install about clang version (#6759) (by Ailing)
- [build] Improve TI_WITH_CUDA guards for CUDA related test cases (#6698) (by Zhanlue Yang)
- [Lang] Add deprecation warning for the removal of the packed switch (#6753) (by Yi Xu)
- [lang] Improve sparse matrix building on GPU (#6748) (by pengyu)
- [aot] JSON serde (#6754) (by PENGUINLIONG)
- [bug] MatrixType bug fix: Fix error with to_numpy() and from_numpy() (#6726) (by Zhanlue Yang)
- [Doc] Stop mentioning packed mode (#6755) (by Yi Xu)
- [lang] Get the length of dynamic SNode by x.length() (#6750) (by Lin Jiang)
- [llvm] Support nested struct with matrix return value on real function (#6734) (by Lin Jiang)
- [Metal] [error] Raise deprecate warning and error when using sparse snodes on metal (#6739) (by Lin Jiang)
- [build] Integrate backward_cpp to test targets for enabling C++ stack trace (#6697) (by Zhanlue Yang)
- [aot] Load AOT module from memory (#6692) (#6714) (by PENGUINLIONG)
- [ci] Add dockerfile.ubuntu-18.04.amdgpu (#6736) (by Zeyu Li)
- [doc] Update LLVM10 -> LLVM15 in installation guide (#6747) (by Zhanlue Yang)
- [misc] Fix warnings of taichi examples (#6740) (by PGZXB)
- [example] Ti-example: instant ngp renderer (#6673) (by Youtian Lin)
- [build] Use a separate prebuilt llvm15 binary for manylinux environment (#6732) (by Ailing)