[SIMT] Add uni_sync warp intrinsics #4927

0xzhang · 2022-05-07T05:08:02Z

Related issue = #4631

netlify · 2022-05-07T05:08:07Z

✅ Deploy Preview for docsite-preview ready!

Name	Link
🔨 Latest commit	`e02f89b`
🔍 Latest deploy log	https://app.netlify.com/sites/docsite-preview/deploys/627801d857d25c000871aa9c
😎 Deploy Preview	https://deploy-preview-4927--docsite-preview.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site settings.

0xzhang · 2022-05-07T06:30:06Z

I'm a newbie in CUDA. And I couldn't find any accurate information about __uni_sync.
I misunderstood the meaning of unique to Equal.
https://docs.nvidia.com/cuda/nvvm-ir-spec/index.html#nvvm-intrin-warp-level-vote
I'll try to fix it.

0xzhang · 2022-05-08T12:44:10Z

GeForce GTX 1650
CUDA 11.4

It seems test_simt.py not testable on my PC.
I use command below.

python tests/run_tests.py -t1 -k test_simt

Warp functions have requirements to compute capability of device. Maybe my card doesn't have enough compute capability according the list. Or maybe the WSL environment limits the compute capability.

0xzhang · 2022-05-08T17:30:20Z

If and only if the predicate of all threads which specified by mask are all non-zero or all zero, return 1, else return 0.
I only find one description about __uni_sync() on whole Internet.

https://www.cnblogs.com/cuancuancuanhao/p/7841512.html

galeselee · 2022-05-09T02:30:25Z

GeForce GTX 1650

CUDA 11.4

It seems test_simt.py not testable on my PC. I use command below.
python tests/run_tests.py -t1 -k test_simt
Warp functions have requirements to compute capability of device. Maybe my card doesn't have enough compute capability according the list. Or maybe the WSL environment limits the compute capability.

Do you solve this problem? I have a similar problem when I write match_all warp intrinsic

0xzhang · 2022-05-09T03:01:35Z

@galeselee No, I'm not sure the cause of problem. I can only use CI to verify that my test passed.

turbo0628

LGTM!

turbo0628 · 2022-05-09T03:08:55Z

@galeselee Can you try on that Linux RTX3080 machine?

galeselee · 2022-05-09T03:29:48Z

@galeselee Can you try on that Linux RTX3080 machine?

yeah, I'll come back later with the results

galeselee · 2022-05-09T03:55:05Z

yeah, I'll come back later with the results

It occurs the same error on RTX3080 machine.

turbo0628 · 2022-05-09T04:22:58Z

It occurs the same error on RTX3080 machine.

Could you update more information about the crash in tests? Mine works fine. I think we are using the identical environments.

galeselee · 2022-05-09T04:45:01Z

yeah, I'll come back later with the results

It occurs the same error on RTX3080 machine.

I'm wrong. And there is no error occured on RTX3080 machine.

turbo0628 · 2022-05-09T04:47:51Z

@0xzhang I think your guess is right, some legacy GPUs cannot support SIMT instructions.

However, Taichi should speak out loudly when the GPU cannot run warp-level primitives. Could you file an issue to track this problem?

0xzhang · 2022-05-09T05:55:57Z

@turbo0628 Yes, I create a new issue (#4935) to track this problem.

turbo0628 · 2022-05-09T06:38:22Z

Thanks!

* [Build] Switch to scikit-build as the build backend (#4624) * switch to skbuild * Switch the build system to scikit-build * include bc and libmolten * find llvm runtime bc * fix bc files installation * install bc after compile * Add more message * Auto Format * fix findpython * Kickstart CI * add empty line * add missing dependency * fix python args * start CI * Fix clang tidy run * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Co-authored-by: Taichi Gardener <taichigardener@gmail.com> Co-authored-by: Ailing <ailzhang@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [build] Install export core library to build dir (#4866) * [misc] Bump version to v1.0.2 (#4867) * [Bug] Remove redundant AllocStmt when lowering FrontendWhileStmt (#4870) * [build] [bug] Fix a bug of skbuild that loses the root package_dir (#4875) * [ci] Add libtaichi_export_core build for desktop in CI (#4871) * [Build] [refactor] Define runtime build target (#4838) * Move LLVM Cmake to its own dir * Suppress warning from submodules * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Use current source dir * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Separate Vulkan runtime files from codegen * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [Doc] Add limitation about TLS optimization (#4877) * [Doc] Add limitation about TLS optimization * Add link to reduction sum benchmark * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Co-authored-by: Haidong Lan <turbo0628g@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [ci] Use the updated docker image for libtaichi_export_core (#4881) * [refactor] Add ASTSerializer and use it to generate offline-cache-key (#4863) * Add ASTSerializer, using it to generate offline-cache-key * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [build] Change the library output dir for export core (#4880) * Change the library output dir for export core * limit the change to the target * [vulkan] Device API explicit semaphores (#4852) * Device API explicit semaphores * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Destroy the semaphore before the context * Fix type warnings * fix nits * return nullptr for devices that don't need semaphores * test out no semaphores between same queue * Use native command list instead of emulated for dx11 * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove the in-queue semaphore * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Use flush instead of sync in places * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix possible null semaphore Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [metal] Complete Device API (#4862) * [metal] Complete Device API * fix * fix * [Doc] Updated links that may break. (#4874) * Updated logo * Updated links that may break when the doc site has versions * Added information that numpy arrays and torch tensors can be passed as arguments * Fixed a broken link. * [error] [lang] Improved error messages for illegal slicing or indexing to ti.field (#4873) * [bug] Improved error messages for ilegal slicing or indexing to ti.field * Fixed test failures * Addressed code-review comments * [metal] Migrate runtime's MTLBuffer allocation to unified device API (#4865) * wip * migrate all buffers * [Build] [refactor] Use keywords instead of plain target_link_libraries CMake (#4864) * Move LLVM Cmake to its own dir * Suppress warning from submodules * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Use current source dir * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Separate Vulkan runtime files from codegen * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Use keywords instead of plain target_link_libraries * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [bug] Fixed type promotion rule for bit-shift operations (#4884) * [bug] Fixed type promotion rule for shift operations * removed debug info * Addressed review comments * [aot] [vulkan] Expose symbols for AOT (#4879) * [aot] [vulkan] Expose symbols for AOT * weird windows * hide to make win happy * fix * [Build] [refactor] Define Cmake OpenGL runtime target (#4887) * Move LLVM Cmake to its own dir * Suppress warning from submodules * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Use current source dir * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Separate Vulkan runtime files from codegen * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Use keywords instead of plain target_link_libraries * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Separate opengl runtime files from backend * Remove some warnings * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Minor * Add glfw include * Add link to taichi core * Update taichi/program/extension.cpp Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: yekuang <k-ye@users.noreply.github.com> * [vulkan] Fix typo for waitSemaphoreCount (#4892) * [vulkan] Add new VMA vulkan functions. (#4893) * Add new VMA vulkan functions. * fix * Use Ninja generator on Windows and skip generator test (#4896) * [Lang] [test] Copy-free interaction between Taichi and PaddlePaddle (#4886) * Implement has_paddle(), to_paddle_type() and update to_taichi_type in python\taichi\lang\util.py * Implement get_paddle_callbacks() and update get_function_body(), match_ext_arr() in python\taichi\lang\kernel_impl.py * Add test test_io_devices() in tests\python\test_torch_io.py * Implement callback for CPU-GPU/GPU-CPU copy between Taichi and Paddle * Partially implement to_torch()/from_torch() according to PyTorch in Taichi * Fix paddle.Tensor's backend check * Update tests for from_paddle()/to_paddle() * [doc] Update Global settings with TI_ENABLE_PADDLE * Fix to avoid fail when only import paddle * [test] Fix the expected list alphabetically * [doc] Add info about paddle.Tensor * [ci] Try to test paddle's GPU version * Fix the usage of paddle.ones * Fix f16 tests for paddle * Fixed supported archs for tests of paddle * Use 1 thread run tests for torch and paddle * Fix linux test * Fix windows test * Unify the name to Paddle * Add tests for paddle * Replace usage of device to place for paddle * Paddle's GPU develop package on Linux import error * [test] Cancel tests for Paddle on GPU (#4914) * remove debug print (#4883) * [Doc] Updated broken links (#4912) * [Doc] Updated broken links * Updated links that may break. * Added .md * [test] Exit on error during Paddle windows test (#4910) * [test] Exit on error during Paddle windows test * Check if paddle test leaks memory * Increase device memory and reduce thread number * Revert "Check if paddle test leaks memory" This reverts commit e0522b0e520050fb50d2c338a2a7d0b2a363bfb0. * Disable paddle for non-paddle test * [build] Warning Suppression PR #2: Fixed codebase warnings (#4909) * [SIMT] Add syncwarp warp intrinsics (#4917) * add warp_barries warp instrinsic add warp_barrier unit test fix error: add Args mask in warp.py * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [refactor] Create MatrixImpl to differentiate Taichi and Python scopes (#4853) * wip * wip * wip * wip * wip * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * cleanup * fix impl._subscript() * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix mesh * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix useless __init__ * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix py-scope subscript * fix swizzle * fix doc * fix api * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [build] Warning Suppression PR #1: Turned on -Wno-ignored-attributes & Removed unused functions (#4916) * [SIMT] Add activemask warp intrinsics (#4918) * add activemask warp intrinsic add test function call del extra space unit-test print->assert * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [build] Warning Suppression PR #3: Eliminate warnings from third-party headers (#4920) * [build] Warning Suppression PR #1: Turned on -Wno-ignored-attributes & Removed unused functions * [build] Warning Suppression PR #2: Eliminate warnings from third-party headers * Fixed an warning with enum comparison * [build] Warning Suppression PR #4: Fixed warnings with MacOS (#4926) * [build] Warning Suppression PR #1: Turned on -Wno-ignored-attributes & Removed unused functions * [build] Warning Suppression PR #2: Eliminate warnings from third-party headers * Fixed an warning with enum comparison * [build] Warning Suppression PR #4: Fixed Mac-specific warnings * [refactor] Simplify Matrix's initializer (#4923) * [refactor] Simplify Matrix's initializer * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix * fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update python/taichi/lang/matrix.py * Update python/taichi/lang/matrix.py Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [Doc] Updated relative path (#4929) * Update field.md * Updated one broken link. * [Lang] Support sparse matrix datatype and storage format configuration (#4673) * Add sparse matrix datatype configuration * create sparse matrix with datatype in Python * sparse solver takes as sparse matrix with datatype parameters * operator overloading with bug * fix operator overloading bugs * Add more operator overloading functions * EigenSparseMatrix operator overloading * improve * Clang-tidy * add more datatype EigenSparseMatrix * get/set element bug fix * Bugfix:sparse matrix shape configuration * improve sparse matrix test cases * Update tests/python/test_sparse_matrix.py Co-authored-by: Yi Xu <xy_xuyi@foxmail.com> * improve * Update taichi/program/sparse_matrix.h Co-authored-by: Yi Xu <xy_xuyi@foxmail.com> Co-authored-by: Yi Xu <xy_xuyi@foxmail.com> Co-authored-by: taichiCourse01 <tgc01@taichi.graphics> * [lang] Fix type check warnings for ti.Mesh (#4930) * fix * fix * [SIMT] Add uni_sync warp intrinsics (#4927) * [SIMT] Add uni_sync warp intrinsics * [build] Enable -Werror on Linux & Mac (#4941) * [build] Turn on -Werror on Linux and Mac platforms (#4928) * [build] Turn on -Werror on Linux and Mac platforms * Added documentations for Werror * Patched documentation * [doc] Updated documentations for implicit type casting rules (#4885) * [doc] Updated documentations for type promotion rules * Rearranged type promotion docs * [refactor] Remove unused snode_trees in ProgramImpl interface (#4942) * [refactor] Remove unused snode_trees in ProgramImpl interface * Update taichi/codegen/codegen_llvm.h * [build] Turned off -Werror temporarily for issues with performance-bot (#4946) * [refactor] [llvm] Remove struct_compiler_ as a member variable (#4945) * [build] Limit -Werror to Clang-compiler only (#4947) * [build] Enable -Werror on Linux & Mac * [build] Limit -Werror to Clang-compiler only * [ci] Fix Nightly (#4948) * [ci] Fix nightly test * Add python 3.7 3.9 in nightly * [Build] Improved building on Windows (#4925) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [Lang] Add more functions to math module (#4939) * add more functions to math module * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more functions to math module * add more functions to math module * add more functions to math module * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more functions to math module * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more functions to math module * add more functions to math module * Update _funcs.py * Update python/taichi/_funcs.py Co-authored-by: pengyu <6712304+FantasyVR@users.noreply.github.com> * Update python/taichi/math/mathimpl.py Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: pengyu <6712304+FantasyVR@users.noreply.github.com> * [lang] [bug] Implement Expression serializing and fix some bugs (#4931) * Serialize Expression and remove old useless ExpressionOfflineCacheKeyGenerator * Fix some bugs(reported by test_assert and test-snodes with offline_cache=True) * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [refactor] Add ArrayMetadata to store the array runtime size (#4950) * [refactor] Add ArrayMetadata to store the array runtime size * rm macros * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix * revert to debug * decompose * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [refactor] Some renamings (#4959) * [lang] Add reference type support on real functions (#4889) * wip * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixes * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixes * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix * add test * fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix test_api Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [llvm] Move cache directory to dump() (#4963) * [llvm] Move cache directory to dump() * fix * fix * [RFC] AOT for all SNodes (#4806) * [rfc] AOT for all SNodes * add rfc tag * fix * fix * Update docs/rfcs/20220413-aot-for-all-snode.md Co-authored-by: Ailing <ailzhang@users.noreply.github.com> * fix * soa * more contents on autodiff, add_field() * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * toc * Update docs/rfcs/20220413-aot-for-all-snode.md Co-authored-by: Yi Xu <xy_xuyi@foxmail.com> * Update docs/rfcs/20220413-aot-for-all-snode.md Co-authored-by: Yi Xu <xy_xuyi@foxmail.com> * Update docs/rfcs/20220413-aot-for-all-snode.md Co-authored-by: Yi Xu <xy_xuyi@foxmail.com> * improvements * improvements Co-authored-by: Ailing <ailzhang@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Yi Xu <xy_xuyi@foxmail.com> * [ci] Add new buildbot with latest driver for Linux/Vulkan test (#4953) * [ci] Add new buildbot with latest driver for Linux test * Removed unused Jenkinsfile and travis * Ref to issue * Change matrix * Change matrix format * Change indented maybe * String maybe * First remove runs-on * Minor * Use nested array * Use nested array * Use nested array * Debug path * Revert "Debug path" This reverts commit 000db2ad746f1d670e7fa7c9bdd1fad0209b8147. * Debug path * Revert * Remove trailing space * [vulkan] Set kApiVersion to VK_API_VERSION_1_3 (#4970) * Change vulkan version to fix AMD crash problem. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [bug] [simt] Fix the problem that some intrinsics are never called (#4957) * [bug] [simt] Fix the problem that some intrinsics are never called * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix format * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [llvm] Create ModuleToFunctionConverter (#4962) * [llvm] Create ModuleToFunctionConverter * fix wild pointer * get_compute_device * [build] Fixed Ilegal Instruction Error when importing PaddlePaddle module (#4969) * Trigger CI failure * [build] Fixed Ilegal Instruction Error when importing PaddlePaddle module * CI run: second time * CI run: third time * Log hardware info for CI build-bot * [test] Add an ndarray test in C++. (#4972) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [llvm] Make codegen produce static llvm::Module (#4975) * [llvm] Make codegen produces static llvm::Module * Update taichi/codegen/codegen_llvm.h * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * TI_WITH_LLVM * fix Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [ci] [build] Containerize Windows CPU build and test (#4933) * [ci] [build] Containerize Windows CPU build and test * Disable ninja * Avoid pybind11_add_module() * Force reinstall * Find pybind11 * Include pybind11 dir * Update include dir * Remove trailing whitespace * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Use correct pybind11 * Add path * Enable no extras for pybind11_add_module * Add no_extra * Clone in the container * Use github job container * Add runs-on * Revert back to docker based jobs * Install instead of develop * [ci] [build] Containerize Windows CPU build and test * Disable ninja * Avoid pybind11_add_module() * Force reinstall * Find pybind11 * Include pybind11 dir * Update include dir * Remove trailing whitespace * Use correct pybind11 * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add path * Enable no extras for pybind11_add_module * Add no_extra * Clone in the container * Use github job container * Add runs-on * Revert back to docker based jobs * Install instead of develop * Use tar in jobs * Update cmake * Skip clone * Manual fixing white space * Remove comments Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [llvm] Make cache writer support BC format (#4978) * [Build] Improve Windows build script (#4955) * Improve Windows build script * Switch to clean up intermediates * [refactor] Improve serializer and cleanup utils (#4980) * [refactor] Improve serializer and cleanup utils * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [llvm] Support both BC and LL cache format (#4979) * [llvm] Support both BC and LL cache format * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * rm * fix fs * fix Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [misc] Add ASTSerializer::visit(ReferenceExpression *) (#4984) * [bug] Fix infinite recursion of get_offline_cache_key_of_snode_impl() (#4983) * Fix infinite recursion of get_offline_cache_key_of_snode_impl * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix some comments * Fix Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [cuda] Add block and grid level intrinsic for cuda backend (#4977) * Add block/grid level intrinsics * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add test * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix syntax Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [Workflow] Update release_test.sh (#4960) Co-authored-by: Chengchen(Rex) Wang <14366016+rexwangcc@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Provision of prebuilt LLVM 10 for VS2022 (#4987) * [llvm] Use serializer for LLVM cache (#4982) * [llvm] Use serializer for LLVM cache * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ctor * fix * fix to pointer * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix order * wip * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix * fix Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [Doc] Fix docs deploy netlify test configuration (#4991) * fix docs deploy netlify test configuration * check netlify change to run docs preview * [Doc] Updated URL (#4990) * Updated URL * Updated URL * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [Doc] Update trouble shooting URL in bug report template (#4988) * [Lang] [type] Refactor quantized_types module and make quant APIs public (#4985) * [Type] Refactor quantized_types module and make quant APIs public * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix pylint Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Update README.md Tests if netlify is still working * [test] Fix a few mis-configured ndarray tests (#5000) * [Doc] Fix netlify cache & sync doc without pr content (#5003) * [refactor] Program owns allocated ndarrays. The end goal of this refactor is let Ndarray be a simple wrapper around (DeviceAllocation, dtype, shape) without having to worry about memory allocation/deallocation. But its current implementation heavily couples with Program*, so an intermediate state would be: - If created from Program, Ndarray handles deviceallocation in ctor/dtor. - We'll add another ctor simply constructing Ndarray from (DeviceAllocation, dtype, shape) and update the codebase to it. ghstack-source-id: bdfd24154428a3fd92ca05688333509d0402e53a Pull Request resolved: https://github.com/taichi-dev/taichi/pull/4996 * [test] Add test for Ndarray from DeviceAllocation ghstack-source-id: 7d7c5486bce0a491170b52ba3ae809b4853d5447 Pull Request resolved: https://github.com/taichi-dev/taichi/pull/4997 * [refactor] Construct ndarray from existing DeviceAllocation. The end goal is make this the only ctor for Ndarray class. ghstack-source-id: a7294096285b3879a84ffbb41196a136b2b605a8 Pull Request resolved: https://github.com/taichi-dev/taichi/pull/4998 * [refactor] Free ndarray's memory when python GC triggers Previously Program manages lifetime of all allocated ndarrays. So when you call `del ndarray` in python, its memory was not freed. This PR changes the behavior that `ndarray` memory gets deallocated when python GC triggers, or its containing `Program` gets destructed, whichever happens first. There're some quirks around how we handle the async python GC and manual `ti.reset()`. Thanks to k-ye, we now added a `generation` number to track the containing program instance of ndarrays so that memory deallocation happens correctly. ghstack-source-id: 4fdef9c8285e2188c4afaffe6febdd45e5164b15 Pull Request resolved: https://github.com/taichi-dev/taichi/pull/4999 * [refactor] Move ndarray fast fill methods to Program This PR gets rid of `LlvmProgramImpl*` member inside `Ndarray` class, which is a step closer towards decoupling `Ndarray` and memory management. ghstack-source-id: 181d28eba3ded5c95d8c70c95a293505bbfebf01 Pull Request resolved: https://github.com/taichi-dev/taichi/pull/5002 * [refactor] Get rid of data_ptr_ in Ndarray ghstack-source-id: d795592d21f4a3da4a6c8ccffdce1dbc40ad99aa Pull Request resolved: https://github.com/taichi-dev/taichi/pull/5004 * [Doc] Branding updates. Also tests netlify. (#4994) * Branding updates. Also tests netlify. * Minor editorial updates to trigger netlify preview. * Minor updates to re-trigger CI/CD * [AOT] Supported inclusion of taichi as subdirectory for AOT modules (#5007) * Support building taichi as CMake subdirectory * Fixes for export-less integration on Android * [misc] Version bump: v1.0.2 -> v1.0.3 (#5008) * [Lang] [type] Fix parameter name 'range' for ti.types.quant.fixed (#5006) * [SIMT] Add match_any warp intrinsics (#4921) * add match_any warp intrinsic del f32 reset * alter predicate to value * update warp.py to sync with PR4957 Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [doc] Update community section (#4943) * [doc] Update community section add active events and communication * fix typo * refine docs Co-authored-by: Vissidarte-Herman <93570324+Vissidarte-Herman@users.noreply.github.com> * Update README.md Co-authored-by: Vissidarte-Herman <93570324+Vissidarte-Herman@users.noreply.github.com> Co-authored-by: Vissidarte-Herman <93570324+Vissidarte-Herman@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [llvm] Add serializable LlvmLaunchArgInfo (#4992) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [bug] Fixed numerical error for Atomic-Sub between unsigned values with different number of bits (#5011) * [refactor] Move get ndarray data ptr to program (#5012) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [ci] [build] Enable ccache for windows docker (#5001) * Enable ccache for windows docker * only run windows docker job * copy ccache_folder * trigger CI * Re-enable all jobs * remove dumb text * [test] Unify kernel setup for ndarray related tests We'll reuse these two kernels for cgraph tests as well so let's clean it up first. ghstack-source-id: acd772f092ac9044197ac2e1f16100ba4ba9005d Pull Request resolved: https://github.com/taichi-dev/taichi/pull/5014 * [aot] Build and run graph without serialization This PR servces as the base PR with a minimal example of building and running a Graph. Runtime values for graph arguments can be either scalars or ndarrays. For detailed proposal please see #4786. Things handled in this PR: - Maximize common code/runtime shared by the two workflows below: 1. build -> compile -> run 2. build -> compile -> serialize -> deserilize -> run - Graph arguments are annotated with dtype and element shape for ndarray (temporary until we have vec3 types in C++) Things that we've discussed but not included in this PR: - C API: I'll leave that for a unified C API PR in the future. - bind IValues to graph: easy, will add later. ghstack-source-id: f459afccdde56b59ab0ecc860ed11d761a20fe0a Pull Request resolved: https://github.com/taichi-dev/taichi/pull/5015 * [Llvm] Add AOT builder and loader (#5013) * [Llvm] Add AOT builder and loader * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * check nullptr Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [ci] Fix nightly macos (#5018) * [bug] Revert freeing ndarray memory when python GC triggers (#5019) * [SIMT] Add match_all warp intrinsics (#4961) * add match_all warp intrinsic by ptx * add args to match_all in warp.py * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update warp.py to sync with PR4957 * update llvm_context.cpp: add more details about match_all_sync intrinsic * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update test_simt.py Initialize a with1 Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [AOT] Support importing external Vulkan buffers (#5020) * [Bug] [type] Fix frontend type check for reading a whole bit_struct (#5027) * fix fast_gui rgba bug (#5031) * [doc] Update OS names (#5030) * [ci] Disable win cpu docker job test (#5033) * Disable win cpu docker job test * Revert changes on naming * [aot] Serialize built graph, deserialize and run. related: #4786 [Update]: based on an offline discussion with k-ye, I've split the original `Graph` class into `GraphBuilder` and `CompiledGraph` classes in C++. Note that the implementation didn't follow exactly the builder design pattern as our builder is slightly simpler as shown below. The complexity in our problem is more in the need of serialization and deserialization for the same graph representation intead of its construction process. So IMHO it's good enough to separate the GraphBuilder and Runner(`CompiledGraph`) as we discussed. Please feel free to correct me if I'm wrong! ``` GraphBuilder | compile() | | CompiledGraph <---- serialize/deserialize ----> file | | run() ``` This PR demonstrates a minimal example of serializing a built graph, deserializing and running it. ghstack-source-id: 7dda7cc11ef3a946f31d75783a8cfd1836e47ba5 Pull Request resolved: https://github.com/taichi-dev/taichi/pull/5024 * [aot] Move ArgKind as first argument in Arg class Thought this might be more intuitive for users. ghstack-source-id: 865062f0982db4d69a41ba345a1d254c2054a12f Pull Request resolved: https://github.com/taichi-dev/taichi/pull/5025 * [aot] Bind graph APIs to python and add mpm88 example (#5034) This PR supports graph builder and runner APIs in python. Note for simplicity I've merged builder and runner in the same Python class. Please feel free to comment if you have any suggestions. This PR also adds a test of saving mpm88 graph in aot module, as well as an example script to demonstrate the speed improvement (15fps -> 45fps) compared to the current taichi. ghstack-source-id: 600e604b141f9e534045f930d8424125c38ed875 Pull Request resolved: https://github.com/taichi-dev/taichi/pull/5026 * [Lang] [type] Refactor quant type definition APIs (#5036) * [Lang] [type] Refactor quant type definition APIs * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [Metal] Support Ndarray (#4720) * [Metal] Support Ndarray * simple work * fix copying * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * wip * fixes * fix devalloc id bug, enable tests * fix extra_arg offset * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove size * rm size * ref * fix for ret matrix type * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix test * fix zero Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [Lang] Fix potential precision bug when using math vector and matrix types (#5032) * fix fast_gui rgba bug * fix floating precision problem for vector types * add more flexible initialization methods for matrix type * add more flexible initialization methods for matrix type * add glsl determinant and inverse function support * add glsl determinant and inverse function support * add glsl determinant and inverse function support * add glsl determinant and inverse function support * add glsl determinant and inverse function support * fix matrix precision type bug and use matrix-member inverse * fix matrix precision type bug and use matrix-member inverse * [Vulkan] Fixed vulkan backend crash on AOT examples (#5047) * Exit CI builds when download of prebuilt packages fails (#5043) * [ci] Run cpp tests via run_tests.py (#5035) * [ci] Run cpp tests via run_tests.py * default to False * enable cpp on win * Set host_write to false for opengl ndarray (#5038) As discussed ndarrays can be written through calling write kernels, but it shouldn't support directly map on host and write to it. * [Lang] Build sparse matrix from ndarray (#4841) * build sparse matrix from ndarray * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add shape property * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix errors * fix pylint * fix failed test * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * improve * improve * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix ndarray data ptr not found * pylint * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * check ndarray dimension when build sparse matrix * improve * add example docstring * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [bug] Added type promotion support for atan2 (#5037) * [bug] Added type promotion rule for atan2 * Fixed minor issue * Modified type promotion rule * [Doc] Updated type system (#5054) * Editorial updates * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [bug] Ndarray type should include primitive dtype as well (#5052) This PR does three things: - Switch cgraph `Arg` to take in `ti.f32/i32` instead of string `f32/i32` as inputs - Fix a bug that when we produce injected ndarray args for compilation we only produced f32 ndarrays, which won't work for ndarray of other primitive dtypes. - No need to specify `element_shape` if it's scalar arg or scalar ndarray arg. * [Lang] [ir] Add short-circuit if-then-else operator (#5022) * [Lang] [ir] Add a short-circuit if-then-else operator and use it to implement IfExp * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [Lang] Struct Classes implementation (#4989) * Initial Struct Classes implementation * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update method translation to mark as taichi funcs * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Revert unwanted changes to impl.py * Update test_api.py * Update struct.py Update with review comments. * Update class decorator docstring * Update func marking and add tests * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update tests/python/test_custom_struct.py Co-authored-by: Yi Xu <xy_xuyi@foxmail.com> * Update docstrings * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update python/taichi/lang/struct.py Co-authored-by: Yi Xu <xy_xuyi@foxmail.com> * Update python/taichi/lang/struct.py Co-authored-by: Yi Xu <xy_xuyi@foxmail.com> * Update python/taichi/lang/struct.py Co-authored-by: Yi Xu <xy_xuyi@foxmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Yi Xu <xy_xuyi@foxmail.com> * [Lang] [type] Disallow reading a whole bit_struct (#5061) * [bug] Remove operator ! for Expr (#5062) * [build] [bug] Ensure the assets folder is copied to the project directory (#5063) * Bugfix: ensure the assets folder are copied * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * call copy_assets before setup() Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [refactor] Split GraphBuilder out of Graph class (#5064) * [aot] [CUDA-AOT PR #0] Refactored compile_module_to_executable() to CUDAModuleToFunctionConverter (#5070) * [refactor] Specialized Ndarray Type is (element_type, shape, layout) ghstack-source-id: 977cd453359b8ccc09deccacc62a915abcd42734 Pull Request resolved: https://github.com/taichi-dev/taichi/pull/5065 * [refactor] Pass element_shape and layout to C++ Ndarray Note we still flatten element_shape in the C++ Ndarray, which is blocked by the accessors and will be fixed in the following PRs. ghstack-source-id: 0cb5c05f0ad4c188546d7174a1d82f398bc717c2 Pull Request resolved: https://github.com/taichi-dev/taichi/pull/5066 * [Lang] Support constructing vector and matrix ndarray from ti.ndarray() ghstack-source-id: 3055ba79c35aecea61c449038b4f8c07e87571b9 Pull Request resolved: https://github.com/taichi-dev/taichi/pull/5073 * [refactor] Resolve comments from #5065 (#5074) * [Example] Update mass_spring_3d_ggui.py to v2 (#3879) * cleaner mass_spring_3d_ggui.py * fix collision * Fix penetration * No capitalized globals * fix compute_force * parameter tweaks * Looks good now * Update TaichiCore.cmake * Update mass_spring_3d_ggui.py * Update mass_spring_3d_ggui.py Change variable name `allow_bending` to `bending_springs`. * [doc] Fix broken link for github action status badge (#5076) * [doc] Fix link for github action status badge * Update README.md Co-authored-by: Bo Qiao <boqiao@taichi.graphics> * Update README.md Co-authored-by: Bo Qiao <boqiao@taichi.graphics> Co-authored-by: Bo Qiao <boqiao@taichi.graphics> * [llvm] Specialize element shape for LLVM backend (#5071) * Specialize element shape for LLVM backend Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [spirv] Specialize element shape for spirv codegen. (#5068) * Specialize element shape for spirv codegen. * Fix index for size_var_names * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Slight changes for better code style. Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [Lang] Add more initialization routines for glsl matrix types (#5069) * add more initialization routines for glsl matrix types * add more initialization routines for glsl matrix types * [cuda] [simt] Add assertions for warp intrinsics on old GPUs (#5077) * Add guard for cc smaller than 70 * Fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix pylint Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [refactor] Correctly set ndarray element_size and nelement (#5080) * [infra] Refactor Vulkan runtime into true Common Runtime (#5058) * Remove all references to Vulkan in common runtime & fix device API for OpenGL (bindings) and DirectX 11 (memory leaks) * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix tests * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix cpp test * update * update Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [llvm] [aot] CUDA-AOT PR #1: Extracted common logics from CPUAotModuleImpl into LLVMAotModule (#5072) * [llvm] [aot] CUDA-AOT PR #1: Extracted common logics from CPUAotModuleImpl into LLVMAotModule * Renamed LLVMAotModule * Fixed minor issue * [llvm] [refactor] Merge AtomicOpStmt codegen in CPU and CUDA backends (#5086) * [llvm] [refactor] Merge AtomicOpStmt codegen in CodeGenLLVMCUDA and CodeGenLLVM * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [refactor] Make sure Ndarray shape is field shape (#5085) * [autodiff] Allocate dual and adjoint snode (#5083) * allocate dual and decouple grad and adjoint * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update * update * update the adjoint name * fix matrix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * recover the grad name Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [build] [refactor] Change CMake global include_directories to target based function (#5082) * Change to target_include_directories * Update runtime cmake * Pre-commit format * [Doc] Add documentation of Taichi Struct Classes. (#5075) * Add documentation of Taichi Struct Classes. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * edits * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update docs/lang/articles/advanced/odop.md Co-authored-by: Yi Xu <xy_xuyi@foxmail.com> * Update docs/lang/articles/advanced/odop.md Co-authored-by: Yi Xu <xy_xuyi@foxmail.com> * Update docs/lang/articles/advanced/odop.md Co-authored-by: Yi Xu <xy_xuyi@foxmail.com> * Update docs/lang/articles/advanced/odop.md Co-authored-by: Yi Xu <xy_xuyi@foxmail.com> * Update docs/lang/articles/advanced/odop.md Co-authored-by: Yi Xu <xy_xuyi@foxmail.com> * Update docs/lang/articles/advanced/odop.md Co-authored-by: Yi Xu <xy_xuyi@foxmail.com> * Fix capitalization of Python Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Yi Xu <xy_xuyi@foxmail.com> * [llvm] [aot] Add LLVM-CPU AOT tests (#5079) * [llvm] [aot] Add LLVM-CPU AOT tests * Refactored AOT test framework * Fixed minor issue * Enabled LLVM CPU-AOT for arm64 architecture * Added aot unit tests programming guide * Fixed typo * Refactored AOT test framework * [autodiff] Extract shared components for reverse and forward mode (#5088) extract shared components for reverse and forward mode * [llvm] [refactor] Use LLVM native atomic ops if possible (#5091) * [llvm] [refactor] Use LLVM native atomic ops if possible * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [bug] Minor fix for ndarray element_shape in graph mode (#5093) Now that ndarray's element_shape is separated from shape, this hack can be removed. * Use pre-calculated runtime size array for gfx runtime. (#5094) * [Doc] Improve ODOP doc structure (#5089) * [llvm] [aot] CUDA-AOT PR #2: Implemented AOTModuleLoader & AOTModuleBuilder for LLVM-CUDA backend (#5087) * [llvm] [aot] Add LLVM-CPU AOT tests * Refactored AOT test framework * Fixed minor issue * Enabled LLVM CPU-AOT for arm64 architecture * Added aot unit tests programming guide * [llvm] [aot] CUDA-AOT PR #2: Implemented AOT Module Loader for LLVM-CUDA backend * Fixed typo * Fixed minor issue * Refactored AOT test framework * [llvm] [aot] Add LLVM-CUDA AOT tests * Added cuda device availability check * clean hidden override functions (#5097) * [refactor] Update Ndarray constructor used in AOT runtime. (#5095) This constructor is mainly used to construct an Ndarray out of an existing device allocation. This PR updates the behavior of this constructor to seprate element_shape out of shape. * [refactor] Remove ndarray element shape from extra arg buffer (#5100) * Remove element shape from extra args. * [llvm] [refactor] Move load_bit_pointer() to CodeGenLLVM (#5099) * [llvm] [refactor] Move load_bit_pointer() to CodeGenLLVM * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [test] Save mpm88 graph in python and load in C++ test. (#5104) This is a simplified version of https://github.com/ailzhang/taichi-aot-demo/tree/mpm88_cgraph_demo which strips the GGUI rendering part. Let's add this as a test (as well as demo ;) ) in the codebase. We used to test the saving part of mpm88 btw and it was replaced with this e2e test. Huge thanks to @k-ye for help debugging the GGUI rendering issue! * [Example] Update visual effects of mass_spring_3d_ggui.py (#5081) * update scene for mass_spring simulation * update scene for mass_spring simulation * update scene for mass_spring simulation * [type] [refactor] Remove redundant promotion for custom int in type_check (#5102) * [llvm] [refactor] Replace cast_int() with LLVM native integer cast (#5110) * [llvm] [refactor] Use LLVM native integer cast * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [type] [llvm] [refactor] Fix function names in codegen_llvm_quant (#5115) * [type] [llvm] [refactor] Fix function names in codegen_llvm_quant * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [bug] Fix build without llvm backend crash (#5113) * [bug] Fix build without llvm backend crash * Update taichi/python/export_lang.cpp Co-authored-by: yekuang <k-ye@users.noreply.github.com> Co-authored-by: yekuang <k-ye@users.noreply.github.com> * [build] [refactor] Move Vulkan runtime out of backends dir (#5106) * Precommit fix * Add spirv source * Move device code back to backends * Expose glfw include in vulkan rhi * Fix llvm include * Fix include for test * [autodiff] Add forward mode pipeline for autodiff pass (#5098) * Add forward mode pipeline for autodiff pass * Replace the grad parameter with AutodiffMode to distinguish three kinds of kernels primal, forward ad and reverse ad * [aot] [llvm] LLVM AOT Field #0: Implemented FieldCacheData & refactored initialize_llvm_runtime_snodes() (#5108) * [aot] [llvm] Implemented FieldCacheData and refactored initialize_llvm_runtime_snodes() * Addressed compilation erros * Added initialization for struct members * Minor fix * [aot][bug] Use cached compiled kernel pointer when it's added to graph (#5122) multiple times This bug was triggered when we tried to port stable_fluid demo so this PR also added a cgraph based stable fluid demo. ``` ti example stable_fluid_graph ``` Note it's not ideal to save both `FunctionType compiled_` as well as `aot::Kernel compiled_aot_kernel_` inside C++ `Kernel` class. But we plan to clean that up (likely by getting rid of `FunctionType compiled_`) in #5114. * [aot] [llvm] LLVM AOT Field #1: Adjust serialization/deserialization logics for FieldCacheData (#5111) * [aot] [llvm] Implemented FieldCacheData and refactored initialize_llvm_runtime_snodes() * Addressed compilation erros * [aot] [llvm] LLVM AOT Field #1: Adjust serialization/deserialization logics for FieldCacheData * Editorial update (#5119) * [lang] Texture support 0/n: IR changes (#5134) * fix mass_spring_3d_ggui backend (#5127) * [Example] Fix block_dim warning in ggui (#5128) * fix block dim warning in ggui * fix block dim warning in ggui * fix block dim warning in ggui * [ci] Enable yapf and isort on example files (#5140) Note we explicitly exclude running pylint on them as it requires a bunch of manual fixes first. * [type] [refactor] Misc improvements to quant codegen (#5129) * Replace is_custom_type() with is_quant() * Rename two functions * Use get_constant() if possible * Rename two metal functions * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [aot] [llvm] LLVM AOT Field #2: Updated LLVM AOTModuleLoader & AOTModuleBuilder to support Fields (#5120) * [aot] [llvm] Implemented FieldCacheData and refactored initialize_llvm_runtime_snodes() * Addressed compilation erros * [aot] [llvm] LLVM AOT Field #1: Adjust serialization/deserialization logics for FieldCacheData * [llvm] [aot] Added Field support for LLVM AOT * [aot] [llvm] LLVM AOT Field #2: Updated LLVM AOTModuleLoader & AOTModuleBuilder to support Fields * Fixed merge issues * Stopped abusing Program* Co-authored-by: Frost Ming <mianghong@gmail.com> Co-authored-by: Taichi Gardener <taichigardener@gmail.com> Co-authored-by: Ailing <ailzhang@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Taichi Gardener <62079278+taichi-gardener@users.noreply.github.com> Co-authored-by: Zhanlue Yang <zy2284@columbia.edu> Co-authored-by: Bo Qiao <boqiao@taichi.graphics> Co-authored-by: Haidong Lan <turbo0628g@gmail.com> Co-authored-by: PGZXB <420254146@qq.com> Co-authored-by: Bob Cao <bobcaocheng@gmail.com> Co-authored-by: yekuang <k-ye@users.noreply.github.com> Co-authored-by: Vissidarte-Herman <93570324+Vissidarte-Herman@users.noreply.github.com> Co-authored-by: Zhanlue Yang <jim19930609@gmail.com> Co-authored-by: Gabriel H <64807734+ghuau-innopeak@users.noreply.github.com> Co-authored-by: 0xzhang <33616362+0xzhang@users.noreply.github.com> Co-authored-by: yixu <BillXu2000@126.com> Co-authored-by: Zeyu Li <47965866+GaleSeLee@users.noreply.github.com> Co-authored-by: pengyu <6712304+FantasyVR@users.noreply.github.com> Co-authored-by: Yi Xu <xy_xuyi@foxmail.com> Co-authored-by: taichiCourse01 <tgc01@taichi.graphics> Co-authored-by: Chang Yu <g1n0st@live.com> Co-authored-by: PENGUINLIONG <admin@penguinliong.moe> Co-authored-by: Lin Jiang <90667349+lin-hitonami@users.noreply.github.com> Co-authored-by: Haidong Lan <haidonglan@taichi.graphics> Co-authored-by: YuZhang <YuCrazing@users.noreply.github.com> Co-authored-by: Chuandong Yan <90600320+chuandongyan@users.noreply.github.com> Co-authored-by: Chengchen(Rex) Wang <14366016+rexwangcc@users.noreply.github.com> Co-authored-by: Justin <62801799+Justinterest@users.noreply.github.com> Co-authored-by: Ailing Zhang <ailing@taichi.graphics> Co-authored-by: Zeyu Li <li_zeyu@pku.edu.cn> Co-authored-by: yanqingzhang <yanqingdw@gmail.com> Co-authored-by: daylily <xy.r@outlook.com> Co-authored-by: bsavery <brian.savery@gmail.com> Co-authored-by: Alex Brown <96645475+AlexBrown42@users.noreply.github.com> Co-authored-by: Bo Qiao <qiao.bo@outlook.com> Co-authored-by: Mingrui Zhang <33411325+erizmr@users.noreply.github.com> Co-authored-by: Olinaaaloompa <106292061+Olinaaaloompa@users.noreply.github.com>

0xzhang force-pushed the cuda-warp branch from 6b96f75 to 167cf98 Compare May 8, 2022 17:03

[SIMT] Add uni_sync warp intrinsics

137f111

fix: test according the meaning of __uni_sync()

e02f89b

0xzhang force-pushed the cuda-warp branch from 167cf98 to e02f89b Compare May 8, 2022 17:45

0xzhang requested review from qiao-bo and turbo0628 May 8, 2022 17:47

turbo0628 approved these changes May 9, 2022

View reviewed changes

turbo0628 self-requested a review May 9, 2022 03:09

0xzhang mentioned this pull request May 9, 2022

[cuda] CUDA warp level intrinsics failed on legacy GPU #4935

Closed

turbo0628 merged commit 407ff73 into taichi-dev:master May 9, 2022

qiao-bo mentioned this pull request May 10, 2022

[RFC] [SIMT] Add CUDA warp-level intrinsics to Taichi #4631

Open

37 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SIMT] Add uni_sync warp intrinsics #4927

[SIMT] Add uni_sync warp intrinsics #4927

0xzhang commented May 7, 2022

netlify bot commented May 7, 2022 •

edited

Loading

0xzhang commented May 7, 2022 •

edited

Loading

0xzhang commented May 8, 2022 •

edited

Loading

0xzhang commented May 8, 2022 •

edited

Loading

galeselee commented May 9, 2022

0xzhang commented May 9, 2022

turbo0628 left a comment

turbo0628 commented May 9, 2022

galeselee commented May 9, 2022 •

edited

Loading

galeselee commented May 9, 2022 •

edited

Loading

turbo0628 commented May 9, 2022

galeselee commented May 9, 2022

turbo0628 commented May 9, 2022

0xzhang commented May 9, 2022 •

edited

Loading

turbo0628 commented May 9, 2022

[SIMT] Add uni_sync warp intrinsics #4927

[SIMT] Add uni_sync warp intrinsics #4927

Conversation

0xzhang commented May 7, 2022

netlify bot commented May 7, 2022 • edited Loading

✅ Deploy Preview for docsite-preview ready!

0xzhang commented May 7, 2022 • edited Loading

0xzhang commented May 8, 2022 • edited Loading

0xzhang commented May 8, 2022 • edited Loading

galeselee commented May 9, 2022

0xzhang commented May 9, 2022

turbo0628 left a comment

Choose a reason for hiding this comment

turbo0628 commented May 9, 2022

galeselee commented May 9, 2022 • edited Loading

galeselee commented May 9, 2022 • edited Loading

turbo0628 commented May 9, 2022

galeselee commented May 9, 2022

turbo0628 commented May 9, 2022

0xzhang commented May 9, 2022 • edited Loading

turbo0628 commented May 9, 2022

netlify bot commented May 7, 2022 •

edited

Loading

0xzhang commented May 7, 2022 •

edited

Loading

0xzhang commented May 8, 2022 •

edited

Loading

0xzhang commented May 8, 2022 •

edited

Loading

galeselee commented May 9, 2022 •

edited

Loading

galeselee commented May 9, 2022 •

edited

Loading

0xzhang commented May 9, 2022 •

edited

Loading