-
-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GPU backends #92
Comments
Reopening this to track potential GPU backends. OverviewFor now we want to limit ourselves to backends supported by LLVM. Another approach would be having a source code generator and use the corresponding runtime compiler, for example for OpenCL or Apple Metal. AMD GPUsWe can use the LLVM AMDGPU backend which is considered stable and included in all recent LLVM builds by default. https://www.llvm.org/docs/AMDGPUUsage.html Relevant inline assembly:
See RDNA 3 ISA doc: https://www.amd.com/system/files/TechDocs/rdna3-shader-instruction-set-architecture-feb-2023_0.pdf Apple MetalThere is no official LLVM IR to Apple Metal backend but Apple uses a fork of LLVM.
(see https://developer.apple.com/forums/thread/707695) Metal doesn't seem to allow assembly for add-with-carry and extended precision multiplication. Nvidia CudaBackend configured and added in #210 OpenCLGenerating OpenCL code through LLVM requires going through SPIR-V and loading the resulting kernel through SPIR-V is an experimental backend starting from LLVM 15 and likely needs to be configured through Alternatively there is https://github.com/KhronosGroup/SPIRV-LLVM-Translator but it would require compiling Nim in C++ mode. Intel GPUs inline assembly:
See also:
ARM GPUs inline assembly: Other backendsBackends superceded by vendor-specific backends, in particular due to not being available in https://github.com/llvm/llvm-project/tree/main/llvm/lib/Target or not allowing inline assembly for add-with-carry and extended precision multiplication:
|
Some more investigation on the AMD backend: https://www.amd.com/content/dam/amd/en/documents/radeon-tech-docs/instruction-set-architectures/rdna3-shader-instruction-set-architecture-feb-2023_0.pdf You have scalar and vector execution units, 8x more vector units in this example. What was mentioned S_ADDC_U32 was actually for scalar code but we in-fact need vector code which does exist: However, it doesn't seem like AMD provides an auto-vectorizer like when we use Nvidia PTX virtual ISA, so we'll have to vectorize the code ourselves. I.e. implement fp_add_x32, fp_mul_x32, ... |
Looking at some of the AMD codegen, it might be that we can stay within LLVM IR as there were some LLVM improvements related to add with carry:
Though we'll likely have the same issue with sub-with-borrow (what's the canonical IR?) |
For OpenCL / SPIR-V on Intel GPUs, 2 extensions are of particular interest:
as the inline assembly would allow guaranteeing addition with carry from Intel virtual ISA: https://github.com/intel/intel-graphics-compiler/blob/master/documentation/visa/6_instructions.md |
Closed by #465 |
Zero Knowledge Proofs work by handling constraints circuits with millions of gates corresponding to field operations.
Those can be executed in parallel and the full constant-time design with no branch of Constantine actually helps to avoid divergence at the GPU warp level.
Resources:
The text was updated successfully, but these errors were encountered: