GPU backends #92

mratsim · 2020-09-27T07:42:53Z

Zero Knowledge Proofs work by handling constraints circuits with millions of gates corresponding to field operations.

Those can be executed in parallel and the full constant-time design with no branch of Constantine actually helps to avoid divergence at the GPU warp level.

Resources:

mratsim · 2023-04-08T13:53:37Z

Reopening this to track potential GPU backends.

Overview

For now we want to limit ourselves to backends supported by LLVM.

Another approach would be having a source code generator and use the corresponding runtime compiler, for example for OpenCL or Apple Metal.

AMD GPUs

We can use the LLVM AMDGPU backend which is considered stable and included in all recent LLVM builds by default. https://www.llvm.org/docs/AMDGPUUsage.html

Relevant inline assembly:

S_ADDC_U32: add-with-carry
S_SUBB_U32: sub-with-borrow
S_MUL_HI_U32: extended precision multiplication high limb
S_CSELECT_B32(dst, a, b): conditional select a if SCC flag set, b otherwise
S_CMOV_B32(dst, a): conditional move a into dst if SCC flag set.

See RDNA 3 ISA doc: https://www.amd.com/system/files/TechDocs/rdna3-shader-instruction-set-architecture-feb-2023_0.pdf

Apple Metal

There is no official LLVM IR to Apple Metal backend but Apple uses a fork of LLVM.
By linking to it it might be possible to generate Metal shaders using a target triple of the form:

air64-apple-macos13.0
air64-apple-ios16.0-macabi

(see https://developer.apple.com/forums/thread/707695)

Metal doesn't seem to allow assembly for add-with-carry and extended precision multiplication.

Nvidia Cuda

Backend configured and added in #210

OpenCL

Generating OpenCL code through LLVM requires going through SPIR-V and loading the resulting kernel through clCreateProgramWithIL

SPIR-V is an experimental backend starting from LLVM 15 and likely needs to be configured through LLVM_EXPERIMENTAL_TARGETS_TO_BUILD (see https://stackoverflow.com/questions/46905464/how-to-enable-a-llvm-backend, https://reviews.llvm.org/D115009 )

Alternatively there is https://github.com/KhronosGroup/SPIRV-LLVM-Translator but it would require compiling Nim in C++ mode.

Intel GPUs inline assembly:

ADDC: add with carry
SUBB: sub with borrow
MULH: extended precision multiplication high limb
SEL(dst, a, b): conditional select a if predicate is set else b
MAD(dst, a, b, c): multiply-add dst = a*b+c or multiply-accumulate dst += a*b+c
MADW(dst, a, b, c): extended precision MAD, stores the full 64-bit result

Other backends

Backends superceded by vendor-specific backends, in particular due to not being available in https://github.com/llvm/llvm-project/tree/main/llvm/lib/Target or not allowing inline assembly for add-with-carry and extended precision multiplication:

OpenGL ES is the largest supported GPU backend (on all phones and desktops) but it doesn't seem easy (or documented) to generate performant computed code including inline assembly for extended-precision arithmetic.
DirectX
Vulkan, Vulkan requires the same steps as OpenCL (SPIR-V). But apparently Vulkan doesn't allow pointers? (comment from 2017: https://community.khronos.org/t/compiling-opencl-kernel-to-spir-v-then-use-in-vulkan/7213/2)
WebGPU

mratsim · 2024-08-02T09:36:06Z

Some more investigation on the AMD backend: https://www.amd.com/content/dam/amd/en/documents/radeon-tech-docs/instruction-set-architectures/rdna3-shader-instruction-set-architecture-feb-2023_0.pdf

You have scalar and vector execution units, 8x more vector units in this example.

up to 2 scalar unit per wave (32 units). And vector add with carry does exist

What was mentioned S_ADDC_U32 was actually for scalar code

but we in-fact need vector code which does exist:

However, it doesn't seem like AMD provides an auto-vectorizer like when we use Nvidia PTX virtual ISA, so we'll have to vectorize the code ourselves. I.e. implement fp_add_x32, fp_mul_x32, ...

mratsim · 2024-08-02T10:22:09Z

Looking at some of the AMD codegen, it might be that we can stay within LLVM IR as there were some LLVM improvements related to add with carry:

inefficient codegen: OpenCL wide add ROCm/ROCm#488
[InstCombine] missed reducing/canonicalizing add overflow patterns llvm/llvm-project#59232

https://reviews.llvm.org/D138814 including @chfast comment

This is of particular interest: https://reviews.llvm.org/D138814#3973599

though I'm unsure what's canonical for substraction
See https://godbolt.org/z/zfnorbvzr
compile with llc -march=amdgcn -mcpu=gfx900

define {i32, i1} @foo32(i32 %a, i32 %b) {
  %add = add i32 %a, %b
  %cmp = icmp ult i32 %add, %a
  %insert0 = insertvalue { i32, i1 } poison, i32 %add, 0
  %insert1 = insertvalue { i32, i1 } %insert0, i1 %cmp, 1
  ret {i32, i1} %insert1
}

declare { i32, i1 } @llvm.uadd.with.overflow.i32(i32, i32)

define {i32, i1} @foo32_uaddo(i32 %a, i32 %b) {
  %add = add i32 %a, %b
  %uaddo = call { i32, i1 } @llvm.uadd.with.overflow.i32(i32 %a, i32 %b)
  ret { i32, i1 } %uaddo
}

Though we'll likely have the same issue with sub-with-borrow (what's the canonical IR?)

mratsim · 2024-08-05T05:52:47Z

For OpenCL / SPIR-V on Intel GPUs, 2 extensions are of particular interest:

SPV_INTEL_arbitrary_precision_integers
SPV_INTEL_inline_assembly

as the inline assembly would allow guaranteeing addition with carry from Intel virtual ISA: https://github.com/intel/intel-graphics-compiler/blob/master/documentation/visa/6_instructions.md

mratsim · 2024-08-27T15:25:05Z

Closed by #465

mratsim added enhancement New feature or request Zero Knowledge 🤫 labels Sep 27, 2020

mratsim mentioned this issue Jan 11, 2023

[Backend] Add support for Nvidia GPUs #210

Merged

mratsim closed this as completed in #210 Jan 12, 2023

mratsim changed the title ~~GPU backend for Zero Knowledge Proofs~~ GPU backends Apr 8, 2023

mratsim reopened this Apr 8, 2023

mratsim mentioned this issue Jun 14, 2023

GPU prover taikoxyz/zkevm-circuits#16

Open

mratsim mentioned this issue Aug 4, 2024

AMDGPU JIT compiler #453

Merged

mratsim mentioned this issue Aug 14, 2024

LLVM: field addition with saturated fields #456

Merged

mratsim mentioned this issue Aug 27, 2024

[GPU] GPU / LLVM IR Elliptic curves implementation plan #465

Open

mratsim closed this as completed Aug 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU backends #92

GPU backends #92

mratsim commented Sep 27, 2020

mratsim commented Apr 8, 2023

mratsim commented Aug 2, 2024 •

edited

Loading

mratsim commented Aug 2, 2024 •

edited

Loading

mratsim commented Aug 5, 2024

mratsim commented Aug 27, 2024

GPU backends #92

GPU backends #92

Comments

mratsim commented Sep 27, 2020

mratsim commented Apr 8, 2023

Overview

AMD GPUs

Apple Metal

Nvidia Cuda

OpenCL

Other backends

mratsim commented Aug 2, 2024 • edited Loading

mratsim commented Aug 2, 2024 • edited Loading

mratsim commented Aug 5, 2024

mratsim commented Aug 27, 2024

mratsim commented Aug 2, 2024 •

edited

Loading

mratsim commented Aug 2, 2024 •

edited

Loading