multi gpu docs #391

ImmanuelSegol · 2024-02-22T05:14:55Z

Describe the changes

This PR...

Linked Issues

Resolves #

docs/docs/icicle/multi-gpu.md

Co-authored-by: Jeremy Felder <jeremy.felder1@gmail.com>

vhnatyk · 2024-02-22T14:53:47Z

docs/docs/icicle/multi-gpu.md

+
+This approach wont let us tackle larger computation sizes but it will allow us to compute multiple computations which we wouldn't be able to load onto a single GPU.
+
+For example lets say that you had to compute two MSMs of size 2^20 on a 16GB VRAM GPU you would normally have to perform them asynchronously. However, if you double the number of GPUs in your system you can now run them in parallel. 


2^20 will fit into 16GB RAM even quite more of these will, right? even with precomputation - for example for BLS12-381 - 2^20 * (48+32) = 80MB

Thanks! That's a typo will fix

DmytroTym · 2024-02-22T14:52:25Z

docs/docs/icicle/multi-gpu.md

+
+One common challenge with Zero-Knowledge computation is managing the large input sizes. It's not uncommon to encounter circuits surpassing 2^25 constraints, pushing the capabilities of even advanced GPUs to their limits. To effectively scale and process such large circuits, leveraging multiple GPUs in tandem becomes a necessity.
+
+Multi-GPU programming involves developing software to operate across multiple GPU devices. Lets first explore different approaches to Multi-GPU programming  then we will cover how ICICLE allows you to easily develop youR ZK computations to run across many GPUs.


"programming then" - double space

DmytroTym · 2024-02-22T15:07:32Z

docs/docs/icicle/multi-gpu.md

+
+This approach wont let us tackle larger computation sizes but it will allow us to compute multiple computations which we wouldn't be able to load onto a single GPU.
+
+For example lets say that you had to compute two MSMs of size 2^20 on a 16GB VRAM GPU you would normally have to perform them asynchronously. However, if you double the number of GPUs in your system you can now run them in parallel. 


I think size 2^20 MSM on bls12 curves should require less than 500 Mb. For bls12, 2^26 is probably the size when 1 MSM fits into 16 GB but 2 do not

DmytroTym · 2024-02-22T15:09:02Z

docs/docs/icicle/multi-gpu.md

+
+The approach we have taken for the moment is a GPU Server approach; we assume you have a machine with multiple GPUs and you wish to run some computation on each GPU.
+
+To dive deeper and learn about the API checkout the docs for our different ICICLE API


checkout -> check out

DmytroTym · 2024-02-22T15:11:10Z

docs/docs/icicle/rust-bindings/multi-gpu.md

+
+## Device context API
+
+The `DeviceContext` is embedded into `NTTConfig`, `MSMConfig` and `PoseidonConfig`, meaning you can simple pass a `device_id` to your existing config an the same computation will be triggered on a different device automatically.


simple -> simply

vhnatyk

hi @ImmanuelSegol - I see pr was merged already 😊, looks great 👍🏻 and there are small notes to consider

vhnatyk · 2024-02-22T15:03:37Z

docs/docs/icicle/multi-gpu.md

+
+- Never hardcode device IDs, if you want your software to take advantage of all GPUs on a machine use methods such as `get_device_count` to support arbitrary number of GPUs.
+
+- Launch one thread per GPU, to avoid nasty errors and hard to read code we suggest that for every GPU task you wish to launch you create a dedicated thread. This will make your code way more manageable, easy to read and performant.


umm - one CPU thread per GPU - actually you can do more tasks on that thread as long as they target the same GPU. Also the section imo needs a link to https://developer.nvidia.com/blog/cuda-pro-tip-always-set-current-device-avoid-multithreading-bugs/

vhnatyk · 2024-02-22T15:05:16Z

docs/docs/icicle/rust-bindings/multi-gpu.md

+
+## Device context API
+
+The `DeviceContext` is embedded into `NTTConfig`, `MSMConfig` and `PoseidonConfig`, meaning you can simple pass a `device_id` to your existing config an the same computation will be triggered on a different device automatically.


and typo?

actually current implementation doesn't have the "automatic" - we just check device_id from config matches the current device id for the thread, so it won't be executed on wrong device

vhnatyk · 2024-02-22T15:10:18Z

docs/docs/icicle/rust-bindings/multi-gpu.md

+
+#### [`DeviceContext`](https://github.com/vhnatyk/icicle/blob/eef6876b037a6b0797464e7cdcf9c1ecfcf41808/wrappers/rust/icicle-cuda-runtime/src/device_context.rs#L11)
+
+Represents the configuration a CUDA device, encapsulating the device's stream, ID, and memory pool. The default device is always `0`, unless configured otherwise.


, unless configured otherwise probably should be removed - I doubt it's possible

vhnatyk · 2024-02-22T15:17:55Z

docs/docs/icicle/rust-bindings/multi-gpu.md

+
+- **`device_id: usize`**
+
+  The index of the GPU currently in use. The default value is `0`, indicating the first GPU in the system.


umm, assuming invocation command was prepended with CUDA_VISIBLE_DEVICES=2,3,7 in the system with 8 GPUs - the device_id 0 will correspond to GPU with id 2, so technically a third GPU in the system

DmytroTym · 2024-02-22T15:52:22Z

@vhnatyk @ImmanuelSegol maybe we can fix the issues in #389

* Update README.md (#385) * refactor * refactor * refactor * rename task * update codespell * multi gpu docs (#391) * Refactor * refacotr * fix typo * Apply suggestions from code review Co-authored-by: Jeremy Felder <jeremy.felder1@gmail.com> * refactor * refactor --------- Co-authored-by: DmytroTym <dmytrotym1@gmail.com> Co-authored-by: ChickenLover <Romangg81@gmail.com> Co-authored-by: Jeremy Felder <jeremy.felder1@gmail.com>

migrate docs website + improved docs (#389) * Update README.md (#385) * refactor * refactor * refactor * rename task * update codespell * multi gpu docs (#391) * Refactor * refacotr * fix typo * Apply suggestions from code review * refactor * refactor --------- Co-authored-by: ImmanuelSegol <3ditds@gmail.com> Co-authored-by: DmytroTym <dmytrotym1@gmail.com> Co-authored-by: ChickenLover <Romangg81@gmail.com>

refactor

fced2c2

ImmanuelSegol assigned jeremyfelder and unassigned jeremyfelder Feb 22, 2024

ImmanuelSegol requested a review from jeremyfelder February 22, 2024 05:15

jeremyfelder reviewed Feb 22, 2024

View reviewed changes

docs/docs/icicle/multi-gpu.md Outdated Show resolved Hide resolved

Update multi-gpu.md

c64049f

Co-authored-by: Jeremy Felder <jeremy.felder1@gmail.com>

jeremyfelder approved these changes Feb 22, 2024

View reviewed changes

ImmanuelSegol merged commit 4867b0d into migrate-docs Feb 22, 2024

ImmanuelSegol deleted the multi-gpu branch February 22, 2024 14:53

vhnatyk reviewed Feb 22, 2024

View reviewed changes

DmytroTym reviewed Feb 22, 2024

View reviewed changes

vhnatyk requested changes Feb 22, 2024

View reviewed changes

DmytroTym mentioned this pull request Feb 22, 2024

migrate docs website + improved docs #389

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

multi gpu docs #391

multi gpu docs #391

ImmanuelSegol commented Feb 22, 2024

vhnatyk Feb 22, 2024

ImmanuelSegol Feb 22, 2024

DmytroTym Feb 22, 2024

DmytroTym Feb 22, 2024

DmytroTym Feb 22, 2024

DmytroTym Feb 22, 2024

DmytroTym Feb 22, 2024

vhnatyk left a comment

vhnatyk Feb 22, 2024

vhnatyk Feb 22, 2024

vhnatyk Feb 22, 2024

vhnatyk Feb 22, 2024

vhnatyk Feb 22, 2024

DmytroTym commented Feb 22, 2024


		This approach wont let us tackle larger computation sizes but it will allow us to compute multiple computations which we wouldn't be able to load onto a single GPU.

		For example lets say that you had to compute two MSMs of size 2^20 on a 16GB VRAM GPU you would normally have to perform them asynchronously. However, if you double the number of GPUs in your system you can now run them in parallel.


		One common challenge with Zero-Knowledge computation is managing the large input sizes. It's not uncommon to encounter circuits surpassing 2^25 constraints, pushing the capabilities of even advanced GPUs to their limits. To effectively scale and process such large circuits, leveraging multiple GPUs in tandem becomes a necessity.

		Multi-GPU programming involves developing software to operate across multiple GPU devices. Lets first explore different approaches to Multi-GPU programming then we will cover how ICICLE allows you to easily develop youR ZK computations to run across many GPUs.


		The approach we have taken for the moment is a GPU Server approach; we assume you have a machine with multiple GPUs and you wish to run some computation on each GPU.

		To dive deeper and learn about the API checkout the docs for our different ICICLE API


		## Device context API

		The `DeviceContext` is embedded into `NTTConfig`, `MSMConfig` and `PoseidonConfig`, meaning you can simple pass a `device_id` to your existing config an the same computation will be triggered on a different device automatically.


		- Never hardcode device IDs, if you want your software to take advantage of all GPUs on a machine use methods such as `get_device_count` to support arbitrary number of GPUs.

		- Launch one thread per GPU, to avoid nasty errors and hard to read code we suggest that for every GPU task you wish to launch you create a dedicated thread. This will make your code way more manageable, easy to read and performant.


		#### [`DeviceContext`](https://github.com/vhnatyk/icicle/blob/eef6876b037a6b0797464e7cdcf9c1ecfcf41808/wrappers/rust/icicle-cuda-runtime/src/device_context.rs#L11)

		Represents the configuration a CUDA device, encapsulating the device's stream, ID, and memory pool. The default device is always `0`, unless configured otherwise.


		- `device_id: usize`

		The index of the GPU currently in use. The default value is `0`, indicating the first GPU in the system.

multi gpu docs #391

multi gpu docs #391

Conversation

ImmanuelSegol commented Feb 22, 2024

Describe the changes

Linked Issues

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vhnatyk left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DmytroTym commented Feb 22, 2024