feat: add warmup for CudaStream #422

alxiong · 2024-03-06T12:48:49Z

Describe the changes

Add a non-blocking warmup function to CudaStream

when you run the benchmark (e.g. the msm example you have) the first instance is always slow, with a constant overhead of 200~300ms cuda stream warmup. and I want to get rid of that in my application by warming it up in parallel while my host do something else.

full credit to: @DmytroTym for the work.

jeremyfelder · 2024-03-06T12:57:23Z

@alxiong Thanks for the contribution! 💪🏻 🚀

Can you run our formatter script (https://github.com/ingonyama-zk/icicle?tab=readme-ov-file#development-contributions) and we will review asap

alxiong · 2024-03-06T13:00:53Z

yup, fixing it now, will push soon. Thanks!

DmytroTym · 2024-03-06T13:01:50Z

@alxiong thx for the PR!

Could you pls elaborate a bit on the use-case? You need to free the memory before the object is dropped and that's why the default free-on-drop doesn't work for you?

alxiong · 2024-03-06T13:07:15Z

the use case is, I want to "warm up" the stream by allocating and deallocating some bytes. and i want this to be non-blocking.

when you run the benchmark (e.g. the msm example you have) the first instance is always slow, with a constant overhead of 200~300ms cuda stream warmup. and I want to get rid of that in my application by warming it up in parallel while my host do something else.

DmytroTym · 2024-03-06T13:31:54Z

Right. I think any call to the new function must be followed by std::mem::forget as otherwise double free will happen on drop. And forget cannot be called from inside the function as it requires ownership. So the function looks pretty unsafe to use tbh...
Is it possible that we add a special function for the use-case you've described? We can call it warmup or prealloc, it will just allocate and free memory asynchronously given a DeviceContext object. Would that be ok? If so, I can make a quick PR against your branch

alxiong · 2024-03-06T14:05:25Z

any call to the new function must be followed by std::mem::forget as otherwise double free will happen on drop.

good point, you convinced me. I didn't plan to use it in isolation anyway. I was planning to wrap it in my own fn warmup_new_stream(), but it's better if we have this from icicle upstream directly.

We can call it warmup or prealloc, it will just allocate and free memory asynchronously given a DeviceContext object. Would that be ok? If so, I can make a quick PR against your branch

that sounds perfect, feel free to just edit on this PR if more convenient.

btw, I want to ask, I should always carry around a DeviceContext rather than a single CudaStream, right?
cuz if I were to mimic the example/msm/main.rs code it's only creating a stream, but never deal with a device context.

DmytroTym · 2024-03-06T16:22:09Z

alxiong#1
About DeviceContext - there's just no straightforward way to determine device id given a stream which is why we created this struct. But for the warmup method, I just put a stream there for simplicity.

Warmup function added

jeremyfelder

Awesome! Looks great 🚀

DmytroTym

@alxiong if the PR looks good to you, I think we can merge it in.

alxiong · 2024-03-07T16:05:24Z

sure, feel free to merge whenever!

feat: add rust api for cudaFreeAsync

b108c71

ImmanuelSegol requested review from DmytroTym and jeremyfelder March 6, 2024 13:00

fix: cargo fmt

b22aa02

Warmup function

7185657

DmytroTym and others added 2 commits March 6, 2024 22:08

Merge branch 'main' into feat/warmup

4a65758

Merge pull request #1 from ingonyama-zk/feat/warmup

d8059a2

Warmup function added

alxiong changed the title ~~feat: add rust api for cudaFreeAsync~~ feat: add warmup for CudaStream Mar 7, 2024

jeremyfelder approved these changes Mar 7, 2024

View reviewed changes

DmytroTym approved these changes Mar 7, 2024

View reviewed changes

DmytroTym merged commit 0e84fb4 into ingonyama-zk:main Mar 7, 2024
21 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add warmup for CudaStream #422

feat: add warmup for CudaStream #422

alxiong commented Mar 6, 2024 •

edited

Loading

jeremyfelder commented Mar 6, 2024

alxiong commented Mar 6, 2024

DmytroTym commented Mar 6, 2024

alxiong commented Mar 6, 2024 •

edited

Loading

DmytroTym commented Mar 6, 2024 •

edited

Loading

alxiong commented Mar 6, 2024 •

edited

Loading

DmytroTym commented Mar 6, 2024

jeremyfelder left a comment

DmytroTym left a comment

alxiong commented Mar 7, 2024

feat: add warmup for CudaStream #422

feat: add warmup for CudaStream #422

Conversation

alxiong commented Mar 6, 2024 • edited Loading

Describe the changes

jeremyfelder commented Mar 6, 2024

alxiong commented Mar 6, 2024

DmytroTym commented Mar 6, 2024

alxiong commented Mar 6, 2024 • edited Loading

DmytroTym commented Mar 6, 2024 • edited Loading

alxiong commented Mar 6, 2024 • edited Loading

DmytroTym commented Mar 6, 2024

jeremyfelder left a comment

Choose a reason for hiding this comment

DmytroTym left a comment

Choose a reason for hiding this comment

alxiong commented Mar 7, 2024

alxiong commented Mar 6, 2024 •

edited

Loading

alxiong commented Mar 6, 2024 •

edited

Loading

DmytroTym commented Mar 6, 2024 •

edited

Loading

alxiong commented Mar 6, 2024 •

edited

Loading