Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add warmup for CudaStream #422

Merged
merged 5 commits into from
Mar 7, 2024
Merged

Conversation

alxiong
Copy link
Contributor

@alxiong alxiong commented Mar 6, 2024

Describe the changes

Add a non-blocking warmup function to CudaStream

when you run the benchmark (e.g. the msm example you have) the first instance is always slow, with a constant overhead of 200~300ms cuda stream warmup. and I want to get rid of that in my application by warming it up in parallel while my host do something else.

full credit to: @DmytroTym for the work.

@jeremyfelder
Copy link
Collaborator

@alxiong Thanks for the contribution! 💪🏻 🚀

Can you run our formatter script (https://github.com/ingonyama-zk/icicle?tab=readme-ov-file#development-contributions) and we will review asap

@alxiong
Copy link
Contributor Author

alxiong commented Mar 6, 2024

yup, fixing it now, will push soon. Thanks!

@DmytroTym
Copy link
Contributor

@alxiong thx for the PR!

Could you pls elaborate a bit on the use-case? You need to free the memory before the object is dropped and that's why the default free-on-drop doesn't work for you?

@alxiong
Copy link
Contributor Author

alxiong commented Mar 6, 2024

the use case is, I want to "warm up" the stream by allocating and deallocating some bytes. and i want this to be non-blocking.

when you run the benchmark (e.g. the msm example you have) the first instance is always slow, with a constant overhead of 200~300ms cuda stream warmup. and I want to get rid of that in my application by warming it up in parallel while my host do something else.

@DmytroTym
Copy link
Contributor

DmytroTym commented Mar 6, 2024

Right. I think any call to the new function must be followed by std::mem::forget as otherwise double free will happen on drop. And forget cannot be called from inside the function as it requires ownership. So the function looks pretty unsafe to use tbh...
Is it possible that we add a special function for the use-case you've described? We can call it warmup or prealloc, it will just allocate and free memory asynchronously given a DeviceContext object. Would that be ok? If so, I can make a quick PR against your branch

@alxiong
Copy link
Contributor Author

alxiong commented Mar 6, 2024

any call to the new function must be followed by std::mem::forget as otherwise double free will happen on drop.

good point, you convinced me. I didn't plan to use it in isolation anyway. I was planning to wrap it in my own fn warmup_new_stream(), but it's better if we have this from icicle upstream directly.

We can call it warmup or prealloc, it will just allocate and free memory asynchronously given a DeviceContext object. Would that be ok? If so, I can make a quick PR against your branch

that sounds perfect, feel free to just edit on this PR if more convenient.

btw, I want to ask, I should always carry around a DeviceContext rather than a single CudaStream, right?
cuz if I were to mimic the example/msm/main.rs code it's only creating a stream, but never deal with a device context.

@DmytroTym
Copy link
Contributor

alxiong#1
About DeviceContext - there's just no straightforward way to determine device id given a stream which is why we created this struct. But for the warmup method, I just put a stream there for simplicity.

@alxiong alxiong changed the title feat: add rust api for cudaFreeAsync feat: add warmup for CudaStream Mar 7, 2024
Copy link
Collaborator

@jeremyfelder jeremyfelder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! Looks great 🚀

Copy link
Contributor

@DmytroTym DmytroTym left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alxiong if the PR looks good to you, I think we can merge it in.

@alxiong
Copy link
Contributor Author

alxiong commented Mar 7, 2024

sure, feel free to merge whenever!

@DmytroTym DmytroTym merged commit 0e84fb4 into ingonyama-zk:main Mar 7, 2024
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants