-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Build for GPU on CircleCI #8829
Conversation
This will allow us to build for CUDA on CI
✅ Deploy Preview for meta-velox canceled.
|
It depends on #8828 for support of CUDA on the CircleCI Docker image |
d780a9c
to
99de48d
Compare
This change adds just enough support to build Velox with GPU support on CircleCI. That will help make sure the build with VELOX_ENABLE_GPU doesn't regress, even if it doesn't test just yet.
99de48d
to
65575c0
Compare
After trying to use the GPU executors, I'm coming to the realization that they don't seem compatible with running the steps in Docker. @assignUser @kgpai I'm not sure what to do next? I think we still want to go ahead and add the building on a Linux/Docker executor for now, and we can add a downstream test job that uses the gpu executor. |
Sorry can you add more detail - what is the problem you are seeing ? |
Hi @luhenry I think this is what should be done:
|
@kgpai @luhenry We are a bit blocked with the GHA migration due to limitations of I haven't used the official cuda capable runners yet but with our self-hosted ones it's important to get the match between host and docker cuda versions correct, so either we run this without docker or use a cuda base image. I'll checkout the gpu runners and see what works. Do we want to run this on every PR? Or would a scheduled nightly job be enough to spot regressions (+ maybe on PRs with direct changes to the experimental features?) |
On using GHA to build and run with GPU, when are you expecting to have all of this ready to go? At the moment, with #8828, we would be able today to build the CUDA code on CircleCI. I'm happy to do the work to add a specific GHA workflow/job to build and test on GHA. But if that's not going to be merged in the coming weeks/months because it's blocked on something else, I would much rather get what's currently available merged (even if it's only building, given it doesn't even builds today) and improve things later on (even if it means the work I'm doing today goes to trash and I need to do more later).
I think that the building part should be run on every PR as it's only building and there shouldn't be any flakiness/slowness to it. For the testing, I think it entirely depends on the capacity and availability of the runners with GPUs. If there are enough, the tests shouldn't take much longer than a non-GPU test run. @pedroerp what do you think? |
We expect this to happen in a few weeks at most. While we can have this working in CCI we will still need to do get this working in GHA as we are migrating in the very short term to GHA from CCI - thus changes to CCI are additional and wasted effort.
This makes sense, we can have the gpu part build / PR but only exercise the code , say nightly on a gpu. |
Closed in favor of #9335 |
This change adds just enough support to build Velox with GPU support on
CircleCI. That will help make sure the build with VELOX_ENABLE_GPU
doesn't regress, even if it doesn't test just yet.
Co-authored-by: Sergei Lewis slewis@rivosinc.com