Build for GPU on CircleCI #8829

luhenry · 2024-02-22T10:22:24Z

This change adds just enough support to build Velox with GPU support on
CircleCI. That will help make sure the build with VELOX_ENABLE_GPU
doesn't regress, even if it doesn't test just yet.

Co-authored-by: Sergei Lewis slewis@rivosinc.com

This will allow us to build for CUDA on CI

netlify · 2024-02-22T10:22:41Z

✅ Deploy Preview for meta-velox canceled.

Name	Link
🔨 Latest commit	`ef7cd95`
🔍 Latest deploy log	https://app.netlify.com/sites/meta-velox/deploys/65d8b167efad3f0008006b7c

luhenry · 2024-02-22T10:22:54Z

It depends on #8828 for support of CUDA on the CircleCI Docker image

velox/experimental/gpu/tests/HashTableTest.cu

This change adds just enough support to build Velox with GPU support on CircleCI. That will help make sure the build with VELOX_ENABLE_GPU doesn't regress, even if it doesn't test just yet.

luhenry · 2024-02-23T14:15:02Z

After trying to use the GPU executors, I'm coming to the realization that they don't seem compatible with running the steps in Docker. @assignUser @kgpai I'm not sure what to do next? I think we still want to go ahead and add the building on a Linux/Docker executor for now, and we can add a downstream test job that uses the gpu executor.

kgpai · 2024-02-23T17:36:33Z

After trying to use the GPU executors, I'm coming to the realization that they don't seem compatible with running the steps in Docker

Sorry can you add more detail - what is the problem you are seeing ?

kgpai · 2024-02-23T21:28:01Z

Hi @luhenry I think this is what should be done:

We have 4-core-ubuntu-gpu-t4 as GPU supported runners in GHA
These are in GHA so changes to Add CUDA to the CircleCI docker image #8828 cant be used (as thats specific to CCI)
You will need to make changes to scripts/centos-8... or scripts/ubuntu dockerfiles
Setup a new GHA job that uses these images and the new runner above along with changes in this PR.

assignUser · 2024-02-23T23:47:44Z

@kgpai @luhenry We are a bit blocked with the GHA migration due to limitations of actions/cache when used to cache a build cache like ccache but I have a fix almost ready.

I haven't used the official cuda capable runners yet but with our self-hosted ones it's important to get the match between host and docker cuda versions correct, so either we run this without docker or use a cuda base image. I'll checkout the gpu runners and see what works.

Do we want to run this on every PR? Or would a scheduled nightly job be enough to spot regressions (+ maybe on PRs with direct changes to the experimental features?)

luhenry · 2024-02-24T17:31:11Z

Sorry can you add more detail - what is the problem you are seeing ?
It is a limitation of the CircleCI GPU runners where you need to specify a machine: making it impossible to specify a docker:.

On using GHA to build and run with GPU, when are you expecting to have all of this ready to go? At the moment, with #8828, we would be able today to build the CUDA code on CircleCI.

I'm happy to do the work to add a specific GHA workflow/job to build and test on GHA. But if that's not going to be merged in the coming weeks/months because it's blocked on something else, I would much rather get what's currently available merged (even if it's only building, given it doesn't even builds today) and improve things later on (even if it means the work I'm doing today goes to trash and I need to do more later).

Do we want to run this on every PR? Or would a scheduled nightly job be enough to spot regressions (+ maybe on PRs with direct changes to the experimental features?)

I think that the building part should be run on every PR as it's only building and there shouldn't be any flakiness/slowness to it. For the testing, I think it entirely depends on the capacity and availability of the runners with GPUs. If there are enough, the tests shouldn't take much longer than a non-GPU test run.

@pedroerp what do you think?

kgpai · 2024-02-24T22:38:46Z

On using GHA to build and run with GPU, when are you expecting to have all of this ready to go? At the moment, with #8828, we would be able today to build the CUDA code on CircleCI.

We expect this to happen in a few weeks at most. While we can have this working in CCI we will still need to do get this working in GHA as we are migrating in the very short term to GHA from CCI - thus changes to CCI are additional and wasted effort.

I think that the building part should be run on every PR as it's only building and there shouldn't be any flakiness/slowness to it. For the testing, I think it entirely depends on the capacity and availability of the runners with GPUs. If there are enough, the tests shouldn't take much longer than a non-GPU test run.

This makes sense, we can have the gpu part build / PR but only exercise the code , say nightly on a gpu.

luhenry · 2024-04-02T12:22:09Z

Closed in favor of #9335

Add CUDA 12.3.1 to the CircleCI docker image

584c599

This will allow us to build for CUDA on CI

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 22, 2024

luhenry force-pushed the circleci-build-gpu branch from d780a9c to 99de48d Compare February 22, 2024 13:38

Yuhta reviewed Feb 22, 2024

View reviewed changes

velox/experimental/gpu/tests/HashTableTest.cu Outdated Show resolved Hide resolved

luhenry added 3 commits February 22, 2024 16:44

Aggresively cleanup to reduce disk space

9089f4a

Cleanup also when setting up the adapters

6cdd820

Remove more tools from the GHA runner to free more disk space

052c580

luhenry mentioned this pull request Feb 23, 2024

Velox on GPU on CI #8842

Open

2 tasks

luhenry added 4 commits February 23, 2024 11:41

Fix unbounded variable in setup-adapters.sh

a418bab

Import Apache Arrow space freeing script

dc48ad6

Build for GPU on CircleCI

fb9bcb0

This change adds just enough support to build Velox with GPU support on CircleCI. That will help make sure the build with VELOX_ENABLE_GPU doesn't regress, even if it doesn't test just yet.

Simplify how we fix "expected a statement" warning

65575c0

luhenry force-pushed the circleci-build-gpu branch from 99de48d to 65575c0 Compare February 23, 2024 13:41

luhenry added 4 commits February 23, 2024 13:50

Build on executor with GPU

70bccc8

Add linux-build-gpu to workflow

296e2c2

Fix build-gpu executor (attempt 1)

415a42a

Fix build-gpu executor (attempt 2)

17c7a30

Revert to building on normal executor for GPU

ef7cd95

luhenry mentioned this pull request Feb 26, 2024

Fix building for GPU with CUDA 12.3.1 #8853

Closed

assignUser closed this Apr 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Build for GPU on CircleCI #8829

Build for GPU on CircleCI #8829

luhenry commented Feb 22, 2024 •

edited

Loading

netlify bot commented Feb 22, 2024 •

edited

Loading

luhenry commented Feb 22, 2024

luhenry commented Feb 23, 2024 •

edited

Loading

kgpai commented Feb 23, 2024

kgpai commented Feb 23, 2024

assignUser commented Feb 23, 2024

luhenry commented Feb 24, 2024 •

edited

Loading

kgpai commented Feb 24, 2024

luhenry commented Apr 2, 2024

Build for GPU on CircleCI #8829

Build for GPU on CircleCI #8829

Conversation

luhenry commented Feb 22, 2024 • edited Loading

netlify bot commented Feb 22, 2024 • edited Loading

✅ Deploy Preview for meta-velox canceled.

luhenry commented Feb 22, 2024

luhenry commented Feb 23, 2024 • edited Loading

kgpai commented Feb 23, 2024

kgpai commented Feb 23, 2024

assignUser commented Feb 23, 2024

luhenry commented Feb 24, 2024 • edited Loading

kgpai commented Feb 24, 2024

luhenry commented Apr 2, 2024

luhenry commented Feb 22, 2024 •

edited

Loading

netlify bot commented Feb 22, 2024 •

edited

Loading

luhenry commented Feb 23, 2024 •

edited

Loading

luhenry commented Feb 24, 2024 •

edited

Loading