Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update latest CUDA version for build/test to 12.5 #73

Closed
13 tasks done
bdice opened this issue Jun 12, 2024 · 15 comments
Closed
13 tasks done

Update latest CUDA version for build/test to 12.5 #73

bdice opened this issue Jun 12, 2024 · 15 comments
Assignees
Labels

Comments

@bdice
Copy link
Contributor

bdice commented Jun 12, 2024

With the recent CCCL update (rapidsai/rapids-cmake#607), we should now be able to build RAPIDS with CUDA versions 12.5 and older.

We have CUDA driver R550 in CI now, which only supports up to CUDA 12.4, so that's the latest version we could adequately test. CUDA 12.5 needs driver R555, which does not yet have a production branch (PB) or long-term support (LTS) release.

edit: R550 is a Production Branch driver, and therefore supports CUDA Forward Compatibility with CUDA 12.5 containers. This means we are able to support CUDA 12.5 (the latest version at the time of writing).

I propose to update CI images, shared workflows, devcontainers, etc. to replace CUDA 12.2 with CUDA 12.5. We would retain CI testing for CUDA 12.0 as a lower bound of 12.x. This will also align with PyTorch's upcoming CUDA 12.4 support (there have been a series of PRs adding CUDA 12.4 support like pytorch/builder#1720). edit: We will upgrade to the latest CUDA, 12.5, instead of 12.4. I will separately address the issues of CUDA compatibility questions between RAPIDS and PyTorch by working on our docs and release selector (see also: https://github.com/rapidsai/build-infra/issues/55).

Tasks

We can start this work now (not blocked by 12.5.1 updates above):

  • Open PR to shared-workflows Update CUDA version to 12.5.1 shared-workflows#229
  • Open PRs for every RAPIDS repository to update 12.2.2 to 12.5.x (xref'd below)
    • See past CUDA 12.2 migration PRs for reference.
    • Most of the items below can be automated:
      • Add cuda-version matrix entry for 12.5
      • Update .github/workflows/ to use shared-workflows branch
      • Update any matrix_filter entries using 12.2 to 12.5
      • Update README/CONTRIBUTING docs to use 12.5, especially for installation
      • Update devcontainers to 12.5
    • rmm
    • cudf
    • ...
  • Publish new images from rapidsai/docker (add CUDA 12.5 images docker#689)

Once all repos are migrated, merge the shared-workflows PR and then revert to the current default shared-workflows branch.

Docs changes (wait until all repos are migrated):

@bdice bdice changed the title Update CUDA version for build/test to 12.4 Update latest CUDA version for build/test to 12.4 Jun 12, 2024
@mmccarty
Copy link

@ajschmidt8 - Please take a look at the possibility of updating this in July.

@bdice
Copy link
Contributor Author

bdice commented Jun 13, 2024

I’ve spent a bit of time investigating and discussing this topic with others (including @jrhemstad). I’m coming to the conclusion that we may not need any driver updates, because the Production Branch driver we currently use (R550) is supported for CUDA Forward Compatibility with 12.5+ according to this table: https://docs.nvidia.com/deploy/cuda-compatibility/#id3

My local tests on a machine with R535, the LTS driver, also indicate compatibility should be fine. The key here is for us to remain on only Production Branch or LTS Branch drivers!

I propose a change to this plan: we should try to use CUDA 12.5 to build and test for a couple repos (rmm and cudf) and if it works we can update to 12.5 instead of 12.4. I will file PRs to the miniforge-cuda, ci-imgs, and shared-workflows repositories to enable this test.

@bdice bdice changed the title Update latest CUDA version for build/test to 12.4 Update latest CUDA version for build/test to 12.5 Jun 13, 2024
@jakirkham
Copy link
Member

It is worth noting that CUDA 12.5.1 packages (in various formats) are now out

Also pynvjitlink has been rebuilt with CUDA 12.5.1: rapidsai/pynvjitlink#95

@jakirkham
Copy link
Member

jakirkham commented Jul 9, 2024

Looking at this one...

It appears the builds are already pulling in the latest distro packages for CUDA 12.5.1

For example this job from yesterday, shows the following

#13 4.389  cuda-compat-12-5                x86_64 1:555.42.06-1           cuda       38 M
#13 4.389  cuda-cudart-12-5                x86_64 12.5.82-1               cuda      226 k
#13 4.389  cuda-toolkit-12-5-config-common noarch 12.5.82-1               cuda      7.7 k
#13 4.389  cuda-toolkit-12-config-common   noarch 12.5.82-1               cuda      7.9 k
#13 4.389  cuda-toolkit-config-common      noarch 12.5.82-1               cuda      7.9 k

Note that these match the new versions in CUDA 12.5U1

So looks like this is done already

Though it would be nice to update this miniforge-cuda line to 12.5.1, it doesn't seem to be a blocker


Edit: Also it looks like the ci-imgs were rebuilt more recently. So are already using these images that have 12.5.1 packages

Given this, will go ahead and checking these boxes

@jakirkham
Copy link
Member

cc @KyleFromNVIDIA (as we discussed this offline)

rapids-bot bot pushed a commit to rapidsai/ucx-py that referenced this issue Jul 19, 2024
After updating everything to CUDA 12.5.1, use `shared-workflows@branch-24.08` again.

Contributes to rapidsai/build-planning#73

Authors:
  - Kyle Edwards (https://github.com/KyleFromNVIDIA)

Approvers:
  - James Lamb (https://github.com/jameslamb)

URL: #1056
rapids-bot bot pushed a commit to rapidsai/rapids-cmake that referenced this issue Jul 19, 2024
After updating everything to CUDA 12.5.1, use `shared-workflows@branch-24.08` again.

Contributes to rapidsai/build-planning#73

Authors:
  - Kyle Edwards (https://github.com/KyleFromNVIDIA)

Approvers:
  - James Lamb (https://github.com/jameslamb)

URL: #647
rapids-bot bot pushed a commit to rapidsai/cuxfilter that referenced this issue Jul 19, 2024
After updating everything to CUDA 12.5.1, use `shared-workflows@branch-24.08` again.

Contributes to rapidsai/build-planning#73

Authors:
  - Kyle Edwards (https://github.com/KyleFromNVIDIA)

Approvers:
  - James Lamb (https://github.com/jameslamb)

URL: #608
rapids-bot bot pushed a commit to rapidsai/rmm that referenced this issue Jul 19, 2024
After updating everything to CUDA 12.5.1, use `shared-workflows@branch-24.08` again.

Contributes to rapidsai/build-planning#73

Authors:
  - Kyle Edwards (https://github.com/KyleFromNVIDIA)

Approvers:
  - James Lamb (https://github.com/jameslamb)

URL: #1617
rapids-bot bot pushed a commit to rapidsai/wholegraph that referenced this issue Jul 19, 2024
After updating everything to CUDA 12.5.1, use `shared-workflows@branch-24.08` again.

Contributes to rapidsai/build-planning#73

Authors:
  - Kyle Edwards (https://github.com/KyleFromNVIDIA)

Approvers:
  - James Lamb (https://github.com/jameslamb)

URL: #193
rapids-bot bot pushed a commit to rapidsai/kvikio that referenced this issue Jul 19, 2024
After updating everything to CUDA 12.5.1, use `shared-workflows@branch-24.08` again.

Contributes to rapidsai/build-planning#73

Authors:
  - Kyle Edwards (https://github.com/KyleFromNVIDIA)

Approvers:
  - James Lamb (https://github.com/jameslamb)

URL: #407
rapids-bot bot pushed a commit to rapidsai/cucim that referenced this issue Jul 19, 2024
After updating everything to CUDA 12.5.1, use `shared-workflows@branch-24.08` again.

Contributes to rapidsai/build-planning#73

Authors:
  - Kyle Edwards (https://github.com/KyleFromNVIDIA)

Approvers:
  - James Lamb (https://github.com/jameslamb)

URL: #749
rapids-bot bot pushed a commit to rapidsai/dask-cuda that referenced this issue Jul 19, 2024
After updating everything to CUDA 12.5.1, use `shared-workflows@branch-24.08` again.

Contributes to rapidsai/build-planning#73

Authors:
  - Kyle Edwards (https://github.com/KyleFromNVIDIA)

Approvers:
  - James Lamb (https://github.com/jameslamb)

URL: #1359
rapids-bot bot pushed a commit to rapidsai/cuvs that referenced this issue Jul 19, 2024
After updating everything to CUDA 12.5.1, use `shared-workflows@branch-24.08` again.

Contributes to rapidsai/build-planning#73

Authors:
  - Kyle Edwards (https://github.com/KyleFromNVIDIA)

Approvers:
  - James Lamb (https://github.com/jameslamb)

URL: #234
rapids-bot bot pushed a commit to rapidsai/ucxx that referenced this issue Jul 19, 2024
After updating everything to CUDA 12.5.1, use `shared-workflows@branch-24.08` again.

Contributes to rapidsai/build-planning#73

Authors:
  - Kyle Edwards (https://github.com/KyleFromNVIDIA)

Approvers:
  - James Lamb (https://github.com/jameslamb)
  - https://github.com/jakirkham

URL: #247
rapids-bot bot pushed a commit to rapidsai/cudf that referenced this issue Jul 19, 2024
After updating everything to CUDA 12.5.1, use `shared-workflows@branch-24.08` again.

Contributes to rapidsai/build-planning#73

Authors:
  - Kyle Edwards (https://github.com/KyleFromNVIDIA)

Approvers:
  - James Lamb (https://github.com/jameslamb)
  - https://github.com/jakirkham

URL: #16314
rapids-bot bot pushed a commit to rapidsai/cuspatial that referenced this issue Jul 19, 2024
This PR updates the latest CUDA build/test version 12.2.2 to 12.5.1.

Contributes to rapidsai/build-planning#73

Authors:
  - Kyle Edwards (https://github.com/KyleFromNVIDIA)
  - https://github.com/jakirkham

Approvers:
  - James Lamb (https://github.com/jameslamb)
  - https://github.com/jakirkham

URL: #1405
rapids-bot bot pushed a commit to rapidsai/cuml that referenced this issue Jul 19, 2024
After updating everything to CUDA 12.5.1, use `shared-workflows@branch-24.08` again.

Contributes to rapidsai/build-planning#73

Authors:
  - Kyle Edwards (https://github.com/KyleFromNVIDIA)

Approvers:
  - James Lamb (https://github.com/jameslamb)
  - https://github.com/jakirkham

URL: #5970
raydouglass pushed a commit to rapidsai/raft that referenced this issue Jul 19, 2024
After updating everything to CUDA 12.5.1, use `shared-workflows@branch-24.08` again.

Contributes to rapidsai/build-planning#73

Authors:
   - Kyle Edwards (https://github.com/KyleFromNVIDIA)

Approvers:
   - James Lamb (https://github.com/jameslamb)
   - https://github.com/jakirkham
raydouglass pushed a commit to rapidsai/cugraph that referenced this issue Jul 19, 2024
After updating everything to CUDA 12.5.1, use `shared-workflows@branch-24.08` again.

Contributes to rapidsai/build-planning#73

Authors:
   - Kyle Edwards (https://github.com/KyleFromNVIDIA)

Approvers:
   - James Lamb (https://github.com/jameslamb)
   - https://github.com/jakirkham
rapids-bot bot pushed a commit to rapidsai/docker that referenced this issue Jul 20, 2024
Contributes to rapidsai/build-planning#73

Proposes the following:

* adding CUDA 12.5 images
* PR builds:
    * +1 test job covering `(cuda=12.5, arch=arm64)`
* branch builds:
    * +1 test job covering `(cuda=12.5, arch=x86_64)`
    * -1  test job covering `(cuda=12.2, arch=arm64)`

context: rapidsai/build-planning#73 (comment)

## Notes for Reviewers

Per offline discussion with @raydouglass , this would be accompanied by a deprecation notice in the RAPIDS 24.08 release stating that the CUDA 12.2 images will be removed in some future release (future release not yet determined).

Authors:
  - James Lamb (https://github.com/jameslamb)
  - https://github.com/jakirkham

Approvers:
  - Ray Douglass (https://github.com/raydouglass)
  - Kyle Edwards (https://github.com/KyleFromNVIDIA)
  - https://github.com/jakirkham

URL: #689
@jameslamb
Copy link
Member

I'm proposing switching to CUDA 12.5 images + Python 3.11 in the docs at https://docs.rapids.ai/deployment

rapidsai/deployment#398

@jakirkham jakirkham added the epic label Aug 3, 2024
@jakirkham
Copy link
Member

For completeness, this is largely complete. All that remains is...

  • Adding this to the rapids.ai main page
  • Adding this to the stable install selector (currently it is only under nightlies)

Both of which are in the checklist in the OP

These typically happen after the release is complete

@jakirkham
Copy link
Member

Ok the remaining webpage PRs are up. Thanks Ray! 🙏

Have noted those above and xref'd them here

@jakirkham
Copy link
Member

Closing as completed 🥳

Thanks everyone for all of your hard work shipping CUDA 12.5 this release! 👏

@bdice
Copy link
Contributor Author

bdice commented Aug 19, 2024

For future reference: gputreeshap was missed during this upgrade. rapidsai/gputreeshap#50 (comment)

@jakirkham
Copy link
Member

jakirkham commented Aug 19, 2024

It's a little hard to know what to do with that repo given cuML seems to vendor it. We've also struggled with updating it in the past ( for example: rapidsai/gputreeshap#42 ). Maybe we should have a separate discussion on how it should be managed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants