[Bugfix] cuda error running llama 3.2 #11047

GeneDer · 2024-12-10T01:11:58Z

Re: https://github.com/vllm-project/vllm/pull/9850/files#diff-107fd4a59dcd0831ff802fefe9c49eac02432b6a6d1f508075a8b1809c1468b4R11-R15

Those .get_device_capability and .has_device_capability are now called on module loading of prefix prefill, however, they can throw errors when using it with cuda. This PR catches those unexpected runtime errors and returns the corresponding value (None and False) in the failure cases so the module can be loaded successfully.

Signed-off-by: Gene Su <e870252314@gmail.com>

github-actions · 2024-12-10T01:12:09Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

GeneDer · 2024-12-10T01:12:14Z

cc @comaniac

Signed-off-by: Gene Su <e870252314@gmail.com>

comaniac

LGTM

youkaichao · 2024-12-10T04:27:31Z

when and why would it fail?

GeneDer · 2024-12-10T15:41:33Z

when and why would it fail?

When running llama 3.2 in a ray actor, runtime error will be thrown, and cause module loading to fail. But regardless, the way this is used now should probably never fail and should just return falsely values in those cases.

youkaichao · 2024-12-10T19:44:20Z

@GeneDer I'm concerned with this change, and would like to know why it fails. I'm afraid this might be caused by some incorrect setup from your infra side, because normally, as long as you can run nvidia-smi , the nvml library should work.

GeneDer · 2024-12-10T20:05:53Z

@youkaichao There are literally no other changes on our end beside upgrading vllm 😅 Also you can see the offending PR called on those methods on module loading, which IMO is not supposed to fail out right and should just use the default values.

youkaichao · 2024-12-10T20:12:06Z

i think the right way should be removing the function call on module loading, rather than changing the function get_device_capability .

GeneDer · 2024-12-10T20:23:21Z

While that also works, I disagree with the approach. The caller (prefix_prefill.py) of those methods shouldn't need to know which platform they are on and guarding against platform specific errors. Those get_device_capability and has_device_capability should be designed to be error free and be able to return the defaults as well. That is also the reason their interface allows to return optional values right https://github.com/vllm-project/vllm/blob/main/vllm/platforms/interface.py#L117

youkaichao · 2024-12-10T20:28:41Z

Those get_device_capability and has_device_capability should be designed to be error free

this is true. that's why i want to know why it fails in your environment. usually it means something is wrong with the nvidia setup. nvml should work for all nvidia datacenter hardware.

GeneDer · 2024-12-10T21:32:09Z

I don't think that's the case unless it's always been setup incorrectly until now that prefix prefill module just revealed the issue lol But the environment has not been changed, between the upgrade, it's always been running the engine in a ray actor on a gpu cluster.

youkaichao · 2024-12-10T21:37:03Z

can you try to reproduce, if you can get error from has_device_capability by running in that gpu node?

[Bugfix] cuda error running llama 3.2

41effca

Signed-off-by: Gene Su <e870252314@gmail.com>

lint

cbbada6

Signed-off-by: Gene Su <e870252314@gmail.com>

comaniac approved these changes Dec 10, 2024

View reviewed changes

comaniac added the ready ONLY add when PR is ready to merge/full CI is needed label Dec 10, 2024

comaniac enabled auto-merge (squash) December 10, 2024 01:22

comaniac merged commit 82c73fd into vllm-project:main Dec 10, 2024
61 checks passed

GeneDer deleted the genesu/fix-llama-3dot2-cuda-issue branch December 10, 2024 20:01

sleepwalker2017 pushed a commit to sleepwalker2017/vllm that referenced this pull request Dec 13, 2024

[Bugfix] cuda error running llama 3.2 (vllm-project#11047)

6ad96e3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bugfix] cuda error running llama 3.2 #11047

[Bugfix] cuda error running llama 3.2 #11047

GeneDer commented Dec 10, 2024 •

edited by github-actions bot

Loading

github-actions bot commented Dec 10, 2024

GeneDer commented Dec 10, 2024

comaniac left a comment

youkaichao commented Dec 10, 2024

GeneDer commented Dec 10, 2024

youkaichao commented Dec 10, 2024

GeneDer commented Dec 10, 2024

youkaichao commented Dec 10, 2024

GeneDer commented Dec 10, 2024

youkaichao commented Dec 10, 2024

GeneDer commented Dec 10, 2024

youkaichao commented Dec 10, 2024

[Bugfix] cuda error running llama 3.2 #11047

[Bugfix] cuda error running llama 3.2 #11047

Conversation

GeneDer commented Dec 10, 2024 • edited by github-actions bot Loading

github-actions bot commented Dec 10, 2024

GeneDer commented Dec 10, 2024

comaniac left a comment

Choose a reason for hiding this comment

youkaichao commented Dec 10, 2024

GeneDer commented Dec 10, 2024

youkaichao commented Dec 10, 2024

GeneDer commented Dec 10, 2024

youkaichao commented Dec 10, 2024

GeneDer commented Dec 10, 2024

youkaichao commented Dec 10, 2024

GeneDer commented Dec 10, 2024

youkaichao commented Dec 10, 2024

GeneDer commented Dec 10, 2024 •

edited by github-actions bot

Loading