Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bugfix] cuda error running llama 3.2 #11047

Merged
merged 2 commits into from
Dec 10, 2024

Conversation

GeneDer
Copy link
Contributor

@GeneDer GeneDer commented Dec 10, 2024

Re: https://github.com/vllm-project/vllm/pull/9850/files#diff-107fd4a59dcd0831ff802fefe9c49eac02432b6a6d1f508075a8b1809c1468b4R11-R15

Those .get_device_capability and .has_device_capability are now called on module loading of prefix prefill, however, they can throw errors when using it with cuda. This PR catches those unexpected runtime errors and returns the corresponding value (None and False) in the failure cases so the module can be loaded successfully.

Signed-off-by: Gene Su <e870252314@gmail.com>
Copy link

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

  • Add ready label to the PR
  • Enable auto-merge.

🚀

@GeneDer
Copy link
Contributor Author

GeneDer commented Dec 10, 2024

cc @comaniac

Signed-off-by: Gene Su <e870252314@gmail.com>
Copy link
Collaborator

@comaniac comaniac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@comaniac comaniac added the ready ONLY add when PR is ready to merge/full CI is needed label Dec 10, 2024
@comaniac comaniac enabled auto-merge (squash) December 10, 2024 01:22
@youkaichao
Copy link
Member

when and why would it fail?

@comaniac comaniac merged commit 82c73fd into vllm-project:main Dec 10, 2024
61 checks passed
@GeneDer
Copy link
Contributor Author

GeneDer commented Dec 10, 2024

when and why would it fail?

When running llama 3.2 in a ray actor, runtime error will be thrown, and cause module loading to fail. But regardless, the way this is used now should probably never fail and should just return falsely values in those cases.

@youkaichao
Copy link
Member

@GeneDer I'm concerned with this change, and would like to know why it fails. I'm afraid this might be caused by some incorrect setup from your infra side, because normally, as long as you can run nvidia-smi , the nvml library should work.

@GeneDer GeneDer deleted the genesu/fix-llama-3dot2-cuda-issue branch December 10, 2024 20:01
@GeneDer
Copy link
Contributor Author

GeneDer commented Dec 10, 2024

@youkaichao There are literally no other changes on our end beside upgrading vllm 😅 Also you can see the offending PR called on those methods on module loading, which IMO is not supposed to fail out right and should just use the default values.

@youkaichao
Copy link
Member

i think the right way should be removing the function call on module loading, rather than changing the function get_device_capability .

@GeneDer
Copy link
Contributor Author

GeneDer commented Dec 10, 2024

While that also works, I disagree with the approach. The caller (prefix_prefill.py) of those methods shouldn't need to know which platform they are on and guarding against platform specific errors. Those get_device_capability and has_device_capability should be designed to be error free and be able to return the defaults as well. That is also the reason their interface allows to return optional values right https://github.com/vllm-project/vllm/blob/main/vllm/platforms/interface.py#L117

@youkaichao
Copy link
Member

Those get_device_capability and has_device_capability should be designed to be error free

this is true. that's why i want to know why it fails in your environment. usually it means something is wrong with the nvidia setup. nvml should work for all nvidia datacenter hardware.

@GeneDer
Copy link
Contributor Author

GeneDer commented Dec 10, 2024

I don't think that's the case unless it's always been setup incorrectly until now that prefix prefill module just revealed the issue lol But the environment has not been changed, between the upgrade, it's always been running the engine in a ray actor on a gpu cluster.

@youkaichao
Copy link
Member

can you try to reproduce, if you can get error from has_device_capability by running in that gpu node?

sleepwalker2017 pushed a commit to sleepwalker2017/vllm that referenced this pull request Dec 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready ONLY add when PR is ready to merge/full CI is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants