Set correct NPU backend and distributed_type when using transfer_to_npu #3021

ArthurinRUC · 2024-08-16T09:07:23Z

What does this PR do?

When running from torch_npu.contrib import transfer_to_npu, we will find torch.cuda.is_available() is True which leads to incorrect backend and distributed_type. This PR fixes the problem to make sure npu envs check is before CUDA.

Fixes #3020

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@muellerzr

BenjaminBossan · 2024-08-16T10:32:00Z

Let's add a comment that NPU has to be checked before CUDA and why (a reference to the issue). Also, any idea if the same can also happen with other devices, i.e. CUDA should always be the last check?

ArthurinRUC · 2024-08-16T10:44:27Z

Let's add a comment that NPU has to be checked before CUDA and why (a reference to the issue). Also, any idea if the same can also happen with other devices, i.e. CUDA should always be the last check?

Sure, I will add a brief description to my fix. Also I've already open an issue here #3020 :)

BenjaminBossan · 2024-08-16T10:54:51Z

Sure, I will add a brief description to my fix. Also I've already open an issue here #3020 :)

Thanks. Yeah, I meant that the comment can reference said issue.

muellerzr · 2024-08-16T19:08:19Z

src/accelerate/state.py

+            elif is_npu_available():
+                backend = "hccl"
+                distributed_type = DistributedType.MULTI_NPU


agreed with @BenjaminBossan, let's add a comment here and during torch.device("npu") that clarifies (in the code) that these must be done before the cuda check

HuggingFaceDocBuilderDev · 2024-08-16T19:11:13Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

ArthurinRUC · 2024-08-17T02:03:14Z

All right I got it! Thanks :)

muellerzr

Great! Thanks for fixing this!

BenjaminBossan

Thanks!

yangyuanhang7 added 2 commits August 16, 2024 16:29

fix npu setting

f699b4b

fix npu setting

f1917d4

ArthurinRUC mentioned this pull request Aug 16, 2024

NPU backend and distributed_type was incorrect when using transfer_to_npu #3020

Closed

muellerzr reviewed Aug 16, 2024

View reviewed changes

add code comments

9edbc7b

muellerzr approved these changes Aug 17, 2024

View reviewed changes

BenjaminBossan approved these changes Aug 19, 2024

View reviewed changes

muellerzr merged commit 5536a3a into huggingface:main Aug 19, 2024
25 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Set correct NPU backend and distributed_type when using transfer_to_npu #3021

Set correct NPU backend and distributed_type when using transfer_to_npu #3021

ArthurinRUC commented Aug 16, 2024 •

edited

Loading

BenjaminBossan commented Aug 16, 2024

ArthurinRUC commented Aug 16, 2024

BenjaminBossan commented Aug 16, 2024

muellerzr Aug 16, 2024

HuggingFaceDocBuilderDev commented Aug 16, 2024

ArthurinRUC commented Aug 17, 2024

muellerzr left a comment

BenjaminBossan left a comment

Set correct NPU backend and distributed_type when using transfer_to_npu #3021

Set correct NPU backend and distributed_type when using transfer_to_npu #3021

Conversation

ArthurinRUC commented Aug 16, 2024 • edited Loading

What does this PR do?

Before submitting

Who can review?

BenjaminBossan commented Aug 16, 2024

ArthurinRUC commented Aug 16, 2024

BenjaminBossan commented Aug 16, 2024

muellerzr Aug 16, 2024

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Aug 16, 2024

ArthurinRUC commented Aug 17, 2024

muellerzr left a comment

Choose a reason for hiding this comment

BenjaminBossan left a comment

Choose a reason for hiding this comment

ArthurinRUC commented Aug 16, 2024 •

edited

Loading