-
Notifications
You must be signed in to change notification settings - Fork 967
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update modeling.py by adding try-catch section to skip the unavailable devices #2681
Conversation
By the way, if the unavailable GPU is the first one among the visible devices, even when the above method is used to obtain the maximum available VRAM, an error will still occur in this section:
This is because PyTorch's cache-clearing function, by default, affects the first device when no parameters are passed. However, this is a feature of PyTorch, I personally believe it may not be necessary to optimize this aspect within this project. Therefore, one could manually implement a try-catch block to handle this situation:
|
This comment was marked as resolved.
This comment was marked as resolved.
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @MeVeryHandsome for the PR ! I think it is best to use info
instead of warning
. As for your question about empty_cache
, we use it a lot in other parts of the library and it won't make sense to try catch these each time. This is probably something that pytorch should try to fix !
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> Update src/accelerate/utils/modeling.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> Update src/accelerate/utils/modeling.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> Update src/accelerate/utils/modeling.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
What does this PR do?
In the current implementation of the library, when deploying tensor operations across multiple GPU devices, the program attempts to initialize a tensor on each device and get maximum available memory.
However, this approach leads to an error and terminates the program if it encounters a GPU that is unavailable(memory fully occupied or device damaged).
For example, this error will occur when one of the devices is fully occupied:
To address this issue, I propose a modification to the logic for determining the maximum available memory on GPUs. Specifically, I have introduced a try-except block around the tensor deployment operation.
If an error occurs due to the unavailability of a GPU, the code will now catch the exception and continue checking the next available GPU. This change ensures that the program gracefully skips over unavailable GPUs and only utilizes those that are operational.
This adjustment is intended to improve the library's usability in multi-devices. I am open to any suggestions or further improvements from the community.
Thank you for watching this : )
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
I apologize, as I am uncertain about whom to tag. Perhaps @muellerzr or @pacman100 can review this. thx