-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
warning msg/documentation on the tf32 related system flags and usage #6754
Comments
Also, found a related part in the repo, fyi Lines 173 to 198 in 2800a76
|
about #6754 . ### Types of changes <!--- Put an `x` in all the boxes that apply, and remove the not applicable items --> - [x] Non-breaking change (fix or new feature that would not break existing functionality). - [ ] Breaking change (fix or new feature that would cause existing functionality to change). - [ ] New tests added to cover the changes. - [ ] Integration tests passed locally by running `./runtests.sh -f -u --net --coverage`. - [ ] Quick tests passed locally by running `./runtests.sh --quick --unittests --disttests`. - [ ] In-line docstrings updated. - [x] Documentation updated, tested `make html` command in the `docs/` folder. --------- Signed-off-by: Qingpeng Li <qingpeng9802@gmail.com>
about #6754 . ### Description show a warning if any thing may enable tf32 is detected ### Types of changes <!--- Put an `x` in all the boxes that apply, and remove the not applicable items --> - [x] Non-breaking change (fix or new feature that would not break existing functionality). - [ ] Breaking change (fix or new feature that would cause existing functionality to change). - [ ] New tests added to cover the changes. - [ ] Integration tests passed locally by running `./runtests.sh -f -u --net --coverage`. - [ ] Quick tests passed locally by running `./runtests.sh --quick --unittests --disttests`. - [x] In-line docstrings updated. - [ ] Documentation updated, tested `make html` command in the `docs/` folder. --------- Signed-off-by: Qingpeng Li <qingpeng9802@gmail.com>
Guys, I understand what you're trying to do, but I train on multi-gpu and the screens starts full of Warnings, which is a bit overwhelming a) is there a way to disable these warnings? ( I do know that TF32 is enabled)
|
a) There is currently no way to disable it, maybe we can add an environment variable Could you provide the code snippets related to |
I think the main ambiguity from a user's perspective is often from this particular setting: |
The thing is actually a bit complicated. |
ok, looks like it's consistent in pytorch 2.0, then I think there's no need to warn in this case?
I don't think in regular use cases Since there are some changes in the previous versions of pytorch on this topic, perhaps we can focus on proper warnings for torch>=2.0 only. what do you think? |
The behavior of PyTorch is consistent, but for the users, it seems a bit hard to troubleshoot, just like the root issue of this issue. This is essentially a tradeoff for bothering experienced and inexperienced users. I would suggest to add an environment variable as a flag to suppress the warnings. There is a similar idea in huggingface/transformers#16588 (comment) |
Guys, I'm running the AutoRunner() from monai on 8 gpus, these WARNINGS are overwhelming. It printed them 16 times (probably form DataAnalyzer() which creates several parallel processes), then another 8 WARNINGS when training starts. Can we please disable these warnings. Or at least show it just one time, and not so many. thank you. |
thanks@myron I'm creating a feature request and will have a look soon. |
(follow up of #6525) My larger concern is that other operations in monai will be also affected by the tf32 issue (since all operations uses
cuda.matmul
are affected). This may lead to significant reproducibility issues.My proposal is adding something like
https://github.com/Lightning-AI/lightning/pull/16037/files#diff-909e246d6c36514f952ae5023bd9fbcc3e8f2c6a0837ebf81d7dc96790b5f938R190-R210
to related classes/functions in monai. Then, monai will print warnings when the flag is True. Not sure when it is better to print warnings, maybe during import? Maybe warnings can be suppressed when the flage is explicitly set by users, but it seems technically challenging.
&
adding a part in the documentation to educate users how to use tf32 properly.
Originally posted by @qingpeng9802 in #6525 (comment)
The text was updated successfully, but these errors were encountered: