-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test
produces a warning when using DDP
#12862
Comments
Your observations are correct. This is why we recommend splitting your fitting and testing procedures into separate scripts when DDP is used for fitting.
No
This can be impactful for the reproducibility of previous results.
Yes! It is mentioned in our docs: import warnings
warnings.filterwarnings("ignore", ".*Consider increasing the value of the `num_workers` argument*")
# or to ignore all warnings that could be false positives
from pytorch_lightning.utilities.warnings import PossibleUserWarning
warnings.filterwarnings("ignore", category=PossibleUserWarning)
I believe we looked into this but are hesitant about enabling it by default - cc @kaushikb11 You should be able to import and use this sampler to try it out at the moment |
Not sure what do you mean by "my observations are correct". My point was that splitting the training into separate
Should it be possible? As I've already mentioned, there are some cases, where this ability would be really useful (you may want to tune, test and/or validate your model on a different device or with a different number of devices and doing it in separate scripts is inconvenient or sometimes even impossible). |
Yes you are right, it does not. The reason is that some of these possibilities are non-trivial to handle, both for Lightning as well as the user. That is why we say the "dumb" thing of use
In this case you're good! No action needs to be taken. The warning can be safely ignored. Should we say that in the warning itself?
We do it because in Lightning we care about correctness and reproducibility. If the test metric varies between runs that use different number of devices, it raises questions that are hard to answer UNLESS you stumble over this post here or are deeply familiar with the internals of PyTorch and DDP.
Right now, no! It's really unfortunate, I know. Ideally this would be solved by Lightning and the user doesn't have to do anything. For this, Lightning would need to support uneven inputs (with something like the unrepeated sampler you mention) AND support syncing the metric. The syncing of the metric is the tricky part here. Specifically, the syncing on_step=True which would lead to a deadlock if processes have different number of batches. I believe that if some restrictions were in place, like, we say syncing on a per batch level is not allowed, and one used torchmetrics exclusively, then the unrepeated sampler + ddp could work. However, these restrictions might be too strong in general. On a personal note, this incorrectness problem bothered me for a long time and I wanted to solve this issue since it emerged, but it is SO HARD! |
I don't know, how representative my opinion is of the wider community, but I take warnings very seriously. I have a zero warning policy, and I try to understand and fix all the warnings in my code. To me, a warning is basically an A warning indicates to the user, that they are probably doing something wrong. If there is no way for the user to fix the warning, then that warning is incorrect and is therefore a bug (a low priority bug, but nevertheless still a bug). I know, that you can silence warnings with warnings.filterwarnings("ignore", ...) but I consider this approach to be bad practice. Here are my reasons against manually filtering warnings:
In my opinion, the "correct" way to fix a warning (as a user) is to change my program in such a way that the warning is not emitted. If there is no way to do so, then it is a bug in the library.
What do you think about only warning the user if |
@RuRo I agree with your thoughts about warnings. This is why all these warnings are under our custom category At the moment, we don't have any ideas about how to improve this situation but feel free to mention any that you have.
|
This issue has been automatically marked as stale because it hasn't had any recent activity. This issue will be closed in 7 days if no further activity occurs. Thank you for your contributions, Pytorch Lightning Team! |
Just commenting because this is really important for the research community using Pytorch lightning ... to make sure the results are reproducible... Are we still looking into this? |
Hi @sounakdey |
I second @sounakdey here. For now, I'm working around this issue of not being able to run |
hi, any progress on this ? I am currently a newbie for lightning. I am working on language generation task. The metric i use is # hyps: a list of string
# refs: a list of string
score = get_bleu_score(hyps,refs) so in ddp test setting, i have to do the following:
To perform 2 and 3, i think this will do (not sure if self.all_gather could gather non-tensor object, but mp.all_gather_object will do): def test_step(self, batch, batch_idx):
hyp, ref = batch
transltions = self.generate(hyp)
return hyp,ref
def test_epoch_end(self, outputs):
all_hyps,all_refs = self.all_gather(outputs)
if self.trainer.is_global_zero:
score = get_bleu_score(all_hyps,all_refs)
self.log("BLEU", score, rank_zero_only=True) However, when i want to unevenly and sequentially distribute test data to multiple GPU, i don't know what to do. here are some naive solutions i came up with:
class UnevenSequentialDistributedSampler(torch.utils.data.sampler.Sampler):
"""
This is slightly different version of SequentialDistrbitedSample from
https://github.com/huggingface/transformers/blob/81ac45f85c35244831f11f73c09ea10eee4f953a/src/transformers/trainer_pt_utils.py
In thie version, the datset is not evenly split. Since we don't need tensors of same shape to reduce or gather
"""
def __init__(self, dataset, num_replicas=None, rank=None):
import math
if num_replicas is None:
if not torch.distributed.is_available():
raise RuntimeError("Requires distributed package to be available")
num_replicas = torch.distributed.get_world_size()
if rank is None:
if not torch.distributed.is_available():
raise RuntimeError("Requires distributed package to be available")
rank = torch.distributed.get_rank()
self.dataset = dataset
self.num_replicas = num_replicas
self.rank = rank
self.num_samples = int(math.ceil(len(self.dataset) * 1.0 / self.num_replicas))
self.total_size = self.num_samples * self.num_replicas
indices = list(range(len(self.dataset)))
self.indices = indices[self.rank * self.num_samples : (self.rank + 1) * self.num_samples] ## a trick for python list ls[:infinity]
def __iter__(self):
return iter(self.indices)
def __len__(self):
return len(self.indices)
def test_epoch_end(self, outputs):
all_hyps,all_refs = self.all_gather(outputs)
all_hyps = all_hyps[:self.number_of_test_samples]
all_refs = all_refs[:self.number_of_test_samples]
if self.trainer.is_global_zero:
score = get_bleu_score(all_hyps,all_refs)
self.log("BLEU", score, rank_zero_only=True) I don't konw what is the best practice to do, could you please help me out ? |
Hello, is there any progress on this? Thank you! |
Trying to
trainer.test
with multiple GPUs (or even when using a single GPU withDDPStrategy
) produces the following warning:The problem is that the warning doesn't adequately explain, how to fix this problem in all possible cases.
1. What if I am running
trainer.test
aftertrainer.fit
?Settings
devices=1
in that case is not really a solution, because I want to use multiple GPUs for training. Creating a newTrainer
instance also doesn't quite work, because that would create a separate experiment (AFAIK?). For example,ckpt_path="best"
wouldn't work with a newTrainer
instance, the Tensorboard logs will get segmented and so on.Is it possible to use a different
Strategy
fortune
,fit
andtest
in a singleTrainer
? (btw, this might be useful even outside of this issue, astune
currently doesn't work well withDDP
)2. What if I don't care about
DistributedSampler
adding extra samples?Please correct me, if I am wrong, but
DistributedSampler
should add at mostnum_devices - 1
extra samples. This means that unless you are using hundreds of devices or using extremely small datasets, the difference in metrics will probably bea) less than the rounding precision
b) less than the natural fluctuations due to random initialization and non-deterministic CUDA shenanigans
I think that bothering users with such a minor issue isn't really desirable. Can this warning be silenced somehow?
3. Can this be fixed without requiring any changes from the users?
I found
pytorch_lightning.overrides.distributed.UnrepeatedDistributedSampler
, which allegedly solves this exact problem, but doesn't work for training.Does
UnrepeatedDistributedSampler
solve this issue? If it does, I think it should be at least mentioned in the warning and at best - used automatically duringtest
instead of warning the user.cc @justusschock @kaushikb11 @awaelchli @akihironitta @rohitgr7
The text was updated successfully, but these errors were encountered: