version 0.7.6 is 2x faster than version 0.8.1 #2290
Replies: 6 comments
-
Behavior reminds #2291 |
Beta Was this translation helpful? Give feedback.
-
This doesn't sound good. What happens if you disable everything except for the minimal forward and backward pass? Have you tried the profiler? |
Beta Was this translation helpful? Give feedback.
-
@ternaus can we also observe this with the e.g. gpu_template.py in pl_examples folder? It would be much easier to debug this one. |
Beta Was this translation helpful? Give feedback.
-
with the Parity tests, we have not observed any slowdown compare to vanilla PyTorch, mind share example? |
Beta Was this translation helpful? Give feedback.
-
@ternaus this is not correct. We actually test for this on every PR... we make sure to stay within a threshold of pytorch (in fact we're only 300ms slower per epoch). https://github.com/PyTorchLightning/pytorch-lightning/blob/master/benchmarks/test_parity.py You likely have loggers enabled or your log_row_interval is too low (set it to 100 or something). |
Beta Was this translation helpful? Give feedback.
-
We can reopen this if you can post a colab that replicates this behavior (also, make sure to use ddp not ddp_spawn if you are using num_workers > 0 |
Beta Was this translation helpful? Give feedback.
-
4 x 2080 Ti
ddp
fp16
0.7.6 => 5 minutes for the train + validation
0.8.1 => 9 minutes for the train + validation
Is there anything I need to change?
Not sure how to provide a minimal example to reproduce.
The full pipeline looks like https://github.com/ternaus/retinafacemask/blob/master/retinafacemask/train.py
Beta Was this translation helpful? Give feedback.
All reactions