PyTorch version training issues #14

longlong161 · 2024-07-24T09:05:07Z

After training the PyTorch version for a while, the metrics will not change. I trained twice and this problem started at 454 epochs.

By the way, can we also add SSIM evaluation metrics to the PyTorch version? Thank you

albrateanu · 2024-07-24T09:10:05Z

Hello. Will add SSIM evaluation today.
About the training issues. I still need to fix the model forward pass a bit. Moreover, training on PyTorch has different behaviour to TensorFlow apparently, so I will still be checking to see how to bring it as close as possible to the TensorFlow initial implementation.

longlong161 · 2024-07-24T09:16:34Z

Hello. Will add SSIM evaluation today. About the training issues. I still need to fix the model forward pass a bit. Moreover, training on PyTorch has different behaviour to TensorFlow apparently, so I will still be checking to see how to bring it as close as possible to the TensorFlow initial implementation.

Okay, thank you for your reply. Do you still need to add files related to testing in the future? Looking forward to the complete PyTorch version

stuhao251 · 2024-07-24T10:42:04Z

感觉是在那个轮次训练已经达到饱和了，哈哈哈

albrateanu · 2024-07-24T18:02:59Z

@longlong161 Don't know exactly what the issue with training is for you. I didn't have that. Looks like a gradient problem. Please retry with the last code and let me know if it still persists.

@stuhao251 It might be the case, unfortunately. I still get better metrics when training with TF for some reason. However, the baseline PyTorch implementation of the model should be exactly the same as the TF version, as both the TF and Torch versions of the model have 44923 parameters. So please feel free to develop on it if you think there's space for improvement.

longlong161 · 2024-07-26T10:04:26Z

@albrateanu I used the latest code you uploaded and encountered a new error that appeared on both machines.

GZY2000 · 2024-07-31T01:24:40Z

@albrateanu I used the latest code you uploaded and encountered a new error that appeared on both machines.

Hello, has pytorch version come out yet? Why is my result so much worse than the results of tensorflow in the paper?

albrateanu · 2024-07-31T16:49:26Z

@longlong161 I'm not sure how I can help with that. Please make sure the environment is ok and running PyTorch 2. And perhaps just attempt retraining more? Or try setting either the Smooth L1 loss or the MS-SSIM loss weight to 0? It seems like a gradient explosion problem and it's not very clear what causes it given that I didn't have that.
@GZY2000 I keep saying this, I am not sure. I am able to obtain at least around 0.8-1dB more on TensorFlow than on PyTorch. But it's a bit out of my knowledge.

albrateanu · 2024-08-03T20:22:14Z

@GZY2000 I have just posted PyTorch weights. On LOLv2 Real, PyTorch implementations has 0.61dB more than TensorFlow.

GZY2000 · 2024-08-07T01:45:11Z

@GZY2000 I have just posted PyTorch weights. On LOLv2 Real, PyTorch implementations has 0.61dB more than TensorFlow.

That's true, but synthetic is more than 2db off on this dataset, which is the best result you can get after a lot of training!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PyTorch version training issues #14

PyTorch version training issues #14

longlong161 commented Jul 24, 2024

albrateanu commented Jul 24, 2024

longlong161 commented Jul 24, 2024

stuhao251 commented Jul 24, 2024

albrateanu commented Jul 24, 2024 •

edited

Loading

longlong161 commented Jul 26, 2024

GZY2000 commented Jul 31, 2024

albrateanu commented Jul 31, 2024

albrateanu commented Aug 3, 2024

GZY2000 commented Aug 7, 2024

PyTorch version training issues #14

PyTorch version training issues #14

Comments

longlong161 commented Jul 24, 2024

albrateanu commented Jul 24, 2024

longlong161 commented Jul 24, 2024

stuhao251 commented Jul 24, 2024

albrateanu commented Jul 24, 2024 • edited Loading

longlong161 commented Jul 26, 2024

GZY2000 commented Jul 31, 2024

albrateanu commented Jul 31, 2024

albrateanu commented Aug 3, 2024

GZY2000 commented Aug 7, 2024

albrateanu commented Jul 24, 2024 •

edited

Loading