-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Discriminator loss converges to zero early in training #16
Comments
@jpfeil can you screenshot the paper section where they propose delaying the discriminator training? (and link the paper too) |
@jpfeil do you have |
Thanks @lucidrains. I'll try again with those parameters. I saw it in the taming implementation here: |
@jpfeil welp.. whatever Robin and Patrick does goes; they are the best in the world. let me add that |
@jpfeil you don't happen to have relatives in Massachusetts, do you? |
@lucidrains Nice. Let me try it out again. No, I don't have any relatives in Massachusetts. Did you meet someone with the last name Pfeil? |
yea, I knew someone back in high school with the Pfeil family name. Tragedy struck and they moved away though. You are the second Pfeil I've met! |
That's amazing. It's not a common name. Sorry to hear about your friend. |
I compared v0.1.26 without the GAN and v0.1.36 with the GAN using the fashion mnist data and was able to get better reconstructions without the GAN:
https://api.wandb.ai/links/pfeiljx/f7wdueh0
Do you have any suggestions for improving training?
I'm using a cosine scheduler for the model and discriminator. Should I use a different learning rate schedule for the discriminator?
I saw similar discriminator collapse with the VQ-GAN, and I read that delaying the discriminator until the generator model is optimized may help. Maybe delaying the discriminator until a certain reconstruction loss is achieved?
After googling some strategies, I saw the unrolled GAN where the generator stays a few steps ahead of the discriminator. I'm not sure how difficult it would be to implement a similar strategy here.
I'm just brainstorming, so feel free to address or ignore any of these comments.
The text was updated successfully, but these errors were encountered: