-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Low FVD scores and generating inverted samples? #16
Comments
Hi! The inverted images are being generated due to the use of differentiable augmentations (from StyleGAN2-ADA). Typically, one just needs to train for longer to get those diffaugs sorted out (you can check the StyleGAN2-ADA paper). A natural way to solve the issue would be to increase the dataset size, but i suspect it's not possible in your case. For RainbowJelly — note please that it's not a symmetric dataset, so it makes sense to disable mirroring for it here if you want to obtain better results on it (we didn't do this in our case to be comparable with other methods). |
I see, I will follow this advice and retrain the network on all the datasets. I don't remember the exact number of k images that I trained the network for, but I trained each network on a 4 GPU setup of Nvidia 2080 GTX Tis for close to 2 days with the following batch configuration and resolution --
I did manually invert the videos in the case of predictions in how2sign_faces and used the cal_metrics_for_dataset.py for 100 generated videos and calculated the FVD -- it came to 297. As for SkyTimeLapse, I observed an FVD score of around 51 even on the smaller dataset and the images were perceptually the best as well. I will try to retrain the models with the mentioned augmentations check turned off and re-report the metrics and inferences. |
Ok, sounds good. Also note that computing FVD on a small amount of videos (100 instead of 2000) might lead to worse FVD values because it will think that you have mode collapse in your statistics and will penalize for that |
I will train with augmentations disabled on the smaller datasets in that case. I don't think generating and manually inverting 2048-generated videos would be a good idea. |
@universome I tried running with augmentation disabled using the flag |
Would it be this particular option in the base.yaml file under configs/training? Also what should be the disc augmentation to avoid the situation being talked about at the top? Currently I am using noaug for aug: and bgc as augpipe: . I am inclined towards changing augpip in bgc to noise though. Please let me know what you think. |
Hi! The question you are asking is somewhat difficult since it is difficult to predict how the model would perform with these or that augmentations. I believe that you would need some augmentations enabled to make your model fit a small dataset. If you want to disable augmentations completely, then you should specify How are your results going without any augmentations? If the model does not overfit, then you can disable them completely. |
So the augmentations specified in the |
There are three possible choices for augmentations: 1) no augmentations ( |
For comparisons, I am training StyleGan-V on a relatively smaller dataset of faces (faces from the how2sign dataset). In particular, I am training StyleGan-V on 10,000 videos and each video has exactly 25 frames. After training for a sufficient amount of time (when I started noticing really good perceptual results), I generated the inference, and the inference looks as follows. The first thing I observe is that the video generated is inverted. Now the orientation of the intermediate predicted/generated video keeps changing during training.
Secondly, I measured the fvd2048_16f scores using this pretrained checkpoint against the dataset on which the model was trained, and I am getting a relatively very high fvd score of ~1100. Is this expected since the model is trained on a fewer number of samples, or if there is something wrong as the inferred videos are inverted? For training on the rest of the datasets, I am able to get videos in the correct orientation (ucf, skytimelapse, rainbow jelly). Also attached below is one frame extracted from the generated video (and the perceptually quality looks good to me).
The text was updated successfully, but these errors were encountered: