-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The results for CausalConv3d #11
Comments
@Epiphqny wow Yuqing! those results do not look half bad! i'll have to think about your results a bit more. so this work builds upon the cvivit from the phenaki paper. in that paper, i believe they encode the first frame separately from the rest (to allow for single image pretraining). however, in this work, they decide to just pad on the left and use the same encoding for the first frame vs the rest. perhaps i can add the cvivit way for the sake of comparing the two |
@Epiphqny once i circle back to this, also want to craft out a few more specialized discriminators (fourier domain as well as temporal) |
@Epiphqny did you use LFQ or FSQ btw? could you share your hyperparameters? |
Hi @lucidrains, thanks for your prompt response! Actually, I didn't use the LFQ or FSQ, instead, I used the quantization in CVQ-VAE https://github.com/lyndonzheng/CVQ-VAE, and extend the 2D conv to 3D causal conv like magvit2. For the training parameters, I've followed the setup used in VQGAN and initialized the weights using a CVQ-VAE model prertrained on image data. I will trained the updated code of first frame and looking forward to the updated discriminator! |
@Epiphqny ohh i see! i didn't know you only used the causal conv i'm not sure what the issue is then |
@lucidrains Thanks for your response ! I will try more modules in this implementation and update the results later. |
Hi @Epiphqny , Is there any progress on improving results? |
Hi @lucidrains , thanks for your awesome work! I used your causal conv implementation and trained on a video vqgan network. The results are as follows:
Original clip sequence:
The reconstructed clip sequence:
I've noticed that the reconstruction seems to heavily rely on the initial frame. As the sequence progresses, the clarity of the images appears to diminish, leading to a more blurring effect with each subsequent frame. Could you provide any insights into this phenomenon? Thank you for your time and assistance!
The text was updated successfully, but these errors were encountered: