Awesome Video Generation

📝 Introduction

A comprehensive collection of works on video generation/synthesis/prediction.

(Source: MCVD, VideoFusion)

✨Survey_Papers

Video Frame Interpolation: A Comprehensive Survey
Diffusion Models: A Comprehensive Survey of Methods and Applications
Diffusion Models in Vision: A Survey (IEEE TPAMI 2023)
What comprises a good talking-head video generation?: A Survey and Benchmark
A Review on Deep Learning Techniques for Video Prediction (2020)

🌟Datasets

InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation
CelebV-Text: A Large-Scale Facial Text-Video Dataset
CelebV-HQ: A Large-Scale Video Facial Attributes Dataset
UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild
The Kinetics Human Action Video Dataset
Recognizing human actions: a local SVM approach
A Short Note about Kinetics-600
LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs
Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval (ICCV 2021)
Self-Supervised Visual Planning with Temporal Skip Connections
How2Sign: A Large-scale Multimodal Dataset for Continuous American Sign Language (CVPR 2021)
Learning to Drive by Watching YouTube Videos: Action-Conditioned Contrastive Policy Pretraining (ECCV 2022)
FakeCatcher: Detection of Synthetic Portrait Videos using Biological Signals
DTVNet: Dynamic Time-lapse Video Generation via Single Still Image (ECCV 2020)
Multi-StyleGAN: Towards Image-Based Simulation of Time-Lapse Live-Cell Microscopy
DDH-QA: A Dynamic Digital Humans Quality Assessment Database
Muscles in Action
TPA-Net: Generate A Dataset for Text to Physics-based Animation
Touch and Go: Learning from Human-Collected Vision and Touch
BVI-VFI: A Video Quality Database for Video Frame Interpolation
Multi-modal Video Chapter Generation
Merkel Podcast Corpus: A Multimodal Dataset Compiled from 16 Years of Angela Merkel's Weekly Video Podcasts (LREC 2022)

🚀Video-generation_subtopics

Controllable Video generation
Text-to-video, image-to-video
The “classic” task: create a video scratch, i.e. starting from random noise. The generation process is sometimes given simple conditions, such as Text or image. Common goals include visual fidelity, temporal coherence, and logical plausibility.
[Video prediction and Frame interpolation
Video generation with (visual) constraints
Predict the next N frames following a sequence of input video frames, or predict N frames between the given start and final frames. Could be viewed as a special case of video completion
Aimed at improving the motion smoothness of low frame rate videos, by inserting additional frames between existing video frames. Some works can “insert” frames after the input frames, so they technically can perform video prediction to some extent.
Novel view synthesis
These usually involve reconstructing a 3D scene from some observations (e.g. monocular video input, or static images), and then generating renderings of the scene from new perspectives.
Human motion generation
These are video generation tasks specifically geared to human (or humanoid) activities
Talking head or face generation
Talking head generation refers to the generation of animated video content that simulates a person's face and head movements while they are speaking
Video-to-video
These include enhancing the (textural) quality of videos, style transfer, motion transfer, Summarization, and various common video editing tasks (e.g. removal of a subject).

2023

Implicit Identity Representation Conditioned Memory Compensation Network for Talking Head video Generation (ICCV 2023)
Bidirectionally Deformable Motion Modulation For Video-based Human Pose Transfer
Animate-A-Story: Storytelling with Retrieval-Augmented Video Generation
GD-VDM: Generated Depth for better Diffusion-based Video Generation
DDLP: Unsupervised Object-Centric Video Prediction with Deep Dynamic Latent Particles
Video Diffusion Models with Local-Global Context Guidance (IJCAI 2023)
Gen-L-Video: Multi-Text to Long Video Generation via Temporal Co-Denoising
Large Language Models are Frame-level Directors for Zero-shot Text-to-Video Generation
ControlVideo: Training-free Controllable Text-to-Video Generation
VDT: An Empirical Study on Video Diffusion with Transformers
DaGAN++: Depth-Aware Generative Adversarial Network for Talking Head Video Generation
Sketching the Future (STF): Applying Conditional Control Techniques to Text-to-Video Models
StyleAvatar: Real-time Photo-realistic Portrait Avatar from a Single Video
Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models
Text2Performer: Text-Driven Human Video Generation
Generative Disco: Text-to-Video Generation for Music Visualization
MoStGAN-V: Video Generation with Temporal Motion Styles
Follow Your Pose: Pose-Guided Text-to-Video Generation using Pose-Free Videos
Sounding Video Generator: A Unified Framework for Text-guided Sounding Video Generation
Fine-grained Audible Video Description (CVPR 2023)
Conditional Image-to-Video Generation with Latent Flow Diffusion Models (CVPR 2023)
Towards End-to-End Generative Modeling of Long Videos with Memory-Efficient Bidirectional Transformers (CVPR 2023)
VideoFusion: Decomposed Diffusion Models for High-Quality Video Generation (CVPR 2023)
MOSO: Decomposing MOtion, Scene and Object for Video Prediction (CVPR 2023)
MotionVideoGAN: A Novel Video Generator Based on the Motion Space Learned from Image Pairs
Consistency Models
Time-Conditioned Generative Modeling of Object-Centric Representations for Video Decomposition and Prediction
Probabilistic Adaptation of Text-to-Video Models
VIDM: Video Implicit Diffusion Models (AAAI 2023)
Mm-Diffusion:Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation (CVPR 2023)
Video Probabilistic Diffusion Models in Projected Latent Space (CVPR)
Synthesizing Artistic Cinemagraphs from Text
Bidirectional Temporal Diffusion Model for Temporally Consistent Human Animation
DisCo: Disentangled Control for Referring Human Dance Generation in Real World
PVP: Personalized Video Prior for Editable Dynamic Portraits using StyleGAN
BEDLAM: A Synthetic Dataset of Bodies Exhibiting Detailed Lifelike Animated Motion (CVPR)
Envisioning a Next Generation Extended Reality Conferencing System with Efficient Photorealistic Human Rendering (CVPR)
Reprogramming Audio-driven Talking Face Synthesis into Text-driven
Self-supervised Learning of Event-guided Video Frame Interpolation for Rolling Shutter Frames
Boost Video Frame Interpolation via Motion Adaptation
VidEdit: Zero-Shot and Spatially Aware Text-Driven Video Editing
DORSal: Diffusion for Object-centric Representations of Scenes
Rerender A Video: Zero-Shot Text-Guided Video-to-Video Translation
Generative Semantic Communication: Diffusion Models Beyond Bit Recovery
Instruct-Video2Avatar: Video-to-Avatar Generation with Instructions
MoviePuzzle: Visual Narrative Reasoning through Multimodal Order Learning
Video Colorization with Pre-trained Text-to-Image Diffusion Models
Temporal-controlled Frame Swap for Generating High-Fidelity Stereo Driving Data for Autonomy Analysis
Adjustable Visual Appearance for Generalizable Novel View Synthesis
4DSR-GCN: 4D Video Point Cloud Upsampling using Graph Convolutional Networks
Intelligent Grimm -- Open-ended Visual Storytelling via Latent Diffusion Models
MammalNet: A Large-scale Video Benchmark for Mammal Recognition and Behavior Understanding
Exploring Phonetic Context in Lip Movement for Authentic Talking Face Generation
Video ControlNet: Towards Temporally Consistent Synthetic-to-Real Video Translation Using Conditional Image Diffusion Models
Context-Preserving Two-Stage Video Domain Translation for Portrait Stylization
OD-NeRF: Efficient Training of On-the-Fly Dynamic Neural Radiance Fields
EgoVSR: Towards High-Quality Egocentric Video Super-Resolution
NegVSR: Augmenting Negatives for Generalized Noise Modeling in Real-World Video Super-Resolution
Video Prediction Models as Rewards for Reinforcement Learning
Reparo: Loss-Resilient Generative Codec for Video Conferencing
CPNet: Exploiting CLIP-based Attention Condenser and Probability Map Guidance for High-fidelity Talking Face Generation (ICME)
Synthesizing Diverse Human Motions in 3D Indoor Scenes
InstructVid2Vid: Controllable Video Editing with Natural Language Instructions
SlotDiffusion: Object-Centric Generative Modeling with Diffusion Models (ICLR Workshop)
IDO-VFI: Identifying Dynamics via Optical Flow Guidance for Video Frame Interpolation with Events
Light-VQA: A Multi-Dimensional Quality Assessment Model for Low-Light Video Enhancement
Laughing Matters: Introducing Laughing-Face Generation using Diffusion Models
Identity-Preserving Talking Face Generation with Landmark and Appearance Priors (CVPR)
HumanRF: High-Fidelity Neural Radiance Fields for Humans in Motion
Style-A-Video: Agile Diffusion for Arbitrary Text-based Video Style Transfer
NeuralEditor: Editing Neural Radiance Fields via Manipulating Point Clouds (CVPR)
DSEC-MOS: Segment Any Moving Object with Moving Ego Vehicle
Video Frame Interpolation with Densely Queried Bilateral Correlation (IJCAI)
Dynamic Video Frame Interpolation with integrated Difficulty Pre-Assessment
AMT: All-Pairs Multi-Field Transforms for Efficient Frame Interpolation (CVPR)
Latent-Shift: Latent Diffusion with Temporal Shift for Efficient Text-to-Video Generation
CAT-NeRF: Constancy-Aware Tx$^2$Former for Dynamic Body Modeling (CVPR Workshop)
Soundini: Sound-Guided Diffusion for Natural Video Editing
Boosting Video Object Segmentation via Space-time Correspondence Learning (CVPR)
VidStyleODE: Disentangled Video Editing via StyleGAN and NeuralODEs
MED-VT: Multiscale Encoder-Decoder Video Transformer with Application to Object Segmentation (CVPR)
Neural Image-based Avatars: Generalizable Radiance Fields for Human Avatar Modeling
That's What I Said: Fully-Controllable Talking Face Generation
HNeRV: A Hybrid Neural Representation for Videos (CVPR)
BiFormer: Learning Bilateral Motion Estimation via Bilateral Transformer for 4K Video Frame Interpolation (CVPR)
TalkCLIP: Talking Head Generation with Text-Guided Expressive Speaking Styles
FONT: Flow-guided One-shot Talking Head Generation with Natural Head Motions (ICME)
Zero-Shot Video Editing Using Off-The-Shelf Image Diffusion Models
Consistent View Synthesis with Pose-Guided Diffusion Models (CVPR)
DAE-Talker: High Fidelity Speech-Driven Talking Face Generation with Diffusion Autoencoder
Novel View Synthesis of Humans using Differentiable Rendering
CelebV-Text: A Large-Scale Facial Text-Video Dataset (CVPR)
GestureDiffuCLIP: Gesture Diffusion Model with CLIP Latents (SIGGRAPH)
SUDS: Scalable Urban Dynamic Scenes (CVPR)
NeRF-DS: Neural Radiance Fields for Dynamic Specular Objects (CVPR)
HandNeRF: Neural Radiance Fields for Animatable Interacting Hands (CVPR)
Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Generators
SHERF: Generalizable Human NeRF from a Single Image
Pix2Video: Video Editing using Image Diffusion
Music-Driven Group Choreography (CVPR)
Pre-NeRF 360: Enriching Unbounded Appearances for Neural Radiance Fields
Emotionally Enhanced Talking Face Generation
Tubelet-Contrastive Self-Supervision for Video-Efficient Generalization
Confidence Attention and Generalization Enhanced Distillation for Continuous Video Domain Adaptation
MoRF: Mobile Realistic Fullbody Avatars from a Monocular Video
Unified Mask Embedding and Correspondence Learning for Self-Supervised Video Segmentation
Leaping Into Memories: Space-Time Deep Feature Synthesis
Learning Data-Driven Vector-Quantized Degradation Model for Animation Video Super-Resolution
FateZero: Fusing Attentions for Zero-shot Text-based Video Editing
LDMVFI: Video Frame Interpolation with Latent Diffusion Models
Learning Physical-Spatio-Temporal Features for Video Shadow Removal
NLUT: Neural-based 3D Lookup Tables for Video Photorealistic Style Transfer
Blowing in the Wind: CycleNet for Human Cinemagraphs from Still Images (CVPR)
Blind Video Deflickering by Neural Filtering with a Flawed Atlas (CVPR)
Butterfly: Multiple Reference Frames Feature Propagation Mechanism for Neural Video Compression (DCC)
One-Shot Video Inpainting (AAAI)
Continuous Space-Time Video Super-Resolution Utilizing Long-Range Temporal Information
Learning Neural Volumetric Representations of Dynamic Humans in Minutes (CVPR)
STB-VMM: Swin Transformer Based Video Motion Magnification
OPT: One-shot Pose-Controllable Talking Head Generation (ICASSP)
One-Shot Face Video Re-enactment using Hybrid Latent Spaces of StyleGAN2
Video Waterdrop Removal via Spatio-Temporal Fusion in Driving Scenes
Structure and Content-Guided Video Synthesis with Diffusion Models
AV-NeRF: Learning Neural Fields for Real-World Audio-Visual Scene Synthesis
Dreamix: Video Diffusion Models are General Video Editors
SceneScape: Text-Driven Consistent Scene Generation
Maximal Cliques on Multi-Frame Proposal Graph for Unsupervised Video Object Segmentation
Optical Flow Estimation in 360$^\circ$ Videos: Dataset, Model and Application
Unsupervised Volumetric Animation
Text-To-4D Dynamic Scene Generation
Regeneration Learning: A Learning Paradigm for Data Generation
Event-Based Frame Interpolation with Ad-hoc Deblurring (CVPR)
DiffTalk: Crafting Diffusion Models for Generalized Audio-Driven Portraits Animation (CVPR)
Diffused Heads: Diffusion Models Beat GANs on Talking-Face Generation
HyperReel: High-Fidelity 6-DoF Video with Ray-Conditioned Sampling (CVPR)
StyleTalk: One-shot Talking Head Generation with Controllable Speaking Styles (AAAI)
Detachable Novel Views Synthesis of Dynamic Scenes Using Distribution-Driven Neural Radiance Fields
SkyGPT: Probabilistic Short-term Solar Forecasting Using Synthetic Sky Videos from Physics-constrained VideoGPT
MovieFactory: Automatic Movie Creation from Text using Large Generative Models for Language and Images
Emotional Talking Head Generation based on Memory-Sharing and Attention-Augmented Networks
Neural Foundations of Mental Simulation: Future Prediction of Latent Representations on Dynamic Scenes
Avatar Fingerprinting for Authorized Use of Synthetic Talking-Head Videos
DynamicStereo: Consistent Dynamic Depth from Stereo Videos (CVPR)
ActorsNeRF: Animatable Few-shot Human Rendering with Generalizable NeRFs
Total-Recon: Deformable Scene Reconstruction for Embodied View Synthesis
3D-IntPhys: Towards More Generalized 3D-grounded Visual Intuitive Physics under Challenging Scenes (CVPR)
Leveraging triplet loss for unsupervised action segmentation (CVPR)
MonoHuman: Animatable Human Neural Field from Monocular Video (CVPR)
Trace and Pace: Controllable Pedestrian Animation via Guided Trajectory Diffusion (CVPR)
Seeing What You Said: Talking Face Generation Guided by a Lip Reading Expert (CVPR)
VIVE3D: Viewpoint-Independent Video Editing using 3D-Aware GANs (CVPR)
CAMS: CAnonicalized Manipulation Spaces for Category-Level Functional Hand-Object Manipulation Synthesis (CVPR)
Prediction of the morphological evolution of a splashing drop using an encoder-decoder
TriPlaneNet: An Encoder for EG3D Inversion
Dual-path Adaptation from Image to Video Transformers (CVPR)
Video-P2P: Video Editing with Cross-attention Control
IntrinsicNGP: Intrinsic Coordinate based Hash Encoding for Human NeRF
Robust Dynamic Radiance Fields (CVPR)
Implicit Identity Representation Conditioned Memory Compensation Network for Talking Head video Generation (ICCV 2023)
FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning
Bidirectionally Deformable Motion Modulation For Video-based Human Pose Transfer
Animate-A-Story: Storytelling with Retrieval-Augmented Video Generation
GD-VDM: Generated Depth for better Diffusion-based Video Generation
DDLP: Unsupervised Object-Centric Video Prediction with Deep Dynamic Latent Particles
Video Diffusion Models with Local-Global Context Guidance (IJCAI 2023)
Gen-L-Video: Multi-Text to Long Video Generation via Temporal Co-Denoising
Large Language Models are Frame-level Directors for Zero-shot Text-to-Video Generation
Probabilistic Adaptation of Text-to-Video Models
VIDM: Video Implicit Diffusion Models (AAAI 2023)
Mm-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation (CVPR 2023)
VideoFusion: Decomposed Diffusion Models for High-Quality Video Generation (CVPR 2023)
Conditional Image-to-Video Generation with Latent Flow Diffusion Models (CVPR 2023)
Physics-Driven Diffusion Models for Impact Sound Synthesis from Videos (CVPR 2023)
Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models (CVPR 2023)
3D Cinemagraphy from a Single Image (CVPR 2023)
MOSO: Decomposing MOtion, Scene and Object for Video Prediction (CVPR 2023)
SadTalker: Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation (CVPR 2023)
Towards End-to-End Generative Modeling of Long Videos with Memory-Efficient Bidirectional Transformers (CVPR 2023)
Learning Universal Policies via Text-Guided Video Generation
Tell me what happened: Unifying text-guided video completion via multimodal masked video generation (CVPR 2023)
MAGVIT: Masked Generative Video Transformer (CVPR 2023)
Text2video-zero: Text-to-image diffusion models are zero-shot video generators
VarietySound: Timbre-controllable video to sound generation via unsupervised information disentanglement
Text-to-4d dynamic scene generation
ControlVideo: Training-free Controllable Text-to-Video Generation
Control-A-Video: Controllable Text-to-Video Generation with Diffusion Models
VDT: An Empirical Study on Video Diffusion with Transformers
Sketching the Future (STF): Applying Conditional Control Techniques to Text-to-Video Models
Text2Performer: Text-Driven Human Video Generation
Generative Disco: Text-to-Video Generation for Music Visualization
MoStGAN-V: Video Generation with Temporal Motion Styles
Follow Your Pose: Pose-Guided Text-to-Video Generation using Pose-Free Videos
Sounding Video Generator: A Unified Framework for Text-guided Sounding Video Generation
MotionVideoGAN: A Novel Video Generator Based on the Motion Space Learned from Image Pairs
Consistency Models
MotionVideoGAN: A Novel Video Generator Based on the Motion Space Learned from Image Pairs
Time-Conditioned Generative Modeling of Object-Centric Representations for Video Decomposition and Prediction
VideoComposer: Compositional Video Synthesis with Motion Controllability
Cinematic Mindscapes: High-quality Video Reconstruction from Brain Activity (May, 2023)
Any-to-Any Generation via Composable Diffusion
VideoFactory: Swap Attention in Spatiotemporal Diffusions for Text-to-Video Generation
Preserve Your Own Correlation: A Noise Prior for Video Diffusion Models
Motion-Conditioned Diffusion Model for Controllable Video Synthesis
LaMD: Latent Motion Diffusion for Video Generation
DreamPose: Fashion Image-to-Video Synthesis via Stable Diffusion
Seer: Language Instructed Video Prediction with Latent Diffusion Models
Learning 3D Photography Videos via Self-supervised Diffusion on Single Images
InterGen: Diffusion-based Multi-human Motion Generation under Complex Interactions
ReMoDiffuse: Retrieval-Augmented Motion Diffusion Model
Human Motion Diffusion as a Generative Prior
Learn the Force We Can: Multi-Object Video Generation from Pixel-Level Interactions
Make-Your-Video: Customized Video Generation Using Textual and Structural Guidance
Make-A-Protagonist: Generic Video Editing with An Ensemble of Experts
DaGAN++: Depth-Aware Generative Adversarial Network for Talking Head Video Generation
LEO: Generative Latent Image Animator for Human Video Synthesis
Multi-object Video Generation from Single Frame Layouts
StyleLipSync: Style-based Personalized Lip-sync Video Generation
High-Fidelity and Freely Controllable Talking Head Video Generation (CVPR 2023)
Video Generation Beyond a Single Clip
DisCoHead: Audio-and-Video-Driven Talking Head Generation by Disentangled Control of Head Pose and Facial Expressions (ICASSP 2023)
Controllable Video Generation by Learning the Underlying Dynamical System with Neural ODE
DPE: Disentanglement of Pose and Expression for General Video Portrait Editing (CVPR 2023)
PV3D: A 3D Generative Model for Portrait Video Generation (ICLR 2023)
AADiff: Audio-Aligned Video Synthesis with Text-to-Image Diffusion (CVPR 2023 Workshop)
Controllable One-Shot Face Video Synthesis With Semantic Aware Prior
Decoupling Dynamic Monocular Videos for Dynamic View Synthesis
Feature-Conditioned Cascaded Video Diffusion Models for Precise Echocardiogram Synthesis (MICCAI 2023)
WALDO: Future Video Synthesis using Object Layer Decomposition and Parametric Flow Prediction
Fast Fourier Inception Networks for Occluded Video Prediction
Let's Think Frame by Frame: Evaluating Video Chain of Thought with Video Infilling and Prediction
PastNet: Introducing Physical Inductive Biases for Spatio-temporal Video Prediction
A Control-Centric Benchmark for Video Prediction (ICLR 2023)
Combining Vision and Tactile Sensation for Video Prediction
MS-LSTM: Exploring Spatiotemporal Multiscale Representations in Video Prediction Domain
Forecasting localized weather impacts on vegetation as seen from space with meteo-guided video prediction
A Dynamic Multi-Scale Voxel Flow Network for Video Prediction (CVPR 2023)
TKN: Transformer-based Keypoint Prediction Network For Real-time Video Prediction
Implicit Stacked Autoregressive Model for Video Prediction
MOSO: Decomposing MOtion, Scene and Object for Video Prediction (CVPR 2023)
Polar Prediction of Natural Videos
STDepthFormer: Predicting Spatio-temporal Depth from Video with a Self-supervised Transformer Model (IROS 2023)
Object-Centric Video Prediction via Decoupling of Object Dynamics and Interactions
Anti-aliasing Predictive Coding Network for Future Video Frame Prediction
Long-horizon video prediction using a dynamic latent hierarchy
Motion and Context-Aware Audio-Visual Conditioned Video Prediction
MIMO Is All You Need : A Strong Multi-In-Multi-Out Baseline for Video Prediction (AAAI 2023)
A unified model for continuous conditional video prediction (CVPR 2023 Workshop)
PreCNet: Next-Frame Video Prediction Based on Predictive Coding (IEEE TNNLS 2023)
NUWA-XL: Diffusion over Diffusion for eXtremely Long Video Generation

2022

Video Diffusion Models (NeurIPS 2022)
McVd: Masked Conditional Video Diffusion for Prediction, Generation, and Interpolation (NeurIPS 2022)
Diffusion Models for Video Prediction and Infilling (TMLR 2022)
Make-A-Video: Text-to-Video Generation without Text-Video Data (ICLR 2023)
DaGAN: Depth-Aware Generative Adversarial Network for Talking Head Video Generation (CVPR 2022)
Show Me What and Tell Me How: Video Synthesis via Multimodal Conditioning (CVPR 2022)
Playable Environments: Video Manipulation in Space and Time (CVPR 2022)
Fast-Vid2Vid: Spatial-Temporal Compression for Video-to-Video Synthesis (ECCV 2022)
TM2T: Stochastic and Tokenized Modeling for the Reciprocal Generation of 3D Human Motions and Texts (ECCV 2022)
Imagen Video: High Definition Video Generation with Diffusion Models
Phenaki: Variable length video generation from open domain textual description

Code (unofficial?):
Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation
Towards Smooth Video Composition
Latent Video Diffusion Models for High-Fidelity Long Video Generation
SinFusion: Training Diffusion Models on a Single Image or Video
INR-V: A Continuous Representation Space for Video-based Generative Tasks
Computational Choreography using Human Motion Synthesis
Phenaki: Variable Length Video Generation From Open Domain Textual Description
Temporally Consistent Transformers for Video Generation
StyleFaceV: Face Video Generation via Decomposing and Recomposing Pretrained StyleGAN3
NUWA-Infinity: Autoregressive over Autoregressive Generation for Infinite Visual Synthesis
3D-Aware Video Generation
Patch-based Object-centric Transformers for Efficient Video Generation
Generating Long Videos of Dynamic Scenes
D'ARTAGNAN: Counterfactual Video Generation
CogVideo: Large-scale Pretraining for Text-to-Video Generation via Transformers
Latent Video Diffusion Models for High-Fidelity Video Generation With Arbitrary Lengths
MagicVideo: Efficient Video Generation With Latent Diffusion Models
Diffusion Probabilistic Modeling for Video Generation
Flexible Diffusion Modeling of Long Videos
Long Video Generation with Time-Agnostic VQGAN and Time-Sensitive Transformer (ECCV 2022)
Diffusion Probabilistic Modeling for Video Generation
StyleHEAT: One-Shot High-Resolution Editable Talking Face Generation via Pre-trained StyleGAN
Generating Videos with Dynamics-aware Implicit Generative Adversarial Networks (ICLR 2022)
StyleGAN-V: A Continuous Video Generator with the Price, Image Quality and Perks of StyleGAN2 (CVPR 2022)
Make It Move: Controllable Image-to-Video Generation with Text Descriptions (CVPR 2022)
NeMo: 3D Neural Motion Fields from Multiple Video Instances of the Same Action (CVPR 2023)
Cross-Resolution Flow Propagation for Foveated Video Super-Resolution (WACV 2023)
MonoNeRF: Learning a Generalizable Dynamic Radiance Field from Monocular Videos
Scalable Adaptive Computation for Iterative Generation (ICML 2023)
Predictive Coding Based Multiscale Network with Encoder-Decoder LSTM for Video Prediction
InstantAvatar: Learning Avatars from Monocular Video in 60 Seconds
MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation (CVPR 2023)
PointAvatar: Deformable Point-based Head Avatars from Videos
PV3D: A 3D Generative Model for Portrait Video Generation (ICLR 2023)
Video Prediction by Efficient Transformers (ICPR 2022)
MAGVIT: Masked Generative Video Transformer (CVPR 2023)
Physically Plausible Animation of Human Upper Body from a Single Image (WACV 2023)
MIMO Is All You Need : A Strong Multi-In-Multi-Out Baseline for Video Prediction
Neural Cell Video Synthesis via Optical-Flow Diffusion
Video Object of Interest Segmentation
Audio-Driven Co-Speech Gesture Video Generation (NeurIPS 2022)
VIDM: Video Implicit Diffusion Models (AAAI 2023)
Mixed Neural Voxels for Fast Multi-view Video Synthesis
VideoReTalking: Audio-based Lip Synchronization for Talking Head Video Editing In the Wild (SIGGRAPH Asia 2022)
Randomized Conditional Flow Matching for Video Prediction
Progressive Disentangled Representation Learning for Fine-Grained Controllable Talking Head Synthesis
WALDO: Future Video Synthesis using Object Layer Decomposition and Parametric Flow Prediction
Efficient Feature Extraction for High-resolution Video Frame Interpolation (BMVC 2022)
Dynamic Neural Portraits (WACV 2023)
Make-A-Story: Visual Memory Conditioned Consistent Story Generation (CVPR 2023)
Tell Me What Happened: Unifying Text-guided Video Completion via Multimodal Masked Video Generation (CVPR 2023)
Hand Avatar: Free-Pose Hand Animation and Rendering from Monocular Video (CVPR 2023)
SuperTran: Reference Based Video Transformer for Enhancing Low Bitrate Streams in Real Time
Depth-Supervised NeRF for Multi-View RGB-D Operating Room Images
SadTalker: Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation (CVPR 2023)
FLEX: Full-Body Grasping Without Full-Body Grasps (CVPR 2023)
Blur Interpolation Transformer for Real-World Motion from Blur (CVPR 2023)
DyNCA: Real-time Dynamic Texture Synthesis Using Neural Cellular Automata (CVPR 2023)
H-VFI: Hierarchical Frame Interpolation for Videos with Large Motions
AdaFNIO: Adaptive Fourier Neural Interpolation Operator for video frame interpolation
SPACE: Speech-driven Portrait Animation with Controllable Expression
Creative divergent synthesis with generative models
CaDM: Codec-aware Diffusion Modeling for Neural-enhanced Video Streaming
Advancing Learned Video Compression with In-loop Frame Prediction (IEEE T-CSVT 2022)
SSGVS: Semantic Scene Graph-to-Video Synthesis
Common Pets in 3D: Dynamic New-View Synthesis of Real-Life Deformable Categories
Temporal Consistency Learning of inter-frames for Video Super-Resolution (IEEE T-CSVT 2022)
SyncTalkFace: Talking Face Generation with Precise Lip-Syncing via Audio-Lip Memory (AAAI 2022)
Learning Variational Motion Prior for Video-based Motion Capture
Streaming Radiance Fields for 3D Video Synthesis (NeurIPS 2022)
Learning to forecast vegetation greenness at fine resolution over Africa with ConvLSTMs (NeurIPS 2022)
EpipolarNVS: leveraging on Epipolar geometry for single-image Novel View Synthesis (BMVC 2022)
Towards Real-Time Text2Video via CLIP-Guided, Pixel-Level Optimization
Facial Expression Video Generation Based-On Spatio-temporal Convolutional GAN: FEV-GAN (ISWA)
Temporal and Contextual Transformer for Multi-Camera Editing of TV Shows (ECCV 2022)
Stochastic Occupancy Grid Map Prediction in Dynamic Scenes
MonoNeRF: Learning Generalizable NeRFs from Monocular Videos without Camera Pose (ICML 2023)
Pre-Avatar: An Automatic Presentation Generation Framework Leveraging Talking Avatar (ICTAI 2022)
AniFaceGAN: Animatable 3D-Aware Face Image Generation for Video Avatars (NeurIPS 2022)
A Generalist Framework for Panoptic Segmentation of Images and Videos
Masked Motion Encoding for Self-Supervised Video Representation Learning (CVPR 2023)
SlotFormer: Unsupervised Visual Dynamics Simulation with Object-Centric Models (ICLR 2023)
Controllable Radiance Fields for Dynamic Face Synthesis (3DV 2022)
A unified model for continuous conditional video prediction (CVPR 2023)
DeepHS-HDRVideo: Deep High Speed High Dynamic Range Video Reconstruction (ICPR 2022)
Self-supervised Video Representation Learning with Motion-Aware Masked Autoencoders
Compressing Video Calls using Synthetic Talking Heads (BMVC 2022)
Text-driven Video Prediction
Audio-Visual Face Reenactment (WACV 2023)
Geometry Driven Progressive Warping for One-Shot Face Animation
Cross-identity Video Motion Retargeting with Joint Transformation and Synthesis (WACV 2023)
Real-RawVSR: Real-World Raw Video Super-Resolution with a Benchmark Dataset (ECCV 2022)
VToonify: Controllable High-Resolution Portrait Video Style Transfer (SIGGRAPH Asia 2022)
T3VIP: Transformation-based 3D Video Prediction (IEEE)
NeuralMarker: A Framework for Learning General Marker Correspondence (SIGGRAPH Asia 2022)
AutoLV: Automatic Lecture Video Generator
Continuously Controllable Facial Expression Editing in Talking Face Videos
A Deep Moving-camera Background Model (ECCV 2022)
HARP: Autoregressive Latent Video Prediction with High-Fidelity Image Generator (ICIP 2022)
Talking Head from Speech Audio using a Pre-trained Image Generator (ACM Multimedia 2022)
Treating Motion as Option to Reduce Motion Dependency in Unsupervised Video Object Segmentation (WACV 2023)
Neural Sign Reenactor: Deep Photorealistic Sign Language Retargeting (CVPR 2023)
REMOT: A Region-to-Whole Framework for Realistic Human Motion Transfer (ACMMM 2022)
SketchBetween: Video-to-Video Synthesis for Sprite Animation via Sketches (ACM conference on the Foundations of Digital Games)
StableFace: Analyzing and Improving Motion Stability for Talking Face Generation
VMFormer: End-to-End Video Matting with Transformer
Neural Novel Actor: Learning a Generalized Animatable Neural Representation for Human Actors
StyleTalker: One-shot Style-based Audio-driven Talking Head Video Generation
Towards MOOCs for Lipreading: Using Synthetic Talking Heads to Train Humans in Lipreading at Scale (WACV 2023)
Temporal View Synthesis of Dynamic Scenes through 3D Object Motion Estimation with Multi-Plane Images (ISMAR 2022)
Wildfire Forecasting with Satellite Images and Deep Generative Model
Video Interpolation by Event-driven Anisotropic Adjustment of Optical Flow (ECCV 2022)
Extreme-scale Talking-Face Video Upsampling with Audio-Visual Priors (ACMMM 2022)
Semi-Supervised Video Inpainting with Cycle Consistency Constraints
UAV-CROWD: Violent and non-violent crowd activity simulator from the perspective of UAV
Cine-AI: Generating Video Game Cutscenes in the Style of Human Directors (ACMHCI)
Language-Guided Face Animation by Recurrent StyleGAN-based Generator
Boosting neural video codecs by exploiting hierarchical redundancy
PS-NeRV: Patch-wise Stylized Neural Representations for Videos
Real-time Gesture Animation Generation from Speech for Virtual Human Interaction (CHI EA 2021)
Meta-Interpolation: Time-Arbitrary Frame Interpolation via Dual Meta-Learning
Efficient Video Deblurring Guided by Motion Magnitude (ECCV 2022)
Error-Aware Spatial Ensembles for Video Frame Interpolation
Learning Dynamic Facial Radiance Fields for Few-Shot Talking Head Synthesis (ECCV 2022)
InfiniteNature-Zero: Learning Perpetual View Generation of Natural Scenes from Single Images (ECCV 2022)
RealFlow: EM-based Realistic Optical Flow Dataset Generation from Videos (ECCV 2022 Oral)
Towards Interpretable Video Super-Resolution via Alternating Optimization (ECCV 2022)
Error Compensation Framework for Flow-Guided Video Inpainting (ECCV 2022)
Animation from Blur: Multi-modal Blur Decomposition with Motion Guidance (ECCV 2022)
TTVFI: Learning Trajectory-Aware Transformer for Video Frame Interpolation (CVPR 2022 Oral)
Audio Input Generates Continuous Frames to Synthesize Facial Video Using Generative Adiversarial Networks
Neighbor Correspondence Matching for Flow-based Video Frame Synthesis (ACMMM 2022)
You Only Align Once: Bidirectional Interaction for Spatial-Temporal Video Super-Resolution (ACMMM 2022)
CANF-VC: Conditional Augmented Normalizing Flows for Video Compression
A Probabilistic Model Of Interaction Dynamics for Dyadic Face-to-Face Settings
Cross-Attention Transformer for Video Interpolation
Jointly Harnessing Prior Structures and Temporal Consistency for Sign Language Video Generation
Segmenting Moving Objects via an Object-Centric Layered Representation (NeurIPS 2022)
Programmatic Concept Learning for Human Motion Description and Synthesis (CVPR 2022)
Optimizing Video Prediction via Video Frame Interpolation (CVPR 2022)
Perceptual Conversational Head Generation with Regularized Driver and Enhanced Renderer (ACMMM 2022)
MaskViT: Masked Visual Pre-Training for Video Prediction
Enhanced Bi-directional Motion Estimation for Video Frame Interpolation (WACV 2023)
Face-Dubbing++: Lip-Synchronous, Voice Preserving Translation of Videos
STIP: A SpatioTemporal Information-Preserving and Perception-Augmented Model for High-Resolution Video Prediction (CVPR 2022)
JNMR: Joint Non-linear Motion Regression for Video Frame Interpolation
SimVP: Simpler yet Better Video Prediction (CVPR 2022)
Recurrent Video Restoration Transformer with Guided Deformable Attention (NeurIPS 2022)
Cascaded Video Generation for Videos In-the-Wild (ICPR 2022)
D$^2$NeRF: Self-Supervised Decoupling of Dynamic and Static Objects from a Monocular Video
TubeFormer-DeepLab: Video Mask Transformer (CVPR 2022)
IFRNet: Intermediate Feature Refine Network for Efficient Frame Interpolation (CVPR 2022)
Feature-Aligned Video Raindrop Removal with Temporal Constraints
Future Transformer for Long-term Action Anticipation (CVPR 2022)
Video2StyleGAN: Disentangling Local and Global Variations in a Video
Automatic Generation of Synthetic Colonoscopy Videos for Domain Randomization
Latent-space disentanglement with untrained generator networks for the isolation of different motion types in video data
Video Frame Interpolation with Transformer (CVPR 2022)
Multi-encoder Network for Parameter Reduction of a Kernel-based Interpolation Architecture (NTIRE)
Diverse Video Generation from a Single Video (CVPR 2022)
Video-ReTime: Learning Temporally Varying Speediness for Time Remapping (AICC)
Spatial-Temporal Space Hand-in-Hand: Spatial-Temporal Video Super-Resolution via Cycle-Projected Mutual Learning
Image2Gif: Generating Continuous Realistic Animations with Warping NODEs (CVPR 2022)
GAN-Based Multi-View Video Coding with Spatio-Temporal EPI Reconstruction
Parametric Reshaping of Portraits in Videos
Video Extrapolation in Space and Time (ECCV 2022)
Zero-Episode Few-Shot Contrastive Predictive Coding: Solving intelligence tests without prior training
Copy Motion From One to Another: Fake Motion Video Generation
Neural Implicit Representations for Physical Parameter Inference from a Single Video (WACV 2023)
Talking Head Generation Driven by Speech-Related Facial Action Units and Audio- Based on Multimodal Representation Fusion (BMVC 2021)
ClothFormer:Taming Video Virtual Try-on in All Module (CVPR 2022 Oral)
Future Object Detection with Spatiotemporal Transformers
STAU: A SpatioTemporal-Aware Unit for Video Prediction and Beyond (TPAMI)
Sound-Guided Semantic Video Generation (ECCV 2022)
Less than Few: Self-Shot Video Instance Segmentation
Learning to Listen: Modeling Non-Deterministic Dyadic Facial Motion
MUGEN: A Playground for Video-Audio-Text Multimodal Understanding and GENeration
Controllable Video Generation through Global and Local Motion Dynamics
Dynamic Neural Textures: Generating Talking-Face Videos with Continuously Controllable Expressions
Self-Supervised Traffic Advisors: Distributed, Multi-view Traffic Prediction for Smart Cities (ITSC)
Structure-Aware Motion Transfer with Deformable Anchor Model (CVPR 2022)
HSTR-Net: High Spatio-Temporal Resolution Video Generation For Wide Area Surveillance
SunStage: Portrait Reconstruction and Relighting using the Sun as a Light Stage (CVPR 2023)
Many-to-many Splatting for Efficient Video Frame Interpolation (CVPR 2022)
Video Demoireing with Relation-Based Temporal Consistency (CVPR 2022)
Neural Rendering of Humans in Novel View and Pose from Monocular Video
MPS-NeRF: Generalizable 3D Human Rendering from Multiview Images (TPAMI 2022)
Foveation-based Deep Video Compression without Motion Search
STRPM: A Spatiotemporal Residual Predictive Model for High-Resolution Video Prediction (CVPR 2022)
High-resolution Face Swapping via Latent Semantics Disentanglement (CVPR 2022)
VPTR: Efficient Transformers for Video Prediction (ICPR 2022)
Long-term Video Frame Interpolation via Feature Propagation (CVPR 2022)
Signing at Scale: Learning to Co-Articulate Signs for Large-Scale Photo-Realistic Sign Language Production
Dressing in the Wild by Watching Dance Videos (CVPR 2022)
Structured Local Radiance Fields for Human Avatar Modeling (CVPR 2022)
V3GAN: Decomposing Background, Foreground and Motion for Video Generation
Keypoints Tracking via Transformer Networks
VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training (NeurIPS 2022)
Unifying Motion Deblurring and Frame Interpolation with Events (CVPR 2022)
QS-Craft: Learning to Quantize, Scrabble and Craft for Conditional Human Motion Animation
Sem2NeRF: Converting Single-View Semantic Masks to Neural Radiance Fields (ECCV 2022)
Stochastic Video Prediction with Structure and Motion (TPAMI)
Exploring Motion Ambiguity and Alignment for High-Quality Video Frame Interpolation
Beyond a Video Frame Interpolator: A Space Decoupled Learning Approach to Continuous Image Transition
Transframer: Arbitrary Frame Prediction with Generative Models
Look Outside the Room: Synthesizing A Consistent Long-Term 3D Scene Video from A Single Image (CVPR 2022)
MSPred: Video Prediction at Multiple Spatio-Temporal Scales with Hierarchical Recurrent Networks
Latent Image Animator: Learning to Animate Images via Latent Space Navigation (ICLR 2022)
DialogueNeRF: Towards Realistic Avatar Face-to-face Conversation Video Generation
One-stage Video Instance Segmentation: From Frame-in Frame-out to Clip-in Clip-out
NeRFocus: Neural Radiance Field for 3D Synthetic Defocus
A Novel Dual Dense Connection Network for Video Super-resolution
Region-of-Interest Based Neural Video Compression (BMVC 2022)
Thinking the Fusion Strategy of Multi-reference Face Reenactment (ICIP 2022)
Neural Marionette: Unsupervised Learning of Motion Skeleton and Latent Dynamics from Volumetric Video (AAAI 2022)
Enhancing Deformable Convolution based Video Frame Interpolation with Coarse-to-fine 3D CNN
Exploring Discontinuity for Video Frame Interpolation (CVPR 2023)
A new face swap method for image and video domains: a technical report
Feature-Style Encoder for Style-Based GAN Inversion
Third Time's the Charm? Image and Video Editing with StyleGAN3
Deep Video Prior for Video Consistency and Propagation (TPAMI 2021)
Non-linear Motion Estimation for Video Frame Interpolation using Space-time Convolutions (CLIC, CVPR 2022)
Splatting-based Synthesis for Video Frame Interpolation (WACV 2023)
Stitch it in Time: GAN-Based Facial Editing of Real Videos
Self-Supervised Deep Blind Video Super-Resolution
Autoencoding Video Latents for Adversarial Video Generation
AugLy: Data Augmentations for Robustness
Towards Realistic Visual Dubbing with Heterogeneous Sources (ACMMM 2021)
Audio-Driven Talking Face Video Generation with Dynamic Convolution Kernels (IEEE)
Learning Temporally and Semantically Consistent Unpaired Video-to-video Translation Through Pseudo-Supervision From Synthetic Optical Flow (AAAI 2022)
MetaDance: Few-shot Dancing Video Retargeting via Temporal-aware Meta-learning
Music2Video: Automatic Generation of Music Video with fusion of audio and text
MobileFaceSwap: A Lightweight Framework for Video Face Swapping (AAAI 2022)
Structured 3D Features for Reconstructing Controllable Avatars (CVPR 2023)
MoFusion: A Framework for Denoising-Diffusion-based Motion Synthesis (CVPR 2023)
High-fidelity Facial Avatar Reconstruction from Monocular Video with Generative Priors
3DDesigner: Towards Photorealistic 3D Object Generation and Editing with Text-guided Diffusion Models
Audio-visual video face hallucination with frequency supervision and cross modality support by speech based lip reading loss
It Takes Two: Masked Appearance-Motion Modeling for Self-supervised Video Transformer Pre-training
See, Plan, Predict: Language-guided Cognitive Planning with Video Prediction
Motion Transformer for Unsupervised Image Animation (ECCV 2022)
Low-Light Video Enhancement with Synthetic Event Guidance
Neural Capture of Animatable 3D Human from Monocular Video (ECCV 2022)
NDF: Neural Deformable Fields for Dynamic Human Modelling (ECCV 2022)
Diverse Dance Synthesis via Keyframes with Transformer Controllers
CTrGAN: Cycle Transformers GAN for Gait Transfer
Enhanced Deep Animation Video Interpolation
An Identity-Preserved Framework for Human Motion Transfer
Unsupervised Coherent Video Cartoonization with Perceptual Motion Consistency
Learning Multi-Object Dynamics with Compositional Neural Radiance Fields (CoRL 2022)
VRT: A Video Restoration Transformer

2021

HYouTube: Video Harmonization Dataset
Video Frame Interpolation without Temporal Priors (NeurIPS 2020)
ST-MFNet: A Spatio-Temporal Multi-Flow Network for Frame Interpolation (CVPR 2022)
Video Frame Interpolation Transformer (CVPR 2022)
Improving the Perceptual Quality of 2D Animation Interpolation (ECCV 2022)
Render In-between: Motion Guided Video Synthesis for Action Interpolation
Flow-Guided Video Inpainting with Scene Templates (ICCV 2021)
Asymmetric Bilateral Motion Estimation for Video Frame Interpolation (ICCV 2021)
EA-Net: Edge-Aware Network for Flow-based Video Frame Interpolation
Zooming SlowMo: An Efficient One-Stage Framework for Space-Time Video Super-Resolution (CVPR 2020)
PDWN: Pyramid Deformable Warping Network for Video Interpolation
Motion-blurred Video Interpolation and Extrapolation (AAAI 2021)
MUGL: Large Scale Multi Person Conditional Action Generation with Locomotion (WACV 2022.)
LARNet: Latent Action Representation for Human Action Synthesis (ICLR 2022)
Synthetic Data for Multi-Parameter Camera-Based Physiological Sensing
Physics-based Human Motion Estimation and Synthesis from Videos (ICCV 2021)
Deep Person Generation: A Survey from the Perspective of Face, Pose and Cloth Synthesis
Sparse to Dense Motion Transfer for Face Image Animation (ICCV 2021)
FLAME-in-NeRF : Neural control of Radiance Fields for Free View Face Animation
Robust Pose Transfer with Dynamic Details using Neural Video Rendering
Gradient Forward-Propagation for Large-Scale Temporal Video Modelling (CVPR 2021)
Task-Generic Hierarchical Human Motion Prior using VAEs
Pose-Guided Sign Language Video GAN with Dynamic Lambda
Behavior-Driven Synthesis of Human Dynamics (CVPR 2021)
AI Choreographer: Music Conditioned 3D Dance Generation with AIST++
Neural Point Light Fields (CVPR 2022)
Human Pose Manipulation and Novel View Synthesis using Differentiable Rendering
Temporal-MPI: Enabling Multi-Plane Images for Dynamic Scene Modelling via Temporal Basis Learning (ECCV 2022)
H-NeRF: Neural Radiance Fields for Rendering and Temporal Reconstruction of Humans in Motion
Pose-guided Generative Adversarial Net for Novel View Action Synthesis (WACV 2022)
Neural Human Performer: Learning Generalizable Radiance Fields for Human Performance Rendering
View Synthesis of Dynamic Scenes based on Deep 3D Mask Volume (ICCV 2021)
Target Adaptive Context Aggregation for Video Scene Graph Generation (ICCV 2021)
LiveView: Dynamic Target-Centered MPI for View Synthesis
Novel View Video Prediction Using a Dual Representation (ICIP 2021)
Neural Actor: Neural Free-view Synthesis of Human Actors with Pose Control (SIGGRAPH Asia 2021)
Stylizing 3D Scene via Implicit Representation and HyperNetwork (WACV2022)
LUMINOUS: Indoor Scene Generation for Embodied AI Challenges
NeuralDiff: Segmenting 3D objects that move in egocentric videos (3DV 2021)
Talking Head Generation with Audio and Speech Related Facial Action Units (BMVC 2021)
FACIAL: Synthesizing Dynamic Talking Face with Implicit Attribute Learning (ICCV 2021)
Audio2Head: Audio-driven One-shot Talking-head Generation with Natural Head Motion
Speech2Video: Cross-Modal Distillation for Speech to Video Generation (ACCV 2020)
LipSync3D: Data-Efficient Learning of Personalized 3D Talking Faces from Video using Pose and Lighting Normalization (CVPR 2021)
Temporally coherent video anonymization through GAN inpainting (FG2021)
Image-to-Video Generation via 3D Facial Dynamics
3D-TalkEmo: Learning to Synthesize 3D Emotional Talking Head
Pose-Controllable Talking Face Generation by Implicitly Modularized Audio-Visual Representation (CVPR 2021)
MeshTalk: 3D Face Animation from Speech using Cross-Modality Disentanglement (CVPR 2021)
Audio-Driven Emotional Video Portraits (CVPR 2021)
AD-NeRF: Audio Driven Neural Radiance Fields for Talking Head Synthesis (ICCV 2021)
Imitating Arbitrary Talking Style for Realistic Audio-DrivenTalking Face Synthesis (MM 2021)
Dance In the Wild: Monocular Human Animation with Neural Dynamic Appearance Synthesis
Intelligent Video Editing: Incorporating Modern Talking Face Generation Algorithms in a Video Editor (ICVGIP 2021)
Self-Supervised Decomposition, Disentanglement and Prediction of Video Sequences while Interpreting Dynamics: A Koopman Perspective
Temporally Coherent Person Matting Trained on Fake-Motion Dataset
Occlusion-Aware Video Object Inpainting (ICCV 2021)
UniFaceGAN: A Unified Framework for Temporally Consistent Facial Video Editing (IEEE TIP 2021)
Learning to Cut by Watching Movies (ICCV 2021)
VCGAN: Video Colorization with Hybrid Generative Adversarial Network (IEEE (TMM)2021)
M3L: Language-based Video Editing via Multi-Modal Multi-Level Transformers (CVPR 2022)
Personal Privacy Protection via Irrelevant Faces Tracking and Pixelation in Video Live Streaming
HDRVideo-GAN: Deep Generative HDR Video Reconstruction (ICVGIP 2021)
Layered Controllable Video Generation (ECCV 2022)
Video Content Swapping Using GAN
Xp-GAN: Unsupervised Multi-object Controllable Video Generation
Action2video: Generating Videos of Human 3D Actions (IJCV 2022)
Image Comes Dancing with Collaborative Parsing-Flow Video Synthesis (TIP 2021)
Sketch Me A Video
Video Autoencoder: self-supervised disentanglement of static 3D structure and motion (ICCV 2021)
Diverse Generation from a Single Video Made Possible
Conditional MoCoGAN for Zero-Shot Video Generation
Simple Video Generation using Neural ODEs
Learning Fine-Grained Motion Embedding for Landscape Animation (ACM Multimedia 2021)
Conditional Temporal Variational AutoEncoder for Action Video Prediction (ECCV 2018)
iButter: Neural Interactive Bullet Time Generator for Human Free-viewpoint Rendering (ACM MM 2021)
RockGPT: Reconstructing three-dimensional digital rocks from single two-dimensional slice from the perspective of video generation
Video Generation from Text Employing Latent Path Construction for Temporal Modeling
Generative Video Transformer: Can Objects be the Words?
StyleVideoGAN: A Temporal Generative Model using a Pretrained StyleGAN
Cross-View Exocentric to Egocentric Video Synthesis (ACM MM 2021)
iPOKE: Poking a Still Image for Controlled Stochastic Video Synthesis (ICCV 2021)
Efficient training for future video generation based on hierarchical disentangled representation of latent variables
Hierarchical Video Generation for Complex Data
Stochastic Image-to-Video Synthesis using cINNs (CVPR 2021)
Strumming to the Beat: Audio-Conditioned Contrastive Video Textures (WACV 2022)
Collaborative Learning to Generate Audio-Video Jointly (ICASSP 2021)
Learning to compose 6-DoF omnidirectional videos using multi-sphere images
Neural 3D Video Synthesis from Multi-view Video (CVPR 2022)
Dual-MTGAN: Stochastic and Deterministic Motion Transfer for Image-to-Video Synthesis (ICPR 2020)
One Shot Audio to Animated Video Generation
TräumerAI: Dreaming Music with StyleGAN (NeurIPS Workshop 2020)
ArrowGAN : Learning to Generate Videos by Learning Arrow of Time
InMoDeGAN: Interpretable Motion Decomposition Generative Adversarial Network for Video Generation
Two-stage Rule-induction Visual Reasoning on RPMs with an Application to Video Prediction
FREGAN : an application of generative adversarial networks in enhancing the frame rate of videos
TaylorSwiftNet: Taylor Driven Temporal Modeling for Swift Future Frame Prediction
Wide and Narrow: Video Prediction from Context and Motion
Fourier-based Video Prediction through Relational Object Motion
A Hierarchical Variational Neural Uncertainty Model for Stochastic Video Prediction
Unsupervised Video Prediction from a Single Frame by Estimating 3D Dynamic Scene Structure
Conditional COT-GAN for Video Prediction with Kernel Smoothing
Anticipative Video Transformer (ICCV 2021)
Taylor saves for later: disentanglement for video prediction using Taylor representation
Local Frequency Domain Transformer Networks for Video Prediction
Object-centric Video Prediction without Annotation
Hierarchical Motion Understanding via Motion Programs (CVPR 2021)
Learning Semantic-Aware Dynamics for Video Prediction (CVPR 2021)
Revisiting Hierarchical Approach for Persistent Long-Term Video Prediction (ICLR 2021)
Video Prediction Recalling Long-term Motion Context via Memory Alignment Learning (CVPR 2021)
Future Frame Prediction for Robot-assisted Surgery (IPMI 2021)
Greedy Hierarchical Variational Autoencoders for Large-Scale Video Prediction
MotionRNN: A Flexible Model for Video Prediction with Spacetime-Varying Motions (CVPR 2021)
Clockwork Variational Autoencoders
VAE^2: Preventing Posterior Collapse of Variational Video Predictions in the Wild
A Stacking Ensemble Approach for Supervised Video Summarization
ERA: Entity Relationship Aware Video Summarization with Wasserstein GAN
Unsupervised Video Summarization with a Convolutional Attentive Adversarial Network
Reconstructive Sequence-Graph Network for Video Summarization (IEEE TPAMI 2021)
Creating and Reenacting Controllable 3D Humans with Differentiable Rendering (WACV 2022)
I2V-GAN: Unpaired Infrared-to-Visible Video Translation (ACM MM 2021)
Egocentric Videoconferencing
Moving SLAM: Fully Unsupervised Deep Learning in Non-Rigid Scenes
Long-Term Temporally Consistent Unpaired Video Translation from Simulated Surgical 3D Data (ICCV 2021)
A Shape-Aware Retargeting Approach to Transfer Human Motion and Appearance in Monocular Videos (IJCV 2021)
Frame Difference-Based Temporal Loss for Video Stylization
Self-Supervised Equivariant Scene Synthesis from Video
Disentangled Recurrent Wasserstein Autoencoder
Non-Rigid Neural Radiance Fields: Reconstruction and Novel View Synthesis of a Dynamic Scene From Monocular Video
NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion
Generative Adversarial Graph Convolutional Networks for Human Action Synthesis (WACV 2022)
Towards Using Clothes Style Transfer for Scenario-aware Person Video Generation
Latent Image Animator: Learning to animate image via latent space navigation (ICLR 2022)
SLAMP: Stochastic Latent Appearance and Motion Prediction (ICCV 2021)
VirtualConductor: Music-driven Conducting Video Generation System (ICME 2021)
Click to Move: Controlling Video Generation with Sparse Motion (ICCV 2021)
VideoGPT: Video Generation using VQ-VAE and Transformers
Latent Neural Differential Equations for Video Generation
Understanding Object Dynamics for Interactive Image-to-Video Synthesis (CVPR 2021)
One-Shot Free-View Neural Talking-Head Synthesis for Video Conferencing (CVPR 2021)
Flow Guided Transformable Bottleneck Networks for Motion Retargeting (CVPR 2021)
Stable View Synthesis (CVPR 2021)
Scene-Aware Generative Network for Human Motion Synthesis (CVPR 2021)
Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes (CVPR 2021)
Deep Animation Video Interpolation in the Wild (CVPR 2021)
High-Fidelity Neural Human Motion Transfer from Monocular Video (CVPR 2021)
Flow-Guided One-Shot Talking Face Generation With a High-Resolution Audio-Visual Dataset (CVPR 2021)
Layout-Guided Novel View Synthesis From a Single Indoor Panorama (CVPR 2021)
Space-Time Neural Irradiance Fields for Free-Viewpoint Video (CVPR 2021)
GeoSim: Realistic Video Simulation via Geometry-Aware Composition for Self-Driving (CVPR 2021)
Animating Pictures With Eulerian Motion Fields (CVPR 2021)
SLAMP: Stochastic Latent Appearance and Motion Prediction (ICCV 2021)
CCVS: Context-aware Controllable Video Synthesis (NeurIPS 2021)
Diverse Video Generation using a Gaussian Process Trigger (ICLR 2021)
FitVid: Overfitting in Pixel-Level Video Prediction
NWT: Towards natural audio-to-video generation with representation learning
Editable Free-viewpoint Video Using a Layered Neural Representation
A Good Image Generator Is What You Need for High-Resolution Video Synthesis
GODIVA: Generating Open-DomaIn Videos from nAtural Descriptions
Text2Video: Text-driven Talking-head Video Synthesis with Personalized Phoneme-Pose Dictionary
Adaptive Appearance Rendering
Write-a-speaker: Text-based Emotional and Rhythmic Talking-head Generation
Predicting Video with VQVAE
Playable Video Generation (CVPR 2021)
Infinite Nature: Perpetual View Generation of Natural Scenes from a Single Image (ICCV 2021)
Vid-ODE: Continuous-Time Video Generation with Neural Ordinary Differential Equation (AAAI 2021)
Compositional Video Synthesis with Action Graphs (ICML 2021)
Temporal Shift GAN for Large Scale Video Generation (WACV 2021)
Learning Speech-driven 3D Conversational Gestures from Video
SLPC: a VRNN-based approach for stochastic lidar prediction and completion in autonomous driving
Self-Supervision by Prediction for Object Discovery in Videos
Modulated Periodic Activations for Generalizable Local Functional Representations (ICCV 2021)
Dynamic Texture Synthesis by Incorporating Long-range Spatial and Temporal Correlations
GANs N' Roses: Stable, Controllable, Diverse Image to Image Translation (works for videos too!)
Alias-Free Generative Adversarial Networks (NeurIPS 2021)
Modeling Clothing as a Separate Layer for an Animatable Human Avatar
CLIP-It! Language-Guided Video Summarization (NeurIPS 2021)
Towards an Interpretable Latent Space in Structured Models for Video Prediction
AnyoneNet: Synchronized Speech and Talking Head Generation for Arbitrary Person
SPACE: A Simulator for Physical Interactions and Causal Learning in 3D Environments
PIP: Physical Interaction Prediction via Mental Simulation with Span Selection
Advancing High-Resolution Video-Language Representation with Large-Scale Video Transcriptions (CVPR 2022)
Responsive Listening Head Generation: A Benchmark Dataset and Baseline (ECCV 2022)
BANMo: Building Animatable 3D Neural Models from Many Casual Videos (CVPR 2022)
Continuous-Time Video Generation via Learning Motion Dynamics with Neural ODE (BMVC 2021)
Image Animation with Keypoint Mask
SAGA: Stochastic Whole-Body Grasping with Contact (ECCV 2022)
Adversarial Memory Networks for Action Prediction
End-to-End Rate-Distortion Optimized Learned Hierarchical Bi-Directional Video Compression
Enhanced Frame and Event-Based Simulator and Event-Based Video Interpolation Network
Discrete neural representations for explainable anomaly detection (AAAI 2022)
Controllable Animation of Fluid Elements in Still Images (CVPR 2022)
One-shot Talking Face Generation from Single-speaker Audio-Visual Correlation Learning (AAAI 2022)
Efficient Neural Radiance Fields for Interactive Free-viewpoint Video (SIGGRAPH Asia 2022)
Dynamic View Synthesis from Dynamic Monocular Video (ICCV 2021)

2020

Stochastic Talking Face Generation Using Latent Distribution Matching
Latent Neural Differential Equations for Video Generation
LIFI: Towards Linguistically Informed Frame Interpolation
Unsupervised object-centric video generation and decomposition in 3D (NeurIPS 2020)
Novel-View Human Action Synthesis (ACCV 2020)
Structure-Aware Human-Action Generation (ECCV 2020)
Hierarchical Patch VAE-GAN: Generating Diverse Videos from a Single Sample (NeurIPS 2020)
Latent Video Transformer
Audio-driven Talking Face Video Generation with Learning-based Personalized Head Pose
Exploring Spatial-Temporal Multi-Frequency Analysis for High-Fidelity and Temporal-Consistency Video Prediction (CVPR 2020)
Stochastic Latent Residual Video Prediction (ICML 2020)
G3AN: Disentangling Appearance and Motion for Video Generation (CVPR 2020)
Scaling Autoregressive Video Models (ICLR 2020)
VideoFlow: A Conditional Flow-Based Model for Stochastic Video Generation (ICLR 2020)

2019

Music-oriented Dance Video Synthesis with Pose Perceptual Loss
DwNet: Dense warp-based network for pose-guided human video generation
Order Matters: Shuffling Sequence Generation for Video Prediction
Adversarial Video Generation on Complex Datasets
From Here to There: Video Inbetweening Using Direct 3D Convolutions
Improved Conditional VRNNs for Video Prediction (ICCV 2019)
Sliced Wasserstein Generative Models (CVPR 2019)
Point-to-Point Video Generation (ICCV 2019)
High Frame Rate Video Reconstruction based on an Event Camera
Video Generation from Single Semantic Label Map (CVPR 2019)
Learning to navigate image manifolds induced by generative adversarial networks for unsupervised video generation
Animating Arbitrary Objects via Deep Motion Transfer (CVPR 2019)
StoryGAN: A Sequential Conditional GAN for Story Visualization (CVPR 2019)
Stochastic Adversarial Video Prediction (ICLR 2019)

2018

TwoStreamVAN: Improving Motion Modeling in Video Generation
Learning Temporal Coherence via Self-Supervision for GAN-based Video Generation
Towards High Resolution Video Generation with Progressive Growing of Sliced Wasserstein GANs
Everybody Dance Now (ICCV 2019)
Learning to Forecast and Refine Residual Motion for Image-to-Video Generation (ECCV 2018)
Talking Face Generation by Conditional Recurrent Adversarial Network
Probabilistic Video Generation using Holistic Attribute Control (ECCV 2018)
Stochastic Video Generation with a Learned Prior (ICML 2018)
Stochastic Video Generation with a Learned Prior (ICML 2018)
Stochastic Variational Video Prediction (ICLR 2018)
Hierarchical Video Generation from Orthogonal Information: Optical Flow and Texture (AAAI 2018)
MoCoGAN: Decomposing Motion and Content for Video Generation (CVPR 2018)

2017

Improving Video Generation for Multi-functional Applications
Attentive Semantic Video Generation using Captions (ICCV 2017)
Temporal Generative Adversarial Nets with Singular Value Clipping (ICCV 2017)

2016

Sync-DRAW: Automatic Video Generation using Deep Recurrent Attentive Architectures
Unsupervised Learning for Physical Interaction through Video Prediction

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Awesome Video Generation

📝 Introduction

Contents

✨Survey_Papers

🌟Datasets

🚀Video-generation_subtopics

2023

2022

2021

2020

2019

2018

2017

2016

Files

README.md

Latest commit

History

README.md

File metadata and controls

Awesome Video Generation

📝 Introduction

Contents

✨Survey_Papers

🌟Datasets

🚀Video-generation_subtopics

2023

2022

2021

2020

2019

2018

2017

2016