A comprehensive collection of works on video generation/synthesis/prediction.
(Source: MCVD, VideoFusion)
-
Diffusion Models: A Comprehensive Survey of Methods and Applications
-
Diffusion Models in Vision: A Survey (IEEE TPAMI 2023)
-
What comprises a good talking-head video generation?: A Survey and Benchmark
-
A Review on Deep Learning Techniques for Video Prediction (2020)
-
InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation
-
UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild
-
LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs
-
Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval (ICCV 2021)
-
Self-Supervised Visual Planning with Temporal Skip Connections
-
How2Sign: A Large-scale Multimodal Dataset for Continuous American Sign Language (CVPR 2021)
-
Learning to Drive by Watching YouTube Videos: Action-Conditioned Contrastive Policy Pretraining (ECCV 2022)
-
FakeCatcher: Detection of Synthetic Portrait Videos using Biological Signals
-
DTVNet: Dynamic Time-lapse Video Generation via Single Still Image (ECCV 2020)
-
Multi-StyleGAN: Towards Image-Based Simulation of Time-Lapse Live-Cell Microscopy
-
DDH-QA: A Dynamic Digital Humans Quality Assessment Database
-
TPA-Net: Generate A Dataset for Text to Physics-based Animation
-
Touch and Go: Learning from Human-Collected Vision and Touch
-
BVI-VFI: A Video Quality Database for Video Frame Interpolation
-
Merkel Podcast Corpus: A Multimodal Dataset Compiled from 16 Years of Angela Merkel's Weekly Video Podcasts (LREC 2022)
- Controllable Video generation
Text-to-video, image-to-video
The “classic” task: create a video scratch, i.e. starting from random noise. The generation process is sometimes given simple conditions, such as Text or image. Common goals include visual fidelity, temporal coherence, and logical plausibility. - [Video prediction and Frame interpolation
Video generation with (visual) constraints
Predict the next N frames following a sequence of input video frames, or predict N frames between the given start and final frames. Could be viewed as a special case of video completion
Aimed at improving the motion smoothness of low frame rate videos, by inserting additional frames between existing video frames. Some works can “insert” frames after the input frames, so they technically can perform video prediction to some extent. - Novel view synthesis
These usually involve reconstructing a 3D scene from some observations (e.g. monocular video input, or static images), and then generating renderings of the scene from new perspectives. - Human motion generation
These are video generation tasks specifically geared to human (or humanoid) activities - Talking head or face generation
Talking head generation refers to the generation of animated video content that simulates a person's face and head movements while they are speaking - Video-to-video
These include enhancing the (textural) quality of videos, style transfer, motion transfer, Summarization, and various common video editing tasks (e.g. removal of a subject).
-
Implicit Identity Representation Conditioned Memory Compensation Network for Talking Head video Generation (ICCV 2023)
-
Bidirectionally Deformable Motion Modulation For Video-based Human Pose Transfer
-
Animate-A-Story: Storytelling with Retrieval-Augmented Video Generation
-
GD-VDM: Generated Depth for better Diffusion-based Video Generation
-
DDLP: Unsupervised Object-Centric Video Prediction with Deep Dynamic Latent Particles
-
Video Diffusion Models with Local-Global Context Guidance (IJCAI 2023)
-
Gen-L-Video: Multi-Text to Long Video Generation via Temporal Co-Denoising
-
Large Language Models are Frame-level Directors for Zero-shot Text-to-Video Generation
-
ControlVideo: Training-free Controllable Text-to-Video Generation
-
VDT: An Empirical Study on Video Diffusion with Transformers
-
DaGAN++: Depth-Aware Generative Adversarial Network for Talking Head Video Generation
-
Sketching the Future (STF): Applying Conditional Control Techniques to Text-to-Video Models
-
StyleAvatar: Real-time Photo-realistic Portrait Avatar from a Single Video
-
Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models
-
Generative Disco: Text-to-Video Generation for Music Visualization
-
Follow Your Pose: Pose-Guided Text-to-Video Generation using Pose-Free Videos
-
Sounding Video Generator: A Unified Framework for Text-guided Sounding Video Generation
-
Fine-grained Audible Video Description (CVPR 2023)
-
Conditional Image-to-Video Generation with Latent Flow Diffusion Models (CVPR 2023)
-
Towards End-to-End Generative Modeling of Long Videos with Memory-Efficient Bidirectional Transformers (CVPR 2023)
-
VideoFusion: Decomposed Diffusion Models for High-Quality Video Generation (CVPR 2023)
-
MOSO: Decomposing MOtion, Scene and Object for Video Prediction (CVPR 2023)
-
MotionVideoGAN: A Novel Video Generator Based on the Motion Space Learned from Image Pairs
-
Time-Conditioned Generative Modeling of Object-Centric Representations for Video Decomposition and Prediction
-
VIDM: Video Implicit Diffusion Models (AAAI 2023)
-
Mm-Diffusion:Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation (CVPR 2023)
-
Video Probabilistic Diffusion Models in Projected Latent Space (CVPR)
-
Bidirectional Temporal Diffusion Model for Temporally Consistent Human Animation
-
DisCo: Disentangled Control for Referring Human Dance Generation in Real World
-
PVP: Personalized Video Prior for Editable Dynamic Portraits using StyleGAN
-
BEDLAM: A Synthetic Dataset of Bodies Exhibiting Detailed Lifelike Animated Motion (CVPR)
-
Envisioning a Next Generation Extended Reality Conferencing System with Efficient Photorealistic Human Rendering (CVPR)
-
Reprogramming Audio-driven Talking Face Synthesis into Text-driven
-
Self-supervised Learning of Event-guided Video Frame Interpolation for Rolling Shutter Frames
-
VidEdit: Zero-Shot and Spatially Aware Text-Driven Video Editing
-
DORSal: Diffusion for Object-centric Representations of Scenes
-
Rerender A Video: Zero-Shot Text-Guided Video-to-Video Translation
-
Generative Semantic Communication: Diffusion Models Beyond Bit Recovery
-
Instruct-Video2Avatar: Video-to-Avatar Generation with Instructions
-
MoviePuzzle: Visual Narrative Reasoning through Multimodal Order Learning
-
Video Colorization with Pre-trained Text-to-Image Diffusion Models
-
Temporal-controlled Frame Swap for Generating High-Fidelity Stereo Driving Data for Autonomy Analysis
-
Adjustable Visual Appearance for Generalizable Novel View Synthesis
-
4DSR-GCN: 4D Video Point Cloud Upsampling using Graph Convolutional Networks
-
Intelligent Grimm -- Open-ended Visual Storytelling via Latent Diffusion Models
-
MammalNet: A Large-scale Video Benchmark for Mammal Recognition and Behavior Understanding
-
Exploring Phonetic Context in Lip Movement for Authentic Talking Face Generation
-
Video ControlNet: Towards Temporally Consistent Synthetic-to-Real Video Translation Using Conditional Image Diffusion Models
-
Context-Preserving Two-Stage Video Domain Translation for Portrait Stylization
-
OD-NeRF: Efficient Training of On-the-Fly Dynamic Neural Radiance Fields
-
EgoVSR: Towards High-Quality Egocentric Video Super-Resolution
-
NegVSR: Augmenting Negatives for Generalized Noise Modeling in Real-World Video Super-Resolution
-
Video Prediction Models as Rewards for Reinforcement Learning
-
Reparo: Loss-Resilient Generative Codec for Video Conferencing
-
CPNet: Exploiting CLIP-based Attention Condenser and Probability Map Guidance for High-fidelity Talking Face Generation (ICME)
-
InstructVid2Vid: Controllable Video Editing with Natural Language Instructions
-
SlotDiffusion: Object-Centric Generative Modeling with Diffusion Models (ICLR Workshop)
-
IDO-VFI: Identifying Dynamics via Optical Flow Guidance for Video Frame Interpolation with Events
-
Light-VQA: A Multi-Dimensional Quality Assessment Model for Low-Light Video Enhancement
-
Laughing Matters: Introducing Laughing-Face Generation using Diffusion Models
-
Identity-Preserving Talking Face Generation with Landmark and Appearance Priors (CVPR)
-
HumanRF: High-Fidelity Neural Radiance Fields for Humans in Motion
-
Style-A-Video: Agile Diffusion for Arbitrary Text-based Video Style Transfer
-
NeuralEditor: Editing Neural Radiance Fields via Manipulating Point Clouds (CVPR)
-
Video Frame Interpolation with Densely Queried Bilateral Correlation (IJCAI)
-
Dynamic Video Frame Interpolation with integrated Difficulty Pre-Assessment
-
AMT: All-Pairs Multi-Field Transforms for Efficient Frame Interpolation (CVPR)
-
Latent-Shift: Latent Diffusion with Temporal Shift for Efficient Text-to-Video Generation
-
CAT-NeRF: Constancy-Aware Tx$^2$Former for Dynamic Body Modeling (CVPR Workshop)
-
Boosting Video Object Segmentation via Space-time Correspondence Learning (CVPR)
-
VidStyleODE: Disentangled Video Editing via StyleGAN and NeuralODEs
-
MED-VT: Multiscale Encoder-Decoder Video Transformer with Application to Object Segmentation (CVPR)
-
Neural Image-based Avatars: Generalizable Radiance Fields for Human Avatar Modeling
-
That's What I Said: Fully-Controllable Talking Face Generation
-
BiFormer: Learning Bilateral Motion Estimation via Bilateral Transformer for 4K Video Frame Interpolation (CVPR)
-
TalkCLIP: Talking Head Generation with Text-Guided Expressive Speaking Styles
-
FONT: Flow-guided One-shot Talking Head Generation with Natural Head Motions (ICME)
-
Zero-Shot Video Editing Using Off-The-Shelf Image Diffusion Models
-
Consistent View Synthesis with Pose-Guided Diffusion Models (CVPR)
-
DAE-Talker: High Fidelity Speech-Driven Talking Face Generation with Diffusion Autoencoder
-
Novel View Synthesis of Humans using Differentiable Rendering
-
GestureDiffuCLIP: Gesture Diffusion Model with CLIP Latents (SIGGRAPH)
-
NeRF-DS: Neural Radiance Fields for Dynamic Specular Objects (CVPR)
-
HandNeRF: Neural Radiance Fields for Animatable Interacting Hands (CVPR)
-
Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Generators
-
Pre-NeRF 360: Enriching Unbounded Appearances for Neural Radiance Fields
-
Tubelet-Contrastive Self-Supervision for Video-Efficient Generalization
-
Confidence Attention and Generalization Enhanced Distillation for Continuous Video Domain Adaptation
-
MoRF: Mobile Realistic Fullbody Avatars from a Monocular Video
-
Unified Mask Embedding and Correspondence Learning for Self-Supervised Video Segmentation
-
Learning Data-Driven Vector-Quantized Degradation Model for Animation Video Super-Resolution
-
FateZero: Fusing Attentions for Zero-shot Text-based Video Editing
-
LDMVFI: Video Frame Interpolation with Latent Diffusion Models
-
Learning Physical-Spatio-Temporal Features for Video Shadow Removal
-
NLUT: Neural-based 3D Lookup Tables for Video Photorealistic Style Transfer
-
Blowing in the Wind: CycleNet for Human Cinemagraphs from Still Images (CVPR)
-
Blind Video Deflickering by Neural Filtering with a Flawed Atlas (CVPR)
-
Butterfly: Multiple Reference Frames Feature Propagation Mechanism for Neural Video Compression (DCC)
-
One-Shot Video Inpainting (AAAI)
-
Continuous Space-Time Video Super-Resolution Utilizing Long-Range Temporal Information
-
Learning Neural Volumetric Representations of Dynamic Humans in Minutes (CVPR)
-
OPT: One-shot Pose-Controllable Talking Head Generation (ICASSP)
-
One-Shot Face Video Re-enactment using Hybrid Latent Spaces of StyleGAN2
-
Video Waterdrop Removal via Spatio-Temporal Fusion in Driving Scenes
-
Structure and Content-Guided Video Synthesis with Diffusion Models
-
AV-NeRF: Learning Neural Fields for Real-World Audio-Visual Scene Synthesis
-
Maximal Cliques on Multi-Frame Proposal Graph for Unsupervised Video Object Segmentation
-
Optical Flow Estimation in 360$^\circ$ Videos: Dataset, Model and Application
-
Regeneration Learning: A Learning Paradigm for Data Generation
-
Event-Based Frame Interpolation with Ad-hoc Deblurring (CVPR)
-
DiffTalk: Crafting Diffusion Models for Generalized Audio-Driven Portraits Animation (CVPR)
-
Diffused Heads: Diffusion Models Beat GANs on Talking-Face Generation
-
HyperReel: High-Fidelity 6-DoF Video with Ray-Conditioned Sampling (CVPR)
-
StyleTalk: One-shot Talking Head Generation with Controllable Speaking Styles (AAAI)
-
Detachable Novel Views Synthesis of Dynamic Scenes Using Distribution-Driven Neural Radiance Fields
-
SkyGPT: Probabilistic Short-term Solar Forecasting Using Synthetic Sky Videos from Physics-constrained VideoGPT
-
MovieFactory: Automatic Movie Creation from Text using Large Generative Models for Language and Images
-
Emotional Talking Head Generation based on Memory-Sharing and Attention-Augmented Networks
-
Neural Foundations of Mental Simulation: Future Prediction of Latent Representations on Dynamic Scenes
-
Avatar Fingerprinting for Authorized Use of Synthetic Talking-Head Videos
-
DynamicStereo: Consistent Dynamic Depth from Stereo Videos (CVPR)
-
ActorsNeRF: Animatable Few-shot Human Rendering with Generalizable NeRFs
-
Total-Recon: Deformable Scene Reconstruction for Embodied View Synthesis
-
3D-IntPhys: Towards More Generalized 3D-grounded Visual Intuitive Physics under Challenging Scenes (CVPR)
-
Leveraging triplet loss for unsupervised action segmentation (CVPR)
-
MonoHuman: Animatable Human Neural Field from Monocular Video (CVPR)
-
Trace and Pace: Controllable Pedestrian Animation via Guided Trajectory Diffusion (CVPR)
-
Seeing What You Said: Talking Face Generation Guided by a Lip Reading Expert (CVPR)
-
VIVE3D: Viewpoint-Independent Video Editing using 3D-Aware GANs (CVPR)
-
CAMS: CAnonicalized Manipulation Spaces for Category-Level Functional Hand-Object Manipulation Synthesis (CVPR)
-
Prediction of the morphological evolution of a splashing drop using an encoder-decoder
-
Dual-path Adaptation from Image to Video Transformers (CVPR)
-
IntrinsicNGP: Intrinsic Coordinate based Hash Encoding for Human NeRF
-
Implicit Identity Representation Conditioned Memory Compensation Network for Talking Head video Generation (ICCV 2023)
-
FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning
-
Bidirectionally Deformable Motion Modulation For Video-based Human Pose Transfer
-
Animate-A-Story: Storytelling with Retrieval-Augmented Video Generation
-
GD-VDM: Generated Depth for better Diffusion-based Video Generation
-
DDLP: Unsupervised Object-Centric Video Prediction with Deep Dynamic Latent Particles
-
Video Diffusion Models with Local-Global Context Guidance (IJCAI 2023)
-
Gen-L-Video: Multi-Text to Long Video Generation via Temporal Co-Denoising
-
Large Language Models are Frame-level Directors for Zero-shot Text-to-Video Generation
-
VIDM: Video Implicit Diffusion Models (AAAI 2023)
-
Mm-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation (CVPR 2023)
-
VideoFusion: Decomposed Diffusion Models for High-Quality Video Generation (CVPR 2023)
-
Conditional Image-to-Video Generation with Latent Flow Diffusion Models (CVPR 2023)
-
Physics-Driven Diffusion Models for Impact Sound Synthesis from Videos (CVPR 2023)
-
Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models (CVPR 2023)
-
3D Cinemagraphy from a Single Image (CVPR 2023)
-
MOSO: Decomposing MOtion, Scene and Object for Video Prediction (CVPR 2023)
-
SadTalker: Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation (CVPR 2023)
-
Towards End-to-End Generative Modeling of Long Videos with Memory-Efficient Bidirectional Transformers (CVPR 2023)
-
Learning Universal Policies via Text-Guided Video Generation
-
Tell me what happened: Unifying text-guided video completion via multimodal masked video generation (CVPR 2023)
-
MAGVIT: Masked Generative Video Transformer (CVPR 2023)
-
Text2video-zero: Text-to-image diffusion models are zero-shot video generators
-
VarietySound: Timbre-controllable video to sound generation via unsupervised information disentanglement
-
ControlVideo: Training-free Controllable Text-to-Video Generation
-
Control-A-Video: Controllable Text-to-Video Generation with Diffusion Models
-
VDT: An Empirical Study on Video Diffusion with Transformers
-
Sketching the Future (STF): Applying Conditional Control Techniques to Text-to-Video Models
-
Generative Disco: Text-to-Video Generation for Music Visualization
-
Follow Your Pose: Pose-Guided Text-to-Video Generation using Pose-Free Videos
-
Sounding Video Generator: A Unified Framework for Text-guided Sounding Video Generation
-
MotionVideoGAN: A Novel Video Generator Based on the Motion Space Learned from Image Pairs
-
MotionVideoGAN: A Novel Video Generator Based on the Motion Space Learned from Image Pairs
-
Time-Conditioned Generative Modeling of Object-Centric Representations for Video Decomposition and Prediction
-
VideoComposer: Compositional Video Synthesis with Motion Controllability
-
Cinematic Mindscapes: High-quality Video Reconstruction from Brain Activity (May, 2023)
-
VideoFactory: Swap Attention in Spatiotemporal Diffusions for Text-to-Video Generation
-
Preserve Your Own Correlation: A Noise Prior for Video Diffusion Models
-
Motion-Conditioned Diffusion Model for Controllable Video Synthesis
-
DreamPose: Fashion Image-to-Video Synthesis via Stable Diffusion
-
Seer: Language Instructed Video Prediction with Latent Diffusion Models
-
Learning 3D Photography Videos via Self-supervised Diffusion on Single Images
-
InterGen: Diffusion-based Multi-human Motion Generation under Complex Interactions
-
Learn the Force We Can: Multi-Object Video Generation from Pixel-Level Interactions
-
Make-Your-Video: Customized Video Generation Using Textual and Structural Guidance
-
Make-A-Protagonist: Generic Video Editing with An Ensemble of Experts
-
DaGAN++: Depth-Aware Generative Adversarial Network for Talking Head Video Generation
-
LEO: Generative Latent Image Animator for Human Video Synthesis
-
StyleLipSync: Style-based Personalized Lip-sync Video Generation
-
High-Fidelity and Freely Controllable Talking Head Video Generation (CVPR 2023)
-
DisCoHead: Audio-and-Video-Driven Talking Head Generation by Disentangled Control of Head Pose and Facial Expressions (ICASSP 2023)
-
Controllable Video Generation by Learning the Underlying Dynamical System with Neural ODE
-
DPE: Disentanglement of Pose and Expression for General Video Portrait Editing (CVPR 2023)
-
PV3D: A 3D Generative Model for Portrait Video Generation (ICLR 2023)
-
AADiff: Audio-Aligned Video Synthesis with Text-to-Image Diffusion (CVPR 2023 Workshop)
-
Controllable One-Shot Face Video Synthesis With Semantic Aware Prior
-
Decoupling Dynamic Monocular Videos for Dynamic View Synthesis
-
Feature-Conditioned Cascaded Video Diffusion Models for Precise Echocardiogram Synthesis (MICCAI 2023)
-
WALDO: Future Video Synthesis using Object Layer Decomposition and Parametric Flow Prediction
-
Fast Fourier Inception Networks for Occluded Video Prediction
-
Let's Think Frame by Frame: Evaluating Video Chain of Thought with Video Infilling and Prediction
-
PastNet: Introducing Physical Inductive Biases for Spatio-temporal Video Prediction
-
A Control-Centric Benchmark for Video Prediction (ICLR 2023)
-
MS-LSTM: Exploring Spatiotemporal Multiscale Representations in Video Prediction Domain
-
Forecasting localized weather impacts on vegetation as seen from space with meteo-guided video prediction
-
A Dynamic Multi-Scale Voxel Flow Network for Video Prediction (CVPR 2023)
-
TKN: Transformer-based Keypoint Prediction Network For Real-time Video Prediction
-
MOSO: Decomposing MOtion, Scene and Object for Video Prediction (CVPR 2023)
-
STDepthFormer: Predicting Spatio-temporal Depth from Video with a Self-supervised Transformer Model (IROS 2023)
-
Object-Centric Video Prediction via Decoupling of Object Dynamics and Interactions
-
Anti-aliasing Predictive Coding Network for Future Video Frame Prediction
-
Long-horizon video prediction using a dynamic latent hierarchy
-
Motion and Context-Aware Audio-Visual Conditioned Video Prediction
-
MIMO Is All You Need : A Strong Multi-In-Multi-Out Baseline for Video Prediction (AAAI 2023)
-
A unified model for continuous conditional video prediction (CVPR 2023 Workshop)
-
PreCNet: Next-Frame Video Prediction Based on Predictive Coding (IEEE TNNLS 2023)
-
NUWA-XL: Diffusion over Diffusion for eXtremely Long Video Generation
-
Video Diffusion Models (NeurIPS 2022)
-
McVd: Masked Conditional Video Diffusion for Prediction, Generation, and Interpolation (NeurIPS 2022)
-
Diffusion Models for Video Prediction and Infilling (TMLR 2022)
-
Make-A-Video: Text-to-Video Generation without Text-Video Data (ICLR 2023)
-
DaGAN: Depth-Aware Generative Adversarial Network for Talking Head Video Generation (CVPR 2022)
-
Show Me What and Tell Me How: Video Synthesis via Multimodal Conditioning (CVPR 2022)
-
Playable Environments: Video Manipulation in Space and Time (CVPR 2022)
-
Fast-Vid2Vid: Spatial-Temporal Compression for Video-to-Video Synthesis (ECCV 2022)
-
TM2T: Stochastic and Tokenized Modeling for the Reciprocal Generation of 3D Human Motions and Texts (ECCV 2022)
-
Imagen Video: High Definition Video Generation with Diffusion Models
-
Phenaki: Variable length video generation from open domain textual description
-
Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation
-
Latent Video Diffusion Models for High-Fidelity Long Video Generation
-
SinFusion: Training Diffusion Models on a Single Image or Video
-
INR-V: A Continuous Representation Space for Video-based Generative Tasks
-
Phenaki: Variable Length Video Generation From Open Domain Textual Description
-
StyleFaceV: Face Video Generation via Decomposing and Recomposing Pretrained StyleGAN3
-
NUWA-Infinity: Autoregressive over Autoregressive Generation for Infinite Visual Synthesis
-
Patch-based Object-centric Transformers for Efficient Video Generation
-
CogVideo: Large-scale Pretraining for Text-to-Video Generation via Transformers
-
Latent Video Diffusion Models for High-Fidelity Video Generation With Arbitrary Lengths
-
MagicVideo: Efficient Video Generation With Latent Diffusion Models
-
Long Video Generation with Time-Agnostic VQGAN and Time-Sensitive Transformer (ECCV 2022)
-
StyleHEAT: One-Shot High-Resolution Editable Talking Face Generation via Pre-trained StyleGAN
-
Generating Videos with Dynamics-aware Implicit Generative Adversarial Networks (ICLR 2022)
-
StyleGAN-V: A Continuous Video Generator with the Price, Image Quality and Perks of StyleGAN2 (CVPR 2022)
-
Make It Move: Controllable Image-to-Video Generation with Text Descriptions (CVPR 2022)
-
NeMo: 3D Neural Motion Fields from Multiple Video Instances of the Same Action (CVPR 2023)
-
Cross-Resolution Flow Propagation for Foveated Video Super-Resolution (WACV 2023)
-
MonoNeRF: Learning a Generalizable Dynamic Radiance Field from Monocular Videos
-
Scalable Adaptive Computation for Iterative Generation (ICML 2023)
-
Predictive Coding Based Multiscale Network with Encoder-Decoder LSTM for Video Prediction
-
InstantAvatar: Learning Avatars from Monocular Video in 60 Seconds
-
MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation (CVPR 2023)
-
PointAvatar: Deformable Point-based Head Avatars from Videos
-
PV3D: A 3D Generative Model for Portrait Video Generation (ICLR 2023)
-
Video Prediction by Efficient Transformers (ICPR 2022)
-
MAGVIT: Masked Generative Video Transformer (CVPR 2023)
-
Physically Plausible Animation of Human Upper Body from a Single Image (WACV 2023)
-
MIMO Is All You Need : A Strong Multi-In-Multi-Out Baseline for Video Prediction
-
Audio-Driven Co-Speech Gesture Video Generation (NeurIPS 2022)
-
VIDM: Video Implicit Diffusion Models (AAAI 2023)
-
VideoReTalking: Audio-based Lip Synchronization for Talking Head Video Editing In the Wild (SIGGRAPH Asia 2022)
-
Progressive Disentangled Representation Learning for Fine-Grained Controllable Talking Head Synthesis
-
WALDO: Future Video Synthesis using Object Layer Decomposition and Parametric Flow Prediction
-
Efficient Feature Extraction for High-resolution Video Frame Interpolation (BMVC 2022)
-
Dynamic Neural Portraits (WACV 2023)
-
Make-A-Story: Visual Memory Conditioned Consistent Story Generation (CVPR 2023)
-
Tell Me What Happened: Unifying Text-guided Video Completion via Multimodal Masked Video Generation (CVPR 2023)
-
Hand Avatar: Free-Pose Hand Animation and Rendering from Monocular Video (CVPR 2023)
-
SuperTran: Reference Based Video Transformer for Enhancing Low Bitrate Streams in Real Time
-
Depth-Supervised NeRF for Multi-View RGB-D Operating Room Images
-
SadTalker: Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation (CVPR 2023)
-
FLEX: Full-Body Grasping Without Full-Body Grasps (CVPR 2023)
-
Blur Interpolation Transformer for Real-World Motion from Blur (CVPR 2023)
-
DyNCA: Real-time Dynamic Texture Synthesis Using Neural Cellular Automata (CVPR 2023)
-
H-VFI: Hierarchical Frame Interpolation for Videos with Large Motions
-
AdaFNIO: Adaptive Fourier Neural Interpolation Operator for video frame interpolation
-
SPACE: Speech-driven Portrait Animation with Controllable Expression
-
CaDM: Codec-aware Diffusion Modeling for Neural-enhanced Video Streaming
-
Advancing Learned Video Compression with In-loop Frame Prediction (IEEE T-CSVT 2022)
-
Common Pets in 3D: Dynamic New-View Synthesis of Real-Life Deformable Categories
-
Temporal Consistency Learning of inter-frames for Video Super-Resolution (IEEE T-CSVT 2022)
-
SyncTalkFace: Talking Face Generation with Precise Lip-Syncing via Audio-Lip Memory (AAAI 2022)
-
Learning Variational Motion Prior for Video-based Motion Capture
-
Streaming Radiance Fields for 3D Video Synthesis (NeurIPS 2022)
-
Learning to forecast vegetation greenness at fine resolution over Africa with ConvLSTMs (NeurIPS 2022)
-
EpipolarNVS: leveraging on Epipolar geometry for single-image Novel View Synthesis (BMVC 2022)
-
Towards Real-Time Text2Video via CLIP-Guided, Pixel-Level Optimization
-
Facial Expression Video Generation Based-On Spatio-temporal Convolutional GAN: FEV-GAN (ISWA)
-
Temporal and Contextual Transformer for Multi-Camera Editing of TV Shows (ECCV 2022)
-
MonoNeRF: Learning Generalizable NeRFs from Monocular Videos without Camera Pose (ICML 2023)
-
Pre-Avatar: An Automatic Presentation Generation Framework Leveraging Talking Avatar (ICTAI 2022)
-
AniFaceGAN: Animatable 3D-Aware Face Image Generation for Video Avatars (NeurIPS 2022)
-
A Generalist Framework for Panoptic Segmentation of Images and Videos
-
Masked Motion Encoding for Self-Supervised Video Representation Learning (CVPR 2023)
-
SlotFormer: Unsupervised Visual Dynamics Simulation with Object-Centric Models (ICLR 2023)
-
Controllable Radiance Fields for Dynamic Face Synthesis (3DV 2022)
-
A unified model for continuous conditional video prediction (CVPR 2023)
-
DeepHS-HDRVideo: Deep High Speed High Dynamic Range Video Reconstruction (ICPR 2022)
-
Self-supervised Video Representation Learning with Motion-Aware Masked Autoencoders
-
Compressing Video Calls using Synthetic Talking Heads (BMVC 2022)
-
Audio-Visual Face Reenactment (WACV 2023)
-
Geometry Driven Progressive Warping for One-Shot Face Animation
-
Cross-identity Video Motion Retargeting with Joint Transformation and Synthesis (WACV 2023)
-
Real-RawVSR: Real-World Raw Video Super-Resolution with a Benchmark Dataset (ECCV 2022)
-
VToonify: Controllable High-Resolution Portrait Video Style Transfer (SIGGRAPH Asia 2022)
-
NeuralMarker: A Framework for Learning General Marker Correspondence (SIGGRAPH Asia 2022)
-
Continuously Controllable Facial Expression Editing in Talking Face Videos
-
A Deep Moving-camera Background Model (ECCV 2022)
-
HARP: Autoregressive Latent Video Prediction with High-Fidelity Image Generator (ICIP 2022)
-
Talking Head from Speech Audio using a Pre-trained Image Generator (ACM Multimedia 2022)
-
Treating Motion as Option to Reduce Motion Dependency in Unsupervised Video Object Segmentation (WACV 2023)
-
Neural Sign Reenactor: Deep Photorealistic Sign Language Retargeting (CVPR 2023)
-
REMOT: A Region-to-Whole Framework for Realistic Human Motion Transfer (ACMMM 2022)
-
SketchBetween: Video-to-Video Synthesis for Sprite Animation via Sketches (ACM conference on the Foundations of Digital Games)
-
StableFace: Analyzing and Improving Motion Stability for Talking Face Generation
-
Neural Novel Actor: Learning a Generalized Animatable Neural Representation for Human Actors
-
StyleTalker: One-shot Style-based Audio-driven Talking Head Video Generation
-
Towards MOOCs for Lipreading: Using Synthetic Talking Heads to Train Humans in Lipreading at Scale (WACV 2023)
-
Temporal View Synthesis of Dynamic Scenes through 3D Object Motion Estimation with Multi-Plane Images (ISMAR 2022)
-
Wildfire Forecasting with Satellite Images and Deep Generative Model
-
Video Interpolation by Event-driven Anisotropic Adjustment of Optical Flow (ECCV 2022)
-
Extreme-scale Talking-Face Video Upsampling with Audio-Visual Priors (ACMMM 2022)
-
Semi-Supervised Video Inpainting with Cycle Consistency Constraints
-
UAV-CROWD: Violent and non-violent crowd activity simulator from the perspective of UAV
-
Cine-AI: Generating Video Game Cutscenes in the Style of Human Directors (ACMHCI)
-
Language-Guided Face Animation by Recurrent StyleGAN-based Generator
-
Boosting neural video codecs by exploiting hierarchical redundancy
-
PS-NeRV: Patch-wise Stylized Neural Representations for Videos
-
Real-time Gesture Animation Generation from Speech for Virtual Human Interaction (CHI EA 2021)
-
Meta-Interpolation: Time-Arbitrary Frame Interpolation via Dual Meta-Learning
-
Efficient Video Deblurring Guided by Motion Magnitude (ECCV 2022)
-
Learning Dynamic Facial Radiance Fields for Few-Shot Talking Head Synthesis (ECCV 2022)
-
InfiniteNature-Zero: Learning Perpetual View Generation of Natural Scenes from Single Images (ECCV 2022)
-
RealFlow: EM-based Realistic Optical Flow Dataset Generation from Videos (ECCV 2022 Oral)
-
Towards Interpretable Video Super-Resolution via Alternating Optimization (ECCV 2022)
-
Error Compensation Framework for Flow-Guided Video Inpainting (ECCV 2022)
-
Animation from Blur: Multi-modal Blur Decomposition with Motion Guidance (ECCV 2022)
-
TTVFI: Learning Trajectory-Aware Transformer for Video Frame Interpolation (CVPR 2022 Oral)
-
Audio Input Generates Continuous Frames to Synthesize Facial Video Using Generative Adiversarial Networks
-
Neighbor Correspondence Matching for Flow-based Video Frame Synthesis (ACMMM 2022)
-
You Only Align Once: Bidirectional Interaction for Spatial-Temporal Video Super-Resolution (ACMMM 2022)
-
CANF-VC: Conditional Augmented Normalizing Flows for Video Compression
-
A Probabilistic Model Of Interaction Dynamics for Dyadic Face-to-Face Settings
-
Jointly Harnessing Prior Structures and Temporal Consistency for Sign Language Video Generation
-
Segmenting Moving Objects via an Object-Centric Layered Representation (NeurIPS 2022)
-
Programmatic Concept Learning for Human Motion Description and Synthesis (CVPR 2022)
-
Optimizing Video Prediction via Video Frame Interpolation (CVPR 2022)
-
Perceptual Conversational Head Generation with Regularized Driver and Enhanced Renderer (ACMMM 2022)
-
Enhanced Bi-directional Motion Estimation for Video Frame Interpolation (WACV 2023)
-
Face-Dubbing++: Lip-Synchronous, Voice Preserving Translation of Videos
-
STIP: A SpatioTemporal Information-Preserving and Perception-Augmented Model for High-Resolution Video Prediction (CVPR 2022)
-
JNMR: Joint Non-linear Motion Regression for Video Frame Interpolation
-
SimVP: Simpler yet Better Video Prediction (CVPR 2022)
-
Recurrent Video Restoration Transformer with Guided Deformable Attention (NeurIPS 2022)
-
Cascaded Video Generation for Videos In-the-Wild (ICPR 2022)
-
D$^2$NeRF: Self-Supervised Decoupling of Dynamic and Static Objects from a Monocular Video
-
TubeFormer-DeepLab: Video Mask Transformer (CVPR 2022)
-
IFRNet: Intermediate Feature Refine Network for Efficient Frame Interpolation (CVPR 2022)
-
Feature-Aligned Video Raindrop Removal with Temporal Constraints
-
Future Transformer for Long-term Action Anticipation (CVPR 2022)
-
Video2StyleGAN: Disentangling Local and Global Variations in a Video
-
Automatic Generation of Synthetic Colonoscopy Videos for Domain Randomization
-
Latent-space disentanglement with untrained generator networks for the isolation of different motion types in video data
-
Video Frame Interpolation with Transformer (CVPR 2022)
-
Multi-encoder Network for Parameter Reduction of a Kernel-based Interpolation Architecture (NTIRE)
-
Diverse Video Generation from a Single Video (CVPR 2022)
-
Video-ReTime: Learning Temporally Varying Speediness for Time Remapping (AICC)
-
Spatial-Temporal Space Hand-in-Hand: Spatial-Temporal Video Super-Resolution via Cycle-Projected Mutual Learning
-
Image2Gif: Generating Continuous Realistic Animations with Warping NODEs (CVPR 2022)
-
GAN-Based Multi-View Video Coding with Spatio-Temporal EPI Reconstruction
-
Video Extrapolation in Space and Time (ECCV 2022)
-
Zero-Episode Few-Shot Contrastive Predictive Coding: Solving intelligence tests without prior training
-
Copy Motion From One to Another: Fake Motion Video Generation
-
Neural Implicit Representations for Physical Parameter Inference from a Single Video (WACV 2023)
-
Talking Head Generation Driven by Speech-Related Facial Action Units and Audio- Based on Multimodal Representation Fusion (BMVC 2021)
-
ClothFormer:Taming Video Virtual Try-on in All Module (CVPR 2022 Oral)
-
STAU: A SpatioTemporal-Aware Unit for Video Prediction and Beyond (TPAMI)
-
Sound-Guided Semantic Video Generation (ECCV 2022)
-
Learning to Listen: Modeling Non-Deterministic Dyadic Facial Motion
-
MUGEN: A Playground for Video-Audio-Text Multimodal Understanding and GENeration
-
Controllable Video Generation through Global and Local Motion Dynamics
-
Dynamic Neural Textures: Generating Talking-Face Videos with Continuously Controllable Expressions
-
Self-Supervised Traffic Advisors: Distributed, Multi-view Traffic Prediction for Smart Cities (ITSC)
-
Structure-Aware Motion Transfer with Deformable Anchor Model (CVPR 2022)
-
HSTR-Net: High Spatio-Temporal Resolution Video Generation For Wide Area Surveillance
-
SunStage: Portrait Reconstruction and Relighting using the Sun as a Light Stage (CVPR 2023)
-
Many-to-many Splatting for Efficient Video Frame Interpolation (CVPR 2022)
-
Video Demoireing with Relation-Based Temporal Consistency (CVPR 2022)
-
Neural Rendering of Humans in Novel View and Pose from Monocular Video
-
MPS-NeRF: Generalizable 3D Human Rendering from Multiview Images (TPAMI 2022)
-
Foveation-based Deep Video Compression without Motion Search
-
STRPM: A Spatiotemporal Residual Predictive Model for High-Resolution Video Prediction (CVPR 2022)
-
High-resolution Face Swapping via Latent Semantics Disentanglement (CVPR 2022)
-
VPTR: Efficient Transformers for Video Prediction (ICPR 2022)
-
Long-term Video Frame Interpolation via Feature Propagation (CVPR 2022)
-
Signing at Scale: Learning to Co-Articulate Signs for Large-Scale Photo-Realistic Sign Language Production
-
Dressing in the Wild by Watching Dance Videos (CVPR 2022)
-
Structured Local Radiance Fields for Human Avatar Modeling (CVPR 2022)
-
V3GAN: Decomposing Background, Foreground and Motion for Video Generation
-
VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training (NeurIPS 2022)
-
Unifying Motion Deblurring and Frame Interpolation with Events (CVPR 2022)
-
QS-Craft: Learning to Quantize, Scrabble and Craft for Conditional Human Motion Animation
-
Sem2NeRF: Converting Single-View Semantic Masks to Neural Radiance Fields (ECCV 2022)
-
Stochastic Video Prediction with Structure and Motion (TPAMI)
-
Exploring Motion Ambiguity and Alignment for High-Quality Video Frame Interpolation
-
Beyond a Video Frame Interpolator: A Space Decoupled Learning Approach to Continuous Image Transition
-
Transframer: Arbitrary Frame Prediction with Generative Models
-
Look Outside the Room: Synthesizing A Consistent Long-Term 3D Scene Video from A Single Image (CVPR 2022)
-
MSPred: Video Prediction at Multiple Spatio-Temporal Scales with Hierarchical Recurrent Networks
-
Latent Image Animator: Learning to Animate Images via Latent Space Navigation (ICLR 2022)
-
DialogueNeRF: Towards Realistic Avatar Face-to-face Conversation Video Generation
-
One-stage Video Instance Segmentation: From Frame-in Frame-out to Clip-in Clip-out
-
A Novel Dual Dense Connection Network for Video Super-resolution
-
Region-of-Interest Based Neural Video Compression (BMVC 2022)
-
Thinking the Fusion Strategy of Multi-reference Face Reenactment (ICIP 2022)
-
Neural Marionette: Unsupervised Learning of Motion Skeleton and Latent Dynamics from Volumetric Video (AAAI 2022)
-
Enhancing Deformable Convolution based Video Frame Interpolation with Coarse-to-fine 3D CNN
-
Exploring Discontinuity for Video Frame Interpolation (CVPR 2023)
-
A new face swap method for image and video domains: a technical report
-
Third Time's the Charm? Image and Video Editing with StyleGAN3
-
Deep Video Prior for Video Consistency and Propagation (TPAMI 2021)
-
Non-linear Motion Estimation for Video Frame Interpolation using Space-time Convolutions (CLIC, CVPR 2022)
-
Splatting-based Synthesis for Video Frame Interpolation (WACV 2023)
-
Towards Realistic Visual Dubbing with Heterogeneous Sources (ACMMM 2021)
-
Audio-Driven Talking Face Video Generation with Dynamic Convolution Kernels (IEEE)
-
Learning Temporally and Semantically Consistent Unpaired Video-to-video Translation Through Pseudo-Supervision From Synthetic Optical Flow (AAAI 2022)
-
MetaDance: Few-shot Dancing Video Retargeting via Temporal-aware Meta-learning
-
Music2Video: Automatic Generation of Music Video with fusion of audio and text
-
MobileFaceSwap: A Lightweight Framework for Video Face Swapping (AAAI 2022)
-
Structured 3D Features for Reconstructing Controllable Avatars (CVPR 2023)
-
MoFusion: A Framework for Denoising-Diffusion-based Motion Synthesis (CVPR 2023)
-
High-fidelity Facial Avatar Reconstruction from Monocular Video with Generative Priors
-
3DDesigner: Towards Photorealistic 3D Object Generation and Editing with Text-guided Diffusion Models
-
Audio-visual video face hallucination with frequency supervision and cross modality support by speech based lip reading loss
-
It Takes Two: Masked Appearance-Motion Modeling for Self-supervised Video Transformer Pre-training
-
See, Plan, Predict: Language-guided Cognitive Planning with Video Prediction
-
Motion Transformer for Unsupervised Image Animation (ECCV 2022)
-
Neural Capture of Animatable 3D Human from Monocular Video (ECCV 2022)
-
NDF: Neural Deformable Fields for Dynamic Human Modelling (ECCV 2022)
-
Diverse Dance Synthesis via Keyframes with Transformer Controllers
-
Unsupervised Coherent Video Cartoonization with Perceptual Motion Consistency
-
Learning Multi-Object Dynamics with Compositional Neural Radiance Fields (CoRL 2022)
-
Video Frame Interpolation without Temporal Priors (NeurIPS 2020)
-
ST-MFNet: A Spatio-Temporal Multi-Flow Network for Frame Interpolation (CVPR 2022)
-
Video Frame Interpolation Transformer (CVPR 2022)
-
Improving the Perceptual Quality of 2D Animation Interpolation (ECCV 2022)
-
Render In-between: Motion Guided Video Synthesis for Action Interpolation
-
Flow-Guided Video Inpainting with Scene Templates (ICCV 2021)
-
Asymmetric Bilateral Motion Estimation for Video Frame Interpolation (ICCV 2021)
-
EA-Net: Edge-Aware Network for Flow-based Video Frame Interpolation
-
Zooming SlowMo: An Efficient One-Stage Framework for Space-Time Video Super-Resolution (CVPR 2020)
-
PDWN: Pyramid Deformable Warping Network for Video Interpolation
-
Motion-blurred Video Interpolation and Extrapolation (AAAI 2021)
-
MUGL: Large Scale Multi Person Conditional Action Generation with Locomotion (WACV 2022.)
-
LARNet: Latent Action Representation for Human Action Synthesis (ICLR 2022)
-
Synthetic Data for Multi-Parameter Camera-Based Physiological Sensing
-
Physics-based Human Motion Estimation and Synthesis from Videos (ICCV 2021)
-
Deep Person Generation: A Survey from the Perspective of Face, Pose and Cloth Synthesis
-
Sparse to Dense Motion Transfer for Face Image Animation (ICCV 2021)
-
FLAME-in-NeRF : Neural control of Radiance Fields for Free View Face Animation
-
Robust Pose Transfer with Dynamic Details using Neural Video Rendering
-
Gradient Forward-Propagation for Large-Scale Temporal Video Modelling (CVPR 2021)
-
Behavior-Driven Synthesis of Human Dynamics (CVPR 2021)
-
AI Choreographer: Music Conditioned 3D Dance Generation with AIST++
-
Neural Point Light Fields (CVPR 2022)
-
Human Pose Manipulation and Novel View Synthesis using Differentiable Rendering
-
Temporal-MPI: Enabling Multi-Plane Images for Dynamic Scene Modelling via Temporal Basis Learning (ECCV 2022)
-
H-NeRF: Neural Radiance Fields for Rendering and Temporal Reconstruction of Humans in Motion
-
Pose-guided Generative Adversarial Net for Novel View Action Synthesis (WACV 2022)
-
Neural Human Performer: Learning Generalizable Radiance Fields for Human Performance Rendering
-
View Synthesis of Dynamic Scenes based on Deep 3D Mask Volume (ICCV 2021)
-
Target Adaptive Context Aggregation for Video Scene Graph Generation (ICCV 2021)
-
Novel View Video Prediction Using a Dual Representation (ICIP 2021)
-
Neural Actor: Neural Free-view Synthesis of Human Actors with Pose Control (SIGGRAPH Asia 2021)
-
Stylizing 3D Scene via Implicit Representation and HyperNetwork (WACV2022)
-
LUMINOUS: Indoor Scene Generation for Embodied AI Challenges
-
NeuralDiff: Segmenting 3D objects that move in egocentric videos (3DV 2021)
-
Talking Head Generation with Audio and Speech Related Facial Action Units (BMVC 2021)
-
FACIAL: Synthesizing Dynamic Talking Face with Implicit Attribute Learning (ICCV 2021)
-
Audio2Head: Audio-driven One-shot Talking-head Generation with Natural Head Motion
-
Speech2Video: Cross-Modal Distillation for Speech to Video Generation (ACCV 2020)
-
LipSync3D: Data-Efficient Learning of Personalized 3D Talking Faces from Video using Pose and Lighting Normalization (CVPR 2021)
-
Temporally coherent video anonymization through GAN inpainting (FG2021)
-
3D-TalkEmo: Learning to Synthesize 3D Emotional Talking Head
-
Pose-Controllable Talking Face Generation by Implicitly Modularized Audio-Visual Representation (CVPR 2021)
-
MeshTalk: 3D Face Animation from Speech using Cross-Modality Disentanglement (CVPR 2021)
-
Audio-Driven Emotional Video Portraits (CVPR 2021)
-
AD-NeRF: Audio Driven Neural Radiance Fields for Talking Head Synthesis (ICCV 2021)
-
Imitating Arbitrary Talking Style for Realistic Audio-DrivenTalking Face Synthesis (MM 2021)
-
Dance In the Wild: Monocular Human Animation with Neural Dynamic Appearance Synthesis
-
Intelligent Video Editing: Incorporating Modern Talking Face Generation Algorithms in a Video Editor (ICVGIP 2021)
-
Self-Supervised Decomposition, Disentanglement and Prediction of Video Sequences while Interpreting Dynamics: A Koopman Perspective
-
Temporally Coherent Person Matting Trained on Fake-Motion Dataset
-
Occlusion-Aware Video Object Inpainting (ICCV 2021)
-
UniFaceGAN: A Unified Framework for Temporally Consistent Facial Video Editing (IEEE TIP 2021)
-
Learning to Cut by Watching Movies (ICCV 2021)
-
VCGAN: Video Colorization with Hybrid Generative Adversarial Network (IEEE (TMM)2021)
-
M3L: Language-based Video Editing via Multi-Modal Multi-Level Transformers (CVPR 2022)
-
Personal Privacy Protection via Irrelevant Faces Tracking and Pixelation in Video Live Streaming
-
HDRVideo-GAN: Deep Generative HDR Video Reconstruction (ICVGIP 2021)
-
Layered Controllable Video Generation (ECCV 2022)
-
Xp-GAN: Unsupervised Multi-object Controllable Video Generation
-
Action2video: Generating Videos of Human 3D Actions (IJCV 2022)
-
Image Comes Dancing with Collaborative Parsing-Flow Video Synthesis (TIP 2021)
-
Video Autoencoder: self-supervised disentanglement of static 3D structure and motion (ICCV 2021)
-
Learning Fine-Grained Motion Embedding for Landscape Animation (ACM Multimedia 2021)
-
Conditional Temporal Variational AutoEncoder for Action Video Prediction (ECCV 2018)
-
iButter: Neural Interactive Bullet Time Generator for Human Free-viewpoint Rendering (ACM MM 2021)
-
RockGPT: Reconstructing three-dimensional digital rocks from single two-dimensional slice from the perspective of video generation
-
Video Generation from Text Employing Latent Path Construction for Temporal Modeling
-
StyleVideoGAN: A Temporal Generative Model using a Pretrained StyleGAN
-
Cross-View Exocentric to Egocentric Video Synthesis (ACM MM 2021)
-
iPOKE: Poking a Still Image for Controlled Stochastic Video Synthesis (ICCV 2021)
-
Efficient training for future video generation based on hierarchical disentangled representation of latent variables
-
Stochastic Image-to-Video Synthesis using cINNs (CVPR 2021)
-
Strumming to the Beat: Audio-Conditioned Contrastive Video Textures (WACV 2022)
-
Collaborative Learning to Generate Audio-Video Jointly (ICASSP 2021)
-
Learning to compose 6-DoF omnidirectional videos using multi-sphere images
-
Neural 3D Video Synthesis from Multi-view Video (CVPR 2022)
-
Dual-MTGAN: Stochastic and Deterministic Motion Transfer for Image-to-Video Synthesis (ICPR 2020)
-
TräumerAI: Dreaming Music with StyleGAN (NeurIPS Workshop 2020)
-
ArrowGAN : Learning to Generate Videos by Learning Arrow of Time
-
InMoDeGAN: Interpretable Motion Decomposition Generative Adversarial Network for Video Generation
-
Two-stage Rule-induction Visual Reasoning on RPMs with an Application to Video Prediction
-
FREGAN : an application of generative adversarial networks in enhancing the frame rate of videos
-
TaylorSwiftNet: Taylor Driven Temporal Modeling for Swift Future Frame Prediction
-
Fourier-based Video Prediction through Relational Object Motion
-
A Hierarchical Variational Neural Uncertainty Model for Stochastic Video Prediction
-
Unsupervised Video Prediction from a Single Frame by Estimating 3D Dynamic Scene Structure
-
Conditional COT-GAN for Video Prediction with Kernel Smoothing
-
Anticipative Video Transformer (ICCV 2021)
-
Taylor saves for later: disentanglement for video prediction using Taylor representation
-
Local Frequency Domain Transformer Networks for Video Prediction
-
Hierarchical Motion Understanding via Motion Programs (CVPR 2021)
-
Learning Semantic-Aware Dynamics for Video Prediction (CVPR 2021)
-
Revisiting Hierarchical Approach for Persistent Long-Term Video Prediction (ICLR 2021)
-
Video Prediction Recalling Long-term Motion Context via Memory Alignment Learning (CVPR 2021)
-
Future Frame Prediction for Robot-assisted Surgery (IPMI 2021)
-
Greedy Hierarchical Variational Autoencoders for Large-Scale Video Prediction
-
MotionRNN: A Flexible Model for Video Prediction with Spacetime-Varying Motions (CVPR 2021)
-
VAE^2: Preventing Posterior Collapse of Variational Video Predictions in the Wild
-
A Stacking Ensemble Approach for Supervised Video Summarization
-
ERA: Entity Relationship Aware Video Summarization with Wasserstein GAN
-
Unsupervised Video Summarization with a Convolutional Attentive Adversarial Network
-
Reconstructive Sequence-Graph Network for Video Summarization (IEEE TPAMI 2021)
-
Creating and Reenacting Controllable 3D Humans with Differentiable Rendering (WACV 2022)
-
I2V-GAN: Unpaired Infrared-to-Visible Video Translation (ACM MM 2021)
-
Moving SLAM: Fully Unsupervised Deep Learning in Non-Rigid Scenes
-
Long-Term Temporally Consistent Unpaired Video Translation from Simulated Surgical 3D Data (ICCV 2021)
-
A Shape-Aware Retargeting Approach to Transfer Human Motion and Appearance in Monocular Videos (IJCV 2021)
-
Non-Rigid Neural Radiance Fields: Reconstruction and Novel View Synthesis of a Dynamic Scene From Monocular Video
-
NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion
-
Generative Adversarial Graph Convolutional Networks for Human Action Synthesis (WACV 2022)
-
Towards Using Clothes Style Transfer for Scenario-aware Person Video Generation
-
Latent Image Animator: Learning to animate image via latent space navigation (ICLR 2022)
-
SLAMP: Stochastic Latent Appearance and Motion Prediction (ICCV 2021)
-
VirtualConductor: Music-driven Conducting Video Generation System (ICME 2021)
-
Click to Move: Controlling Video Generation with Sparse Motion (ICCV 2021)
-
Understanding Object Dynamics for Interactive Image-to-Video Synthesis (CVPR 2021)
-
One-Shot Free-View Neural Talking-Head Synthesis for Video Conferencing (CVPR 2021)
-
Flow Guided Transformable Bottleneck Networks for Motion Retargeting (CVPR 2021)
-
Stable View Synthesis (CVPR 2021)
-
Scene-Aware Generative Network for Human Motion Synthesis (CVPR 2021)
-
Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes (CVPR 2021)
-
Deep Animation Video Interpolation in the Wild (CVPR 2021)
-
High-Fidelity Neural Human Motion Transfer from Monocular Video (CVPR 2021)
-
Flow-Guided One-Shot Talking Face Generation With a High-Resolution Audio-Visual Dataset (CVPR 2021)
-
Layout-Guided Novel View Synthesis From a Single Indoor Panorama (CVPR 2021)
-
Space-Time Neural Irradiance Fields for Free-Viewpoint Video (CVPR 2021)
-
GeoSim: Realistic Video Simulation via Geometry-Aware Composition for Self-Driving (CVPR 2021)
-
Animating Pictures With Eulerian Motion Fields (CVPR 2021)
-
SLAMP: Stochastic Latent Appearance and Motion Prediction (ICCV 2021)
-
CCVS: Context-aware Controllable Video Synthesis (NeurIPS 2021)
-
Diverse Video Generation using a Gaussian Process Trigger (ICLR 2021)
-
NWT: Towards natural audio-to-video generation with representation learning
-
Editable Free-viewpoint Video Using a Layered Neural Representation
-
A Good Image Generator Is What You Need for High-Resolution Video Synthesis
-
GODIVA: Generating Open-DomaIn Videos from nAtural Descriptions
-
Text2Video: Text-driven Talking-head Video Synthesis with Personalized Phoneme-Pose Dictionary
-
Write-a-speaker: Text-based Emotional and Rhythmic Talking-head Generation
-
Playable Video Generation (CVPR 2021)
-
Infinite Nature: Perpetual View Generation of Natural Scenes from a Single Image (ICCV 2021)
-
Vid-ODE: Continuous-Time Video Generation with Neural Ordinary Differential Equation (AAAI 2021)
-
Compositional Video Synthesis with Action Graphs (ICML 2021)
-
Temporal Shift GAN for Large Scale Video Generation (WACV 2021)
-
Learning Speech-driven 3D Conversational Gestures from Video
-
SLPC: a VRNN-based approach for stochastic lidar prediction and completion in autonomous driving
-
Self-Supervision by Prediction for Object Discovery in Videos
-
Modulated Periodic Activations for Generalizable Local Functional Representations (ICCV 2021)
-
Dynamic Texture Synthesis by Incorporating Long-range Spatial and Temporal Correlations
-
GANs N' Roses: Stable, Controllable, Diverse Image to Image Translation (works for videos too!)
-
Alias-Free Generative Adversarial Networks (NeurIPS 2021)
-
Modeling Clothing as a Separate Layer for an Animatable Human Avatar
-
CLIP-It! Language-Guided Video Summarization (NeurIPS 2021)
-
Towards an Interpretable Latent Space in Structured Models for Video Prediction
-
AnyoneNet: Synchronized Speech and Talking Head Generation for Arbitrary Person
-
SPACE: A Simulator for Physical Interactions and Causal Learning in 3D Environments
-
PIP: Physical Interaction Prediction via Mental Simulation with Span Selection
-
Advancing High-Resolution Video-Language Representation with Large-Scale Video Transcriptions (CVPR 2022)
-
Responsive Listening Head Generation: A Benchmark Dataset and Baseline (ECCV 2022)
-
BANMo: Building Animatable 3D Neural Models from Many Casual Videos (CVPR 2022)
-
Continuous-Time Video Generation via Learning Motion Dynamics with Neural ODE (BMVC 2021)
-
SAGA: Stochastic Whole-Body Grasping with Contact (ECCV 2022)
-
End-to-End Rate-Distortion Optimized Learned Hierarchical Bi-Directional Video Compression
-
Enhanced Frame and Event-Based Simulator and Event-Based Video Interpolation Network
-
Discrete neural representations for explainable anomaly detection (AAAI 2022)
-
Controllable Animation of Fluid Elements in Still Images (CVPR 2022)
-
One-shot Talking Face Generation from Single-speaker Audio-Visual Correlation Learning (AAAI 2022)
-
Efficient Neural Radiance Fields for Interactive Free-viewpoint Video (SIGGRAPH Asia 2022)
-
Dynamic View Synthesis from Dynamic Monocular Video (ICCV 2021)
-
Stochastic Talking Face Generation Using Latent Distribution Matching
-
Unsupervised object-centric video generation and decomposition in 3D (NeurIPS 2020)
-
Novel-View Human Action Synthesis (ACCV 2020)
-
Structure-Aware Human-Action Generation (ECCV 2020)
-
Hierarchical Patch VAE-GAN: Generating Diverse Videos from a Single Sample (NeurIPS 2020)
-
Audio-driven Talking Face Video Generation with Learning-based Personalized Head Pose
-
Exploring Spatial-Temporal Multi-Frequency Analysis for High-Fidelity and Temporal-Consistency Video Prediction (CVPR 2020)
-
Stochastic Latent Residual Video Prediction (ICML 2020)
-
G3AN: Disentangling Appearance and Motion for Video Generation (CVPR 2020)
-
Scaling Autoregressive Video Models (ICLR 2020)
-
VideoFlow: A Conditional Flow-Based Model for Stochastic Video Generation (ICLR 2020)
-
Music-oriented Dance Video Synthesis with Pose Perceptual Loss
-
DwNet: Dense warp-based network for pose-guided human video generation
-
Order Matters: Shuffling Sequence Generation for Video Prediction
-
From Here to There: Video Inbetweening Using Direct 3D Convolutions
-
Improved Conditional VRNNs for Video Prediction (ICCV 2019)
-
Sliced Wasserstein Generative Models (CVPR 2019)
-
Point-to-Point Video Generation (ICCV 2019)
-
High Frame Rate Video Reconstruction based on an Event Camera
-
Video Generation from Single Semantic Label Map (CVPR 2019)
-
Learning to navigate image manifolds induced by generative adversarial networks for unsupervised video generation
-
Animating Arbitrary Objects via Deep Motion Transfer (CVPR 2019)
-
StoryGAN: A Sequential Conditional GAN for Story Visualization (CVPR 2019)
-
Stochastic Adversarial Video Prediction (ICLR 2019)
-
Learning Temporal Coherence via Self-Supervision for GAN-based Video Generation
-
Towards High Resolution Video Generation with Progressive Growing of Sliced Wasserstein GANs
-
Everybody Dance Now (ICCV 2019)
-
Learning to Forecast and Refine Residual Motion for Image-to-Video Generation (ECCV 2018)
-
Talking Face Generation by Conditional Recurrent Adversarial Network
-
Probabilistic Video Generation using Holistic Attribute Control (ECCV 2018)
-
Stochastic Video Generation with a Learned Prior (ICML 2018)
-
Stochastic Video Generation with a Learned Prior (ICML 2018)
-
Stochastic Variational Video Prediction (ICLR 2018)
-
Hierarchical Video Generation from Orthogonal Information: Optical Flow and Texture (AAAI 2018)
-
MoCoGAN: Decomposing Motion and Content for Video Generation (CVPR 2018)