A reading list of papers about Video Grounding.
- Charades-STA [2017][ICCV] TALL: Temporal Activity Localization via Language Query.[paper][dataset][Charades]
- ActivityNet Captions [2017][ICCV] Dense-Captioning Events in Videos.[paper][dataset]
- DiDeMo [2017][ICCV] Localizing Moments in Video with Natural Language.[paper][dataset]
- TACoS [2013][ACL] Grounding Action Descriptions in Videos.[paper][dataset]
- CD [2021][arXiv] A Closer Look at Temporal Sentence Grounding in Videos: Datasets and Metrics.[paper][dataset]
- CG [2022][CVPR] Compositional Temporal Grounding with Structured Variational Cross-Graph Correspondence Learning.[paper][dataset]
- MAD [2022][CVPR] MAD: A Scalable Dataset for Language Grounding in Videos from Movie Audio Descriptions.[paper][dataset]
- [2022][AAAI] Explore Inter-Contrast Between Videos via Composition forWeakly Supervised Temporal Sentence Grounding.[paper]
- [2022][AAAI] Exploring Motion and Appearance Information for Temporal Sentence Grounding.[paper]
- [2022][AAAI] Memory-Guided Semantic Learning Network for Temporal Sentence Grounding.[paper]
- [2022][AAAI] Negative Sample Matters: A Renaissance of Metric Learning for Temporal Grounding.[paper]
- [2022][AAAI] Unsupervised Temporal Video Grounding with Deep Semantic Clustering.[paper]
- [2022][CVPR] Compositional Temporal Grounding with Structured Variational Cross-Graph Correspondence Learning.[paper][code]
- [2022][CVPR] MAD: A Scalable Dataset for Language Grounding in Videos from Movie Audio Descriptions.[paper][code]
- [2022][IJCV] Weakly Supervised Moment Localization with Decoupled Consistent Concept Prediction.[paper]
- [2022][TIP] Video Moment Retrieval with Cross-Modal Neural Architecture Search.[paper]
- [2022][TIP] Exploring Language Hierarchy for Video Grounding.[paper]
- [2022][TMM] Cross-modal Dynamic Networks for Video Moment Retrieval with Text Query.[paper]
- [2021][ACL] Parallel Attention Network with Sequence Matching for Video Grounding.[paper]
- [2021][ACMMM] AsyNCE: Disentangling False-Positives forWeakly-Supervised Video Grounding.[paper]
- [2021][CVPR] Cascaded Prediction Network via Segment Tree for Temporal Video Grounding.[paper]
- [2021][CVPR] Context-aware Biaffine Localizing Network for Temporal Sentence Grounding.[paper][code]
- [2021][CVPR] Embracing Uncertainty: Decoupling and De-bias for Robust Temporal Grounding.[paper]
- [2021][CVPR] Interventional Video Grounding with Dual Contrastive Learning.[paper][code]
- [2021][CVPR] Multi-stage Aggregated Transformer Network for Temporal Language Localization in Videos.[paper]
- [2021][CVPR] Structured Multi-Level Interaction Network for Video Moment Localization via Language Query.[paper]
- [2021][ICCV] Zero-shot Natural Language Video Localization.[paper]
- [2021][ICCV] Boundary-sensitive Pre-training for Temporal Localization in Videos.[paper]
- [2021][ICCV] Support-Set Based Cross-Supervision for Video Grounding.[paper]
- [2021][ICCV] VLG-Net: Video-Language Graph Matching Network for Video Grounding.[paper][code]
- [2021][TMM] Weakly Supervised Temporal Adjacent Network for Language Grounding.[paper]
- [2021][arXiv] A Closer Look at Temporal Sentence Grounding in Videos: Datasets and Metrics.[paper][code]
- [2021][CVPR] .[paper]
- [2020][AAAI] Weakly-Supervised Video Moment Retrieval via Semantic Completion Network.[paper]
- [2020][AAAI] Tree-Structured Policy based Progressive Reinforcement Learning for Temporally Language Grounding in Video.[paper][code]
- [2020][AAAI] Temporally Grounding Language Queries in Videos by Contextual Boundary-Aware Prediction.[paper]
- [2020][AAAI] Learning 2D Temporal Adjacent Networks for Moment Localization with Natural Language.[paper]
- [2020][ACMMM] Fine-grained Iterative Attention Network for Temporal Language Localization in Videos.[paper]
- [2020][ECCV] Learning Modality Interaction for Temporal Sentence Localization and Event Captioning in Videos.[paper]
- [2020][CVPR] Local-Global Video-Text Interactions for Temporal Grounding.[paper][code]
- [2020][CVPR] Dense Regression Network for Video Grounding.[paper]
- [2019][AAAI] Localizing Natural Language in Videos.[paper]
- [2019][AAAI] Multilevel Language and Vision Integration for Text-to-Clip Retrieval.[paper]
- [2019][AAAI] Read,Watch, and Move: Reinforcement Learning for Temporally Grounding Natural Language Descriptions in Videos.[paper]
- [2019][AAAI] Semantic Proposal for Activity Localization in Videos via Sentence Query.[paper]
- [2019][AAAI] To Find Where You Talk: Temporal Sentence Localization in Video with Attention Based Location Regression.[paper]
- [2019][CVPR] Language-driven Temporal Activity Localization: A Semantic Matching Reinforcement Learning Model.[paper]
- [2019][CVPR] MAN: Moment Alignment Network for Natural Language Moment Retrieval via Iterative Graph Adjustment.[paper]
- [2019][CVPR] Weakly Supervised Video Moment Retrieval From Text Queries .[paper]
- [2019][EMNLP] WSLLN:Weakly Supervised Natural Language Localization Networks.[paper]
- [2019][NeurIPS] Semantic Conditioned Dynamic Modulation for Temporal Sentence Grounding in Videos.[paper]
- [2019][WACV] MAC: Mining Activity Concepts for Language-based Temporal Localization.[paper]
- [2018][EMNLP] Localizing Moments in Video with Temporal Language.[paper]
- [2018][EMNLP] Temporally Grounding Natural Sentence in Video.[paper]
- [2018][SIGIR] Attentive Moment Retrieval in Videos.[paper]
- [2017][ICCV] TALL: Temporal Activity Localization via Language Query.[paper]
- [2017][ICCV] Dense-Captioning Events in Videos.[paper]
- [2017][ICCV] Localizing Moments in Video with Natural Language.[paper]
TODO
- [2022][AAAI] End-to-End Modeling via Information Tree for One-Shot Natural Language Spatial Video Grounding.[paper]
- [2021][CVPR] Co-Grounding Networks with Semantic Attention for Referring Expression Comprehension in Videos.[paper]
- [2021][ICCV] STVGBert: A Visual-linguistic Transformer based Framework for Spatio-temporal Video Grounding.[paper]
- [2021][CVPR] .[paper]
TODO
TODO
TODO
TODO