- Zero-TPrune: Zero-Shot Token Pruning through Leveraging of the Attention Graph in Pre-Trained Transformers
- FlowerFormer: Empowering Neural Architecture Encoding using a Flow-aware Graph Transformer
- Compositional Video Understanding with Spatiotemporal Structure-based Transformers
- MeshGPT: Generating Triangle Meshes with Decoder-Only Transformers
- Bezier Everywhere All at Once: Learning Drivable Lanes as Bezier Graphs
- Adaptive Hyper-graph Aggregation for Modality-Agnostic Federated Learning
- OmniVec2 - A Novel Transformer based Network for Large Scale Multimodal and Multitask Learning
- Compositional Chain-of-Thought Prompting for Large Multimodal Models
- The Audio-Visual Conversational Graph: From an Egocentric-Exocentric Perspective
- Improving Graph Contrastive Learning via Adaptive Positive Sampling
- CLIP-Driven Open-Vocabulary 3D Scene Graph Generation via Cross-Modality Contrastive Learning
- SG-PGM: Partial Graph Matching Network with Semantic Geometric Fusion for 3D Scene Graph Alignment and Its Downstream Tasks
- Open3DSG: Open-Vocabulary 3D Scene Graphs from Point Clouds with Queryable Objects and Open-Set Relationships
- GraphDreamer: Compositional 3D Scene Synthesis from Scene Graphs
- Multi-Level Neural Scene Graphs for Dynamic Urban Environments
- Composing Object Relations and Attributes for Image-Text Matching
- Neighbor Relations Matter in Video Scene Detection
- DSGG: Dense Relation Transformer for an End-to-end Scene Graph Generation
- EGTR: Extracting Graph from Transformer for Scene Graph Generation
- HiKER-SGG: Hierarchical Knowledge Enhanced Robust Scene Graph Generation
- HIG: Hierarchical Interlacement Graph Approach to Scene Graph Generation in Video Understanding
- OED: Towards One-stage End-to-End Dynamic Scene Graph Generation
- LLM4SGG: Large Language Models for Weakly Supervised Scene Graph Generation
- From Pixels to Graphs: Open-Vocabulary Scene Graph Generation with Vision-Language Models
- Leveraging Predicate and Triplet Learning for Scene Graph Generation
- DGC-GNN: Leveraging Geometry and Color Cues for Visual Descriptor-Free 2D-3D Matching
- Denoising Point Clouds in Latent Space via Graph Convolution and Invertible Neural Network
- Object Dynamics Modeling with Hierarchical Point Cloud-based Representations
- GLiDR: Topologically Regularized Graph Generative Network for Sparse LiDAR Point Clouds
- LiDAR-based Person Re-identification
- Dynamic Graph Representation with Knowledge-aware Attention for Histopathology Whole Slide Image Analysis
- GreedyViG: Dynamic Axial Graph Construction for Efficient Vision GNNs
- Generating Handwritten Mathematical Expressions From Symbol Graphs: An End-to-End Pipeline
- HyperSDFusion: Bridging Hierarchical Structures in Language and Geometry for Enhanced 3D Text2Shape Generation
- Constrained Layout Generation with Factor Graphs
- MaskPLAN: Masked Generative Layout Planning from Partial Input
- HHMR: Holistic Hand Mesh Recovery by Enhancing the Multimodal Controllability of Graph Diffusion Models
- Dysen-VDM: Empowering Dynamics-aware Text-to-Video Diffusion with LLMs
- DiffAssemble: A Unified Graph-Diffusion Model for 2D and 3D Reassembly
- Relation Rectification in Diffusion Model
- Neural Sign Actors: A Diffusion Model for 3D Sign Language Production from Text
- Molecular Data Programming: Towards Molecule Pseudo-labeling with Systematic Weak Supervision
- Clustering for Protein Representation Learning
- Multi-agent Long-term 3D Human Pose Forecasting via Interaction-aware Trajectory Conditioning
- Higher-order Relational Reasoning for Pedestrian Trajectory Prediction
- Tumor Micro-environment Interactions Guided Graph Learning for Survival Analysis of Human Cancers from Whole-slide Pathological Images
- XFibrosis: Explicit Vessel-Fiber Modeling for Fibrosis Staging from Liver Pathology Images
- BlockGCN: Redefine Topology Awareness for Skeleton-Based Action Recognition
- Person in Place: Generating Associative Skeleton-Guidance Maps for Human-Object Interaction Image Editing
- SOK-Bench: A Situated Video Reasoning Benchmark with Aligned Open-World Knowledge
- Advancing Saliency Ranking with Human Fixations: Dataset Models and Benchmarks
- Neural Markov Random Field for Stereo Matching
- MESA: Matching Everything by Segmenting Anything
- CURSOR: Scalable Mixed-Order Hypergraph Matching with CUR Decomposition
- MaskClustering: View Consensus based Mask Graph Clustering for Open-Vocabulary 3D Instance Segmentation
- CAGE: Controllable Articulation GEneration
- G-FARS: Gradient-Field-based Auto-Regressive Sampling for 3D Part Grouping
- 3D Feature Tracking via Event Camera
- TeMO: Towards Text-Driven 3D Stylization for Multi-Object Meshes
- Category-Level Multi-Part Multi-Joint 3D Shape Assembly
- VS: Reconstructing Clothed 3D Human from Single Image via Vertex Shift
- FC-GNN: Recovering Reliable and Accurate Correspondences from Interferences
- Domain Separation Graph Neural Networks for Saliency Object Ranking
- Improving Out-of-Distribution Generalization in Graphs via Hierarchical Semantic Environments
- SignGraph: A Sign Sequence is Worth Graphs of Nodes
- Image Processing GNN: Breaking Rigidity in Super-Resolution
- Learning Structure-from-Motion with Graph Attention Networks
- MemoNav: Working Memory Model for Visual Navigation
- Error Detection in Egocentric Procedural Task Videos
- Semantic-Aware Multi-Label Adversarial Attacks
- Learning for Transductive Threshold Calibration in Open-World Recognition