Highlights
- Pro
Vision
Official code for "RealFusion: 360° Reconstruction of Any Object from a Single Image" (CVPR 2023)
3D Hand Shape and Pose Estimation from a Single RGB Image
A minimal solution to hand motion capture from a single color camera at over 100fps. Easy to use, plug to run.
Direct voxel grid optimization for fast radiance field reconstruction.
Instant neural graphics primitives: lightning fast NeRF and more
MoDem Accelerating Visual Model-Based Reinforcement Learning with Demonstrations
Official PyTorch implementation of ODISE: Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models [CVPR 2023 Highlight]
Code release for "Detecting Twenty-thousand Classes using Image-level Supervision".
Examples and guides for using the OpenAI API
End-to-End Object Detection with Transformers
Code release for "Masked-attention Mask Transformer for Universal Image Segmentation"
The repository for the largest and most comprehensive empirical study of visual foundation models for Embodied AI (EAI).
The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
A concise but complete full-attention transformer with a set of promising experimental features from various papers
[ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
Instruct-NeRF2NeRF: Editing 3D Scenes with Instructions (ICCV 2023)
[TPAMI 2023] SceneDreamer: Unbounded 3D Scene Generation from 2D Image Collections
Official repo for consistency models.
PyTorch code and models for the DINOv2 self-supervised learning method.
Pointcept: a codebase for point cloud perception research. Latest works: PTv3 (CVPR'24 Oral), PPT (CVPR'24), OA-CNNs (CVPR'24), MSC (CVPR'23)
Fast and memory-efficient exact attention
Segment-anything interactively in NeRF.
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
Code release for DS-NeRF (Depth-supervised Neural Radiance Fields)
Track-Anything is a flexible and interactive tool for video object tracking and segmentation, based on Segment Anything, XMem, and E2FGVI.