Yanmin Wu1, Jiarui Meng1, Haijie Li1, Chenming Wu2*, Yahao Shi3, Xinhua Cheng1, Chen Zhao2, Haocheng Feng2, Errui Ding2, Jingdong Wang2, Jian Zhang1*
1 Peking University, 2 Baidu VIS, 3 Beihang University
The installation of OpenGaussian is similar to 3D Gaussian Splatting.
git clone https://github.com/yanmin-wu/OpenGaussian.git
Then install the dependencies:
conda env create --file environment.yml
conda activate gaussian_splatting
# the rasterization lib comes from DreamGaussian
cd OpenGaussian/submodules
unzip ashawkey-diff-gaussian-rasterization.zip
pip install ./ashawkey-diff-gaussian-rasterization
- other additional dependencies: bitarray, scipy, pytorch3d
pip install bitarray scipy # install a pytorch3d version compatible with your PyTorch, Python, and CUDA.
simple-knn
is not required
-
Point feature visualization - Data preprocessing
- Improved SAM mask extraction (extracting only one layer)
- Click to Select 3D Object
The files are as follows:
[DATA_ROOT]
├── [1] scannet/
│ │ ├── scene0000_00/
| | | |── color/
| | | |── language_features/
| | | |── points3d.ply
| | | |── transforms_train/test.json
| | | |── *_vh_clean_2.labels.ply
│ │ ├── scene0062_00/
│ │ └── ...
├── [2] lerf_ovs/
│ │ ├── figurines/ & ramen/ & teatime/ & waldo_kitchen/
| | | |── images/
| | | |── language_features/
| | | |── sparse/
│ │ ├── label/
- [1] Prepare ScanNet Data
- You can directly download our pre-processed data: OneDrive. Please unzip the
color.zip
andlanguage_features.zip
files. - The ScanNet dataset requires permission for use, following the ScanNet instructions to apply for dataset permission.
- The preprocessing script will be updated later.
- You can directly download our pre-processed data: OneDrive. Please unzip the
- [2] Prepare lerf_ovs Data
- You can directly download our pre-processed data: OneDrive (re-annotated by LangSplat). Please unzip the
images.zip
andlanguage_features.zip
files.
- You can directly download our pre-processed data: OneDrive (re-annotated by LangSplat). Please unzip the
- Mask and Language Feature Extraction Details
- We use the tools provided by LangSplat to extract the SAM mask and CLIP features, but we only use the large-level mask.
chmod +x scripts/train_scannet.sh
./scripts/train_scannet.sh
- Please check the script for more details and modify the dataset path.
- you will see the following processes during training:
[Stage 0] Start 3dgs pre-train ... (step 0-30k) [Stage 1] Start continuous instance feature learning ... (step 30-50k) [Stage 2.1] Start coarse-level codebook discretization ... (step 50-70k) [Stage 2.2] Start fine-level codebook discretization ... (step 70-90k) [Stage 3] Start 2D language feature - 3D cluster association ... (1 min)
- Intermediate results from different stages can be found in subfolders
***/train_process/stage*
. (The intermediate results of stage 3 are recommended to be observed in the LeRF dataset.)
chmod +x scripts/train_lerf.sh
./scripts/train_lerf.sh
- Please check the script for more details and modify the dataset path.
- you will see the following processes during training:
[Stage 0] Start 3dgs pre-train ... (step 0-30k) [Stage 1] Start continuous instance feature learning ... (step 30-40k) [Stage 2.1] Start coarse-level codebook discretization ... (step 40-50k) [Stage 2.2] Start fine-level codebook discretization ... (step 50-70k) [Stage 3] Start 2D language feature - 3D cluster association ... (1 min)
- Intermediate results from different stages can be found in subfolders
***/train_process/stage*
.
- TODO
- Please install
open3d
first, and then execute the following command on a system with UI support:python scripts/vis_opengs_pts_feat.py
- Please specify
ply_path
in the script as the PLY fileoutput/xxxxxxxx-x/point_cloud/iteration_x0000/point_cloud.ply
saved at different stages. - During the training process, we have saved the first three dimensions of the 6D features as colors for visualization; see here.
- Please specify
- The same rendering method as the 3DGS rendering colors.
You can find the rendered feature maps in subfolders
python render.py -m "output/xxxxxxxx-x"
renders_ins_feat1
andrenders_ins_feat2
.
Due to code optimization and the use of more suitable hyperparameters, the latest evaluation metrics may be higher than those reported in the paper.
- Evaluate text-guided segmentation performance on ScanNet for 19, 15, and 10 categories.
# unzip the pre-extracted text features cd assets unzip text_features.zip # 1. please check the `gt_file_path` and `model_path` are correct # 2. specify `target_id` as 19, 15, or 10 categories. python scripts/eval_scannet.py
-
(1) First, render text-selected 3D Gaussians into multi-view images.
# unzip the pre-extracted text features cd assets unzip text_features.zip # 1. specify the model path using -m # 2. specify the scene name: figurines, teatime, ramen, waldo_kitchen python render_lerf_by_text.py -m "output/xxxxxxxx-x" --scene_name "figurines"
The object selection results are saved in
output/xxxxxxxx-x/text2obj/ours_70000/renders_cluster
. -
(2) Then, compute evaluation metrics.
Due to code optimization and the use of more suitable hyperparameters, the latest evaluation metrics may be higher than those reported in the paper. The metrics may be unstable due to the limited evaluation samples of LeRF.
# 1. change path_gt and path_pred in the script # 2. specify the scene name: figurines, teatime, ramen, waldo_kitchen python scripts/compute_lerf_iou.py --scene_name "figurines"
- TODO
We are quite grateful for 3DGS, LangSplat, CompGS, LEGaussians, SAGA, and SAM.
@article{wu2024opengaussian,
title={OpenGaussian: Towards Point-Level 3D Gaussian-based Open Vocabulary Understanding},
author={Wu, Yanmin and Meng, Jiarui and Li, Haijie and Wu, Chenming and Shi, Yahao and Cheng, Xinhua and Zhao, Chen and Feng, Haocheng and Ding, Errui and Wang, Jingdong and others},
journal={arXiv preprint arXiv:2406.02058},
year={2024}
}
If you have any questions about this project, please feel free to contact Yanmin Wu: wuyanminmax[AT]gmail.com