Voxel occupancy mapping from point clouds and RGB-D images using transformers.
Using a ResNet-based image encoder, NDT-Net point cloud encoder, a ViLBERT-based neck and a deconvolutional head.
These are the requirements you need to install on your system in order to train and evaluate the models.
- Wandb
- PyTorch
- Open3D
- Matplotlib
- NumPy
- OpenCV
- NDT-Net
You can install all dependencies except NDT-Net by running the command pip install -r requirements.txt
- Docker
If you want to go this way, you will need to build the container and then you can follow the same instructions. Just use the container as a remote terminal.
You will need to log in to your wandb account to be able to log the losses and accuracies. Run the command wandb login
Confirm the dataset configuration (namely the path) is according to your expectations in the
configuration file.- In case you are willing to create/use your own dataset, feel free to create a new file with the same structure.
Run the command
python tools/img_pretrain.py --config multitudinous/configs/pretraining/img/se_resnet50_unet.yaml --dataset multitudinous/configs/datasets/carla_rgbd.yaml --output weights/img_pretrain_5k
- The first configuration refers to the model configuration. You can check the others available in that same directory or create a new.
- In any case of doubt, run the script with the
The instructions on pre-training the point cloud backbone are described on its README, available in here.
It is heavily recommended to first pre-train the backbones for the training to converge faster.
- Run the command
python tools/train.py --config multitudinous/configs/model/se_resnet50-ndtnet.yaml --img_backbone_weights /path/to/weights --point_cloud_backbone_weights /path/to/weights