Website | Paper | Colab | Video
Yixuan Wang1, Guang Yin2, Binghao Huang1, Tarik Kelestemur3, Jiuguang Wang3, Yunzhu Li1
1Columbia University,
2University of Illinois Urbana-Champaign,
3Boston Dynamics AI Institute
teaser.mp4
- Install
- Generate Dataset
- Download Dataset
- Visualize Dataset
- Train
- Infer in Simulation
- Deploy in Real World
We recommend Mambaforge instead of the standard anaconda distribution for faster installation:
mamba env create -f conda_environment.yaml
conda activate gendp
pip install -e gendp/
pip install -e sapien_env/
pip install -e robomimic/
pip install -e d3fields_dev/
We use the SAPIEN to build the simulation environments. To create the data of heuristic policy for single episode, use the following command:
python gen_single_episode.py [episode_idx] [dataset_dir] [task_name] --headless --obj_name [OBJ_NAME] --mode [MODE_NAME]
For example, to generate one episode for the hang_mug
task with the GUI, you could run the following command:
python gen_single_episode.py 0 data/ hang_mug --obj_name nescafe_mug # random seed is 0; save the data into data/; task name is hang_mug; object name is nescafe_mug
Meanings for each argument are visible when running python gen_single_episode.py --help
.
If you want to create your own environments with different objects, please imitate sapien_env/sapien_env/sim_env/mug_collect_env.py
. Note that sim_env/custom_env.py
does NOT contain the robot. To add robots, please imitate sapien_env/sapien_env/rl_env/mug_collect_env.py
to add robots. To adjust camera views, please change YX_TABLE_TOP_CAMERAS
within sapien_env/sapien_env/gui/gui_base.py
.
We notice that sapien renderer have memory leak for large-scale data generation. To avoid this, we use bash commands to generate large-scale data.
python gen_multi_episodes.py
Arguments can be edited within gen_multi_episodes.py
.
If you want to download a small dataset to test the whole pipeline, you can run bash scripts/download_small_data.sh
. For hangning mug and pencil insertion task, you can run the following commands:
bash scripts/download_hang_mug.sh
bash scripts/download_pencil_insertion.sh
If the scripts do not work, you could manully download the data from UIUC Box or Google Drive and unzip them.
To visualize image observations within hdf5 files, use the following command:
python gendp/tests/vis_data_2d.py
You could adjust dataset path and observation keys in gendp/tests/vis_data_2d.py
.
Similarly, to visualize aggegated 3D observations, use the following command:
python gendp/tests/vis_aggr_data_3d.py
This will visualize aggregated point clouds from multiple views, robot states, and actions from the dataset. You could adjust dataset path and observation keys in gendp/tests/vis_aggr_data_3d.py
.
Similarly, to visualize 3D semantic fields, use the following command:
python gendp/tests/vis_semantic_fields.py
This will visualize 3D semantic fields processed by D3Fields, robot states, and actions. You could adjust dataset path and observation keys in gendp/tests/vis_semantic_fields.py
. The explanation of each entries within shape_meta
can be seen at Config Explanation.
To run training, we first set the environment variables.
export OMP_NUM_THREADS=1
export TOKENIZERS_PARALLELISM=true
export MKL_NUM_THREADS=1
Then, we run the following command:
cd [PATH_TO_REPO]/gendp
python train.py --config-dir=config/[TASK_NAME] --config-name=distilled_dino_N_4000.yaml training.seed=42 training.device=cuda training.device_id=0 data_root=[PATH_TO_DATA]
For example, to train on small_data
in my local machine, I run the following command:
python train.py --config-dir=config/small_data --config-name=distilled_dino_N_4000.yaml training.seed=42 training.device=cuda training.device_id=0 data_root=/home/yixuan/gendp
Please wait at least till 2 epoches to make sure that all pipelines are working properly. For hang_mug_sim
task and pencil_insertion_sim
task, you could simply replace [TASK_NAME] with hang_mug_sim
and pencil_insertion_sim
respectively.
There are several critical entries within the config file. Here are some explanations:
shape_meta: shape_meta contains the policy input and output information.
action: output information
shape: action dimension. In our work, it is 10 = (3 for translation, 6 for 6d rotation*, 1 for gripper)
key: [optional] key for the action in the dataset. It could be 'eef_action' or 'joint_action'. Default is 'eef_action'.
obs: input information
... # other input modalities if needed
d3fields: 3D semantic fields
shape: shape of the 3D semantic fields, i.e. (num_channel, num_points)
type: type of inputs. It should be 'spatial' for point cloud inputs
info: information of the 3D semantic fields.
reference_frame: frame of input semantic fields. It should be 'world' or 'robot'
distill_dino: whether to add semantic information to the point cloud
distill_obj: the name for reference features, which are saved in `d3fields_dev/d3fields/sel_feats/[DISTILL_OBJ].npy`.
view_keys: viewpoint keys for the semantic fields.
N_gripper: number of points sampled from the gripper.
boundaries: boundaries for the workspace.
resize_ratio: our pipeline will resize images by this ratio to save time and memory.
task:
env_runner: the configuration for the evaluation environment during the training
max_steps: maximum steps for each episode, which should be adjusted according to the task
n_test: number of testing environments
n_test_vis: number of testing environments that will be visualized on wandb
n_train: number of training environments
n_train_vis: number of training environments that will be visualized on wandb
train_obj_ls: list of objects that appear in the training environments
test_obj_ls: list of objects that appear in the testing environments
training:
checkpoint_every: the frequency of saving checkpoints
rollout_every: the frequency of rolling out the policy in the env_runner
Also, the configuration might be repetitive in the config file. Please sync them manually.
To run an existing policy in the simulator, use the following command:
cd [PATH_TO_REPO]/gendp
python eval.py --checkpoint [PATH_TO_CHECKPOINT] --output_dir [OUTPUT_DIR] --n_test [NUM_TEST] --n_train [NUM_TRAIN] --n_test_vis [NUM_TEST_VIS] --n_train_vis [NUM_TRAIN_VIS] --test_obj_ls [OBJ_NAME_1] --test_obj_ls [OBJ_NAME_2] --data_root [PATH_TO_DATA]
For example, we can run
python eval.py --checkpoint /home/yixuan/gendp/checkpoints/small_data/distilled_dino_N_4000/ckpt_00000000.pt --output_dir /home/yixuan/gendp/eval_results/small_data --n_test 10 --n_train 10 --n_test_vis 5 --n_train_vis 5 --test_obj_ls nescafe_mug --data_root /home/yixuan/gendp
To download the existing checkpoints, you could run the following commands.
bash scripts/download_ckpts.sh
You can also download them from UIUC Box or Google Drive and unzip them if the scipt fails.
- Aloha
- >=1 Realsense Camera
mamba env create -f conda_environment_real.yaml
pip install -e gendp/
pip install -e d3fields_dev/
- If you already have ROS noetic installed, you could run
bash scripts/setup_aloha.sh
outside of conda environments. Remember to putsource /opt/ros/noetic/setup.sh && source ~/interbotix_ws/devel/setup.sh
into~/.bashrc
after installation. - As mentioned in Aloha README, you need to go to
~/interbotix_ws/src/interbotix_ros_toolboxes/interbotix_xs_toolbox/interbotix_xs_modules/src/interbotix_xs_modules/arm.py
, find functionpublish_positions
. Changeself.T_sb = mr.FKinSpace(self.robot_des.M, self.robot_des.Slist, self.joint_commands)
toself.T_sb = None
. This prevents the code from calculating FK at every step which delays teleoperation. - We also need to update usb rules for the robot. You could run the following commands to update the usb rules. You might need to change the serial numbers to your own.
sudo bash scripts/modify_usb_rules.sh
sudo udevadm control --reload && sudo udevadm trigger
- Remember to reboot the computer after the installation. If you encounter any problems, please refer to the Aloha.
- To test whether the robot installation is successful, you could run the following command:
# boths sides
roslaunch aloha 4arms_teleop.launch
python gendp/gendp/real_world/aloha_simple_teleop.py --left --right
# left side
roslaunch aloha 2arms_left_teleop.launch
python gendp/gendp/real_world/aloha_simple_teleop.py --left
# right side
roslaunch aloha 2arms_right_teleop.launch
python gendp/gendp/real_world/aloha_simple_teleop.py --right
We found raw RealSense intrinsics are accurate enough for our pipeline, but you might want to verify it before proceeding.
First, we calibrate the extrinsics between the camera and the world (i.e. calibration board) frame. We use calib.io to generate the calibration board. Please use ChArUco
as Target Type
. You could select the rest of options according to your preference and printing capability. Then you can click Save calibration board as PDF
to download and print the calibration board. Then you could run
python gendp/gendp/real_world/calibrate_realsenses.py --rows [NUM_ROWS] --cols [NUM_COLS] --checker_width [CHECKER_WIDTH] --marker_width [MARKER_WIDTH]
This will keep running calibration pipeline in a while True
loop and save the calibration results in gendp/gendp/real_world/cam_extrinsics
. To visualize the calibration results, you could run
python gendp/gendp/real_world/vis_cam_cali.py --iterative
Enabling --iterative
will visualize each camera's point cloud iteratively and aggregated point cloud at the end. Otherwise, it will only visualize the aggregated point cloud. You are expected to see a well-aligned point cloud of the workspace.
Lastly, we calibrate the transformations between the robot base and the world frame, which is done manually. You could adjust robot_base_in_world
within gendp/gendp/real_world/calibrate_robot.py
, which represents the robots' base pose in the world (i.e. calibration board) frame. You could run
python gendp/gendp/real_world/calibrate_robot.py
This will allow you to control robots and visualize the robot point cloud and the aggregated point cloud from cameras at the same time. You could adjust the robot base pose until the robot point cloud is well-aligned with the aggregated point cloud.
You could collect demonstrations by running the following command:
python gendp/demo_real_aloha.py --output_dir [OUTPUT_DIR] --robot_sides [ROBOT_SIDE] --robot_sides [ROBOT_SIDE] # [ROBOT_SIDE] could be 'left' or 'right'
Press "C" to start recording. Use SpaceMouse to move the robot. Press "S" to stop recording.
The traning is similar to the training in the simulator. Here are two examples:
bash scripts/download_real_data.sh # download the real data
python train.py --config-dir=config/knife_real --config-name=distilled_dino_N_1000.yaml training.seed=42 training.device=cuda training.device_id=0 data_root=/home/yixuan/gendp # train the model for pick_up_knife task
python train.py --config-dir=config/pen_real --config-name=distilled_dino_N_1000.yaml training.seed=42 training.device=cuda training.device_id=0 data_root=/home/yixuan/gendp # train the model for pick_up_pen task
Given a checkpoint, you could run the following command to infer in the real world (absolute path is recommended):
python gendp/eval_real_aloha.py -i [PATH_TO_CKPT_FILE] -o [OUTPUT_DIR] -r [ROBOT_SIDE] --vis_d3fields [true OR false]
Press "C" to start evaluation (handing control over to the policy). Press "S" to stop the current episode.
To adapt our framework to new tasks, you could follow the following steps:
- You can select reference DINO features by running
python d3fields_dev/d3fields/scripts/sel_features.py
. This will provide an interactive interface to select the reference features given four arbitrary images. Click left mouse button to select the reference features and 'N' to next image. ClickQ
to quit and save the selected features. - For the new task, you may need to update several important configuration entries.
shape_meta:
action:
shape: 10 if using single robot and 20 for bimanual manipulation
obs:
d3fields:
shape: change the first number (number of channel). It is 3 if only using raw point cloud. It is 3 + number of reference features if using DINOv2 features.
info:
distill_dino: whether to add semantic information to the point cloud
distill_obj: the name for reference features, which are saved in `d3fields_dev/d3fields/sel_feats/[DISTILL_OBJ].npy`.
bounding_box: the bounding box for the workspace
task_name: name for tasks, which will be used in wandb and logging files
dataset_name: the name for the training dataset, which will be used to infer dataset_dir (e.g. ${data_root}/data/real_aloha_demo/${dataset_name} or ${data_root}/data/sapien_demo/${dataset_name})
This repository is built upon the following repositories. Thanks for their great work!