For a visual demonstration of this project, please refer to the video linked below:
This project aims to build an end-to-end system capable of taking a linguistic goal and executing it in a real-world scenario. The system leverages the CLEVR dataset and Transporter Networks to process images and translate them into end-effector poses. The primary objective is to handle various daily objects with different shapes, sizes, and colors without predefined object classes. Additionally, we incorporate safety constraints to ensure the robot avoids forbidden regions while performing tasks.
This project focuses on enhancing the CLIPort model to manipulate objects based on visual and linguistic inputs, ensuring the robot avoids forbidden regions. The task involves picking up a green block and placing it in a pink square while avoiding a blue forbidden region.
- Python 3.8 or higher
- TensorFlow 2.x
- PyTorch
- NumPy
- OpenCV
- CLEVR Dataset
- Transporter Networks
Download CLEVR Dataset:
- Follow instructions from CLEVR dataset to download and set up the dataset.
Configure Environment:
- Ensure your Python environment is correctly configured with the necessary libraries.
The system uses the CLIPort model, which integrates visual and linguistic inputs to guide robot manipulation tasks. The Transporter Network identifies the best pick-up (Tpick) and place (Tplace) points based on the environment's state.
- CLIP Model: Links linguistic goals with visual inputs.
- Transporter Network: Processes images to generate end-effector poses.
The primary task is to manipulate objects while avoiding forbidden regions:
- Pick: Identify and pick up a green block.
- Place: Place the block in a pink square region.
- Avoid: Ensure the trajectory does not pass through a blue forbidden region.
The Transporter Network initially failed to account for forbidden regions. To overcome this, we implemented the Liang-Barsky clipping algorithm to calculate intersection points and generate safe trajectories that avoid forbidden areas.
- Tpick: Best pick-up point.
- Tplace: Best place point.
- Safety Constraint: Adjust trajectory to avoid forbidden regions using calculated intersection points and offsets.
An evaluation matrix was created to track the number of trajectory points passing through the forbidden region. This metric quantifies the robot's ability to avoid the forbidden area while completing tasks.
- Trajectory Points: Number of points passing through the forbidden region.
- Episodes: Number of successful task completions.
The results are visualized through a graph showing the number of episodes against the trajectory points through the obstacle region for 100 episodes. Our goal is to minimize these points, indicating successful avoidance of forbidden regions.
- CLIPort GitHub Repository: CLIPort GitHub
- Project Report: Project Report
- Presentation: Presentation