This project demonstrates a Deep Q-Network (DQN) implementation for training a bipedal robot in a simulated environment using PyBullet. The robot learns to navigate towards a target while balancing and avoiding falling.
Make sure you have the following dependencies installed:
- Python 3.x
- PyBullet (
pip install pybullet
) - NumPy (
pip install numpy
) - PyTorch (
pip install torch
) - Collections (Included in Python's standard library)
- Time (Included in Python's standard library)
- Random (Included in Python's standard library)
- DQN Model: A neural network with three fully connected layers that estimates Q-values for different actions given the robot's state.
- Bipedal Robot: The robot is created using PyBullet's
createMultiBody
function and consists of a base and four joints. - State Representation: The state includes the robot's base position, orientation, velocities, and joint states.
- Action Space: The action space is discretized into 16 possible joint movements.
- Reward Function: The reward is based on the robot's distance to the target, tilting penalties, and joint movement penalties.
The training loop follows these steps:
- Environment Reset: The robot is reset to its initial position at the start of each episode.
- Action Selection: Actions are chosen using an ε-greedy strategy, with ε decaying over time.
- Simulation Step: The chosen action is applied to the robot, and the environment is stepped forward.
- Reward Calculation: Rewards are calculated based on the robot's progress towards the target, penalties for tilting, and penalties for falling.
- Experience Replay: A replay memory stores experiences, which are sampled to train the DQN.
- Target Network Update: The target network is updated every 10 episodes to stabilize training.
To run the program, execute:
python Robot.py