Clean baseline implementation of PPO using an episodic TransformerXL memory
deep-reinforcement-learning pytorch transformer policy-gradient pomdp actor-critic proximal-policy-optimization ppo on-policy episodic-memory transformer-xl gtrxl trxl gated-transformer-xl memory-gym
-
Updated
Jun 18, 2024 - Python