Skip to content

Terabyte17/A2C-with-Tensorflow-2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

A2C with Tensorflow 2

One of the key concepts in Reinforcement Learning is Advantage Actor Critic or A2C which has an actor and a critic which train a neural network to learn a policy directly. This procedure also involves a network for predicting the value function. Although A2C has been implemented by many people, with the Stable Baselines and OpenAI Baselines being very popular, I wanted to implement A2C on my own to get to know more about how we can implement DeepRL techniques using Tensorflow and Keras.

There are many open source implementations of the algorithm, but, most of them are in tensorflow 1.x. Hence, the code in this repository is written in Tensorflow 2.x, and while writing the code it was more about getting to know new Tensorflow functionalities and their power rather than the algorithm which is actually pretty straightforward. I tested the A2C algorithm on the CartPole environment given by OpenAI, an environment in which the reward is based on what is the angle between the cartpole and the vertical. If it exceeds a paricular value or goes out of the frame, the episode terminates. The maximum reward in the environment is 200.

The algorithm upon training for nearly 500 episodes, was able to perform sufficiently well during test time and most of the times it got a reward of 200. This is only the first draft and I'll hopefully optimize it to give better results.

Following are the results from training and testing:

Training Results

Test Results

Model's performance

About

Cartpole Env. solved using Advantage Actor Critic

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages