Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added Policy Gradient Tutorial #82

Open
wants to merge 6 commits into
base: tutorials
Choose a base branch
from

Conversation

shreyas-kowshik
Copy link
Contributor

*Description
Implementation of Vanilla Monte Carlo Policy Gradients on the CartPole-v0 environment added as a tutorial.

*Tests
Run the script

return mean(-logpi .* A_t)
end

opt = ADAM(params(policy),η)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

opt = ADAM(params(policy),η) could be
opt = ADAM(η) in accordance with the new optimizer API


G_t = γ*G_t + r

l = l .+ loss(state,act,G_t)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Broadcasting is not required for a variable element.


l = l .+ loss(state,act,G_t)
Flux.back!(loss(state,act,G_t))
opt()
Copy link
Contributor

@tejank10 tejank10 Jan 17, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WRT the new Optimizer API, this will become update!(opt, params(model))

@shreyas-kowshik
Copy link
Contributor Author

@tejank10

update!(opt,params(policy))

does not find a matching candidate.
Tried using

grads = Tracker.gradient(() -> loss(state,act,G_t), params(policy))

for p in params(policy)
  update!(opt, p, grads[p])
end

but even this throws up errors.

@DhairyaLGandhi
Copy link
Member

DhairyaLGandhi commented Jan 31, 2019

Please add a Project.toml and Manifest.toml as well so it easier to standardize the environment.

@shreyas-kowshik
Copy link
Contributor Author

@dhairyagandhi96 Added the files

@@ -0,0 +1,195 @@
# ***Generative Adversarial Network Tutorial***
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just use normal headings for these rather than the extra formatting / html tags?

@MikeInnes
Copy link
Member

@tejank10 are you happy with the changes made here, or is there more to do?

@shreyas-kowshik
Copy link
Contributor Author

@MikeInnes Thanks for the reply. I apologize for making a few errors before. The DCGAN code should not have been included in this PR. There is a separate PR for that. I have corrected it by removing the GAN code. The changes you mentioned for the GAN part will be updated in the respective PR.

@shreyas-kowshik
Copy link
Contributor Author

shreyas-kowshik commented Mar 26, 2019

@tejank10 I have made the changes requested. Sorry for having delayed this for so long. I got into other work and did not fix the errors that were coming up. The changes have been completed now. I have also added functions to normalize the discounted rewards which would aid in training the network.

@MikeInnes
Copy link
Member

It will also need to be in its own folder, and have a simple README. Otherwise this is looking good I think, but it'd be good to hear from @tejank10.

@shreyas-kowshik
Copy link
Contributor Author

@MikeInnes Sorry for the delayed response. I have made the changes. Is the README sufficient for now or is there something more to be added?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants