Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ElasticArraySARTSTraces does not record the trajectories of MountainCarEnv() correctly #1067

Closed
Van314159 opened this issue Apr 12, 2024 · 7 comments

Comments

@Van314159
Copy link

Van314159 commented Apr 12, 2024

ElasticArraySARTSTraces works perfectly in RandomWalk1D environment, but it does not record the trajectories of MountainCarEnv() correctly. It replaces the state in every step by the final state. My code is :

# Set the gravity=0.0. Then velocity = power*(action-2).
> mcenv = MountainCarEnv(; power=0.01, gravity=0.0) 
> agentmc = Agent(
	policy = RandomPolicy(),
	trajectory = Trajectory(
           ElasticArraySARTSTraces(;
               state = Vector{Float64} => (),
               action = Int64 => (),
               reward = Float64 => (),
               terminal = Bool => (),
           ),
           DummySampler(),
           InsertSampleRatioController(),
       )
)
> run(agentmc, mcenv, StopAfterNSteps(5), TotalRewardPerEpisode());
> agentmc.trajectory.container

It returns

(state = [-0.5317774841747759, -0.03], next_state =[-0.5317774841747759, -0.03], action = 1, reward = -1.0, terminal = false)

(state = [-0.5317774841747759, -0.03], next_state = [-0.5317774841747759, -0.03], action = 3, reward = -1.0, terminal = false)

(state = [-0.5317774841747759, -0.03], next_state = [-0.5317774841747759, -0.03], action = 1, reward = -1.0, terminal = false)

(state = [-0.5317774841747759, -0.03], next_state = [-0.5317774841747759, -0.03], action = 1, reward = -1.0, terminal = false)

(state = [-0.5317774841747759, -0.03], next_state = [-0.5317774841747759, -0.03], action = 1, reward = -1.0, terminal = false)

I guess the problem is on the push!(agent, PostActStage(), env, action).

My Julia and package versions are

Julia Version 1.9.4
Commit 8e5136fa297 (2023-11-14 08:46 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: macOS (arm64-apple-darwin22.4.0)
  CPU: 8 × Apple M1
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.6 (ORCJIT, apple-m1)
  Threads: 4 on 4 virtual cores
Environment:
  JULIA_REVISE_WORKER_ONLY = 1

  [052768ef] CUDA v5.2.0
  [587475ba] Flux v0.14.15
  [8197267c] IntervalSets v0.7.10
  [91a5bcdd] Plots v1.40.3
  [7f904dfe] PlutoUI v0.7.58
  [158674fc] ReinforcementLearning v0.11.0
  [6486599b] ReinforcementLearningTrajectories v0.4.0
  [860ef19b] StableRNGs v1.0.1
  [e88e6eb3] Zygote v0.6.69
  [02a925ec] cuDNN v1.3.0
  [b77e0a4c] InteractiveUtils
  [d6f4376e] Markdown
  [44cfe95a] Pkg v1.9.2
  [9a3f8284] Random
  [2f01184e] SparseArrays
  [10745b16] Statistics v1.9.0
@HenriDeh
Copy link
Member

@jeremiahpslewis
Copy link
Member

I can't reproduce the error; there's a minor bug in the example code, this is what I have:

mcenv = MountainCarEnv(; power=0.01, gravity=0.0) 
agentmc = Agent(
	policy = RandomPolicy(),
	trajectory = Trajectory(
            ElasticArraySARTSTraces(;
               state = Float64 => (2,),
               action = Int64 => (),
               reward = Float64 => (),
               terminal = Bool => (),
           ),
           DummySampler(),
           InsertSampleRatioController(),
       )
)
run(agentmc, mcenv, StopAfterNSteps(5), TotalRewardPerEpisode());
agentmc.trajectory.container
julia> agentmc.trajectory.container[:state]
5-element RelativeTrace:
 [-0.5277521900779774, 0.0]
 [-0.5177521900779773, 0.01]
 [-0.49775219007797733, 0.02]
 [-0.4677521900779773, 0.03]
 [-0.4377521900779773, 0.03]

@jeremiahpslewis
Copy link
Member

@Van314159 Can you please try my example code, perhaps using a fresh environment?

@HenriDeh
Copy link
Member

I reproduced by copy pasting OP's code. But yes indeed I did not notice the state trace was not correctly configured.

@HenriDeh
Copy link
Member

Though, this should raise an error instead of silently failing.

@jeremiahpslewis
Copy link
Member

Note: same issue with CircularArraySARTSTraces

@Van314159
Copy link
Author

Van314159 commented Apr 12, 2024

@jeremiahpslewis
Yes. ElasticArraySARTSTraces works if I write state = Float64 => (2,), in trajectory rather than state = Vector{Float64} => (),. I'm wondering why does it work.

Anyway, thank you very much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants