Enabling periodic evaluation #202
Unanswered
elle-miller
asked this question in
Q&A
Replies: 1 comment
-
Benefits of separating training from evaluation Here is an example of an agent in the Isaac Lab Cartpole environment. You can see that the evaluation returns are communicating the true learning state of the agent, without the stochasticity of the sampled actions. I was always confused by how the performance would degrade/oscillate in Training loop is below. You can reproduce results with minimal example:
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi there,
I would like to use the
SequentialTrainer
in a way that enables me to periodically evaluate the agent during training. The provided example only shows evaluation post-training: https://skrl.readthedocs.io/en/latest/api/trainers/sequential.htmlIn this example, I want to train
num_envs
for 1000 timesteps each, and evaluate 10 times throughout the process. This means training 100 timesteps, evaluate, and repeat x10.Code modifications
train()
function to reset the memory and rollout counter:act()
function in PPO to only return the mean action under evaluation instead of sampling.eval()
method to include:I have applied a
mask
to the rewards because if an environment terminates or truncates, I want the rewards for that episode to stop accumulating. However, the mean evaluation returns I am getting with this method are not "correct". For example with the Isaac Lab Cartpole environment, with the mask the returns never go past ~155, but if I comment out the mask the returns are -> 300 (optimal policy). When I play the learned policy it is indeed optimal.Questions:
SequentialTrainer
in an alternatingtrain()->eval()->train()
fashion, are there any other implementation changes that should be made? e.g. environment resets.step()
already masked out somewhere?Thanks in advance!
Beta Was this translation helpful? Give feedback.
All reactions