You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Training DQN and its variants with N-step returns is supported.
Resetting env with done=False via info dict is supported. When env.step returns a info dict with info['needs_reset']=True, env is reset. This feature is useful for implementing a continuing env.
Evaluation with a fixed number of timesteps is supported (except async training). This evaluation protocol is popular in Atari benchmarks.
examples/atari/dqn now implements the same evaluation protocol as the Nature DQN paper.
An example script of training a DoubleDQN agent for a PyBullet-based robotic grasping env is added: examples/grasping.
Important bugfixes
The bug that PPO's obs_normalizer was not saved is fixed.
The bug that NonbiasWeightDecay didn't work with newer versions of Chainer is fixed.
The bug that argv argument was ignored by chainerrl.experiments.prepare_output_dir is fixed.
Important destructive changes
train_agent_with_evaluation and train_agent_batch_with_evaluation now require eval_n_steps (number of timesteps for each evaluation phase) and eval_n_episodes (number of episodes for each evaluation phase) to be explicitly specified, with one of them being None.
train_agent_with_evaluation's max_episode_len argument is renamed to train_max_episode_len.
ReplayBuffer.sample now returns a list of lists of N experiences to support N-step returns.