Skip to content

v0.6.0

Compare
Choose a tag to compare
@muupan muupan released this 28 Feb 08:50
· 1020 commits to master since this release
9d9f083

Important enhancements

  • Implicit Quantile Network (IQN) https://arxiv.org/abs/1806.06923 agent is added: chainerrl.agents.IQN.
  • Training DQN and its variants with N-step returns is supported.
  • Resetting env with done=False via info dict is supported. When env.step returns a info dict with info['needs_reset']=True, env is reset. This feature is useful for implementing a continuing env.
  • Evaluation with a fixed number of timesteps is supported (except async training). This evaluation protocol is popular in Atari benchmarks.
    • examples/atari/dqn now implements the same evaluation protocol as the Nature DQN paper.
  • An example script of training a DoubleDQN agent for a PyBullet-based robotic grasping env is added: examples/grasping.

Important bugfixes

  • The bug that PPO's obs_normalizer was not saved is fixed.
  • The bug that NonbiasWeightDecay didn't work with newer versions of Chainer is fixed.
  • The bug that argv argument was ignored by chainerrl.experiments.prepare_output_dir is fixed.

Important destructive changes

  • train_agent_with_evaluation and train_agent_batch_with_evaluation now require eval_n_steps (number of timesteps for each evaluation phase) and eval_n_episodes (number of episodes for each evaluation phase) to be explicitly specified, with one of them being None.
  • train_agent_with_evaluation's max_episode_len argument is renamed to train_max_episode_len.
  • ReplayBuffer.sample now returns a list of lists of N experiences to support N-step returns.

All updates

Enhancement

  • Implicit quantile networks (IQN) (#288)
  • Adds N-step learning for DQN-based agents. (#317)
  • Replaywarning (#321)
  • Close envs in async training (#343)
  • Allow envs to send a 'needs_reset' signal (#356)
  • Changes variable names in train_agent_with_evaluation (#358)
  • Use chainer.dataset.concat_examples in batch_states (#366)
  • Implements Time-based evaluations (#367)

Documentation

  • Add long description for pypi (#357, thanks @ljvmiranda921!)
  • A small change to the installation documentation (#369)
  • Adds a link to the ChainerRL visualizer from the main repository (#370)
  • adds implicit quantile networks to readme (#393)
  • Fix DQN.update's docstring (#394)

Examples

  • Grasping example (#371)
  • Adds Deepmind Scores to README in DQN Example (#383)

Testing

  • Fix TestTrainAgentAsync (#363)
  • Use AbnormalExitCodeWarning for nonzero exitcode warnings (#378)
  • Avoid random test failures due to asynchronousness (#380)
  • Drop hacking (#381)
  • Avoid gym 0.11.0 in Travis (#396)
  • Stabilize and speed up A3C tests (#401)
  • Reduce ACER's test cases and maximum timesteps (#404)
  • Add tests of IQN examples (#405)

Bugfixes

  • Avoid UnicodeDecodeError in setup.py (#365)
  • Save and load obs_normalizer of PPO (#377)
  • Make NonbiasWeightDecay work again (#390)
  • bug fix (#391, thanks @tappy27!)
  • Fix episodic training of DDPG (#399)
  • Fix PGT's training (#400)
  • Fix ResidualDQN's training (#402)