Announcement
This release will probably be the final major update under the name of ChainerRL. The development team is planning to switch its backend from Chainer to PyTorch and continue its development as OSS.
Important enhancements
- Soft Actor-Critic (https://arxiv.org/abs/1812.05905) with benchmark results is added.
- Agent class:
chainerrl.agents.SoftActorCritic
- Example and benchmark results (MuJoCo): https://github.com/chainer/chainerrl/tree/v0.8.0/examples/mujoco/reproduction/soft_actor_critic
- Example (Roboschool Atlas): https://github.com/chainer/chainerrl/tree/v0.8.0/examples/atlas
- Agent class:
- Trained models of benchmark results are now downloadable. See READMEs of examples.
- For Atari envs: DQN, IQN, Rainbow, A3C
- For MuJoCo envs: DDPG, PPO, TRPO, TD3, Soft Actor-Critic
- DQN-based agents now support recurrent models in a new, more efficient interface.
- TRPO now supports recurrent models and batch training.
- A variant of IQN with double Q-learning is added.
- Agent class:
chainerrl.agents.DoubleIQN
. - Example: https://github.com/chainer/chainerrl/tree/v0.8.0/examples/atari/train_double_iqn.py
- Agent class:
- IQN now supports prioritized experience replay.
Important bugfixes
- The bug that the update of
CategoricalDoubleDQN
is same as that ofCategoricalDQN
is fixed. - The bug that batch training with N-step or episodic replay buffers does not work is fixed.
- The bug that weight normalization is
PrioritizedReplayBuffer
withnormalize_by_max == 'batch'
is wrong is fixed.
Important destructive changes
- Support of Python 2 is dropped. ChainerRL is now only tested with Python 3.5.1+.
- The interface of DQN-based agents to use recurrent models has changed. See the DRQN example: https://github.com/chainer/chainerrl/tree/v0.8.0/examples/atari/train_drqn_ale.py
All updates
Enhancements
- Recurrent DQN families with a new interface (#436)
- Recurrent and batched TRPO (#446)
- Add Soft Actor-Critic agent (#457)
- Code to collect demonstrations from an agent. (#468)
- Monitor with ContinuingTimeLimit support (#491)
- Fix B007: Loop control variable not used within the loop body (#502)
- Double IQN (#503)
- Fix B006: Do not use mutable data structures for argument defaults. (#504)
- Splits Replay Buffers into separate files in a replay_buffers module (#506)
- Use chainer.grad in ACER (#511)
- Prioritized Double IQN (#518)
- Add policy loss to TD3's logged statistics (#524)
- Adds checkpoint frequencies for serial and batch Agents. (#525)
- Add a deterministic mode to IQN for stable tests (#529)
- Use Link.cleargrads instead of Link.zerograds in REINFORCE (#536)
- Use cupyx.scatter_add instead of cupy.scatter_add (#537)
- Avoid cupy.zeros_like with numpy.ndrray (#538)
- Use get_device_from_id since get_device is deprecated (#539)
- Releases trained models for all reproduced agents (#565)
Documentation
- Typo fix in Replay Buffer Docs (#507)
- Fixes typo in docstring for AsyncEvaluator (#508)
- Improve the algorithm list on README (#509)
- Add Explorers to Documentation (#514)
- Fixes syntax errors in ReplayBuffer docs. (#515)
- Adds policies to the documentation (#516)
- Adds demonstration collection to experiments docs (#517)
- Adds List of Batch Agents to the README (#543)
- Add documentation for Q-functions and some missing details in docstrings (#556)
- Add comment on environment version difference (#582)
- Adds ChainerRL Bibtex to the README (#584)
- Minor Typo Fix (#585)
Examples
- Rename examples directories (#487)
- Adds training times for reproduced Mujoco results (#497)
- Adds additional information to Grasping Example README (#501)
- Fixes a comment in PPO example (#521)
- Rainbow Scores (#546)
- Update train_a3c.py (#547, thanks @xinyuewang1!)
- Update train_a3c.py (#548, thanks @xinyuewang1!)
- Improves formatting of IQN training times (#549)
- Corrects Scores in Examples (#552)
- Removes GPU option from README (#564)
- Releases trained models for all reproduced agents (#565)
- Add an example script for RoboschoolAtlasForwardWalk-v1 (#577)
- Corrects Rainbow Results (#580)
- Adds proper A3C scores (#581)
Testing
- Add CI configs (#478)
- Specify ubuntu 16.04 for Travis CI and modify a dependency accordingly (#520)
- Remove a tailing space of DoubleIQN (#526)
- Add a deterministic mode to IQN for stable tests (#529)
- Fix import error when chainer==7.0.0b3 (#531)
- Make test_monitor.py work on flexCI (#533)
- Improve parameter distributions used in TestGaussianDistribution (#540)
- Increase flexCI's time limit to 20min (#550)
- decrease amount of decimal digits required to 4 (#554)
- Use attrs<19.2.0 with pytest (#569)
- Run slow tests with flexCI (#575)
- Typo fix in CI comment. (#576)
- Adds time to DDPG Tests (#587)
- Fix CI errors due to pyglet, zipp, mock, and gym (#592)
Bugfixes
- Fix a bug in
batch_recurrent_experiences
regarding next_action (#528) - Fix ValueError in SARSA with GPU (#534)
- fix function call (#541)
- Pass env_id to replay_buffer methods to fix batch training (#558)
- Fixes Categorical Double DQN Error. (#567)
- Fix weight normalization inside prioritized experience replay (#570)