Important enhancements

Implicit Quantile Network (IQN) https://arxiv.org/abs/1806.06923 agent is added: chainerrl.agents.IQN.
Training DQN and its variants with N-step returns is supported.
Resetting env with done=False via info dict is supported. When env.step returns a info dict with info['needs_reset']=True, env is reset. This feature is useful for implementing a continuing env.
Evaluation with a fixed number of timesteps is supported (except async training). This evaluation protocol is popular in Atari benchmarks.
- examples/atari/dqn now implements the same evaluation protocol as the Nature DQN paper.
An example script of training a DoubleDQN agent for a PyBullet-based robotic grasping env is added: examples/grasping.

Important bugfixes

The bug that PPO's obs_normalizer was not saved is fixed.
The bug that NonbiasWeightDecay didn't work with newer versions of Chainer is fixed.
The bug that argv argument was ignored by chainerrl.experiments.prepare_output_dir is fixed.

train_agent_with_evaluation and train_agent_batch_with_evaluation now require eval_n_steps (number of timesteps for each evaluation phase) and eval_n_episodes (number of episodes for each evaluation phase) to be explicitly specified, with one of them being None.
train_agent_with_evaluation's max_episode_len argument is renamed to train_max_episode_len.
ReplayBuffer.sample now returns a list of lists of N experiences to support N-step returns.