You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If an episode is truncated by the time limit wrapper, the last discount in that episode is set to 1.0 instead of 0.0. As a result, both the reward and discount calculation spill over into the next episode and give incorrect values. The next observation is also taken from the end of the trajectory, but it should come from the end of the episode instead.
Please run the following code to reproduce the issue. In this code, both "short" and "long" trajectories should give exactly the same result because the episode was truncated.
If an episode is truncated by the time limit wrapper, the last discount in that episode is set to 1.0 instead of 0.0. As a result, both the reward and discount calculation spill over into the next episode and give incorrect values. The next observation is also taken from the end of the trajectory, but it should come from the end of the episode instead.
Please run the following code to reproduce the issue. In this code, both "short" and "long" trajectories should give exactly the same result because the episode was truncated.
The text was updated successfully, but these errors were encountered: