You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi. I'm currently developing some other algorithm using TRPO. I found that the training keeps failing after certain epochs because of CUDA out of memory error. At first I thought this was my fault, but I figured out that example code that is given in trpo_pendulum.py also leaks memory too. I guess this happens on policy optimization step when calculating Hessian, but I'm not sure. This is example code that I modified from trpo_pendulum.py I used for gpu utilization.
#!/usr/bin/env python3
"""This is an example to train a task with TRPO algorithm (PyTorch).
Here it runs InvertedDoublePendulum-v2 environment with 100 iterations.
"""
import torch
from garage import wrap_experiment
from garage.envs import GymEnv
from garage.experiment.deterministic import set_seed
from garage.sampler import LocalSampler
from garage.torch.algos import TRPO
from garage.torch.policies import GaussianMLPPolicy
from garage.torch.value_functions import GaussianMLPValueFunction
from garage.trainer import Trainer
from garage.torch import set_gpu_mode
set_gpu_mode(False)
torch.set_num_threads(1)
if torch.cuda.is_available():
set_gpu_mode(True)
torch.multiprocessing.set_start_method('spawn')
@wrap_experiment
def trpo_pendulum(ctxt=None, seed=1):
"""Train TRPO with InvertedDoublePendulum-v2 environment.
Args:
ctxt (garage.experiment.ExperimentContext): The experiment
configuration used by Trainer to create the snapshotter.
seed (int): Used to seed the random number generator to produce
determinism.
"""
set_seed(seed)
env = GymEnv('InvertedDoublePendulum-v2')
trainer = Trainer(ctxt)
policy = GaussianMLPPolicy(env.spec,
hidden_sizes=[32, 32],
hidden_nonlinearity=torch.tanh,
output_nonlinearity=None)
value_function = GaussianMLPValueFunction(env_spec=env.spec,
hidden_sizes=(32, 32),
hidden_nonlinearity=torch.tanh,
output_nonlinearity=None)
sampler = LocalSampler(agents=policy,
envs=env,
max_episode_length=env.spec.max_episode_length)
algo = TRPO(env_spec=env.spec,
policy=policy,
value_function=value_function,
sampler=sampler,
discount=0.99,
center_adv=False)
if torch.cuda.is_available():
algo.to()
trainer.setup(algo, env)
trainer.train(n_epochs=100, batch_size=1024, plot=True)
trpo_pendulum(seed=1)
And this is added function to() in torch/algos/trpo.py;
def to(self, device=None):
"""Put all the networks within the model on device.
Args:
device (str): ID of GPU or CPU.
"""
from garage.torch import global_device
if device is None:
device = global_device()
logger.log('Using device: ' + str(device))
self.policy = self.policy.to(device)
self._old_policy = self._old_policy.to(device)
self._value_function = self._value_function.to(device)
And I modified the tensors of _train_once of torch/algos/vpg.py to use gpu like following ;
Then I checked the memory of each epoch in function train of torch/algos/vpg.py like this;
def train(self, trainer):
"""Obtain samplers and start actual training for each epoch.
Args:
trainer (Trainer): Gives the algorithm the access to
:method:`~Trainer.step_epochs()`, which provides services
such as snapshotting and sampler control.
Returns:
float: The average return in last epoch cycle.
"""
last_return = None
for _ in trainer.step_epochs():
for _ in range(self._n_samples):
eps = trainer.obtain_episodes(trainer.step_itr)
last_return = self._train_once(trainer.step_itr, eps)
trainer.step_itr += 1
print(torch.cuda.memory_allocated())
And the result of the printing kept gradually increasing like 66560 -> 74752 -> 82944 -> 99328 -> 107520 -> 115712 ... for each epoch.
If I'm doing wrong, please give me a help ! Thanks :)
The text was updated successfully, but these errors were encountered:
Hi. I'm currently developing some other algorithm using TRPO. I found that the training keeps failing after certain epochs because of CUDA out of memory error. At first I thought this was my fault, but I figured out that example code that is given in trpo_pendulum.py also leaks memory too. I guess this happens on policy optimization step when calculating Hessian, but I'm not sure. This is example code that I modified from trpo_pendulum.py I used for gpu utilization.
And this is added function to() in torch/algos/trpo.py;
And I modified the tensors of _train_once of torch/algos/vpg.py to use gpu like following ;
Then I checked the memory of each epoch in function train of torch/algos/vpg.py like this;
And the result of the printing kept gradually increasing like 66560 -> 74752 -> 82944 -> 99328 -> 107520 -> 115712 ... for each epoch.
If I'm doing wrong, please give me a help ! Thanks :)
The text was updated successfully, but these errors were encountered: