-
Hi @Toni-SM First of all, thanks for the great RL library! Do you have a suggested method for loading a trained agent after the training and eval process have been completed? I'm currently trining a PPO agent and my understanding is that after a successful training I don't need the value model anymore and should just need the policy model for inference. What would be the best way to just load the policy model? I've used Stable Baselines3 in the past, and once the policy was trained I could load the model and interact with the env with the trained agent like this. model = PPO.load(model_path, env=env, **kwargs)
for ep in range(n_test_episodes):
obs = env.reset()
done = False
while not done:
action, lstm_states = model.predict(obs, deterministic=True)
obs, reward, done, infos = env.step(action)
env.render("human")
env.close() Thanks for your help. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
Hi @chadb56 First, you can train the agent and save each model/optimizer/preprocessor separately by enabling Second, to load and use the checkpoints in a minimal setup it is necessary to take into account whether input processors (for the states) were used or not, since the processors modify the states that are passed to the models.
In the example.zip file you can find the code and trained checkpoints for 1) training PPO agent in IsaacGymEnvs Cartpole environment without shared model and 2) run a minimal evaluation (only the policy and preprocessor) using the code shown bellow. The minimal example code is divided in 3 parts:
import isaacgym
import torch
import torch.nn as nn
#=================== environment =================
# load and wrap the Isaac Gym environment
from skrl.envs.torch import wrap_env
from skrl.envs.torch import load_isaacgym_env_preview4
env = load_isaacgym_env_preview4(task_name="Cartpole")
env = wrap_env(env)
device = env.device
#=================== policy and state-preprocessor =================
# import the skrl components to build the RL system
from skrl.models.torch import Model, GaussianMixin
from skrl.resources.preprocessors.torch import RunningStandardScaler
# define only the policy
class Policy(GaussianMixin, Model):
def __init__(self, observation_space, action_space, device, clip_actions=False,
clip_log_std=True, min_log_std=-20, max_log_std=2, reduction="sum"):
Model.__init__(self, observation_space, action_space, device)
GaussianMixin.__init__(self, clip_actions, clip_log_std, min_log_std, max_log_std, reduction)
self.net = nn.Sequential(nn.Linear(self.num_observations, 32),
nn.ELU(),
nn.Linear(32, 32),
nn.ELU(),
nn.Linear(32, self.num_actions))
self.log_std_parameter = nn.Parameter(torch.zeros(self.num_actions))
def compute(self, inputs, role):
return self.net(inputs["states"]), self.log_std_parameter, {}
# instantiate state-preprocessor and the policy
state_preprocessor = RunningStandardScaler(env.observation_space, device=device)
policy = Policy(env.observation_space, env.action_space, device=device).to(env.device)
# load checkpoints
state_preprocessor.load_state_dict(torch.load("state_preprocessor_1600.pt"))
policy.load("policy_1600.pt") # same as policy.load_state_dict(torch.load("policy_1600.pt"))
#=================== manual interaction with the environment =================
# manual evaluation
states, infos = env.reset()
for i in range(1000):
# state-preprocessor + policy
with torch.no_grad():
states = state_preprocessor(states)
actions = policy.act({"states": states})[0]
# step the environment
next_states, rewards, terminated, truncated, infos = env.step(actions)
# render the environment
env.render()
# check for termination/truncation
if terminated.any() or truncated.any():
states, infos = env.reset()
else:
states = next_states |
Beta Was this translation helpful? Give feedback.
Hi @chadb56
First, you can train the agent and save each model/optimizer/preprocessor separately by enabling
"store_separately"
in agent configuration as described in skrl docs Saving checkpoints.Second, to load and use the checkpoints in a minimal setup it is necessary to take into account whether input processors (for the states) were used or not, since the processors modify the states that are passed to the models.
with input preprocessors:
without input preprocessors: