Deploy a trained PPO Agent #87

chadb56 · 2023-07-11T20:35:40Z

chadb56
Jul 11, 2023

First of all, thanks for the great RL library!

Do you have a suggested method for loading a trained agent after the training and eval process have been completed?
My goal is to have a light weight model for a "production" environment.

I'm currently trining a PPO agent and my understanding is that after a successful training I don't need the value model anymore and should just need the policy model for inference. What would be the best way to just load the policy model?

I've used Stable Baselines3 in the past, and once the policy was trained I could load the model and interact with the env with the trained agent like this.

model = PPO.load(model_path, env=env, **kwargs)

for ep in range(n_test_episodes):
    obs = env.reset()
    done = False
    while not done:
        action, lstm_states = model.predict(obs, deterministic=True)
        obs, reward, done, infos = env.step(action)
        env.render("human")

env.close()

Thanks for your help.

Answered by Toni-SM

Jul 13, 2023

Hi @chadb56

First, you can train the agent and save each model/optimizer/preprocessor separately by enabling "store_separately" in agent configuration as described in skrl docs Saving checkpoints.

Second, to load and use the checkpoints in a minimal setup it is necessary to take into account whether input processors (for the states) were used or not, since the processors modify the states that are passed to the models.

with input preprocessors:

- define model (only policy)
- instantiate policy and preprocessor
- load policy and preprocessor checkpoints
# use the preprocessor + policy

without input preprocessors:

- define model (only policy)
- instantiate policy
- load policy checkpoi…

View full answer

Toni-SM · 2023-07-13T08:18:55Z

Toni-SM
Jul 13, 2023
Maintainer

Hi @chadb56

First, you can train the agent and save each model/optimizer/preprocessor separately by enabling "store_separately" in agent configuration as described in skrl docs Saving checkpoints.

Second, to load and use the checkpoints in a minimal setup it is necessary to take into account whether input processors (for the states) were used or not, since the processors modify the states that are passed to the models.

with input preprocessors:

- define model (only policy)
- instantiate policy and preprocessor
- load policy and preprocessor checkpoints
# use the preprocessor + policy

without input preprocessors:

- define model (only policy)
- instantiate policy
- load policy checkpoint
# use the policy

In the example.zip file you can find the code and trained checkpoints for 1) training PPO agent in IsaacGymEnvs Cartpole environment without shared model and 2) run a minimal evaluation (only the policy and preprocessor) using the code shown bellow.

example.zip

The minimal example code is divided in 3 parts:

load the environment
define policy, instantiate policy and preprocessor, and load policy and preprocessor checkpoints
run manual interaction with the environment

import isaacgym

import torch
import torch.nn as nn

#=================== environment =================

# load and wrap the Isaac Gym environment
from skrl.envs.torch import wrap_env
from skrl.envs.torch import load_isaacgym_env_preview4
env = load_isaacgym_env_preview4(task_name="Cartpole")
env = wrap_env(env)

device = env.device


#=================== policy and state-preprocessor =================

# import the skrl components to build the RL system
from skrl.models.torch import Model, GaussianMixin
from skrl.resources.preprocessors.torch import RunningStandardScaler

# define only the policy
class Policy(GaussianMixin, Model):
    def __init__(self, observation_space, action_space, device, clip_actions=False,
                 clip_log_std=True, min_log_std=-20, max_log_std=2, reduction="sum"):
        Model.__init__(self, observation_space, action_space, device)
        GaussianMixin.__init__(self, clip_actions, clip_log_std, min_log_std, max_log_std, reduction)

        self.net = nn.Sequential(nn.Linear(self.num_observations, 32),
                                 nn.ELU(),
                                 nn.Linear(32, 32),
                                 nn.ELU(),
                                 nn.Linear(32, self.num_actions))
        self.log_std_parameter = nn.Parameter(torch.zeros(self.num_actions))

    def compute(self, inputs, role):
        return self.net(inputs["states"]), self.log_std_parameter, {}

# instantiate state-preprocessor and the policy
state_preprocessor = RunningStandardScaler(env.observation_space, device=device)
policy = Policy(env.observation_space, env.action_space, device=device).to(env.device)

# load checkpoints
state_preprocessor.load_state_dict(torch.load("state_preprocessor_1600.pt"))
policy.load("policy_1600.pt")  # same as policy.load_state_dict(torch.load("policy_1600.pt"))


#=================== manual interaction with the environment =================

# manual evaluation
states, infos = env.reset()

for i in range(1000):    
    # state-preprocessor + policy
    with torch.no_grad():
        states = state_preprocessor(states)
        actions = policy.act({"states": states})[0]
    
    # step the environment
    next_states, rewards, terminated, truncated, infos = env.step(actions)
    
    # render the environment
    env.render()

    # check for termination/truncation
    if terminated.any() or truncated.any():
        states, infos = env.reset()
    else:
        states = next_states

1 reply

chadb56 Jul 17, 2023
Author

Thanks! This is exactly what I was looking for.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deploy a trained PPO Agent #87

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 1 reply

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Deploy a trained PPO Agent #87

chadb56 Jul 11, 2023

Replies: 1 comment · 1 reply

Toni-SM Jul 13, 2023 Maintainer

chadb56 Jul 17, 2023 Author

chadb56
Jul 11, 2023

Replies: 1 comment 1 reply

Toni-SM
Jul 13, 2023
Maintainer

chadb56 Jul 17, 2023
Author