This repository provides a PyTorch implementation and pre-trained models for Meta Motivo. For details see the paper Zero-Shot Whole-Body Humanoid Control via Behavioral Foundation Models.
- We provide 6 pretrained FB-CPR models for controlling the humanoid model defined in HumEnv.
- Fully reproducible scripts for evaluating the model in HumEnv
- Training code for FB and FB-CPR algorithms
The project is pip installable in your environment.
pip install "metamotivo[huggingface,humenv] @ git+https://github.com/facebookresearch/metamotivo.git"
It requires Python 3.10+ and has only two dependencies: torch >= 2
, safetensors
. Optional dependencies include humenv["bench"]
and huggingface_hub
for testing/training and loading models from HuggingFace.
For reproducibility, we provide all the 5 models (metamotivo-S-X) we trained for producing the results in the paper, where each model is trained using a different random seed. We also provide our largest and most performant model (metamotivo-M-1), which can also be interactively tested in our demo.
Model | # of params | Download |
---|---|---|
metamotivo-S-1 | 24.5M | link |
metamotivo-S-2 | 24.5M | link |
metamotivo-S-3 | 24.5M | link |
metamotivo-S-4 | 24.5M | link |
metamotivo-S-5 | 24.5M | link |
metamotivo-M-1 | 288M | link |
Once the library is installed, you can easily create an FB-CPR agent and download a pre-trained model from the Hugging Face hub. Note that the model is an instance of torch.nn.Module
and by default it is initialized in "inference" mode (no_grad and eval mode).
We provide some simple code snippets to demonstrate how to use the model below. For more detailed examples, see our tutorials on interacting with the model, running an evaluation, and training from scratch.
The following code snippet shows how to instantiate the model.
from metamotivo.fb_cpr.huggingface import FBcprModel
model = FBcprModel.from_pretrained("facebook/metamotivo-S-1")
For each model we provide:
- The training buffer (that can be used for inference or offline training)
- A small reward inference buffer (that contains the minimum amount of information for doing reward inference)
from huggingface_hub import hf_hub_download
import h5py
local_dir = "metamotivo-S-1-datasets"
dataset = "buffer_inference_500000.hdf5" # a smaller buffer that can be used for reward inference
# dataset = "buffer.hdf5" # the full training buffer of the model
buffer_path = hf_hub_download(
repo_id="facebook/metamotivo-S-1",
filename=f"data/{dataset}",
repo_type="model",
local_dir=local_dir,
)
hf = h5py.File(buffer_path, "r")
print(hf.keys())
# create a DictBuffer object that can be used for sampling
data = {k: v[:] for k, v in hf.items()}
buffer = DictBuffer(capacity=data["qpos"].shape[0], device="cpu")
buffer.extend(data)
The FB-CPR model contains several networks:
- forward net
- backward net
- critic net
- discriminator net
- actor net
We provide functions for evaluating these networks
def backward_map(self, obs: torch.Tensor) -> torch.Tensor: ...
def forward_map(self, obs: torch.Tensor, z: torch.Tensor, action: torch.Tensor) -> torch.Tensor: ...
def actor(self, obs: torch.Tensor, z: torch.Tensor, std: float) -> torch.Tensor: ...
def critic(self, obs: torch.Tensor, z: torch.Tensor, action: torch.Tensor) -> torch.Tensor: ...
def discriminator(self, obs: torch.Tensor, z: torch.Tensor) -> torch.Tensor: ...
We also provide simple functions for prompting the model and obtaining a context vector z
representing the task to execute.
#reward prompt (standard and weighted regression)
def reward_inference(self, next_obs: torch.Tensor, reward: torch.Tensor, weight: torch.Tensor | None = None,) -> torch.Tensor: ...
def reward_wr_inference(self, next_obs: torch.Tensor, reward: torch.Tensor) -> torch.Tensor: ...
#goal prompt
def goal_inference(self, next_obs: torch.Tensor) -> torch.Tensor: ...
#tracking prompt
def tracking_inference(self, next_obs: torch.Tensor) -> torch.Tensor:
Once we have a context vector z
we can call the actor to get actions. We provide a function for acting in the environment with a standard interface.
def act(self, obs: torch.Tensor, z: torch.Tensor, mean: bool = True) -> torch.Tensor:
Note that these functions do not allow gradient computation and use eval mode since they are expected to be used for inference (torch.no_grad()
and model.eval()
). For training, you should directly access the class attributes. For training we also define target networks for the forward, backward and critic networks.
This is the minimal example on how to execute a random policy
from humenv import make_humenv
from gymnasium.wrappers import FlattenObservation, TransformObservation
import torch
from metamotivo.fb_cpr.huggingface import FBcprModel
device = "cpu"
env, _ = make_humenv(
num_envs=1,
wrappers=[
FlattenObservation,
lambda env: TransformObservation(
env, lambda obs: torch.tensor(obs.reshape(1, -1), dtype=torch.float32, device=device), env.observation_space # For gymnasium <1.0.0 remove the last argument: env.observation_space
),
],
state_init="Default",
)
model = FBcprModel.from_pretrained("facebook/metamotivo-S-1")
model.to(device)
z = model.sample_z(1)
observation, _ = env.reset()
for i in range(10):
action = model.act(observation, z, mean=True)
observation, reward, terminated, truncated, info = env.step(action.cpu().numpy().ravel())
For reproducibility of the paper, we provide a way of evaluating the models using HumEnv
. We provide wrappers that can be used to interface Meta Motivo with humenv.bench
reward, goal and tracking evaluation.
Here is an example of how to use the wrappers for reward evaluation:
from metamotivo.fb_cpr.huggingface import FBcprModel
from metamotivo.wrappers.humenvbench import RewardWrapper
import humenv.bench
model = FBcprModel.from_pretrained("facebook/metamotivo-S-1")
# this enable reward relabeling and context inference
model = RewardWrapper(
model=model,
inference_dataset=buffer, # see above how to download and create a buffer
num_samples_per_inference=100_000,
inference_function="reward_wr_inference",
max_workers=80,
)
# create the evaluation from humenv
reward_eval = humenv.bench.RewardEvaluation(
tasks=["move-ego-0-0"],
env_kwargs={
"state_init": "Default",
},
num_contexts=1,
num_envs=50,
num_episodes=100
)
scores = reward_eval.run(model)
You can do the same for the other evaluations provided in humenv.bench
. Please refer to examples/humenv_evaluation.py
for a full evaluation loop.
@article{tirinzoni2024metamotivo,
title={Zero-shot Whole-Body Humanoid Control via Behavioral Foundation Models},
author={Tirinzoni, Andrea and Touati, Ahmed and Farebrother, Jesse and Guzek, Mateusz and Kanervisto, Anssi and Xu, Yingchen and Lazaric, Alessandro and Pirotta, Matteo},
}
Meta Motivo is licensed under the CC BY-NC 4.0 license. See LICENSE for details.