Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LlamaGym: Online Reinforcement Learning for LLM-based agents. #731

Open
1 task
irthomasthomas opened this issue Mar 16, 2024 · 1 comment
Open
1 task
Labels
AI-Agents Autonomous AI agents using LLMs Code-Interpreter OpenAI Code-Interpreter finetuning Tools for finetuning of LLMs e.g. SFT or RLHF github gh tools like cli, Actions, Issues, Pages llm Large Language Models

Comments

@irthomasthomas
Copy link
Owner

LlamaGym/README.md at main · KhoomeiK/LlamaGym

DESCRIPTION:

Fine-tune LLM agents with online reinforcement learning

Python Version

🔗 Agents for Web Data Extraction   •   🐦 Twitter

LlamaGym

"Agents" originated in reinforcement learning, where they learn by interacting with an environment and receiving a reward signal. However, LLM-based agents today do not learn online (i.e. continuously in real time) via reinforcement.

OpenAI created Gym to standardize and simplify RL environments, but if you try dropping an LLM-based agent into a Gym environment for training, you'd find it's still quite a bit of code to handle LLM conversation context, episode batches, reward assignment, PPO setup, and more.

LlamaGym seeks to simplify fine-tuning LLM agents with RL. Right now, it's a single Agent abstract class that handles all the issues mentioned above, letting you quickly iterate and experiment with agent prompting & hyperparameters across any Gym environment.

Usage

Fine-tuning an LLM-based agent to play in a Gym-style environment with RL has never been easier! Once you install LlamaGym...

pip install llamagym

First, implement 3 abstract methods on the Agent class:

from llamagym import Agent

class BlackjackAgent(Agent):
    def get_system_prompt(self) -> str:
        return "You are an expert blackjack player."

    def format_observation(self, observation) -> str:
        return f"Your current total is {observation[0]}"

    def extract_action(self, response: str):
        return 0 if "stay" in response else 1

Then, define your base LLM (as you would for any fine-tuning job) and instantiate your agent:

model = AutoModelForCausalLMWithValueHead.from_pretrained("Llama-2-7b").to(device)
tokenizer = AutoTokenizer.from_pretrained("Llama-2-7b")
agent = BlackjackAgent(model, tokenizer, device)

Finally, write your RL loop as usual and simply call your agent to act, reward, and terminate:

env = gym.make("Blackjack-v1")

for episode in trange(5000):
    observation, info = env.reset()
    done = False

    while not done:
        action = agent.act(observation) # act based on observation
        observation, reward, terminated, truncated, info = env.step(action)
        agent.assign_reward(reward) # provide reward to agent
        done = terminated or truncated

    train_stats = agent.terminate_episode() # trains if batch is full

Some reminders:

  • above code snippets are mildly simplified above but a fully working example is available in examples/blackjack.py
  • getting online RL to converge is notoriously difficult so you'll have to mess with hyperparameters to see improvement
    • your model may also benefit from a supervised fine-tuning stage on sampled trajectories before running RL (we may add this feature in the future)
  • our implementation values simplicity so is not as compute efficient as e.g. Lamorel, but easier to start playing around with
  • LlamaGym is a weekend project and still a WIP, but we love contributions!

Relevant Work

Citation

bibtex
@misc{pandey2024llamagym,
  title        = {LlamaGym: Fine-tune LLM agents with Online Reinforcement Learning},
  author       = {Rohan Pandey},
  year         = {2024},
  howpublished = {GitHub},
  url          = {https://github.com/KhoomeiK/LlamaGym}
}

URL: LlamaGym README

Suggested labels

@irthomasthomas irthomasthomas added AI-Agents Autonomous AI agents using LLMs Code-Interpreter OpenAI Code-Interpreter finetuning Tools for finetuning of LLMs e.g. SFT or RLHF github gh tools like cli, Actions, Issues, Pages llm Large Language Models labels Mar 16, 2024
@irthomasthomas
Copy link
Owner Author

Related content

#681

Similarity score: 0.9

#628

Similarity score: 0.89

#494

Similarity score: 0.87

#665

Similarity score: 0.86

#499

Similarity score: 0.86

#706

Similarity score: 0.86

@irthomasthomas irthomasthomas changed the title LlamaGym/README.md at main · KhoomeiK/LlamaGym LlamaGym Online Reinforcement Learning for LLM-based agents. Mar 16, 2024
@irthomasthomas irthomasthomas changed the title LlamaGym Online Reinforcement Learning for LLM-based agents. LlamaGym: Online Reinforcement Learning for LLM-based agents. Mar 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
AI-Agents Autonomous AI agents using LLMs Code-Interpreter OpenAI Code-Interpreter finetuning Tools for finetuning of LLMs e.g. SFT or RLHF github gh tools like cli, Actions, Issues, Pages llm Large Language Models
Projects
None yet
Development

No branches or pull requests

1 participant