LlamaGym: Online Reinforcement Learning for LLM-based agents. #731
Labels
AI-Agents
Autonomous AI agents using LLMs
Code-Interpreter
OpenAI Code-Interpreter
finetuning
Tools for finetuning of LLMs e.g. SFT or RLHF
github
gh tools like cli, Actions, Issues, Pages
llm
Large Language Models
LlamaGym/README.md at main · KhoomeiK/LlamaGym
DESCRIPTION:
Fine-tune LLM agents with online reinforcement learning
🔗 Agents for Web Data Extraction • 🐦 Twitter
LlamaGym
"Agents" originated in reinforcement learning, where they learn by interacting with an environment and receiving a reward signal. However, LLM-based agents today do not learn online (i.e. continuously in real time) via reinforcement.
OpenAI created Gym to standardize and simplify RL environments, but if you try dropping an LLM-based agent into a Gym environment for training, you'd find it's still quite a bit of code to handle LLM conversation context, episode batches, reward assignment, PPO setup, and more.
LlamaGym seeks to simplify fine-tuning LLM agents with RL. Right now, it's a single
Agent
abstract class that handles all the issues mentioned above, letting you quickly iterate and experiment with agent prompting & hyperparameters across any Gym environment.Usage
Fine-tuning an LLM-based agent to play in a Gym-style environment with RL has never been easier! Once you install LlamaGym...
First, implement 3 abstract methods on the Agent class:
Then, define your base LLM (as you would for any fine-tuning job) and instantiate your agent:
Finally, write your RL loop as usual and simply call your agent to act, reward, and terminate:
Some reminders:
examples/blackjack.py
Relevant Work
Citation
URL: LlamaGym README
Suggested labels
The text was updated successfully, but these errors were encountered: