Skip to content

Latest commit

 

History

History
452 lines (353 loc) · 23.4 KB

README.md

File metadata and controls

452 lines (353 loc) · 23.4 KB

Python 3.9+ License huggingface discord GitHub star chart

Paper | Model Instruction | Framework | Installation | Train | Benchmarks | Acknowledgement


🎉 News

This repo is for research purposes only.

Autonomous agents powered by large language models (LLMs) have garnered significant research attention. However, fully harnessing the potential of LLMs for agent-based tasks presents inherent challenges due to the heterogeneous nature of diverse data sources featuring multi-turn trajectories.

This repo introduces xLAM that aggregates agent trajectories from distinct environments, spanning a wide array of scenarios. It standardizes and unifies these trajectories into a consistent format, streamlining the creation of a generic data loader optimized for agent training. Leveraging the data unification, our training pipeline maintains equilibrium across different data sources and preserves independent randomness across devices during dataset partitioning and model training.



Model Instruction

Model # Total Params Context Length Release Date Category Download Model Download GGUF files
xLAM-7b-r 7.24B 32k Sep. 5, 2024 General, Function-calling 🤗 Link --
xLAM-8x7b-r 46.7B 32k Sep. 5, 2024 General, Function-calling 🤗 Link --
xLAM-8x22b-r 141B 64k Sep. 5, 2024 General, Function-calling 🤗 Link --
xLAM-1b-fc-r 1.35B 16k July 17, 2024 Function-calling 🤗 Link 🤗 Link
xLAM-7b-fc-r 6.91B 4k July 17, 2024 Function-calling 🤗 Link 🤗 Link
xLAM-v0.1-r 46.7B 32k Mar. 18, 2024 General, Function-calling 🤗 Link --

If you already know Mixtral, xLAM series are a significant upgrade and better at many things including general tasks and function calling. For the same number of parameters, the model have been fine-tuned across a wide range of agent tasks and scenarios, all while preserving the capabilities of the original model.

For example, xLAM-v0.1-r represents the version 0.1 of the Large Action Model series, with the "-r" indicating it's tagged for research. This model is compatible with VLLM and FastChat platforms.

Below is one example on using the older xLAM-v0.1-r model:

from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("Salesforce/xLAM-v0.1-r")
model = AutoModelForCausalLM.from_pretrained("Salesforce/xLAM-v0.1-r", device_map="auto")

messages = [
    {"role": "user", "content": "What is your favourite condiment?"},
    {"role": "assistant", "content": "Well, I'm quite partial to a good squeeze of fresh lemon juice. It adds just the right amount of zesty flavour to whatever I'm cooking up in the kitchen!"},
    {"role": "user", "content": "Do you have mayonnaise recipes?"}
]

inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to("cuda")

outputs = model.generate(inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Note: You may need to tune the Temperature setting for different applications. Typically, a lower Temperature is helpful for tasks that require deterministic outcomes. Additionally, for tasks demanding adherence to specific formats or function calls, explicitly including formatting instructions is advisable and important.

Deploying and Interacting with xLAM Models

‼️ Check the latest examples and tokenizer info on interacting with xLAM models.

There are two main options for serving the xLAM model as an OpenAI-compatible chat completion API (here we use Salesforce/xLAM-8x7b-r and 4xA100 (40GB) setup as an example):

Option 1: Using vLLM (Recommended)

vLLM offers efficient serving with lower latency. To serve the model with vLLM:

vllm serve Salesforce/xLAM-8x7b-r --host 0.0.0.0 --port 8000 --tensor-parallel-size 4

Option 2: Using FastChat

FastChat provides a more feature-rich serving setup. To serve with FastChat:

  1. Start the controller:
python3 -m fastchat.serve.controller --host 0.0.0.0
  1. Start the OpenAI-compatible API server:
python3 -m fastchat.serve.openai_api_server --host 0.0.0.0 --port 8000
  1. Launch the model worker:
python3 -m fastchat.serve.vllm_worker \
       --model-names "Salesforce/xLAM-8x7b-r" \
       --model-path Salesforce/xLAM-8x7b-r \
       --host 0.0.0.0 \
       --port 31005 \
       --worker-address http://localhost:31001 \
       --num-gpus 4 \
       --limit-worker-concurrency 64

Using the Chat Completion API

Once the model is served, you can use the following xLAM client to interact with it for function calling or other applications:

from xLAM.client import xLAMChatCompletion, xLAMConfig

# Configure the client
config = xLAMConfig(base_url="http://localhost:8000/v1/", model="Salesforce/xLAM-8x7b-r")
llm = xLAMChatCompletion.from_config(config)

# Example conversation
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "What's the weather like in New York?"},
    {"role": "assistant", "content": "To get the weather information for New York, I'll need to use the get_weather function.", "tool_calls": {"name": "get_weather", "arguments": '{"location": "New York", "unit": "fahrenheit"}'}},
    {"role": "tool", "name": "get_weather", "content": '{"temperature": 72, "description": "Partly cloudy"}'},
    {"role": "user", "content": "Now, search for the weather in San Francisco."}
]

# Example function definition (optional)
tools = [
    {
        "name": "get_weather",
        "description": "Get the current weather for a location",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {"type": "string", "description": "The city and state, e.g. San Francisco, New York"},
                "unit": {"type": "string", "enum": ["celsius", "fahrenheit"], "description": "The unit of temperature to return"}
            },
            "required": ["location"]
        }
    },
    {
        "name": "search",
        "description": "Search for information on the internet",
        "parameters": {
            "type": "object",
            "properties": {
                "query": {"type": "string", "description": "The search query, e.g. 'latest news on AI'"}
            },
            "required": ["query"]
        }
    },
    {
        "name": "respond",
        "description": "When you are ready to respond, use this function. This function allows the assistant to formulate and deliver appropriate replies based on the input message and the context of the conversation. Generate a concise response for simple questions, and a more detailed response for complex questions.",
        "parameters": {
            "type": "object",
            "properties": {
                "message": {"type": "string", "description": "The content of the message to respond to."}
            },
            "required": ["message"]
        }
    }
]

response = llm.completion(messages, tools=tools)
print(response)

Framework

A unified data formatting and streaming loader.

from fm_datasets import webshop_multi_turn_v2
from fm_utils.seed_random import init_device_seed
from fm_utils.interleave_datasets import interleave_data


sft_webshop_multi_turn = webshop_multi_turn_v2.SFTWebShopMultiTurnV2(tokenizer, script_args)

seed = init_device_seed(seed=42)

train_dataset, eval_dataset = \
    interleave_data(
        data_objects=[sft_webshop_multi_turn],
        sample_probs=[1.0],
        return_type="prompt_answer",
        seq_length=4096,
        seed=seed)

Supervised fine tuning and DPO fine tuning.

We have SFT trainer v1 and v2lite, where v1 is more based on trl module optimized for LoRA while v2lite is starting from scratch with Accelerator optimized for fully-finetuning. They share almost the same interface.

from xLAM.fm_utils.derived_data_collator import DataCollatorForPromptAnswer
from xLAM.fm_trainers.sft_foundation_trainer import SFTFoundationTrainer
from xLAM.train.fm_trainers.sft_foundation_trainer_lite import SFTFoundationTrainerLite, prepare_accelerator

script_args = parser.parse_args_into_dataclasses()[0]

collator = DataCollatorForPromptAnswer(
    instruction_template=instruction_template_ids,
    response_template=response_template_ids,
    tokenizer=tokenizer,
    mlm=False)

# v2 trainer

accelerator = prepare_accelerator(script_args)
trainer = SFTFoundationTrainerLite(
        args=script_args,
        accelerator=accelerator,
        train_dataset=train_dataset,
        eval_dataset=eval_dataset,
        collator=collator,
    )

trainer.train()

Installation

You can use our configured docker environment gcr.io/salesforce-research-internal/xlam-2024-02-14, and one example yaml file is shown at envs_config. Then, you can pip install -e . --no-dependencies

Or, you can directly pip install -e .. There is a chance that your configured environment might have some error.

Train

You can refer to the complete example scripts to learn more details

Or you can simply run the bash scripts to have a quick start for our example

for v1:

nohup accelerate launch --config_file xLAM/train/scripts/multi_gpu.yaml xLAM/train/scripts/sft_train_model_v1.py --model_name mistralai/Mixtral-8x7B-Instruct-v0.1 --seq_length 4096 --run_name sft_mixtral8X7B_v2_02072024 --output_dir {path} > sft_mixtral8X7B_v2_02072024.nohup 2>&1 &

for v2:

source xLAM/train/scripts/model_run_v2lite_full.sh

🏆 Benchmarks

Berkeley Function-Calling Leaderboard (BFCL)



Webshop

LLM NameZSZSTReaActPlanActPlanReActBOLAA
Llama-2-70B-chat 0.0089 0.01020.42730.28090.39660.4986
Vicuna-33B 0.1527 0.21220.19710.37660.40320.5618
Mixtral-8x7B-Instruct-v0.1 0.4634 0.45920.56380.47380.33390.5342
GPT-3.5-Turbo 0.4851 0.50580.50470.49300.54360.6354
GPT-3.5-Turbo-Instruct 0.3785 0.41950.43770.36040.48510.5811
GPT-4-06130.50020.4783 0.46160.79500.46350.6129
xLAM-v0.1-r0.52010.52680.64860.65730.66110.6556

HotpotQA

LLM NameZSZSTReaActPlanActPlanReAct
Mixtral-8x7B-Instruct-v0.1 0.3912 0.39710.37140.31950.3039
GPT-3.5-Turbo 0.4196 0.39370.38680.41820.3960
GPT-4-06130.58010.5709 0.61290.57780.5716
xLAM-v0.1-r0.54920.47760.50200.55830.5030

Please note: All prompts provided by AgentLite are considered "unseen prompts" for xLAM-v0.1-r, meaning the model has not been trained with data related to these prompts.

Webshop

LLM NameActReActBOLAA
GPT-3.5-Turbo-16k 0.6158 0.60050.6652
GPT-4-06130.6989 0.67320.7154
xLAM-v0.1-r0.65630.66400.6854

HotpotQA

EasyMediumHard
LLM NameF1 ScoreAccuracyF1 ScoreAccuracyF1 ScoreAccuracy
GPT-3.5-Turbo-16k-0613 0.410 0.3500.3300.250.2830.20
GPT-4-06130.6110.47 0.6100.4800.5270.38
xLAM-v0.1-r0.5320.450.5470.460.4550.36
LLM NameUnseen Insts & Same SetUnseen Tools & Seen CatUnseen Tools & Unseen Cat
TooLlama V2 0.4385 0.43000.4350
GPT-3.5-Turbo-0125 0.5000 0.51500.4900
GPT-4-0125-preview0.54620.54500.5050
xLAM-v0.1-r0.50770.56500.5200
LLM Name1-step2-step3-step4-step5-step
GPT-4-0613----69.45
Claude-Instant-112.1232.2539.2544.3745.90
xLAM-v0.1-r4.1028.5036.0142.6643.96
Claude-2 26.45 35.4936.0139.7639.93
Lemur-70b-Chat-v1 3.75 26.9635.6737.5437.03
GPT-3.5-Turbo-0613 2.7316.8924.0631.7436.18
AgentLM-70b 6.4817.7524.9128.1628.67
CodeLlama-34b 0.1716.2123.0425.9428.16
Llama-2-70b-chat 4.2714.3315.7016.5517.92
LLM NameSuccess RateProgress Rate
xLAM-v0.1-r0.5330.766
DeepSeek-67B 0.400 0.714
GPT-3.5-Turbo-0613 0.367 0.627
GPT-3.5-Turbo-16k 0.3170.591
Lemur-70B 0.2830.720
CodeLlama-13B 0.2500.525
CodeLlama-34B 0.1330.600
Mistral-7B 0.0330.510
Vicuna-13B-16K 0.0330.343
Llama-2-70B 0.0000.483

Licenses

This code is licensed under Apache 2.0. For models based on the deepseek model, which require you to follow the use based restrictions in the linked deepseek license. This is a research only project.

Acknowledgement

We want to acknowledge the work which have made contributions to our paper and the agent research community! If you find our work useful, please consider to cite

@article{zhang2024agentohana,
  title={AgentOhana: Design Unified Data and Training Pipeline for Effective Agent Learning},
  author={Zhang, Jianguo and Lan, Tian and Murthy, Rithesh and Liu, Zhiwei and Yao, Weiran and Tan, Juntao and Hoang, Thai and Yang, Liangwei and Feng, Yihao and Liu, Zuxin and others},
  journal={arXiv preprint arXiv:2402.15506},
  year={2024}
}
@article{liu2024apigen,
  title={APIGen: Automated PIpeline for Generating Verifiable and Diverse Function-Calling Datasets},
  author={Liu, Zuxin and Hoang, Thai and Zhang, Jianguo and Zhu, Ming and Lan, Tian and Kokane, Shirley and Tan, Juntao and Yao, Weiran and Liu, Zhiwei and Feng, Yihao and others},
  journal={arXiv preprint arXiv:2406.18518},
  year={2024}
}
@article{zhang2024xlamfamilylargeaction,
  title={xLAM: A Family of Large Action Models to Empower AI Agent Systems}, 
  author={Zhang, Jianguo  and Lan, Tian  and Zhu, Ming  and Liu, Zuxin and Hoang, Thai and Kokane, Shirley and Yao, Weiran and Tan, Juntao and Prabhakar, Akshara and Chen, Haolin and Liu, Zhiwei and Feng, Yihao and Awalgaonkar, Tulika and Murthy, Rithesh and Hu, Eric and Chen, Zeyuan and Xu, Ran and Niebles, Juan Carlos and Heinecke, Shelby and Wang, Huan and Savarese, Silvio and Xiong, Caiming},
  journal={arXiv preprint arXiv:2409.03215}
  year={2024}
}