Update blog

ag2ai · Dec 18, 2024 · 5690a7e · 5690a7e
1 parent e65e4c4
commit 5690a7e
Show file tree

Hide file tree

Showing 3 changed files with 276 additions and 277 deletions.
diff --git a/autogen/agentchat/contrib/reasoning_agent.py b/autogen/agentchat/contrib/reasoning_agent.py
@@ -50,21 +50,23 @@ def __init__(self, content: str, parent: Optional["ThinkNode"] = None) -> None:
         for traversing/visualizing the reasoning path.
 
         Args:
-            content (str): The text content/description for this reasoning step
-            parent (Optional[ThinkNode]): The parent node in the tree, if any
+            content (str): The text content/description for this reasoning step.
+            parent (Optional[ThinkNode]): The parent node in the tree, if any.
 
         Attributes:
-            content (str): The text content/description for this reasoning step
-            value (Optional[float]): A numeric score/value assigned to this node
-            parent (Optional[ThinkNode]): Reference to parent node
-            depth (int): The depth of this node in the tree (root = 0)
-            children (List[ThinkNode]): List of child nodes
-            visits (int): Number of times this node has been visited during search
+            content (str): The text content/description for this reasoning step.
+            value (Optional[float]): A numeric score/value assigned to this node.
+            parent (Optional[ThinkNode]): Reference to the parent node.
+            reflection (str): A string containing reflections on the reasoning process.
+            rating_details (str): A string providing details about the rating of this node.
+            depth (int): The depth of this node in the tree (root = 0).
+            children (List[ThinkNode]): List of child nodes.
+            visits (int): Number of times this node has been visited during search.
 
         The node automatically maintains the tree structure by:
-        - Setting its depth based on parent's depth + 1
-        - Adding itself to parent's children list if parent exists
-        - Providing trajectory utilities to get the full path from root to this node
+        - Setting its depth based on the parent's depth + 1.
+        - Adding itself to the parent's children list if the parent exists.
+        - Providing trajectory utilities to get the full path from root to this node.
         """
         self.content = content
         self.value = 0
@@ -575,6 +577,10 @@ def _mtcs_reply(self, prompt, ground_truth=""):
 
             # Selection
             while not self._is_terminal(node) and len(node.children) > 0:
+                # TODO: In the original UCT formula, child.value represents the win ratio.
+                # Here, we use the average rating rather than the win ratio.
+                # The rating might be biased from the LLM, which could affect the bounds of this vanilla UCT equation.
+                # More intensive analysis is needed in the future.
                 choices_weights = [
                     # exploitation term +
                     (child.value / (child.visits + EPSILON)) +
@@ -590,7 +596,7 @@ def _mtcs_reply(self, prompt, ground_truth=""):
                 if len(node.children) == 0:
                     self._expand(node)
                     if self._method == "lats":
-                        # In LATS: rate the quality of the current child node using the ground truth and
+                        # In LATS: rate the quality of the current child node and
                         # backpropagate the reward to update the node's value and visits.
                         reward = self.rate_node(node, ground_truth)
                         node.backpropagate(reward)

diff --git a/website/blog/2024-12-18-Reasoning-Update/index.mdx b/website/blog/2024-12-18-Reasoning-Update/index.mdx
@@ -0,0 +1,258 @@
+---
+title: ReasoningAgent Update - MCTS, LATS, and Beam Search for LLM Reasoning
+authors:
+  - BabyCNM
+  - Hk669
+  - sonichi
+  - qingyunwu
+tags: [LLM, GPT, research, tutorial]
+---
+
+![Tree of Thoughts](img/reasoningagent_1.png)
+
+**Update Overview:**
+* We introduce Monte Carlo Tree Search (MCTS) as an alternative to Beam Search in ReasoningAgent
+* We draw inspiration from Language Agent Tree Search (LATS) as a modified MCTS approach, where we calculate reward at every step (similar to beam search)
+* You can control the reasoning agent setup with the `reason_config` dictionary
+* We also include a parameter `forest_size` to enable "forest of thoughts"
+* You can include ground truth answer in the prompt for the reasoning agent to generate thinking trajectories for LLM post-training
+
+## Introduction
+
+In our [previous post](/blog/2024-12-02-ReasoningAgent2), we introduced the ReasoningAgent, which utilized Beam Search for systematic reasoning. Today, we include MCTS (Monte Carlo Tree Search) and Language Agent Tree Search (LATS) as alternative search strategies, which present advantages in different scenarios.
+
+Our previous ReasoningAgent draws inspiration from OpenAI's 2023 paper, [Let's Verify Step by Step](https://arxiv.org/pdf/2305.20050), as well as the 2024 [O1](https://openai.com/o1/) feature. The landscape of contemporary research is rich, with notable works such as [DeepSeek-R1](https://api-docs.deepseek.com/news/news1120), [Macro-O1](https://github.com/AIDC-AI/Marco-o1), and [OpenR](https://github.com/openreasoner/openr).
+
+
+## Quick Start Guide
+
+Let's start with a simple example using MCTS:
+
+```python
+import os
+from autogen import UserProxyAgent, ReasoningAgent
+
+# Configure the model
+config_list = [{"model": "gpt-4", "api_key": os.environ.get("OPENAI_API_KEY")}]
+
+# Create a reasoning agent with MCTS
+mcts_agent = ReasoningAgent(
+    name="mcts_agent",
+    llm_config={"config_list": config_list},
+    reason_config={
+        "method": "mcts",  # Use MCTS instead of beam search
+        "nsim": 5,  # Number of MCTS simulations
+        "exploration_constant": 1.41  # UCT exploration parameter
+    }
+)
+
+# Create a user proxy agent
+user_proxy = UserProxyAgent(
+    name="user_proxy",
+    human_input_mode="NEVER",
+    code_execution_config={"use_docker": False}
+)
+
+prompt = "What is the expected maximum dice value if you can roll a 6-sided dice three times?"
+response = user_proxy.initiate_chat(mcts_agent, message=prompt)
+```
+
+## Key Features in the New Version
+
+### 1. Multiple Search Methods
+ReasoningAgent now supports three search strategies:
+
+As the previous blog, the default method is beam search.
+```python
+# Beam Search (default)
+beam_agent = ReasoningAgent(
+    name="beam_agent",
+    llm_config={"config_list": config_list},
+    reason_config={
+        "method": "beam_search",
+        "beam_size": 3,
+        "answer_approach": "pool"  # or "best"
+    }
+)
+```
+
+MCTS is also included as a common approach.
+```python
+# Monte Carlo Tree Search
+mcts_agent = ReasoningAgent(
+    name="mcts_agent",
+    llm_config={"config_list": config_list},
+    reason_config={
+        "method": "mcts",
+        "nsim": 5 # number of simulations
+    }
+)
+```
+
+It is important to note that our reasoning agent operates based on "process" and lacks direct access to the environment. In contrast, the LATS approach relies on feedback from the environment. To address this, we utilize our existing grader agent to generate pseudo-rewards and provide feedback. The major difference between our LATS implementation and our MCTS implementation is that the LATS approach calculates the rewards (using the grader) and backpropagates them to its thinking trajectory at every step. You can define the agent using the LATS approach as follows.
+```python
+# Language Agent Tree Search
+lats_agent = ReasoningAgent(
+    name="lats_agent",
+    llm_config={"config_list": config_list},
+    reason_config={
+        "method": "lats",
+        "nsim": 5
+    }
+)
+```
+
+
+
+### 2. Incorporating Ground Truth for Enhanced Training Data Synthesis
+You can now include ground truth in your prompts to achieve more precise evaluations (grading). This allows you to leverage the reasoning agent to generate diverse thinking trajectories, further finetuning the base LLM.
+
+```python
+prompt = """Solve this calculus problem: ∫x²dx
+
+GROUND_TRUTH:
+The integral of x² is (x³/3) + C
+Steps:
+1. Use power rule: increase power by 1
+2. Divide by new power
+3. Add constant of integration
+"""
+
+response = user_proxy.initiate_chat(mcts_agent, message=prompt)
+
+# After running queries...
+sft_data = extract_sft_dataset(mcts_agent._root)
+rlhf_data = extract_rlhf_preference_dataset(mcts_agent._root)
+```
+
+### 3. Forest of Trees
+Enable ensemble reasoning with multiple independent trees:
+
+```python
+forest_agent = ReasoningAgent(
+    name="forest_agent",
+    llm_config={"config_list": config_list},
+    reason_config={
+        "method": "mcts",
+        "forest_size": 5  # Run 5 independent trees
+    }
+)
+```
+
+
+## When to Use Each Method
+
+
+### Use Beam Search when:
+- You want a deterministic search process
+- You can reliably evaluate intermediate steps
+- You need fast, memory-efficient search
+- The solution space is relatively small and structured
+- Early decisions strongly influence final outcomes
+
+### Use MCTS when:
+- You need stochastic exploration of solution paths
+- Final outcome evaluation is more reliable than intermediate steps
+- The solution space is large or complex
+- You want to balance exploration vs exploitation
+- You have computational budget for multiple simulations
+
+### Use LATS when:
+- You want MCTS-style exploration with step-by-step feedback
+- You can afford frequent LLM evaluations
+- You need to identify and prune poor paths early
+- The problem benefits from granular trajectory scoring
+- You want to combine benefits of beam search and MCTS
+
+## Advanced Features
+
+### 1. Visualization
+Visualize the reasoning tree using graphviz:
+
+```python
+from autogen.agentchat.contrib.reasoning_agent import visualize_tree
+
+# After running queries...
+visualize_tree(mcts_agent._root)
+```
+
+### 2. Custom Evaluation
+Modify the rating scale and evaluation criteria:
+
+```python
+custom_agent = ReasoningAgent(
+    name="custom_agent",
+    llm_config={"config_list": config_list},
+    reason_config={
+        "rating_scale": 100,  # Use 1-100 scale instead of default 1-10 for grading
+    }
+)
+```
+
+### 3. Save and Load Trees
+Save reasoning trees for later analysis:
+
+```python
+import json
+
+# Save tree
+data = mcts_agent._root.to_dict()
+with open("reasoning_tree.json", "w") as f:
+    json.dump(data, f)
+
+# Load tree
+from autogen.agentchat.contrib.reasoning_agent import ThinkNode
+loaded_tree = ThinkNode.from_dict(json.load(open("reasoning_tree.json")))
+```
+
+## Performance Comparison
+### Variables
+- d: Maximum depth of the reasoning tree
+- b: Beam size (number of parallel paths maintained)
+- w: Branching factor (number of child nodes per parent)
+- n: Number of MCTS simulations
+
+### Time Complexity
+Each algorithm has different computational costs:
+- Beam Search: O(d × b × (w + 1))
+  - At each depth level d, evaluates w options for each of b beams
+  - Plus 1 for generating the options
+- MCTS: O(n × d)
+  - Each simulation traverses down to depth d
+  - Performs n total simulations
+- LATS: O(2 × n × d)
+  - Similar to MCTS but doubles cost due to grading at each node
+
+### Memory Usage
+Storage requirements vary by approach:
+- Beam Search: O(b × d)
+  - Fixed memory proportional to beam size and depth
+  - Only stores active beams
+- MCTS and LATS: O(w^d)
+  - Worst case stores complete tree
+  - In practice much smaller due to selective expansion
+
+## Conclusion
+
+The new ReasoningAgent offers a flexible toolkit for systematic reasoning with LLMs. Choose between MCTS, Beam Search, and LATS based on your specific needs regarding:
+- Evaluation cost and availability
+- Time and resource constraints
+- Desired exploration vs exploitation balance
+- Training data generation requirements
+
+## Next Steps
+- Async Client Call: parallelize LLM calling to speed up searching
+- Swarm Agent implementation
+- Efficient Mode: merging thinker and grader
+- Batch Norm: normalizing scores for MCTS
+
+
+## For Further Reading
+
+* [Original ReasoningAgent with Beam Search](/blog/2024-12-02-ReasoningAgent2)
+* [Documentation about ReasoningAgent](/docs/reference/agentchat/contrib/reasoning_agent)
+* [MCTS in Wikipedia](https://en.wikipedia.org/wiki/Monte_Carlo_tree_search)
+* [Example Notebook](https://ag2ai.github.io/ag2/docs/notebooks/agentchat_reasoning_agent/)
+
+
+*Join our [Discord](https://discord.com/invite/pAbnFJrkgZ) server to discuss your experiences with these approaches and suggest improvements.*