Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement RealtimeAgent for Real-Time Conversational AI Support in ag2 Framework #196

Merged
merged 62 commits into from
Dec 20, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
62 commits
Select commit Hold shift + click to select a range
a3f0676
WIP
sternakt Dec 12, 2024
793bf68
WIP: Draft initial version of realtime agent
sternakt Dec 12, 2024
24ea3c2
remove openai_ws from observers (half way through)
davorinrusevljan Dec 12, 2024
b79e8a1
Merge pull request #198 from davorinrusevljan/realtime-agent-rush
sternakt Dec 13, 2024
942e3d0
WIP add handover registration to realtime agent
sternakt Dec 13, 2024
efa4c8b
Merge branch 'realtime-agent' of github.com:ag2ai/ag2 into realtime-a…
sternakt Dec 13, 2024
97f12e0
Implement function calling and register handover to realtime agent
sternakt Dec 13, 2024
96cc3bd
Run pre-commit
sternakt Dec 13, 2024
6145e62
Separated client out from RealtimeAgent
davorinrusevljan Dec 13, 2024
299c862
Merge pull request #206 from davorinrusevljan/realtime-agent-rush
sternakt Dec 13, 2024
3469f13
WIP: Integrate swarm into RealtimeAgent
sternakt Dec 17, 2024
1a26555
wip: refactoring a bit
davorrunje Dec 17, 2024
0b3aa50
WIP
sternakt Dec 17, 2024
995f0fd
wip: refactoring
davorrunje Dec 17, 2024
85fa907
Rework into anyio
sternakt Dec 17, 2024
6b058a6
Merge pull request #225 from ag2ai/realtime-agent-refactoring-anyio
sternakt Dec 17, 2024
99ed595
WIP: Cleanup example notebook
sternakt Dec 17, 2024
1681bb8
WIP: Cleanup example notebook
sternakt Dec 17, 2024
75af6b8
WIP
sternakt Dec 18, 2024
5bd052d
Add question polling
sternakt Dec 18, 2024
e8fc5a2
Sync question asking
sternakt Dec 18, 2024
1201f0c
Replace prints with logging
sternakt Dec 18, 2024
aed9fd5
Merge pull request #231 from ag2ai/realtime-agent-start-swarm-at-startup
sternakt Dec 18, 2024
90e0516
Init RealtimeAgent blogpost
sternakt Dec 18, 2024
39d5a0e
Add realtime agent swarm image
sternakt Dec 18, 2024
d3e1864
WIP: blog
sternakt Dec 19, 2024
147f171
Prepare blogpost
sternakt Dec 19, 2024
ff2f47a
Update autogen/agentchat/realtime_agent/realtime_agent.py
sternakt Dec 19, 2024
e2153fb
Update website/blog/2024-12-18-RealtimeAgent/index.mdx
sternakt Dec 19, 2024
f04deb2
Update website/blog/2024-12-18-RealtimeAgent/index.mdx
sternakt Dec 19, 2024
6e4ed4f
Update website/blog/2024-12-18-RealtimeAgent/index.mdx
sternakt Dec 19, 2024
da27ec8
Update website/blog/2024-12-18-RealtimeAgent/index.mdx
sternakt Dec 19, 2024
3c5eb4d
Update website/blog/2024-12-18-RealtimeAgent/index.mdx
sternakt Dec 19, 2024
1ac6dbc
Update notebook/agentchat_realtime_swarm.ipynb
sternakt Dec 19, 2024
4b6229e
Update website/blog/2024-12-18-RealtimeAgent/index.mdx
sternakt Dec 19, 2024
71a1aa6
Update website/blog/2024-12-18-RealtimeAgent/index.mdx
sternakt Dec 19, 2024
425218c
Revise agentchat_realtime_swarm.ipynb
sternakt Dec 19, 2024
1cc619a
Refactor RealtimeAgent.ask_question
sternakt Dec 19, 2024
4cab66e
Remove RealtimeOpenAIClient mention from the blogpost
sternakt Dec 19, 2024
acef7b4
Add docs
sternakt Dec 19, 2024
c193859
Remove prints
sternakt Dec 19, 2024
bd6d640
realtime deps moved to main deps
davorrunje Dec 19, 2024
f361341
Blog polishing
sternakt Dec 19, 2024
8d5b41a
Merge branch 'realtime-agent' of https://github.com/ag2ai/ag2 into re…
sternakt Dec 19, 2024
21e0c68
Update notebook title
marklysze Dec 19, 2024
7290c06
Twilio spelling corrected
marklysze Dec 19, 2024
449ad8c
Updated setups for ag2 and autogen packages
marklysze Dec 19, 2024
ea01e71
Notebook text tweaks
marklysze Dec 19, 2024
c68633e
Update response message to match video
marklysze Dec 19, 2024
6c01a66
websocket realtime wip(1)
davorinrusevljan Dec 19, 2024
0b023a1
websocket realtime wip(2)
davorinrusevljan Dec 19, 2024
4ae306e
websocket realtime wip(3)
davorinrusevljan Dec 19, 2024
ba15132
websocket realtime wip(4)
davorinrusevljan Dec 19, 2024
972b53b
websocket realtime wip(5)
davorinrusevljan Dec 19, 2024
9d959db
websocket realtime wip(6)
davorinrusevljan Dec 19, 2024
0ec36f1
websocket realtime wip(7)
davorinrusevljan Dec 19, 2024
8344e79
Merge pull request #241 from davorinrusevljan/realtime-agent-rush
davorrunje Dec 19, 2024
6da2e00
Merge remote-tracking branch 'origin/main' into realtime-agent
davorrunje Dec 19, 2024
65c5574
pre-commit run on all files
davorrunje Dec 19, 2024
a2aefe9
Merge remote-tracking branch 'origin/main' into realtime-agent
davorrunje Dec 19, 2024
1342f75
websockets upgraded to 14.x
davorrunje Dec 19, 2024
d3fa24d
merge with main
davorrunje Dec 19, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions autogen/agentchat/realtime_agent/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
from .function_observer import FunctionObserver
from .realtime_agent import RealtimeAgent
from .twilio_observer import TwilioAudioAdapter
from .websocket_observer import WebsocketAudioAdapter

__all__ = ["RealtimeAgent", "FunctionObserver", "TwilioAudioAdapter", "WebsocketAudioAdapter"]
143 changes: 143 additions & 0 deletions autogen/agentchat/realtime_agent/client.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,143 @@
# Copyright (c) 2023 - 2024, Owners of https://github.com/ag2ai
#
# SPDX-License-Identifier: Apache-2.0
#
# Portions derived from https://github.com/microsoft/autogen are under the MIT License.
# SPDX-License-Identifier: MIT

# import asyncio
import json
import logging
from typing import Any, Optional

import anyio
import websockets
from asyncer import TaskGroup, asyncify, create_task_group, syncify

from autogen.agentchat.contrib.swarm_agent import AfterWorkOption, initiate_swarm_chat

from .function_observer import FunctionObserver

logger = logging.getLogger(__name__)


class OpenAIRealtimeClient:
"""(Experimental) Client for OpenAI Realtime API."""

def __init__(self, agent, audio_adapter, function_observer: FunctionObserver):
davorrunje marked this conversation as resolved.
Show resolved Hide resolved
"""(Experimental) Client for OpenAI Realtime API.

args:
agent: Agent instance
the agent to be used for the conversation
audio_adapter: RealtimeObserver
adapter for streaming the audio from the client
function_observer: FunctionObserver
observer for handling function calls
"""
self._agent = agent
self._observers = []
self._openai_ws = None # todo factor out to OpenAIClient
self.register(audio_adapter)
self.register(function_observer)

# LLM config
llm_config = self._agent.llm_config

config = llm_config["config_list"][0]

self.model = config["model"]
self.temperature = llm_config["temperature"]
self.api_key = config["api_key"]

# create a task group to manage the tasks
self.tg: Optional[TaskGroup] = None

def register(self, observer):
"""Register an observer to the client."""
observer.register_client(self)
self._observers.append(observer)

async def notify_observers(self, message):
"""Notify all observers of a message from the OpenAI Realtime API."""
for observer in self._observers:
await observer.update(message)

async def function_result(self, call_id, result):
"""Send the result of a function call to the OpenAI Realtime API."""
result_item = {
"type": "conversation.item.create",
"item": {
"type": "function_call_output",
"call_id": call_id,
"output": result,
},
}
await self._openai_ws.send(json.dumps(result_item))
await self._openai_ws.send(json.dumps({"type": "response.create"}))

async def send_text(self, *, role: str, text: str):
"""Send a text message to the OpenAI Realtime API."""
await self._openai_ws.send(json.dumps({"type": "response.cancel"}))
text_item = {
"type": "conversation.item.create",
"item": {"type": "message", "role": role, "content": [{"type": "input_text", "text": text}]},
}
await self._openai_ws.send(json.dumps(text_item))
await self._openai_ws.send(json.dumps({"type": "response.create"}))

# todo override in specific clients
async def initialize_session(self):
"""Control initial session with OpenAI."""
session_update = {
# todo: move to config
"turn_detection": {"type": "server_vad"},
"voice": self._agent.voice,
"instructions": self._agent.system_message,
"modalities": ["audio", "text"],
"temperature": 0.8,
}
await self.session_update(session_update)

# todo override in specific clients
async def session_update(self, session_options):
"""Send a session update to the OpenAI Realtime API."""
update = {"type": "session.update", "session": session_options}
logger.info("Sending session update:", json.dumps(update))
await self._openai_ws.send(json.dumps(update))
logger.info("Sending session update finished")

async def _read_from_client(self):
"""Read messages from the OpenAI Realtime API."""
try:
async for openai_message in self._openai_ws:
response = json.loads(openai_message)
await self.notify_observers(response)
except Exception as e:
logger.warning(f"Error in _read_from_client: {e}")

async def run(self):
"""Run the client."""
async with websockets.connect(
f"wss://api.openai.com/v1/realtime?model={self.model}",
additional_headers={
"Authorization": f"Bearer {self.api_key}",
"OpenAI-Beta": "realtime=v1",
},
) as openai_ws:
self._openai_ws = openai_ws
await self.initialize_session()
# await asyncio.gather(self._read_from_client(), *[observer.run() for observer in self._observers])
async with create_task_group() as tg:
self.tg = tg
self.tg.soonify(self._read_from_client)()
for observer in self._observers:
self.tg.soonify(observer.run)()
if self._agent._start_swarm_chat:
self.tg.soonify(asyncify(initiate_swarm_chat))(
initial_agent=self._agent._initial_agent,
agents=self._agent._agents,
user_agent=self._agent,
davorrunje marked this conversation as resolved.
Show resolved Hide resolved
messages="Find out what the user wants.",
after_work=AfterWorkOption.REVERT_TO_USER,
)
72 changes: 72 additions & 0 deletions autogen/agentchat/realtime_agent/function_observer.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
# Copyright (c) 2023 - 2024, Owners of https://github.com/ag2ai
#
# SPDX-License-Identifier: Apache-2.0
#
# Portions derived from https://github.com/microsoft/autogen are under the MIT License.
# SPDX-License-Identifier: MIT

import asyncio
import json
import logging

from asyncer import asyncify
from pydantic import BaseModel

from .realtime_observer import RealtimeObserver

logger = logging.getLogger(__name__)


class FunctionObserver(RealtimeObserver):
"""Observer for handling function calls from the OpenAI Realtime API."""

def __init__(self, agent):
"""Observer for handling function calls from the OpenAI Realtime API.

Args:
agent: Agent instance
the agent to be used for the conversation
"""
super().__init__()
self._agent = agent

async def update(self, response):
"""Handle function call events from the OpenAI Realtime API."""
if response.get("type") == "response.function_call_arguments.done":
logger.info(f"Received event: {response['type']}", response)
await self.call_function(
call_id=response["call_id"], name=response["name"], kwargs=json.loads(response["arguments"])
)

async def call_function(self, call_id, name, kwargs):
"""Call a function registered with the agent."""
if name in self._agent.realtime_functions:
_, func = self._agent.realtime_functions[name]
func = func if asyncio.iscoroutinefunction(func) else asyncify(func)
try:
result = await func(**kwargs)
except Exception:
result = "Function call failed"
logger.warning(f"Function call failed: {name}")

if isinstance(result, BaseModel):
result = result.model_dump_json()
elif not isinstance(result, str):
result = json.dumps(result)

await self._client.function_result(call_id, result)

async def run(self):
"""Run the observer.

Initialize the session with the OpenAI Realtime API.
"""
await self.initialize_session()

async def initialize_session(self):
"""Add registered tools to OpenAI with a session update."""
session_update = {
"tools": [schema for schema, _ in self._agent.realtime_functions.values()],
"tool_choice": "auto",
}
await self._client.session_update(session_update)
Loading
Loading