Skip to content

X-PLUG/MM_StoryAgent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MM-StoryAgent

This repo is the official implementation of "MM-StoryAgent: Immersive Narrated Storybook Video Generation with a Multi-Agent Paradigm across Text, Image and Audio".

Introduction

MM-StoryAgent is a multi-agent framework that employs LLMs and diverse expert tools across several modalities to produce expressive storytelling videos. It hightlights in the following aspects:

  • MM-StoryAgent designs a reliable and customizable workflow. Users can define their own expert tools to improve the generation quality of each component.
  • MM-StoryAgent writes high-quality stories based on the input story setting, in a multi-agent, multi-stage pipeline.
  • Agents of all modalities (image, speech, sound, music) generated corresponding assets are composed to an immersive storytelling video.
Framework

Besides, we provide a story topic list and story evaluation criteria for further story writing evaluation.

News

  • Aug 16, 2024: The initial version of MM-StoryAgent was released.

Demo Video

The demo video is available:

Installation

Install the required dependencies and install this repo as a package:

pip install -r requirements.txt
pip install -e .

Quickstart

MM-StoryAgent can be called by configuration files:

python run.py -c configs/mm_story_agent.yaml

Each agent is called in the following format:

story_writer: # agent name
    tool: qa_outline_story_writer # name registered in the definition
    cfg: # parameters for initializing the agent instance
        max_conv_turns: 3
        ...
    params: # parameters for calling the agent instance
        story_topic: "Time Management: A child learning how to manage their time effectively."
        ...

The customization of new agents can refer to music_agent.py. The agent class should implement __init__ and call to work properly, like the following:

from typing import Dict
from mm_story_agent.base import register_tool

@register_tool("my_speech_agent")
class MySpeechAgent:
    
    def __init__(self, cfg: Dict):
        # For example, the agent need `attr1` and `attr2` for initilization
        self.attr1 = cfg.attr1
        self.attr2 = cfg.attr2
        ...
    
    def call(self, params: Dict):
        # For example, calling the agent needs `voice` and `speed` parameters
        voice = params["voice"]
        speed = params["speed"]
        ...
    

Then the agent can be called by simply modifying the configuration like:

speech_generation:
    tool: my_speech_agent
    cfg:
        attr1: val1
        attr2: val2
    params:
        voice: en_female
        speed: 1.0

Evaluation Data

The evaluation topics are provided in story_topics.json. Evaluation rubrics and prompts are also provided accordingly.

Story Content Evaluation

We use GPT-4 to automatically evaluate the story quality according to several aspects. Our story writing agent is compared with directly prompting LLM to write stories. Evaluation scores show the advantage of our multi-agent, multi-stage story writing pipeline.

Rubric Grading Attractiveness Warmth Education Average
Topic 1: Self-growing Direct 3.68 4.42 4.84 4.31
Story Agent 4.1 4.5 4.80 4.47
Topic 2: Family & Friendship Direct 3.94 5.0 4.72 4.55
Story Agent 4.36 4.8 4.92 4.69
Topic 3: Environments Direct 4.0 4.62 4.92 4.51
Story Agent 4.44 4.68 4.86 4.66
Topic 4: Knowledge Learning Direct 4.46 4.14 4.86 4.49
Story Agent 4.84 4.52 4.90 4.75
All Direct 4.02 4.55 4.84 4.47
Story Agent 4.44 4.63 4.87 4.65

Citation

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages