Skip to content

normand1/HyperFeeder

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

HyperFeeder - Autonomous Podcast Generator

All Contributors

Python unit tests

HyperFeeder generates a personally tailored podcast or tweet threads -- just for you (or your audience)!

Example Podcasts:

HyperFeeder is a fully configurable and extensible Multi Agent Workflow for researching and producting podcasts and tweet threads. HyperFeeder uses a plugin system that generates research step by step from Intro and music to individual podcast segments and news stories to how the podcast will be arranged and presented. At each step you can either choose from an existing list of plugins that will generate different kinds of content or simply write your own plugin for any step of the podcast generation process.

With existing tools and plugins you can currently build a podcast with data from Hacker News, Reddit, any podcast with transcripts in the RSS Feed, RSS Newsletters like those on Substack. The plan is to expand this tool to ingest many different configurable sources for building podcast content from as well as new sources for augmenting the source content. See the Issues Tab for the planned and in-progress roadmap. Also, feel free to open new issues for feature requests or pull requests for new features you'd like to contribute back.

Agent Workflow Example

Project Goals

  • The podcasts produced by this framework are fully autonomous and need no human intervention to search for content, generate audio, compose audio segments, produce instrospective metadata about podcast content and publish to a podcast feed.
  • Anyone should be able to submit easily composable new features that can help make all autonomous podcasts better.
  • Any source of data can be a source of podcast content. (In Progress, plugin PRs welcomed!)

Autonomous Podcast Feeds (Submit yours to have it featured here!)

Prerequisites

  • For the text to speech generation script I use say for simplicity right now which is only available on MacOS. I welcome Pull Requests to update the script or add new versions of the script for text-to-speech compatabillity with other systems.

  • For audio Stitching this project uses FFMPEG which can be downloaded for any system here

  • You will need Python version 3.9.17 or newer. If your Python version is older, you can use pyenv to manage multiple versions of Python on your system. Here's how to do it:

    • If you don't have pyenv installed, you can install it following the instructions on the pyenv GitHub.

    • After pyenv is installed, you can install Python 3.9.17 with the following commands:

      pyenv install 3.9.17
      pyenv global 3.9.17
    • Now, check your Python version:

      python --version

      You should see Python 3.9.17 as the output.

  • To access some features, you will need an OpenAI API key. Here's how to get one:

  • Create an account on OpenAI.

  • After signing up and logging in, navigate to the API section.

  • Here, you'll find your API key. Make sure to keep this key secure and do not share it with anyone.

Installation

Clone this repository using git:

git clone <repo_url>
cd <repo_directory>

Install Build Dependencies

brew install helmfile

Configure Plugins

HyperFeeder is made to be easily configurable and extensible with plugins. You can easily use existing plugins in different configurations by either modifying the plugins used in each step of the podcast generation process manually in the .config.env file or you can run the configurePlugins.sh script to use preset plugin configurations for generating a podcast based on any of the available plugins.

Different plugins require specific data sources and configuration options to be set in .config.env to work properly. Check the plugin directories for details on what each plugin requires in the .config.env file to be run.

We use Helm to configure these values and to ensure that requiremets for each plugin are met when modifying the script.

You can find helmfile and helm installation instructions here: https://helm.sh/docs/intro/install/ https://github.com/helmfile/helmfile

To update general publication settings modify: podcastTextGenerationApp/charts/values/base.yaml

To update which plugins are active modify: podcastTextGenerationApp/charts/helmfile.yaml

When you have made changes then run ./configurePlugins.sh to regenerate the .config.env file based on the helmfile configuration.

Dependencies

After setting up, install the required dependencies:

make sure you have pip-tools installed: pip install pip-tools

pip-compile requirements.in
pip install -r requirements.txt

Folder Structure

Here's a brief overview of the main folders and files in this project:

  • generatePodcast.sh: Runs all scripts in the correct sequence to build a podcast
  • podcastTextGenerationApp/: The main Python application.
  • podcastMetaInfoScripts/: Scripts to manage podcast metadata (chapters and description).
  • audioScripts/: Scripts to manage audio files.
  • audioScripts/podcast_intro_music.mp3: Default intro music

Running the Application

To generate a new podcast you can run the generatePodcast script by running python generatePodcast.py.

If the podcast generation process fails at some point you can re-run any of the failed scripts defined in generatePodcast.py directly. Running generatePodcast.py just runs the scripts in the correct sequence.

If you want to run / debug / modify the python app for podcast text generation (this is where most of the podcast script generation logic lives) you can follow these instructions:

To run the application, navigate to the podcastTextGenerationApp directory and run app.py:

cd podcastTextGenerationApp
python podcastTextGenerator.py

Error Recovery

Every output produced by each plugin is saved in the output directory under a folder with the name of the current Date Time when the script was run. In order to retry the podcast generation simply pass the name of the folder created in the output directory like so:

./generatePodcast.sh Podcast-Dec15-2024-05AM

podcastTextGenerationApp Details

The podcastTextGenerationApp directory is the heart of this framework. When modifying and creating your own podcasts this is most likely where you will want to start. The podcastTextGenerationApp uses a plugin architecture so you can extend the functionality of this app and easily contribute your own plugins!

When the app is run by the generatePodcast.py script it will proceed to generate text for a podcast by invoking plugins in the following order:

  • podcastDataSourcePlugins: These plugins will be invoked to generate the urls for a set of "Stories" that will be used to generate the rest of the podcast. Any plugin used here must return data in the form of a "Story" class. Subclasses of Story are valid outputs of this plugin as well, see the HackerNewsStory class as an example.

  • podcastIntroPlugins: These plugins will be invoked to write the Intro for the podcast based on the "stories" picked by the podcastDataSourcePlugins. This is a good place to inject some personality or branding to the podcast based on your choice of plugins or modification to existing plugins.

  • podcastScraperPlugins: These plugins will be invoked to scrape the text from the urls determined by the podcastDataSourcePlugins. This plugin adds a 'raw_text' directory filled with the raw text scraped from these urls that will be used by the next set of plugins.

  • podcastSegmentWriterPlugins: These plugins generate the final text that will be used to produce spoken audio for the podcast. The output of this plugin will be written to the podcast's segment_text directory.

  • podcastOutroWriterPlugins: These plugins generate a the outro to the podcast. This is another good place to inject some personality or branding to the podcast based on your choice of plugins or modification to existing plugins.

The NewsletterRSSFeedPlugin fetches stories from newsletters in RSS feed format and uses Firebase to store the timestamp of the last fetched story for each feed. If a Firebase FIREBASE_DATABASE_URL environment variable is not defined then this plugin simply returns a list of the most recent newsletter items in the RSS Feed.

Podcast Plugin Pipeline

           +-------------------+
           | podcastDataSource |
           |     Plugins       |
           +---------+---------+
                     |
                     |
           +---------v---------+
           | podcastIntro     | 
           |    Plugins       |
           +---------+---------+
                     |
                     |
           +---------v---------+
           | podcastScraper   |
           |    Plugins       |
           +---------+---------+
                     |
                     |
                     |
           +---------v---------+
           | podcastSegment   |
           |   WriterPlugins  |
           +-------------------+
                     |
                     |
           +---------v---------+
           | podcastOutro     |
           |   WriterPlugins  |
           +-------------------+
                     |
                     |
           +---------v---------+
           | podcastProducer   |
           |   Plugins         |
           +-------------------+
                    |
                    |
                    v

Easy Podcast Modification Points

Background Music

  • You can easily change the intro background music by replacing the file podcast_intro_music.mp3 with your own music. This file was generated with Suno.

Introduction

  • Modify the intro by modifying the prompt for the intro here: podcastTextGenerationApp/podcastIntroWriter.py

New Story Presentation

  • Modify the way that the news story segments are presented by modifying the prompt here: podcastTextGenerationApp/storySegmentWriter.py

Contributing

Contributions, issues and feature requests are welcome! Feel free to check issues page. If you'd like to contribute new features, open a pull request.

Testing

To run tests the $PYTHONPATH for the current session must include the podcastTextGenerationApp directory. Here's an example of how this can be set prior to running tests:

    export PYTHONPATH=${PYTHONPATH}:/Users/<your username>/<your path to this app>/HyperFeeder/podcastTextGenerationApp

Change directory to the HyperFeeder top level directory (if you're not there already). Then you should be able to run python -m unittest successfully.

License

MIT

Contact

If you have any questions, feel free to reach out to me at <david.norman.w@gmail.com>.

Enjoy your new podcast!

Contributors ✨

Thanks goes to these wonderful people (emoji key):

David Norman
David Norman

🚇 ⚠️ 💻
yukthi hettiarachchi
yukthi hettiarachchi

💻

This project follows the all-contributors specification. Contributions of any kind welcome!

About

The Autonomous Podcast Generator

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •