Create pull requests for podcast websites using provided transcripts and GenAI.
This an AWS SAM project that takes podcast transcripts generated by Podwhisperer and performs two steps:
- Use an Amazon Bedrock LLM to summarise the episode and create draft YouTube chapters
- Create a Pull Request for the podcast's website source code containing the summary and the transcript content
An example of a Pull Request generated by this project can be found here.
We currently use the Anthropic Claude V2 Large Language Model (LLM) in Amazon Bedrock
(Source for this diagram is in template.drawio
in this directory)
- The project assumes that JSON transcripts are created upstream in an S3 Bucket by Podwhisperer.
- You will need the following build tooling installed.
- Node.js 18.x and NPM 8.x
- AWS SAM, used to build and deploy most of the application
- The AWS CLI
- esbuild
- By default, the target AWS account should have the SLIC Watch SAR Application installed. It can be installed by going to this page in the AWS Console. SLIC Watch is used to create alarms and dashboards for our transcription application. If you want to skip this option, just remove the single line referring to the
SlicWatch-v2
macro from template.yaml. - You will need to go to the Amazon Bedrock Console and enable the Anthropic Claude v2 model in "Model Settings"
- Enable access to the website repository for your podcast with an SSM SecureString parameter in your AWS account:
Parameter | Description | Example Value |
---|---|---|
/episoder/gitHubUserCredentials |
Personal Access Token (PAT) for the GitHub repository | username:github_pat_123AB...xyz |
To test changes to the LLM prompt, you don't have to deploy. You can run summarise.ts with a path to a JSON transcript file. A sample transcript is provided. This script uses Bedrock so you must have AWS credentials for an account set up.
./bin/summarise.ts ./sample-transcripts/aws-bites-101.json
To tweak the prompt, edit lib/prompt-template.ts.
Using AWS SAM:
sam build --parallel
sam deploy --guided
You will be prompted for:
- The S3 Bucket where transcripts are expected to arrive
- The region to use for Bedrock, since Bedrock is currently only available in a limited number of regions
- The email address and name to use for Git commits
- The HTTPS URL of your website GitHub repository, e.g.,
https://github.com/awsbites/aws-bites-site.git
Once deployment has completed, you can check the Step Function that orchestrates the whole process in the AWS Console. This state machine is automatically executed when transcripts are placed in the processed-transcripts/
prefix.
Bedrock pricing can be difficult to estimate. This repo comes with a pricing CloudWatch dashboard that helps to show the cost for a given period and the relationship between invocations, input tokens and output tokens. This is calculated based on published on-demand pricing for the ClaudeV2 model as of 28 October 2023. A CloudWatch alarm is also created for the total cost per hour, defaulting to breach when the cost exceeds $1 per hour for three consecutive hours.
The pricing dashboard can be deployed with CDK:
cd price-monitor
npm install
npx cdk deploy -c bedrockRegion=us-east-1