updating readme and lambdas

reubenson · Jan 22, 2024 · 50fa9f8 · 50fa9f8
1 parent b36be6d
commit 50fa9f8
Show file tree

Hide file tree

Showing 4 changed files with 44 additions and 54 deletions.
diff --git a/README.MD b/README.MD
@@ -1,67 +1,60 @@
 # MIDI-Archive Lambda
-This repo is responsible for defining two AWS Lambda functions. The first Lambda is used to generate prediction tokens from an ML inference model trained on MIDI files hosted by the [MIDI Archive](https://github.com/reubenson/midi-archive), and the second Lambda takes those tokens, converts it to MIDI, and saves the file to an S3 Bucket. These two Lambdas are chained together in prod via an AWS Step Function.
+This repo is defines two AWS Lambda functions and contains the Jupyter notebook used to train the model. The first Lambda is used to generate prediction tokens from an ML inference model trained on MIDI files hosted by the companion Github repo, [/midi-archive](https://github.com/reubenson/midi-archive). The second Lambda takes those tokens, converts it to MIDI, and saves the file to an S3 Bucket. These two Lambdas are chained together in prod via an AWS Step Function.
 
 The ML model follows a simple decoder-only transformer model, built on the PyTorch library, and is based on Andrej Karpathy's [Neural Net lecture series](https://karpathy.ai/zero-to-hero.html).
 
-Created while I was at [Recurse Center](https://recurse.com/) in late 2023!
+Created while I was at [Recurse Center](https://recurse.com/) in the fall of 2023!
 
 ## About
-I built this project as an exploration of low-cost deployment of ML models. I decided to use Lambda (as opposed to Azure, Sagemaker, EC2, etc) because of the resource limitations and ease of scaling down uptime, and thought it would be interesting to use these limitations to explore tradeoffs in training the model. For example, the more sophisticated I make the model (increased number of parameters), the longer it takes to generate a sequence of tokens. My current plan is to only have the Lambda run once a day (staying within AWS's free tier), and since Lambda functions are limited to 15 minutes of execution time, that means a more sophisticated model may end up generating less than a minute of music. At the moment, I'm still balancing these tradeoffs with my own subjective evaluation of the aesthetic interest in _crappy_ results.
+I built this project as an exploration of low-cost deployment of ML models. I decided to use Lambda (as opposed to Azure, Sagemaker, EC2, etc) because of the resource limitations and ease of scaling down uptime, and thought it would be interesting to use these limitations to explore tradeoffs in training the model. For example, the more sophisticated I make the model (increased number of parameters), the longer it takes to generate a sequence of tokens. My current plan is to only have the Lambda run once a day (staying within AWS's free tier), and since Lambda functions are limited to 15 minutes of execution time, that means a more sophisticated model may end up generating less than a minute of music. At the moment, I'm still balancing these tradeoffs.
 
 For more sophisticated approaches to MIDI Machine Learning, check out [this article on using GPT-2 with MIDI](https://huggingface.co/blog/juancopi81/using-hugging-face-to-train-a-gpt-2-model-for-musi), or [this huggingface model](https://huggingface.co/krasserm/perceiver-ar-sam-giant-midi)
 
-Also, see [MIDI Archive](https://github.com/reubenson/midi-archive) for more notes on the underlying training set used for this model. While current state of the art is focused on language, there's also already been considerable work done on MIDI, including OpenAI's [Musenet](https://openai.com/research/musenet) and Magenta's [Music Transformer](https://magenta.tensorflow.org/music-transformer). Both of these models use a MIDI training set called [Maestro](https://magenta.tensorflow.org/datasets/maestro) that often comes up as a canonic MIDI archive. However, developing the training set is a critical step in the design of ML systems, and I wanted to take the opportunity to both explore the design decisions that go into developing a ML model alongside an archive.
+Also, see [MIDI Archive](https://github.com/reubenson/midi-archive) for more notes on the underlying training set used for this model. While current state of the art is focused on text, there's also already been considerable work done on MIDI, including OpenAI's [Musenet](https://openai.com/research/musenet) and Magenta's [Music Transformer](https://magenta.tensorflow.org/music-transformer). Both of these models use a MIDI training set called [Maestro](https://magenta.tensorflow.org/datasets/maestro) that often comes up as a canonic MIDI archive. However, developing the training set is a critical step in the design of ML systems, and I wanted to take the opportunity to both explore the design decisions that go into developing a ML model alongside an archive.
 
-## Workflow
-- `MIDI Archive` repo is responsible for furnishing the training set in the form of .midi files and tokenized .json files
-- In this repo, run the notebook to generate tokenizer config and tokenized dataset
-- Assuming lack of access to local compute resources, run training on Colab, using pre-tokenized dataset
-- After training, export the trained model to ONNX
-- use the deploy script in `/midi-save` to push the model up to an AWS Lambda Layer
-- update the layer version in the Lambda function
-- updated model is now deployed!
-
-### Workflow Diagram
+## Flowchart
 The following is a general workflow diagram of the larger MIDI Archive project
 ```mermaid
 flowchart TD
-    A(Configure scraper) -->Z(Download files from Internet Archive)
-    Z --> B(Serve archived pages via GithubPages with static-site generation handled with 11ty)
-    Z --> L(Process MIDI into tokens for training neural net and upload to S3 Bucket)
-    L --> |Switch to Colab|C(Fetch tokens and tokenizer)
-    C --> E(Train neural net model)
-    E --> Y(Export model to ONNX)
-    Y --> |Done with Colab|F(Deploy to AWS Lambda)
+    flowchart TD
+    A(Scrape MIDI files from early web) --> Z(Present model outout and interactive MIDI archive on Eleventy static site)
+    A --> B(Process MIDI into tokens for training neural net)
+    %% B --> L(Process MIDI into tokens for training neural net and upload to S3 Bucket)
+    B --> |Switch to Colab|C(Fetch tokens and tokenizer)
+    C --> D(Train neural net model)
+    D --> E(Export model to ONNX)
+    E --> |Done with Colab|F(Deploy to AWS Lambda)
     %% F --> X(Daily Cloudwatch trigger)
     F --> |Daily Cloudwatch Trigger|G(Generate MIDI sequence and save to S3)
     %% B --> D(Load onto MIDI Archive website)
-    B --> I(Browser client connects via websockets to Lambda to receive MIDI broadcast)
-    H(Cloudwatch-triggered Lambda reads MIDI file from S3) --> I
-    G-->H
+    %% B --> I(Browser client connects via websockets to Lambda to receive MIDI broadcast)
+    %% G --> H(Cloudwatch-triggered Lambda reads MIDI file from S3) --> Z
+    %% G-->H
+    G --> Z
 ```
 
-Note: the websockets portion has not yet been implemented!
-
 ### Training Notebook
 Notebook for training and exporting the model in the repo [at `/training-notebook`](https://github.com/reubenson/midi-archive-lambda/tree/main/training-notebook), and on [Colab](https://colab.research.google.com/drive/1hpzG6ygsn0Cv44ImhyOn13eHtSo_Lccg#scrollTo=2BEEaoBHBQ1K)
 
-### Lambda #1
-The handler and deployment script is defined at `/neural-net`
-
-### Lambda #2
-The handler and deployment script is defined at `/midi-save`
 
-<!-- The Lambda will send periodic websockets messages to client browsers containing a payload of MIDI data generated by the neural net model. -->
+## Workflow
+- `MIDI Archive` repo is responsible for furnishing the training set in the form of .midi files and tokenized .json files
+- In this repo, run the notebook to generate tokenizer config and tokenized dataset
+- Assuming lack of access to local compute resources, run training on Colab, using pre-tokenized dataset
+- After training, export the trained model to ONNX
+- use the deploy script in `/midi-save` to push the model up to an AWS Lambda Layer
+- update the layer version in the Lambda function
+- updated model is now deployed!
 
-## Installation
+<!-- ## Installation -->
 <!-- Follow instrictions at https://github.com/nficano/python-lambda, which is the repo this project follows. Unfortunately, its releases are lagging, and an [important update](https://github.com/nficano/python-lambda/pull/714) has not made its way into the official package distribution. Until then, I'm running a local version of the repo: `pip install -e ../python-lambda`, in order for `lambda deploy` to work as expected -->
 
-## Commands
+<!-- ## Commands -->
 <!-- Zip up and deploy lambda with `lambda deploy --requirements ./requirements.txt`. The requirements flag is useful because I've been having weird issues with the installer pulling in all kinds of things from venv -->
 
 ## Misc notes
-<!-- - To get the Lambda working properly, I had to fuss with permissions a bunch [here](https://us-east-1.console.aws.amazon.com/iam/home?region=us-east-1#/roles/details/lambda_basic_execution?section=permissions) -->
 - Used `docker run --rm -v $(pwd):/package highpoints/aws-lambda-layer-zip-builder miditok` to generate .zip with miditok dependency (https://github.com/HighPoint/aws-lambda-onnx-model)
+<!-- - To get the Lambda working properly, I had to fuss with permissions a bunch [here](https://us-east-1.console.aws.amazon.com/iam/home?region=us-east-1#/roles/details/lambda_basic_execution?section=permissions) -->
 
 ## References
 - [As an alternative to the approach taken here, Docker and ECR could instead be used to manage Lambda deployment](https://www.serverless.com/blog/deploying-pytorch-model-as-a-serverless-service)

diff --git a/midi-save/lambda_function.py b/midi-save/lambda_function.py
@@ -89,7 +89,7 @@ def lambda_handler(event, context):
     # this one will be archived
     upload_to_s3(midi_filepath, f'neural-net/{current_date}_sequence.mid')
     # this sequence will be used to prompt subsequent token generation
-    upload_to_s3(json_filepath, 'neural-net/token_sequence.json')
+    upload_to_s3(json_filepath, f'neural-net/token_sequence.json')
 
     return {
         'statusCode': 200,

diff --git a/neural-net/lambda_function.py b/neural-net/lambda_function.py
@@ -17,8 +17,7 @@ def get_previous_tokens():
 
 def download_from_s3(file_key, save_path):
     bucket_name = 'midi-archive'
-    # current_date = datetime.now().date()
-    # to_path = f'neural-net/{current_date}_sequence.mid'
+
 
     # Upload the file
     s3_client = boto3.client('s3')
@@ -29,35 +28,33 @@ def download_from_s3(file_key, save_path):
         return False
     return True
 
-def softmax(x):
-    return(np.exp(x)/np.exp(x).sum())
+# https://medium.com/mlearning-ai/softmax-temperature-5492e4007f71
+def softmax_temp(x, temperature):
+    return(np.exp(x/temperature)/np.exp(x/temperature).sum())
+
+def generate_token(logit, temperature):
+    # 0.9999 factor is a hacky solution to probability adding up to > 1 due to float math
+    probability = 0.99999 * softmax_temp(logit, temperature)
+    val = np.random.multinomial(1, probability)
+    token = val.tolist().index(1)
+    return token
 
 def generate_tokens(session):
     input_shape = session.get_inputs()[0].shape
     batch_size, block_size = input_shape
-    num_tokens = 8192 # translates roughly to 3-4min
-    # randint should be replaced with some actual MIDI data so it's not gibberish to start
-    prev_tokens = get_previous_tokens()
-    print(f'prev_tokens {prev_tokens}')
+    num_tokens = 10000 # translates roughly to 3-4min
 
+    prev_tokens = get_previous_tokens()
     context = prev_tokens[-block_size:]
-    # context2 = np.random.randint(vocab_size, size=block_size)
-    # print(f'context1 {context1}')
-    # print(f'context2 {context2}')
-    # return
-    # context = seed_sequence
+    # context = np.random.randint(vocab_size, size=block_size)
     outputs = []
     for i in range(num_tokens):
         logits = session.run(None, {'input': [context]})[0]
         last_logit = logits[0, -1, :] # grab last timestep
-        # 0.9999 factor is a hacky solution to probability adding up to > 1 due to float math
-        probability = 0.99999 * softmax(last_logit)
-        val = np.random.multinomial(1, probability)
-        token = val.tolist().index(1)
+        token = generate_token(last_logit, 1.15)
         outputs.append(token)
         context = np.append(context, token)
         context = context[-block_size:]
-        print(f'token i: {i}')
     return outputs
 
 def lambda_handler(event, context):

diff --git a/neural-net/model.onnx b/neural-net/model.onnx