Skip to content

Latest commit

 

History

History
41 lines (29 loc) · 1.9 KB

README.md

File metadata and controls

41 lines (29 loc) · 1.9 KB

char-mamba

This repository contains a simple script for Mamba-based Character-level Language Modeling. It can be considered the Mamba version of char-rnn. Due to its simplicity, this script can serve as a template for training Mamba models from scratch, applicable to a wide array of sequence-to-sequence problems.

Requirements

Usage

main.py supports two subcommands: train and generate.

Train

To get started, use the following command to train a simple model:

python main.py train --cut-dataset=100

This command will train Mamba on the first 100 * 256 characters of the Tiny Shakespeare dataset (downloading it if necessary) for 10 epochs, save the model, and produce a sample generation. It takes about 10 seconds on GTX 1650, and the resulting model is able to generate legitimate English words.

Once you make sure that it's working, you can train on the whole dataset by removing --cut-dataset=100 argument. For more command line arguments, see the end of main.py.

The training code is based on mamba-dive's fine-tuning script, which in turn is based on mamba-chat.

Generate

After training the model, you can use the generate subcommand to load the saved model and generate text:

python main.py generate
# Generate with a prompt:
python main.py generate --prompt=First
# Generate batched:
python main.py generate --batch=4

The generation code is based on this script and supports most of the same arguments.