Mamba 4chan 2

About

The Kek of Destiny, the next generation Mamba 4chan, is here.

Installation

We provided a simple setup.sh to install the Conda environment. You need to satisfy the following prerequisites:

Linuxf
NVIDIA GPU
CUDA 12+ supported GPU driver
Miniforge

Then, simply run source ./setup.sh to get started.

Dataset

We utilized the same preprocessed Raiders of the Lost Kek dataset detailed in the original mamba 4chan repo. You can also find the download link there.

Fine-tuned Models

We provide the following fine-tuned models, each trained for one epoch on the tokenized dataset using a single RTX 4090 with a context size of 2048 tokens and a batch size of 409,600 tokens. Mixed precision (bf16) was used for training, while the model weights were stored in fp32. We will release more models and improved versions as opportunities arise.

Name	Model Dim.	Num. of Layers	Attention Layers	Download	Fine-tuning Log
Mamba 4chan 2 780M	1536	48	None	Download	log

Training and Inferencing

We provide train.py, which contains all the necessary code to train a Mamba 4chan 2 model and log the training progress. The logged parameters can be modified in model.py.

The base model's hyperparameters are stored in model_config.py, and you can adjust them as needed. When further training our model, note that all hyperparameters are saved directly in the model file. For more information, refer to PyTorch Lightning's documentation. The same applies to inferencing, as PyTorch Lightning automatically handles all parameters when loading our model.

Here's a sample code snippet to perform inferencing with Mamba 4chan 2:

from transformers import AutoTokenizer

from model import mamba_4chan

model = mamba_4chan.load_from_checkpoint("path_to.ckpt")

# from model_config import ssm_780m_config
# model = mamba_4chan.load_from_checkpoint(
#     "path_to_weights_only.ckpt",
#     config = ssm_780m_config()
# )

model.cuda()
model.eval()

tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-neox-20b")
text = "-----\n\n--- 943264000\nOur country".strip()
pred = model.generate_text(tokenizer, text, 512)

You can also use this colab notebook for a quick demo.

Credits

Our work builds upon the remarkable achievement of Mamba <3.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.gitignore		.gitignore
README.md		README.md
environment.yml		environment.yml
generation.py		generation.py
model.py		model.py
model_config.py		model_config.py
optimizer.py		optimizer.py
setup.sh		setup.sh
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Mamba 4chan 2

About

Installation

Dataset

Fine-tuned Models

Training and Inferencing

Credits

About

Releases

Packages

Languages

catalpaaa/Mamba-4chan-2

Folders and files

Latest commit

History

Repository files navigation

Mamba 4chan 2

About

Installation

Dataset

Fine-tuned Models

Training and Inferencing

Credits

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages