Hungry Hungry Hippos (H3)

This repository provides the official implementation of H3 from the following paper.

Hungry Hungry Hippos: Towards Language Modeling with State Space Models
Tri Dao*, Daniel Y. Fu*, Khaled K. Saab, Armin W. Thomas, Atri Rudra, Christopher Ré
International Conference on Learning Representations, 2023. Notable top-25% (spotlight).
Paper: https://arxiv.org/abs/2212.14052

Code & model release

You can find model weights on the Hugging Face Hub here (under "Files and Versions" for each model):

Loading weights and running inference

Examples of how to load the weights and run inference are given in benchmarks/benchmark_generation.py and examples/generate_text_h3.py.

Here's an example of how to download and run our 125M model (you may need to install FlashAttention):

git lfs install
git clone https://huggingface.co/danfu09/H3-125M

git clone https://github.com/HazyResearch/H3.git

PYTHONPATH=$(pwd)/H3 python H3/examples/generate_text_h3.py --ckpt H3-125M/model.pt --prompt "Hungry Hungry Hippos: Towards Language Modeling With State Space Models is a new language model that" --dmodel 768 --nlayer 12 --attn-layer-idx 6 --nheads=12

You should get an output like this (may change due to sampling in the text generation):

Hungry Hungry Hippos: Towards Language Modeling With State Space Models is a new language model that uses state-space models to create a human-like vocabulary that can help improve human understanding and judgment of language. It takes a human's past experience of language, and tries to capture their cognitive patterns. State Space Models helps the researchers make sense of language in its own terms, which helps users learn about their language of choice. State Space Models is used to develop a set of languages for researchers in an effort to help them develop more intelligent language models. The goal is to increase and develop a human-like language model using state space models. It is hoped that it will aid people to do more work to develop a language that is more

Here's the summary of model sizes for each model:

Model	dmodel	nlayer	nheads
125M	768	12	12
355M	1024	24	16
1.3B	2048	24	16
2.7B	2560	32	20

See examples/README.md for examples about how to load all these models and run them!

Acknowledgments

Some of the files related to S4D and HiPPO initialization are adapted from the https://github.com/HazyResearch/state-spaces.

Citation

If you use this codebase, or otherwise found our work valuable, please cite:

@inproceedings{dao2023hungry,
  title={Hungry {H}ungry {H}ippos: Towards Language Modeling with State Space Models},
  author={Dao, Tri and Fu, Daniel Y. and Saab, Khaled K. and Thomas, Armin W.
  and Rudra, Atri and R{\'e}, Christopher},
  booktitle={International Conference on Learning Representations},
  year={2023}
}

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
assets		assets
benchmarks		benchmarks
csrc		csrc
examples		examples
flash-attention @ 33e0860		flash-attention @ 33e0860
src		src
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
Together_README.md		Together_README.md
dockerfile_together		dockerfile_together
local-cfg.yaml		local-cfg.yaml
requirements.txt		requirements.txt
serve_1.3b.sh		serve_1.3b.sh
serve_125m.sh		serve_125m.sh
serve_2.7b.sh		serve_2.7b.sh
serve_355m.sh		serve_355m.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hungry Hungry Hippos (H3)

Code & model release

Loading weights and running inference

Acknowledgments

Citation

About

Releases

Packages

Languages

License

togethercomputer/H3

Folders and files

Latest commit

History

Repository files navigation

Hungry Hungry Hippos (H3)

Code & model release

Loading weights and running inference

Acknowledgments

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages