Efficient Streaming Language Models with Attention Sinks with Retrieval Augmented Generation

Fork of https://github.com/mit-han-lab/streaming-llm for MIT 6.5940 final project.

Demo

Usage

Environment Setup

conda create -yn streaming python=3.8
conda activate streaming

pip install torch torchvision torchaudio
pip install transformers==4.33.0 accelerate datasets evaluate wandb scikit-learn scipy sentencepiece

conda install -c conda-forge faiss-gpu

python setup.py develop

Run Streaming Llama Chatbot

CUDA_VISIBLE_DEVICES=0 python examples/run_streaming_llama.py  --enable_streaming

Run Streaming Llama Chatbot with RAG

CUDA_VISIBLE_DEVICES=0 python examples/run_streaming_llama.py  --enable_streaming --enable_rag

Name		Name	Last commit message	Last commit date
Latest commit History 54 Commits
assets		assets
data		data
examples		examples
figures		figures
streaming_llm		streaming_llm
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Efficient Streaming Language Models with Attention Sinks with Retrieval Augmented Generation

Demo

Usage

Environment Setup

Run Streaming Llama Chatbot

Run Streaming Llama Chatbot with RAG

About

Releases

Packages

Languages

License

gchatz22/streaming-llm

Folders and files

Latest commit

History

Repository files navigation

Efficient Streaming Language Models with Attention Sinks with Retrieval Augmented Generation

Demo

Usage

Environment Setup

Run Streaming Llama Chatbot

Run Streaming Llama Chatbot with RAG

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages