Skip to content

Latest commit

 

History

History
141 lines (102 loc) · 4.41 KB

README.md

File metadata and controls

141 lines (102 loc) · 4.41 KB

Llama3-Med

Contents

Installation and Requirements

Please note that our environment requirements are different from LLaVA's environment requirements. We strongly recommend you create the environment from scratch as follows.

  1. Clone this repository and navigate to the folder
git clone https://github.com/standardmodelbio/llama3-med.git
cd llama3-med
  1. Create a conda environment, activate it and install Packages
conda create -n <env-name> python=3.10 -y
conda activate <env-name>
pip install --upgrade pip  # enable PEP 660 support
pip install -e .
  1. Install additional packages
pip install -e ".[train]"

pip install flash-attn --no-build-isolation

Upgrade to the latest code base

git pull
pip install -e .

Get Started

1. Data Preparation

Please refer to the Data Preparation section in our Documenation.

2. Train

Here's an example for training a LMM using Phi-2.

  • Replace data paths with yours in scripts/train/train_phi.sh
  • Replace output_dir with yours in scripts/train/pretrain.sh
  • Replace pretrained_model_path and output_dir with yours in scripts/train/finetune.sh
  • Adjust your GPU ids (localhost) and per_device_train_batch_size in scripts/train/pretrain.sh and scripts/train/finetune.sh
bash scripts/train/train_phi.sh

Important hyperparameters used in pretraining and finetuning are provided below.

Training Stage Global Batch Size Learning rate conv_version
Pretraining 256 1e-3 pretrain
Finetuning 128 2e-5 phi

Tips:

Global Batch Size = num of GPUs * per_device_train_batch_size * gradient_accumulation_steps, we recommand you always keep global batch size and learning rate as above except for lora tuning your model.

conv_version is a hyperparameter used for choosing different chat templates for different LLMs. In the pretraining stage, conv_version is the same for all LLMs, using pretrain. In the finetuning stage, we use

phi for Phi-2, StableLM, Qwen-1.5

llama for TinyLlama, OpenELM

gemma for Gemma

3. Evaluation

Please refer to the Evaluation section in our Documenation.

Launch Demo Locally

If you want to launch the model trained by yourself or us locally, here's an example.

Run inference with the model trained by yourself
from tinyllava.eval.run_tiny_llava import eval_model

model_path = "/absolute/path/to/your/model/"
prompt = "What are the things I should be cautious about when I visit here?"
image_file = "https://llava-vl.github.io/static/images/view.jpg"
conv_mode = "phi" # or llama, gemma, etc

args = type('Args', (), {
    "model_path": model_path,
    "model_base": None,
    "query": prompt,
    "conv_mode": conv_mode,
    "image_file": image_file,
    "sep": ",",
    "temperature": 0,
    "top_p": None,
    "num_beams": 1,
    "max_new_tokens": 512
})()

eval_model(args)

"""
Output: 
XXXXXXXXXXXXXXXXX
"""
Run inference with the model trained by us using huggingface transformers
from transformers import AutoTokenizer, AutoModelForCausalLM

hf_path = 'tinyllava/TinyLLaVA-Phi-2-SigLIP-3.1B'
model = AutoModelForCausalLM.from_pretrained(hf_path, trust_remote_code=True)
model.cuda()
config = model.config
tokenizer = AutoTokenizer.from_pretrained(hf_path, use_fast=False, model_max_length = config.tokenizer_model_max_length,padding_side = config.tokenizer_padding_side)
prompt="What are these?"
image_url="http://images.cocodataset.org/val2017/000000039769.jpg"
output_text, genertaion_time = model.chat(prompt=prompt, image=image_url, tokenizer=tokenizer)

print('model output:', output_text)
print('runing time:', genertaion_time)