Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

If I want to continue fine-tuning on your GPT4RoI weight node, how should you design the parameters for train.sh #15

Open
hangzeli08 opened this issue Aug 21, 2023 · 1 comment

Comments

@hangzeli08
Copy link

How can I design WORKDIR and STAGE1WORKDIR if I want to continue fine-tuning on your GPT4RoI weight node,

@jshilong
Copy link
Owner

You should download the weights and merge them with llama.
Then you can do this

mkdir -p exp_name/checkpoint-0

Then move the checkpoints to checkpoint-0

then

WORKDIR=exp_name

export PYTHONPATH=`pwd`:$PYTHONPATH

torchrun --nnodes=1 --nproc_per_node=8 --master_port=25001 \
    gpt4roi/train/train_mem.py \
    --model_name_or_path path_to_vicuna-7b \
    --vision_tower openai/clip-vit-large-patch14 \
    --pretrain_mm_mlp_adapter LLaVA-7b-pretrain-projector-v0-CC3M-595K-original_caption.bin \
    --dataset_config ./gpt4roi/configs/stage2.py \
    --mm_vision_select_layer -2 \
    --mm_use_im_start_end True \
    --bf16 True \
    --output_dir $WORKDIR \
    --num_train_epochs 2 \
    --per_device_train_batch_size 2 \
    --per_device_eval_batch_size 4 \
    --gradient_accumulation_steps 1 \
    --evaluation_strategy "no" \
    --save_strategy "steps" \
    --save_steps 3000 \
    --save_total_limit 1 \
    --learning_rate 2e-5 \
    --weight_decay 0. \
    --warmup_ratio 0.003 \
    --warmup_steps 3000 \
    --fsdp "full_shard auto_wrap" \
    --fsdp_transformer_layer_cls_to_wrap 'LlamaDecoderLayer' \
    --lr_scheduler_type "cosine" \
    --logging_steps 1 \
    --tf32 True \
    --model_max_length 2048 \
    --gradient_checkpointing True \
    --lazy_preprocess True \
    --report_to "none" \
    --seed 0 \
    | tee $WORKDIR/train.log

You can find this logic at

if list(pathlib.Path(training_args.output_dir).glob('checkpoint-*')):

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants