-
Notifications
You must be signed in to change notification settings - Fork 130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
您好,模型运行,然后立马退出显示运行成功,请问是什么原因 #88
Comments
这是输出的内容:(Xray) root@qzedu-NF5280M6:/data/ymj/XrayGLM-main/XrayGLM-main# bash finetune_XrayGLM.sh |
#! /bin/bash
NUM_WORKERS=1
NUM_GPUS_PER_WORKER=1
MP_SIZE=1
script_path=$(realpath $0)
script_dir=$(dirname $script_path)
main_dir=$(dirname $script_dir)
MODEL_TYPE="XrayGLM"
MODEL_ARGS="--max_source_length 64
--max_target_length 256
--lora_rank 10
--pre_seq_len 4"
#OPTIONS_SAT="SAT_HOME=$1" #"SAT_HOME=/raid/dm/sat_models"
OPTIONS_NCCL="NCCL_DEBUG=info NCCL_IB_DISABLE=0 NCCL_NET_GDR_LEVEL=2"
HOST_FILE_PATH="hostfile"
HOST_FILE_PATH="hostfile_single"
train_data="./data/Xray/openi-zh.json"
eval_data="./data/Xray/openi-zh.json"
gpt_options="
--experiment-name finetune-$MODEL_TYPE
--model-parallel-size ${MP_SIZE}
--mode finetune
--train-iters 300
--resume-dataloader
$MODEL_ARGS
--train-data ${train_data}
--valid-data ${eval_data}
--distributed-backend nccl
--lr-decay-style cosine
--warmup .02
--checkpoint-activations
--save-interval 3000
--eval-interval 10000
--save "./checkpoints"
--split 1
--eval-iters 10
--eval-batch-size 8
--zero-stage 1
--lr 0.0001
--batch-size 8
--skip-init
--fp16
--use_lora
"
run_cmd="${OPTIONS_NCCL} ${OPTIONS_SAT} deepspeed --master_port 16666 --hostfile ${HOST_FILE_PATH} finetune_XrayGLM.py ${gpt_options}"
echo ${run_cmd}
eval ${run_cmd}
set +x
The text was updated successfully, but these errors were encountered: