We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
训练过程中很快出现loss跳变为0的现象,降低学习率无法解决该问题。 配置文件如下: model: arch: st_llm_hf model_type: instructblip_vicuna0 use_grad_checkpoint: True max_txt_len: 256 end_sym: "###" #prompt_path: "prompts/alignment.txt" prompt_template: '###Human: {} ###Assistant: ' llama_model: '/root/qfs/lmm/weights/stllm/pretrained/vicuna-7b-v1.1/' ckpt: '/root/qfs/lmm/weights/stllm/pretrained/instruct_blip_vicuna7b_trimmed.pth' q_former_model: '/root/qfs/lmm/weights/stllm/pretrained/instruct_blip_vicuna7b_trimmed.pth' qformer_text_input: True freeze_LLM: False video_input: "residual" residual_size: 16 use_mask : True mvm_decode: True
datasets: caption_体育240402_en: num_frames: 64
run: task: video_text_it bf16: True tf32: False output_dir: "./output/instructblipbase_stllm_conversation" num_train_epochs: 4 dataloader_num_workers: 2 per_device_train_batch_size: 2 per_device_eval_batch_size: 2 gradient_accumulation_steps: 1 evaluation_strategy: "no"
learning_rate: 1e-10 weight_decay: 0.
warmup_ratio: 0.3 lr_scheduler_type: 'cosine' logging_steps: 1 model_max_length: 1024 save_steps: 3000 #save_strategy: "epoch" save_total_limit: 10 deepspeed: 'stllm/train/zero2.json'
The text was updated successfully, but these errors were encountered:
训练机器是8卡A10040G
Sorry, something went wrong.
你好,可以康康是不是visual encoder,qformer或是LLM初始化出了问题
No branches or pull requests
训练过程中很快出现loss跳变为0的现象,降低学习率无法解决该问题。
配置文件如下:
model:
arch: st_llm_hf
model_type: instructblip_vicuna0
use_grad_checkpoint: True
max_txt_len: 256
end_sym: "###"
#prompt_path: "prompts/alignment.txt"
prompt_template: '###Human: {} ###Assistant: '
llama_model: '/root/qfs/lmm/weights/stllm/pretrained/vicuna-7b-v1.1/'
ckpt: '/root/qfs/lmm/weights/stllm/pretrained/instruct_blip_vicuna7b_trimmed.pth'
q_former_model: '/root/qfs/lmm/weights/stllm/pretrained/instruct_blip_vicuna7b_trimmed.pth'
qformer_text_input: True
freeze_LLM: False
video_input: "residual"
residual_size: 16
use_mask : True
mvm_decode: True
datasets:
caption_体育240402_en:
num_frames: 64
run:
task: video_text_it
bf16: True
tf32: False
output_dir: "./output/instructblipbase_stllm_conversation"
num_train_epochs: 4
dataloader_num_workers: 2
per_device_train_batch_size: 2
per_device_eval_batch_size: 2
gradient_accumulation_steps: 1
evaluation_strategy: "no"
learning_rate: 2e-5
learning_rate: 1e-10
weight_decay: 0.
warmup_ratio: 0.03
warmup_ratio: 0.3
lr_scheduler_type: 'cosine'
logging_steps: 1
model_max_length: 1024
save_steps: 3000
#save_strategy: "epoch"
save_total_limit: 10
deepspeed: 'stllm/train/zero2.json'
deepspeed: 'stllm/train/zero3.json'
deepspeed: 'stllm/train/zero3_offload.json'
The text was updated successfully, but these errors were encountered: