unable to train #131

riyajatar37003 · 2024-06-18T12:19:50Z

these are steps followed to setup :

pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
git clone https://github.com/texttron/tevatron.git
cd tevatron
git checkout tevatron-v1 also git checkout main
pip install transformers datasets peft
pip install deepspeed accelerate
pip install faiss-cpu
pip install -e .

run the following command to run

python -m torch.distributed.run --nproc_per_node=1 -m tevatron.driver.train
--output_dir retriever-mistral
--model_name_or_path "/Mixtral-7b-instruct"
--lora
--lora_target_modules q_proj,k_proj,v_proj,o_proj,down_proj,up_proj,gate_proj
--save_steps 50
--dataset_name Tevatron/msmarco-passage-aug
--query_prefix "Query: "
--passage_prefix "Passage: "
--pooling eos
--append_eos_token
--normalize
--fp16
--temperature 0.01
--per_device_train_batch_size 4
--gradient_checkpointing
--train_group_size 16
--learning_rate 1e-4
--query_max_len 32
--passage_max_len 156
--num_train_epochs 1
--logging_steps 10
--overwrite_output_dir
--gradient_accumulation_steps 4

i always got this error

/opt/conda/bin/python: Error while finding module specification for 'tevatron.driver.train' (ModuleNotFoundError: No module named 'tevatron.driver')

riyajatar37003 · 2024-06-18T12:32:36Z

[rank0]: raise ValueError(f"Some specified arguments are not used by the HfArgumentParser: {remaining_args}")
[rank0]: ValueError: Some specified arguments are not used by the HfArgumentParser: ['--lora', '--lora_target_modules', 'q_proj,k_proj,v_proj,o_proj,down_proj,up_proj,gate_proj', '--query_prefix', 'Query: ', '--passage_prefix', 'Passage: ', '--pooling', 'eos', '--append_eos_token', '--temperature', '0.01', '--train_group_size', '16', '--query_max_len', '32', '--passage_max_len', '156']

riyajatar37003 · 2024-06-18T13:33:47Z

DDP does not support such use cases in default. You can try to use _set_static_graph() as a workaround if your module graph does not change over iterations.
Parameter at index 447 with name encoder.base_model.model.layers.31.mlp.down_proj.lora_B.default.weight has been marked as ready twice.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

unable to train #131

unable to train #131

riyajatar37003 commented Jun 18, 2024 •

edited

Loading

riyajatar37003 commented Jun 18, 2024

riyajatar37003 commented Jun 18, 2024

unable to train #131

unable to train #131

Comments

riyajatar37003 commented Jun 18, 2024 • edited Loading

riyajatar37003 commented Jun 18, 2024

riyajatar37003 commented Jun 18, 2024

riyajatar37003 commented Jun 18, 2024 •

edited

Loading