TIM: Teaching LM to Translate with Comparison

⭐ Support ⭐

LLMs: BLOOM-(e.g., BLOOM-1b7, BLOOMZ-7b1-mt), LLaMA-(e.g., LLaMA-7b,LLaMA-13b), LLaMA2-(e.g., LLaMA2-7b,LLaMA2-13b), ChatGLM-(e.g., ChatGLM2-6b)
our Proposed TIM [run_clm.py] and Vanilla Instruct-tuning[run_clm_sft.py], and Set RATE as -1.
LoRA, Tuning with Embedding Fixed, Full Parameters Tuning
Data-streaming
Distributed training with deepspeed ZeRO stage 1/2/3

Please refer our paper for more detail.

⭐ Tips ⭐

[20231215] We added the flash-attention for faster training. we can set --use_flash_attention to active flash-attention.
[20230914] We update the preference loss function of TIM, which makes the training more stable.
[20230914] We fix the bug when using Data Cache (i.e., --streaming=False) for training.
When datastreaming is turned on, it is recommended to shuffle the training data first.
When training with Deepspeed ZeRO stage 1/2, we can set --use_low_cpu_mem=True to save memory usage
After training a model using Deepspeed ZeRO stage3, we need to use sft_reward_training/change_param_name.py to perform a transformation of the model's parameter names before inference.

Quick start

Environment

We develop TIM with HuggingFaces's transformers and Deepspeed-chat.

Requirements:

Python 3.7.9
Pytorch 1.10.0+cu111
Transformers 4.28
accelerate==0.19.0
numpy==1.22.4
deepspeed==0.9.0
scikit-learn
flash-attn==2.0.1

Datasets

Training data: train_data/alpaca_reward.json, train.wmt_hint_dict_revall_alpaca_lm1b.json

An essential ingredient of our method is the construction of samples used to provide comparison signals for model learning. In addition to regular translation data, we construct data used for comparison by introducing dictionary information or translation errors
test data: test_data/wmt22, test_data/flores200

Data Construction for TIM

We modify add_noisy.py in noisy-text.

add noisy

We use the following setting in our paper:

   python add_noise.py data/example --delete_probability 0.15 --replace_probability 0.15  --filler_token '' --permutation_range 1

Then, you can run [run_reward.sh] to get the final training data for TIM.

Instruct Tuning with TIM

We modify run_clm.py and Trainer in transformers, and utils for LoRA in Deepspeed-Chat. In addition to vanilla fine-tuning all model parameters, parameter-efficient fine-tuning methods are specially proposed for large language models such as prefix tuning and LoRA. We adopt three different strategies for tuning the models, listed in descending order from the number of fine-tuned parameters.

(1) LoRA: Tuning with Low-rank Matrices

sft_reward_training/run_lora.sh

   LORA_MODULE_NAME="query_key_value" # for BLOOM
   LORA_MODULE_NAME="q_proj,k_proj,v_proj,o_proj" # for Llama

   --only_optimize_lora    # if True, only optimizing the parameters of LoRA
   --lora_dim 8  
   --lora_alpha 16 
   --lora_droppout 0.05 
   --lora_module_name ${LORA_MODULE_NAME}

(2) FixEmb: Tuning with Embedding Fixed

sft_reward_training/run_fixemb.sh

   --only_optimize_layers "9" "8" "7" "6" "5" "4" "3" "2" "1" "0"

(2) Full: Tuning with Full Parameters

sft_reward_training/run_full.sh

Deepspeed Config

deepspeed_config/ds_config.json, deepspeed_config/ds_config_stage2.json, deepspeed_config/ds_config_stage3.json

Inference

inference/infer_bloom.py, inference/infer_llama.py
inference/run_test_bloomz.sh

   -l            # using LoRA
   --rootmodel   # if LoRA, the path of the foundation model
   --ifhint      # add note indicates no mistakes in the hypothesize
   --ifsample    # if true, use sample else beam search for inference
   --ifreranking # use the preference score to select a preferred hypothesize in candidates
   --vocab       # the dictionary for dict-guided inference
   --reverse     # whether reverse the src language and tgt language when loading the dictionary

Experimental Results

We evaluate TIM's performance on the WMT and FLORES-200 dev-test tasks, comprising four language pairs.

### Citation Please kindly cite our paper if you find it helpful:

@inproceedings{zeng2023tim,
  title={TIM: Teaching LM to Translate with Comparison}, 
  author={Jiali Zeng and Fandong Meng and Yongjing Yin and Jie Zhou},
  booktitle = {ArXiv},
  year      = {2023},
  url = {https://arxiv.org/pdf/2307.04408.pdf}
}

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
data_processing		data_processing
deepspeed_config		deepspeed_config
images		images
inference		inference
noisy-text		noisy-text
sft_reward_flashatt_training		sft_reward_flashatt_training
sft_reward_training		sft_reward_training
test_data		test_data
train_data		train_data
.DS_Store		.DS_Store
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TIM: Teaching LM to Translate with Comparison

Quick start

Environment

Datasets

Data Construction for TIM

Instruct Tuning with TIM

Deepspeed Config

Inference

Experimental Results

About

Releases

Packages

Languages

lemon0830/TIM

Folders and files

Latest commit

History

Repository files navigation

TIM: Teaching LM to Translate with Comparison

Quick start

Environment

Datasets

Data Construction for TIM

Instruct Tuning with TIM

Deepspeed Config

Inference

Experimental Results

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages