Source code for the paper "Glancing Future for Simultaneous Machine Translation"
Our method is implemented based on the open-source toolkit Fairseq .
-
Python version = 3.6
-
PyTorch version = 1.7
-
Install fairseq:
git clone https://github.com/ictnlp/Glance-SiMT.git
cd Glance-SiMT
pip install --editable ./
We use the data of IWSLT15 English-Vietnamese (download here), WMT15 German-English (download here).
For WMT15 German-English, we tokenize the corpus via mosesdecoder/scripts/tokenizer/normalize-punctuation.perl and apply BPE with 32K merge operations via subword_nmt/apply_bpe.py.
Then, we process the data into the fairseq format, adding --joined-dictionary
for WMT15 German-English:
SRC=source_language
TGT=target_language
TRAIN_DATA=path_to_training_data
VALID_DATA=path_to_valid_data
TEST_DATA=path_to_test_data
DATA=path_to_processed_data
# add --joined-dictionary for WMT15 German-English
fairseq-preprocess --source-lang ${SRC} --target-lang ${TGT} \
--trainpref ${TRAIN_DATA} --validpref ${VALID_DATA} \
--testpref ${TEST_DATA}\
--destdir ${DATA}
Conduct the glancing future training of HMT on WMT15 German-English task.
export CUDA_VISIBLE_DEVICES=0,1,2,3
TGT_FILE=dit_to_save_training_log
MODELFILE=dir_to_save_model
DATAFILE=dir_to_data
FIRST_READ=1
CANDS_PER_TOKEN=4
python train.py --ddp-backend=no_c10d ${DATAFILE} --arch transformer --share-all-embeddings \
--optimizer adam \
--adam-betas '(0.9, 0.98)' \
--clip-norm 0.0 \
--lr 5e-4 \
--lr-scheduler inverse_sqrt \
--warmup-init-lr 1e-07 \
--warmup-updates 4000 \
--dropout 0.3 \
--encoder-attention-heads 8 \
--decoder-attention-heads 8 \
--criterion label_smoothed_cross_entropy \
--label-smoothing 0.1 \
--left-pad-source False \
--save-dir ${MODELFILE} \
--first-read ${FIRST_READ} \
--cands-per-token ${CANDS_PER_TOKEN} \
--max-tokens 4096 --update-freq 2 \
--max-target-positions 200 \
--curriculum-update 160000 \
--min-future 0.05 \
--skip-invalid-size-inputs-valid-test \
--fp16 \
--save-interval-updates 5000 \
--keep-interval-updates 300 \
--log-interval 10 > train_log/${TGT_FILE} 2>&1 &
Evaluate the model with the following command:
export CUDA_VISIBLE_DEVICES=0
MODELFILE=dir_to_save_model
DATAFILE=dir_to_data
REFERENCE=path_to_reference
python scripts/average_checkpoints.py --inputs ${MODELFILE} --num-update-checkpoints 5 --output ${MODELFILE}/average-model.pt
python generate.py ${DATAFILE} --path ${MODELFILE}/average-model.pt --batch-size 200 --beam 1 --left-pad-source False --fp16 --remove-bpe > pred.out
grep ^H pred.out | cut -f1,3- | cut -c3- | sort -k1n | cut -f2- > pred.translation
multi-bleu.perl -lc ${REFERENCE} < pred.translation
@misc{guo2023glancing,
title={Glancing Future for Simultaneous Machine Translation},
author={Shoutao Guo and Shaolei Zhang and Yang Feng},
year={2023},
eprint={2309.06179},
archivePrefix={arXiv},
primaryClass={cs.CL}
}