DocMTAgent

This repository releases the codes and data for the paper -- DelTA: An Online Document-Level Translation Agent Based on Multi-Level Memory.

DelTA: An Online Document-Level Translation Agent Based on Multi-Level Memory

📣 News

[10/10/2024] Our code and dataset for DelTA is released!
[11/10/2024] Our paper is published on arXiv: arXiv:2410.08143!

🔗 Quick Links

About DelTA
File Structure
Requirements
Quick Start
Citation

🤖 About DelTA

DelTA, which is short for Document-levEL Translation Agent, is an online document-level translation agent based on multi-level memory. It consists of the following four memory components:

Proper Noun Records: Maintain a repository of previously encountered proper nouns and their initial translations within the document, ensuring consistency by reusing the same translation for each subsequent occurrence of the same proper noun.
Bilingual Summary: Contain summaries of both the source and target texts, capturing the core meanings and genre characteristics of the documents to enhance translation coherence.
Long-Term Memory: Store contextual sentences in a wide span, from which reletive sentences will be retrieved while translating subsequent sentences.
Short-Term Memory: Store contextual sentences in a narrow span, which will be utilized as demonstration exemplars while translating subsequent sentences.

The Framework of DelTA

📜 File Structure

Directory	Contents
`data/`	Experimental Data
`eval_consistency/`	Scripts of the LTCR-1 metric
`infer/`	Testing scripts
`prompts/`	Prompts for LLMs
`results/`	Testing outputs

🛠️ Requirements

DelTA with Qwen as backbone models is developed with HuggingFaces's transformers, DelTA with GPT as backbone models is developed with OpenAI API

Python 3.9.19
Pytorch 2.4.1+cu121
transformers==4.45
accelerate==0.34.2
spacy==3.7.4
numpy==2.0.2
openai==1.51.2

🚀 Quick Start

Installation

git clone https://github.com/YutongWang1216/DocMTAgent.git
cd DocMTAgent
pip install -r requirements.txt

Inference with DelTA

(1) GPT as backbone models

infer/run_infer_gpt.sh

Make sure to fill in the following parameters before running:

lang=en-zh                         # translation direction, choices=[en-zh,en-de,en-fr,en-ja,zh-en,de-en,fr-en,ja-en]
use_model=gpt35turbo               # GPT model, choices=[gpt35turbo,gpt4omini]
src=/path/to/src/file              # path to source document
ref=/path/to/ref/file              # path to reference document (optional, leave blank if not given)
export API_BASE=                   # base url of the API
export API_KEY=                    # API key

(2) Qwen as backbone models

infer/run_infer_qwen.sh

Make sure to fill in the following parameters before running:

lang=en-zh                         # translation direction, choices=[en-zh,en-de,en-fr,en-ja,zh-en,de-en,fr-en,ja-en]
use_model=qwen2-7b-instruct        # GPT model, choices=[qwen2-7b-instruct,qwen2-72b-instruct]
modelpathroot=/path/to/checkpoint  # path to huggingface model checkpoint
src=/path/to/src/file              # path to source document
ref=/path/to/ref/file              # path to reference document

Calculating LTCR-1 metric scores

eval_consistency/run_eval.sh

Make sure to fill in the following parameters before running:

lang=en-zh                         # translation direction, choices=[en-zh,en-de,en-fr,en-ja,zh-en,de-en,fr-en,ja-en]
src_file=/path/to/src/file         # path to source document (.src file generated by the inference script)
hyp_file=/path/to/hyp/file         # path to hypothesis document (.hyp file generated by the inference script)
output_dir=result/                 # output path of the evaluation results

📝 Citation

If you find this repo useful, please cite our paper as:

@misc{wang2024deltaonlinedocumentleveltranslation,
      title={DelTA: An Online Document-Level Translation Agent Based on Multi-Level Memory}, 
      author={Yutong Wang and Jiali Zeng and Xuebo Liu and Derek F. Wong and Fandong Meng and Jie Zhou and Min Zhang},
      year={2024},
      eprint={2410.08143},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2410.08143}, 
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DocMTAgent

📣 News

🔗 Quick Links

🤖 About DelTA

📜 File Structure

🛠️ Requirements

🚀 Quick Start

Installation

Inference with DelTA

Calculating LTCR-1 metric scores

📝 Citation

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
data		data
eval_consistency		eval_consistency
images		images
infer		infer
prompts		prompts
results		results
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

License

YutongWang1216/DocMTAgent

Folders and files

Latest commit

History

Repository files navigation

DocMTAgent

📣 News

🔗 Quick Links

🤖 About DelTA

📜 File Structure

🛠️ Requirements

🚀 Quick Start

Installation

Inference with DelTA

Calculating LTCR-1 metric scores

📝 Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages