Skip to content

[EMNLP 2023] CodeTransOcean: A Comprehensive Multilingual Benchmark for Code Translation

License

Notifications You must be signed in to change notification settings

WeixiangYAN/CodeTransOcean

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

33 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation


CodeTransOcean, a large-scale comprehensive benchmark that supports the largest variety of programming languages for code translation. CodeTransOcean consists of three novel multilingual datasets, namely, MultilingualTrans supporting translations between multiple popular programming languages, NicheTrans for translating between niche programming languages and popular ones, and LLMTrans for evaluating executability of translated code by large language models (LLMs). CodeTransOcean also includes a novel cross-framework dataset, DLTrans, for translating deep learning code across different frameworks.

Datasets

🤗Hugging Face or Google Drive

Code

The MultilingualTrans, NicheTrans, and DLTrans datasets were experimented with on CodeT5+, and the code is in the CodeT5+ file.

The LLMTrans dataset was experimented with on GPT-3.5, and the code is in the ChatGPT file.

Citation

Please cite the paper if you use the data or code from CodeTransOcean.

@article{yan2023codetransocean,
  title={CodeTransOcean: A Comprehensive Multilingual Benchmark for Code Translation},
  author={Yan, Weixiang and Tian, Yuchen and Li, Yunzhe and Chen, Qian and Wang, Wen},
  journal={arXiv preprint arXiv:2310.04951},
  year={2023}
}

Contact

For questions, please feel free to reach out via email at yanweixiang.ywx@gmail.com.

About

[EMNLP 2023] CodeTransOcean: A Comprehensive Multilingual Benchmark for Code Translation

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published