Template and example code for paper: Towards Cross-Lingual Generalization of Translation Gender Bias (ACM FaccT 2021)
All files are plain txt, UTF-8 encoding, and each word/sentence was seperated by new line('\n').
word_list:
- occupation word list used in template (187 words): EN, KR, TL
- adjective word list used in template (62 words): EN
- noun word list used in template (68 words): KR, TL
template: sentences given to translators
- EN
- KR
- TL
gold_standard:reference sentences compared with output sentences
- DE
- EN
- PT
Both bleu&bertScore were executed on Linux and Python 3.6+.
bertScore:
You can find our example on Google Colab.
Note
- a GPU is usually neccessary.
- the max length is limited to 510(512 after adding cls/sep) as we used bert-base-multilingual-cased as our default model.
bleu:
Our bleu evalation example can be found here
*Won Ik Cho
*Jiwon Kim
Jaeyeong Yang
Nam Soo Kim
*: equally contributed
If you find this repo useful, please cite this:
@inproceedings{10.1145/3442188.3445907, author = {Cho, Won Ik and Kim, Jiwon and Yang, Jaeyeong and Kim, Nam Soo}, title = {Towards Cross-Lingual Generalization of Translation Gender Bias}, year = {2021}, url = {https://doi.org/10.1145/3442188.3445907}, }