- This is a personal repository for studying the Transformer model (“Attention Is All You Need”).
- This includes minor experiments.
- Due to computing resource limitations, a smaller dataset (“Multi30K”) and a smaller model size are used compared to the original paper.
spacy
andtorchtext
libraries are utilized to generate tokens and vocabulary.
- Requirements
- torch (v2.3.0)
- torchtext
- datasets
- scapy
- wandb (optional)
To use tokenizer of spacy
, it need to download the language pipeline manually.
python -m spacy download en_core_web_sm
python -m spacy download de_core_news_sm
- Dataset: Multi30K
- Vocaburary Size: English 5,893 toekens, German 7,853 tokens.
- Make vocaburary with minimum freqeunce of 2.
- Model
d_model
=256,d_ff
=512,n_layers
=3,n_heads
=8- Number of model parameters: ~9M
residual_drop
=0.1,embedding_drop
=0.1.attention_drop
=0.0- In the paper, they don't use dropout on attention weight, but now it is common.
- Training:
batch_size
=128,optimization
='AdamW',lr
=0.0005,weight_decay
=5e-4,label_smoothing
=0.1 - I used EMA(Exponential Moving Average) of model weights additionally.
- For further details, see default arguments in
config.py
.
- Base Model:
ex-01
- Use embedding vector instead of sinusoidal positional encoding:
ex-02
- Dropout on attention weight:
ex-03
- German to English translation:
ex-04
- Don't use label smoothing:
ex-05
- Increase label smoothness:
ex-06
- Adjust number of heads:
ex-07
(8 -> 4),ex-08
(8 -> 16)
Train loss | Validation loss | Validation loss (ema) |
---|---|---|
- BLEU Score is calculated with the best validation loss model and the ema model.
BLEU Score (best valid model) | BLEU Score (ema) |
---|---|
See live charts in wandb project.
# English to German (ex-01~03)
Ground Truth: ein mann mit einem orangefarbenen hut , der etwas anstarrt .
ex-01(best val): ein mann mit einem orangefarbenen hut starrt etwas an .
ex-01(ema): ein mann mit orangefarbener mütze starrt auf etwas .
ex-02(best val): ein mann mit einem orangefarbenen hut starrt etwas an .
ex-02(ema): ein mann mit einem orangefarbenen hut starrt auf etwas zu .
ex-03(best val): ein mann mit einem orangefarbenen hut starrt auf etwas .
ex-03(ema): ein mann mit einem orangefarbenen hut starrt auf etwas .
# German to English (ex-04)
Ground Truth: a man in an orange hat starring at something .
ex-04(best val): <unk> with an orange plastic <unk> <unk> something .
ex-04(ema): <unk> with an orange <unk> cutting something to something .