Simple implementaion of pytorch Performer.
Performer approximates kernel using random feature map. The kernel expects to replace Transformer's Dot-Product Self attention.
kernel_transformation=softmax_kernel_transformation.
kernel_transformation=relu_kernel_transformation
pretrain masked language model.
- Pretrain file:
/example/train_mlm.py
. - Config file:
/example/config.json
① prepare dataset and vocab you want to train
② check configuration in config.json
③ run /example/train_mlm.py
- Performer performance test
- Write test example
- apply to language model
- evaluate language model