This repository contains unofficial BERT implmentation using PyTorch Framework.
- To build vocabulary, run following code snippet
python build_vocab.py
- To pretrain BERT model, run following code snippet with options
python main.py \
--mode MODE
--max_len MAX_LEN
--max_pred MAX_PRED
--num_layers NUM_LAYERS
--num_heads NUM_HEADS
--num_segments NUM_SEGMENTS
--hidden_dim HIDDEN_DIM
--ffn_dim FFN_DIM
--dropout DROPOUT
- Finish
build_iter
andmake_instance
logic on large dataset - Apply WordPiece tokenization
- Add Fine-tuning stage