Top 2% solution (36/2060) based on a baseline model developed by Abhishek Thakur.
- Python 3.9.7
- Pytorch 1.10.1
- Transformers 4.15.0
- The solution is an ensemble of 5 transformer models, each trained on 5 folds: 2x
deberta-large
, 2xdeberta-v3-large
and 1xlongformer-large
. The two versions of deberta models were trained usingdropout=0.1
anddropout=0.15
andmax_len=1024
parameters, while for longformer-large modeldropout=0.1
andmax_len=1536
parameters were used.
- Download tez pytorch trainier library and put it at the root level
- Put
./data
directory at the root level and unzip the files downloaded from Kaggle there. - In order to use deberta v2 or v3, you need to patch transformers library to create a new fast tokenizer using data and instructions from this kaggle dataset.
- Download
microsoft/deberta-large
,microsoft/deberta-v3-large
andallenai/transformer-large
or any other transformer models using nbs/download_model.ipynb and save them in./model
folder. - Create 5 training folds using nbs/creating_folds.ipynb.
Please make sure you run the script from parent directory of ./bin
.
$ sh ./bin/train.sh
To train different models on different folds (0...4) make changes inside the train.sh
file.
The training of each fold should fit into 15GB GPU memory.
-
For testing model ensembling per fold use ensemble_inference_oof.ipynb.
-
This kaggle kernel was used for final submission.