This repo contains our code and configurations for the Kaggle - LLM Science Exam competition. A detailed summary of the solution is posted here. Please refer to the following sections for details on training and dependencies.
Computing resources from Jarvislabs.ai were used. Specifically, models were trained on the following instance:
Ubuntu 20.04.5 LTS (128 GB boot disk) Intel(R) Xeon(R) Silver 4216 CPU @ 2.10GHz (7 vCPUs) 1 x NVIDIA A100 40GB GPU OR 1 x NVIDIA A6000 48GB GPU
I used PyTorch-2.0 image from Jarvislabs.ai, which comes with:
- Python 3.10.11
- CUDA 11.8
- Python packages installation: pip install -r requirements.txt
Please make sure Kaggle API is installed. Then run the following script to download the required datasets:
chmod +x ./setup.sh
./setup.sh
Please note that the above script will create a datasets folder in the directory located one level above the current directory. The external datasets will be downloaded in the datasets folder.
python ./code/train_e_topic.py \
--config-name conf_e_topic_bge \
use_wandb=false \
all_data=false
python ./code/train_e_ranker.py \
--config-name conf_e_ranker \
use_wandb=false \
all_data=false
python ./code/train_r_delta.py \
--config-name conf_r_delta_k1 \
use_wandb=false \
all_data=false
python ./code/train_r_delta.py \
--config-name conf_r_delta_k2_resumed \
use_wandb=false \
all_data=false