CaseEncoder: A Knowledge-enhanced Pre-trained Model for Legal Case Encoding
Please unzip all .blk.zip files in data
A few parameters in config/Caseformer.config
need to be specified:
train_data_path
,valid_data_path
inconfig/Caseformer.config
is the path of the pre-training and validation dataset, which is not included in this repo due to the space limit. But we provide exactly the same checkpoint of CaseEncoder reported in our paper here.test_kara_dataset
is the test dataset you would like to use. We provide three choices:lecard
,cail-lcr21
, andcail-lcr22
.
Training from the start:
torchrun --standalone --nnodes=1 --nproc_per_node=YOUR_GPU_NUMBER train.py --config config/Caseformer.config --gpu YOUR_GPU_LIST 2>&1 | tee -a log/Caseformer.log
Training from a checkpoint:
torchrun --standalone --nnodes=1 --nproc_per_node=YOUR_GPU_NUMBER train.py --checkpoint YOUR_CHECKPOINT_PATH --config config/Caseformer.config --gpu YOUR_GPU_LIST 2>&1 | tee -a log/Caseformer.log
To validate the checkpoint of CaseEncoder
torchrun --standalone --nnodes=1 --nproc_per_node=1 test.py --checkpoint YOUR_CHECKPOINT_PATH --config config/Caseformer.config --gpu 0 --result YOUR_RESULT_STORAGE_PATH
- where
YOUR_CHECKPOINT_PATH
is the path ofCaseEncoder
checkpoint you download.