The repository is for the paper: Exploring the Trade-Offs: Unified Large Language Models vs Local Fine-Tuned Models for Highly-Specific Radiology NLI Task, including the code and dataset for reproducing.
The paper is under revision on IEEE-TBD.
Fig. 1: Overview of our workflow. (a) Top panel: Conversion of RadQA dataset to RadQNLI dataset. The highlighted sentence in the context contains the answer to the question. (b) Bottom panel: Utilization of ChatGPT to perform the Natural Language Inference (NLI) task on the generated RadQNLI dataset.litgpt repository for LLaMA family evaluation;
pip install 'litgpt[all]'
See notebooks
See data. Also, you can download on Google Drive.
rouge_02 has been filtered to exclude pairs with low lexical overlap between the question and context sentence, using a ROUGE threshold of 0.2.
The models' responses are available at results.
- 0e: zero-shot.
- 0ecot: zero-shot with CoT.
- 10e: few-shot (10 shots).
- 10ecot: few-shot (10 shots) with CoT.
If you find this work or code useful, please cite:
@article{wu2023exploring,
title={Exploring the trade-offs: Unified large language models vs local fine-tuned models for highly-specific radiology nli task},
author={Wu, Zihao and Zhang, Lu and Cao, Chao and Yu, Xiaowei and Dai, Haixing and Ma, Chong and Liu, Zhengliang and Zhao, Lin and Li, Gang and Liu, Wei and others},
journal={arXiv preprint arXiv:2304.09138},
year={2023}
}