- BERT-version Bilinear Attn Networks on VQA-Rad
⚠️ Very quick revision (done in 5 days 😅) so the overall code structure may look ugly. Thank you for your understanding and if you find any bugs, make a PR or open an issue.
- Bilinear Attn Networks: BERT-version
- Downstream tasks: VQA-Rad
- Revised based on sarahESL/PubMedCLIP (2021)
- Explanation: Original BAN model uses normal
nn.Embeddings
initialized with glove 300d and GRU as text encoder.
- Use pretrained Bio-Clinical BERT.
- Train with 2 optimizers, because BAN and BERT require very different learning rates.
- Experiment stats show that pretrained CLIP visual encoder
RN50x4
with our BERT-BAN and the preprocessed image outperforms the original PubMedCLIP ($71.62% \rightarrow 73.17%$ ). With original images, it achieves$72.28%$ . Note that the$71.62%$ is our reproduced score of paper instead of paper's score ($71.8%$ ). - For more details, see 2023 MIS: Final Presentation Slides.
⚠️ It is likely that some settings could still be changed to make the performance better.
- From
Awenbocc/med-vqa/data
you can find theimages
and img pickles. - If you'd like to pickle the data from
images
on your own:- Open
lib/utils/run.sh
. - Configure the
IMAGEPATH
. - Run the
create_resized_images.py
lines to put the new image pickles underDATARADPATH
. - The VQA script reads the image pickles from your
DATARADPATH
so be sure they are placed correctly.
- Open
-
This classifier is used in validation period, where a question is classified into Open or Close, and then sent to different answer pools for the 2nd stage answer classififcation.
-
Please download and unzip
type_classifier_rad_biobert_2023Jun03-155924.pth.zip
for a pretrained type classifier. The BERT model for this type classifier checkpoint isemilyalsentzer/Bio_ClinicalBERT
. -
If the type classifier is corrupted (it seems that uploading it anywhere corrupts it, only
scp
resolves the issue), runtype_classifier.py
in the repo again to train a new one. -
⚠️ The config passed should be the one you will be using in the VQA training. Specifically, make sure the config variableDATASET/EMBEDDER_MODEL
is consistent with the following experiments' config so that their vocab sizes match. -
⚠️ If you'd like to try out other BERT-based models, feel free to change config variableDATASET/EMBEDDER_MODEL
to another huggingface model name, and then train and use your own type classifier.
- Create a virtual env and then
pip install -r requirements.txt
. - Install
torch
series packages following start locally|Pytorch. - Open a config that you'd like to use and check:
- For
TRAIN.VISION.CLIP_PATH
, download the pretrained clip visual encoders here. ReadSarahESL/PubMedCLIP/PubMedCLIP/README.md
formore details. - Change
DATASET.DATA_DIR
to your dataset's path.
- For
- Copy the essentials to this folder from
SarahESL/PubMedCLIP/QCR_PubMedCLIP
if anything is missing. - Run
python3 main.py --cfg={config_path}
- Be sure to use modified configs, namely
configs/qcr_pubmedclip{visual_encoder_name}_ae_rad_nondeterministic_typeatt_2lrs.yaml
. - The changed files from BAN to BERT-BAN are:
configs/
lib/config/default.py
lib/BAN/multi_level_model.py
lib/lngauge/classify_question.py
lib/lngauge/language_model.py
lib/dataset/dataset_RAD_bert.py
- (May be more)
- Beware of your disk space because 1 model checkpoint is roughly 3.6 GB; once your disk space is full the training stops.
We haven't written the test script (supposed to be used for creating the validation file). main/test.py
is used for testing in original repo, so you could modify the eval loop by following main/train.py
, which should be workable.
Make a PR or open an issue for your questions and we may (or may not) deal with it if we find time.