Skip to content

Official Implementation of "Surgical-VQLA++: Adversarial Contrastive Learning for Calibrated Robust Visual Question Localized-Answering in Robotic Surgery"

License

Notifications You must be signed in to change notification settings

longbai1006/Surgical-VQLAPlus

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Surgical-VQLA++: Adversarial Contrastive Learning for Calibrated Robust VQLA in Robotic Surgery

Long Bai*, Guankun Wang*, Mobarakol Islam*, Lalithkumar Seenivasan, An Wang and Hongliang Ren


Environment

  • PyTorch
  • numpy
  • pandas
  • scipy
  • scikit-learn
  • timm
  • transformers
  • h5py

Directory Setup

In this project, we implement our method using the Pytorch library, the structure is as follows:

  • checkpoints/: Contains trained weights.
  • dataset/
    • bertvocab/
      • v2 : bert tokernizer
    • EndoVis-18-VQLA/ : Each sequence folder follows the same folder structure.
      • seq_1:
        • left_frames: Image frames (left_frames) for each sequence can be downloaded from EndoVIS18 challange.
        • vqla
          • label: Q&A pairs and bounding box label.
          • img_features: Contains img_features extracted from each frame with different patch size.
            • 5x5: img_features extracted with a patch size of 5x5 by ResNet18.
            • frcnn: img_features extracted by Fast-RCNN and ResNet101.
      • ....
      • seq_16
    • EndoVis-17-VQLA/ : 97 frames are selected from EndoVIS17 challange for external validation.
      • left_frames
      • vqla
        • label: Q&A pairs and bounding box label.
        • img_features: Contains img_features extracted from each frame with different patch size.
          • 5x5: img_features extracted with a patch size of 5x5 by ResNet18.
          • frcnn: img_features extracted by Fast-RCNN and ResNet101.
  • models/:
    • CATViLEmbedding.py : our proposed model for VQLA task.
    • DeiTPrediction.py :DeiT encoder-based model for VQLA task.
    • VisualBertResMLP.py : VisualBERT ResMLP encoder from Surgical-VQA.
    • visualBertPrediction.py : VisualBert encoder-based model for VQLA task.
    • VisualBertResMLPPrediction.py : VisualBert ResMLP encoder-based model for VQLA task.
  • dataloader.py
  • train.py
  • utils.py

Dataset

EndoVis17/18-VQLA-Extended.


Run training

  • Train on EndoVis-18-VLQA-Extended
    python train.py --checkpoint_dir /CHECKPOINT_PATH/ --transformer_ver cat --batch_size 32 --epochs 80 --savelog /SAVELOG_PATH/ --detloss giou --claloss focal --uncer True

Evaluation

  • Evaluate on EndoVis17/18-VQLA-Extended
    python train.py --validate True --checkpoint_dir /CHECKPOINT_PATH/ --transformer_ver cat --batch_size 32

About

Official Implementation of "Surgical-VQLA++: Adversarial Contrastive Learning for Calibrated Robust Visual Question Localized-Answering in Robotic Surgery"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages