Recently, there has been a surge in the popularity of deep learning foundation models, particularly in the fields of computer vision and natural language processing. As a result, many milestone works have been proposed, such as Vision Transformers (ViT), Generative Pretrained Transformers (GPT), Contrastive Language-Image Pretraining (CLIP), and Segment Anything (SAM). As the size (number of parameters) of these models grows bigger, the capacity of the models increases while the requirement of assembled data for training also inflates, following the scaling law. However, for specific domains like medicine, the shortage of public availability and quality annotations has been the bottleneck for training large-scale deep learning models. Therefore, a variety of learning paradigms has been researched to overcome the roadblock besides the conventional and monotone routine of finetuning the pre-trained model (e.g., ImageNet pre-trained models) using domain-specific data with labels.
The large-scale pre-trained vision and language models have shown remarkable domain transfer capability on natural images. However, it remains unknown whether this capability can also apply to the medical image domain due to the unique characteristics of medical images. OpenMEDLab showcases the feasibility of transferring knowledge from pre-trained vision and language foundation models to the medical domain via well-engineered medical text prompts or building visual prompts in training (see projects in Foundation Model Prompting for Medical Image Analysis).
In the field of medical image analysis, task-specific models are still the main approaches, especially for real-world applications such as computer-aided disease diagnosis. Developing medical foundation models presents a significant challenge due to the diverse imaging modalities used in medicine. They could differ greatly from natural images and are based on a range of physics-based properties and energy sources, e.g., using light, electrons, lasers, X-rays, ultrasound, nuclear physics, and magnetic resonance. The images produced span multiple scales, ranging from molecules and cells to organ systems and the full body. Therefore, it may be infeasible to develop a unified multi-scale foundation model trained from a combination of these multi-modality images. OpenMEDLab presents a variety of foundation models and their uses in medical image analysis, ranging from modality-specific models to organ and task-specific models (see projects in Pre-trained Medical Image Foundation Models).
Image from "S. Zhang and D. Metaxas. On the Challenges and Perspectives of Foundation Models for Medical Image Analysis. Medical Image Analysis"
Moreover, the medical large language model, PULSE, is released in the OpenMEDLab. It collects a supervised fine-tuning dataset consisting of 4,000,000 data samples, which is equivalent to approximately 9.6 billion tokens, and has qualified experts labeled a reinforcement learning dataset with scores and ranks of responses generated by the model. PULSE is trained on these two datasets. A self-evaluation prompt is added to the reward model training, and the standard PPO framework is further optimized for better performance. It also demonstrates a fine-tuned version of the PULSE in understanding the literature on SARS-COV-2. Plugins for the downstream applications are under development. Quantized, updated versions and larger models are currently under fast development, and please contact us for access (see details in Medical Large Language Models).
OpenMEDLab also encapsulates the advances in the research field of protein engineering (see projects in Protein Engineering). As a pioneering work, we introduce TemPL, a novel deep learning approach for zero-shot prediction of protein stability and activity, harnessing temperature-guided large language modeling. An extensive dataset of 96 million sequence-host bacterial strain optimal growth temperatures (OGTs) and ∆Tm data is assembled for point mutations under consistent experimental conditions. TemPL offers considerable promise for protein engineering applications, facilitating the design of mutation sequences with enhanced stability and activity.
Prompting for medical image classification
Prompting for medical image detection
Prompting for medical 3D image segmentation and localization
Foundation model for pathological image staining [PathoDuet] and classification [BROW]
Foundation model for ultrasound images [DeblurringMIM]