Skip to content

Latest commit

 

History

History
4 lines (4 loc) · 798 Bytes

README.md

File metadata and controls

4 lines (4 loc) · 798 Bytes

skimlit

  • Replicated a cutting-edge NLP model from the 2017 paper "PubMed 200k RCT" to classify sentences in medical abstracts sequentially, using the dataset of ~200,000 labelled Randomised Controlled Trial (RCT) abstracts to enhance literature review efficiency.
  • Developed and iterated through multiple model architectures including TF-IDF classifiers, deep learning models with various embeddings, and multimodal models, culminating in a final model that significantly aids in structured abstract skimming.
  • Integrated advanced preprocessing and modelling techniques, including Python and spaCy for text segmentation and neural network models for sentence classification, aiming to implement the model in practical applications like browser extensions for real-time literature structuring.