skimlit

Replicated a cutting-edge NLP model from the 2017 paper "PubMed 200k RCT" to classify sentences in medical abstracts sequentially, using the dataset of ~200,000 labelled Randomised Controlled Trial (RCT) abstracts to enhance literature review efficiency.
Developed and iterated through multiple model architectures including TF-IDF classifiers, deep learning models with various embeddings, and multimodal models, culminating in a final model that significantly aids in structured abstract skimming.
Integrated advanced preprocessing and modelling techniques, including Python and spaCy for text segmentation and neural network models for sentence classification, aiming to implement the model in practical applications like browser extensions for real-time literature structuring.

Provide feedback