Skip to content

Prediction if patients with symptoms have COVID-19 based on clinical variables (blood related variables, urine related variables, age, etc)

License

Notifications You must be signed in to change notification settings

lucasthim/covid19-prediction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Interpretable Machine Learning for COVID-19 Diagnosis Through Clinical Variables

The main goal is to propose an efficient and also transparent/interpretable ML solution for the diagnosis of suspicious COVID-19 cases, based on some clinical variables.

Secondary goal will be to eliminate least important features while keeping the preditive power of the solution.

Classification algorithms utilized in the solution:

  • Random Forest
  • Logistic Regression
  • Support Vector Machine

Interpretation algorithms:

  • SHAP
  • Regression weights

Original dataset comes from a Kaggle Competition held by Einstein Data4u. It can be found at: https://www.kaggle.com/dataset/e626783d4672f182e7870b1bbe75fae66bdfb232289da0a61f08c2ceb01cab01/tasks?taskId=645

Specific steps adopted on this novel:

  • Remove the least amount of rows and columns in order to eliminate "holes" of missing values in the dataset.
  • Execute a Grid Search with Cross Validation to tune models hyperparameters and obtain preliminare results.
  • Execute a Recursive Feature Elimination (RFE) with Cross Validation. Test models after RFE-CV without tuning hyper parameters.
  • Choose variables to eliminate and execute another Grid Search with remaining variables.
  • Visualize feature weights with feature_importance of linear models and SHAP values of non-linear models.

About the original paper

Link: https://doi.org/10.48011/asba.v2i1.1590

Abstract:

This work proposes an interpretable machine learning approach to diagnose suspected COVID-19 cases based on clinical variables. Results obtained for the proposed models have F-2 measure superior to 0.80 and accuracy superior to 0.85. Interpretation of the linear model feature importance brought insights about the most relevant features. Shapley Additive Explanations were used in the non-linear models. They were able to show the difference between positive and negative patients as well as offer a global interpretability sense of the models.

If you enjoy this work, please cite as :

@article{thimoteo_vellasco_amaral_figueiredo_yokoyama_marques_2020, title={Interpretable Machine Learning for COVID-19 Diagnosis Through Clinical Variables}, DOI={10.48011/asba.v2i1.1590}, journal={Anais do Congresso Brasileiro de Automática 2020}, author={Thimoteo, Lucas M. and Vellasco, Marley M. and Amaral, Jorge M. Do and Figueiredo, Karla and Yokoyama, Cátia Lie and Marques, Erito}, year={2020}, month={Dec}}

About

Prediction if patients with symptoms have COVID-19 based on clinical variables (blood related variables, urine related variables, age, etc)

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published