Skip to content

CVaPR Project - Computer Vision and Pattern Recognition. Classification of tumour data.

Notifications You must be signed in to change notification settings

revalew/CVaPR-Project

Repository files navigation

Classification of tumour data and metastasis

CVaPR Project - Computer Vision and Pattern Recognition. Classification of tumour data.

Summary

For this project I decided to create a CNN - Convolutional Neural Network. The trained classifier will be used to determine whether a patient has "metastasis" (class 0) or "no metastasis" (class 1) based on 105 features.

The CNN model is built based on the output of hyperparameter optimisation scripts using the optuna library. In addition, 3 different feature selection methods are used to TRY to improve the CNN performance. This CNN model is compared with simple classifiers such as SVM or KNN, etc. Later, we tried the feature selection using the ranking method for simple classifiers to see if the results could be improved.

File structure overview

/root/ of the project

  • The cvapr_project_instructions.txt file contains the rough notes on the details of this project (in Polish),

  • The corrections_needed.png file contains the list of things that needed to be corrected before the final report was submitted (in Polish),

  • The corrections_needed_2.png file contains another thing that needed to be corrected before the final report was submitted (in Polish).

./src/ directory

  • One of two main Jupyter Notebooks: cnn_classification.ipynb contains the optimized CNN model (based on the hyperparameter optimisation scripts),

  • Second main notebook is simple_classifier_knn_svm_bayes.ipynb, which contains simple classifiers such as: "RandomForest", "DecisionTree", "SVM", "KNN", "NeuralNet" and "NaiveBayes".

  • As an experiment, I also tried to create a simple DNN - Dense Neural Network, which can be found in the Notebook called dense_network_classification.ipynb.

  • The xtrain_feature_selection/ subdirectory contains script used to test different feature selection methods on x_train FEATURES (after splitting),

    • ranking_method.ipynb - assess the impact of individual features on a model’s performance. In practice, they help select an optimal set of features for machine learning models,

    • wrapper_method.ipynb - choose the best features by testing different combinations and keeping those that enhance model performance,

    • embedded_method.ipynb - feature selection process as part of the model training. Select features during the model learning process, specific to given learning algorithms that perform feature selection while training, such as Lasso,

    • simple_classifier_ranking_method.ipynb - ranking method applied to simple classifiers.

  • The all_feature_selection/ subdirectory contains script used to test different feature selection methods on ALL FEATURES (before splitting),

  • The optimizers/ subdirectory contains script used to optimize hyperparameters with "optuna" library.

    • optimalization_test.ipynb - the test script using one of the models generated by optimizers,

    • hiperparametr_optymalization.ipynb - returns the parameters which can be used to train the optimal network and an image (./models/optimizers/hyperparam_model.png),

    • learning_optimalization.ipynb - returns the optimal model architecture and parameters and an image (./models/optimizers/learning_model.png).

./models/ directory

  • The trained models are stored in the ./models/ directory. Here we can find CNN and DNN models.

  • The xtrain_feature_selection/ subdirectory contains models with different method of selecting features from x_train FEATURES (after splitting),

  • The all_feature_selection/ subdirectory contains models with different method of selecting features from ALL FEATURES (before splitting),

  • The optimizers/ subdirectory contains optimized models generated with "optuna" library.

./data/ directory

  • Data used to train the model is located in this folder:

    • labels_features.csv was the only file used, and it contains both features and labels,

    • features.csv contains only features,

    • labels.csv contains only labels,

    • clinical_radiomics_imported_from_tsv.xlsx is the original spreadsheet used to generate CSV files,

    • ./aditional_resources/ folder contains my personal resources, where I wrote the code and comments used to study machine learning (notebook 7. Deep Learning for Computer Vision.ipynb), as well as various scripts backup.

./final_report directory

  • The CVaPR_report.docx is the final report containing the results of the different classification methods. The CVaPR_report.pdf is a non-editable version of this report.

About

CVaPR Project - Computer Vision and Pattern Recognition. Classification of tumour data.

Resources

Stars

Watchers

Forks