CVaPR Project - Computer Vision and Pattern Recognition. Classification of tumour data.
For this project I decided to create a CNN - Convolutional Neural Network. The trained classifier will be used to determine whether a patient has "metastasis" (class 0) or "no metastasis" (class 1) based on 105 features.
The CNN model is built based on the output of hyperparameter optimisation scripts using the optuna library. In addition, 3 different feature selection methods are used to TRY to improve the CNN performance. This CNN model is compared with simple classifiers such as SVM or KNN, etc. Later, we tried the feature selection using the ranking method for simple classifiers to see if the results could be improved.
-
The
cvapr_project_instructions.txt
file contains the rough notes on the details of this project (in Polish), -
The
corrections_needed.png
file contains the list of things that needed to be corrected before the final report was submitted (in Polish), -
The
corrections_needed_2.png
file contains another thing that needed to be corrected before the final report was submitted (in Polish).
-
One of two main Jupyter Notebooks:
cnn_classification.ipynb
contains the optimized CNN model (based on the hyperparameter optimisation scripts), -
Second main notebook is
simple_classifier_knn_svm_bayes.ipynb
, which contains simple classifiers such as: "RandomForest", "DecisionTree", "SVM", "KNN", "NeuralNet" and "NaiveBayes". -
As an experiment, I also tried to create a simple DNN - Dense Neural Network, which can be found in the Notebook called
dense_network_classification.ipynb
. -
The
xtrain_feature_selection/
subdirectory contains script used to test different feature selection methods onx_train FEATURES (after splitting)
,-
ranking_method.ipynb
- assess the impact of individual features on a model’s performance. In practice, they help select an optimal set of features for machine learning models, -
wrapper_method.ipynb
- choose the best features by testing different combinations and keeping those that enhance model performance, -
embedded_method.ipynb
- feature selection process as part of the model training. Select features during the model learning process, specific to given learning algorithms that perform feature selection while training, such as Lasso, -
simple_classifier_ranking_method.ipynb
- ranking method applied to simple classifiers.
-
-
The
all_feature_selection/
subdirectory contains script used to test different feature selection methods onALL FEATURES (before splitting)
, -
The
optimizers/
subdirectory contains script used to optimize hyperparameters with "optuna" library.-
optimalization_test.ipynb
- the test script using one of the models generated by optimizers, -
hiperparametr_optymalization.ipynb
- returns the parameters which can be used to train the optimal network and an image (./models/optimizers/hyperparam_model.png
), -
learning_optimalization.ipynb
- returns the optimal model architecture and parameters and an image (./models/optimizers/learning_model.png
).
-
-
The trained models are stored in the
./models/
directory. Here we can find CNN and DNN models. -
The
xtrain_feature_selection/
subdirectory contains models with different method of selecting features fromx_train FEATURES (after splitting)
, -
The
all_feature_selection/
subdirectory contains models with different method of selecting features fromALL FEATURES (before splitting)
, -
The
optimizers/
subdirectory contains optimized models generated with "optuna" library.
-
Data used to train the model is located in this folder:
-
labels_features.csv
was the only file used, and it contains both features and labels, -
features.csv
contains only features, -
labels.csv
contains only labels, -
clinical_radiomics_imported_from_tsv.xlsx
is the original spreadsheet used to generate CSV files, -
./aditional_resources/
folder contains my personal resources, where I wrote the code and comments used to study machine learning (notebook7. Deep Learning for Computer Vision.ipynb
), as well as various scripts backup.
-
- The
CVaPR_report.docx
is the final report containing the results of the different classification methods. TheCVaPR_report.pdf
is a non-editable version of this report.