Network-Based Data Analysis course project

Project for the Network-Based Data Analysis course held by Professor Mario Lauria (A.Y. 2023-2024).

Chosen disease

Colorectal cancer (CRC), which includes both colon and rectal cancer, has become a significant health concern. It ranks as the third most commonly diagnosed cancer and the second leading cause of cancer-related deaths, accounting for approximately 9.4% of cancer-related deaths in 2020.

Data utilized

From Gene Expression Omnibus (GEO), the dataset GSE110225 (Vlachavas EI et al) was selected.

Aim and Workflow

This project aims to uncover new findings about Colorectal cancer (CRC) and seeks to advance the understanding of this tumor by establishing a set of variables capable of discriminating between cancer and normal samples. The workflow for this project can be summarized into three main steps:

Exploratory Analysis: In the first part of the workflow, an exploratory data analysis was performed using unsupervised methods.
- Principal Component Analysis (PCA)
- K-means clustering
- Hirechichal clustering
Supervised Learning: The second part involved utilizing supervised learning methods as classifiers to identify the most significant variables.
- Random Forest
- Linear Discriminant Analysis (LDA)
- Lasso and Ridge regression
- SCUDO (Signature-based Clustering for Diagnostic Purposes)
Functional Enrichment Analysis: Finally, it was used the important variables found in the previous step as inputs for various tools that perform Over-Representation Analysis and Network-based Analysis to identify enriched terms.
- Over-Representation Analysis: gProfiler and DAVID
- Network-based Analysis: pathfindR, EnrichNet and STRING

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
Code		Code
Data		Data
Paper		Paper
Report_&_Presentation		Report_&_Presentation
images		images
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Network-Based Data Analysis course project

Chosen disease

Data utilized

Aim and Workflow

About

Releases

Packages

iamandreatonina/Network_based_Data-Analysis-

Folders and files

Latest commit

History

Repository files navigation

Network-Based Data Analysis course project

Chosen disease

Data utilized

Aim and Workflow

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages