Microarray Expression Data Analysis for AML and Normal Cells
This is the project for the "Introduction to Bioinformatics" course at Sharif University of Technology analyzing microarray data. The data is available in the GEO database with the accession number GSE48558.
In this phase, we analyze the given data and plot the results. The data is downloaded using the GEOquery package in R. The data is then normalized using the limma package in R. It is then analyzed using the umap package in R. The results are then visualized using ggplot2 and corrplot.
Then, we used dimensionality reduction techniques such as PCA, TSNE, and MDS to visualize the data. The results are shown below:
PCA | TSNE | MDS |
Finally, we analyzed the correlation between different groups of patients. The results are shown below:
The final report (in persian) is presented in the documentation for phase 1. The code is available in this jupyter notebook.
In this phase, we used the limma package in R to find the differentially expressed genes (DEGs) between the AML and normal cells. We then used the Enricher to find the enriched pathways and gene ontology in the DEGs.
Finally, we investigated different research papers to find the most important pathways and genes in AML cancer.
The final report (in persian) is presented in the documentation for phase 2. The code is available in this jupyter notebook. The data is available in this tsv file.