CytoTRACE2.Rmd

---
title: "Finding Dedifferentiated Cancer Cells"
output:
  html_document: default
  pdf_document: default
date: "2024-05-02"
---

# **Finding Dedifferentiated Cancer Cells**

## Overview
This analysis pipeline focuses on identifying dedifferentiated cancer cells using various bioinformatics tools and techniques, including MAGIC for data imputation and CytoTRACE2 for predicting cellular potency. The following steps walk through the data processing, normalization, visualization, and in-depth analysis to understand cancer cell dedifferentiation.

### Step 1: Loading and Preprocessing Data

# Load the Seurat object from an RDS file
seurat_obj <- readRDS("/path/to/Cancer_cell_in_house.rds")

# Extract the raw expression matrix
expression_matrix <- as.matrix(seurat_obj@assays$RNA@counts)
# Display the first few rows and columns of the matrix
head(expression_matrix[,1:10],10)

# Save the expression matrix to a CSV file for further processing
write.csv(expression_matrix, "/path/to/in_house_oc_data.csv")

### Step 2: Imputation with MAGIC

# Import the imputed data
imputed_data <- read.csv("/path/to/GSE165897_EOC_expression_matrix.csv", row.names = 1)
head(imputed_data[,1:10],10)

# Transpose the matrix for Seurat compatibility
imputed_matrix <- t(imputed_data)

# Extract metadata from the original Seurat object
metadata <- seurat_obj@meta.data

### Step 3: Creating a New Seurat Object with Imputed Data

# Align the cell names between imputed data and metadata
common_cells <- intersect(colnames(imputed_matrix), rownames(metadata))
imputed_matrix <- imputed_matrix[, common_cells]
metadata <- metadata[common_cells, ]

# Create a new Seurat object with the imputed data
new_seurat_obj <- CreateSeuratObject(counts = imputed_matrix)
new_seurat_obj <- AddMetaData(new_seurat_obj, metadata)

### Step 4: Normalization and Dimensionality Reduction

# Normalize data using Seurat's NormalizeData function
new_seurat_obj <- NormalizeData(new_seurat_obj)
# Identify variable features for dimensional reduction
new_seurat_obj <- FindVariableFeatures(new_seurat_obj)
# Scale data for PCA
new_seurat_obj <- ScaleData(new_seurat_obj)
# Perform PCA
new_seurat_obj <- RunPCA(new_seurat_obj)

# Run UMAP for dimensionality reduction and clustering
new_seurat_obj <- RunUMAP(new_seurat_obj, dims = 1:20)
DimPlot(new_seurat_obj, reduction = "umap")

### Step 5: Running CytoTRACE2 for Cellular Potency Prediction

# Run CytoTRACE2 on the Seurat object
cytotrace2_result <- cytotrace2(new_seurat_obj, is_seurat = TRUE, slot_type = "counts", species = 'human', ncores = 10)

# Plot UMAP of potency scores
plots$CytoTRACE2_UMAP

### Step 6: Data Visualization

# Correlation analysis between ROR2 and FZD genes
ggplot(expression_data, aes(x = ROR2, y = FZD8)) +
  geom_point() +
  stat_cor(p.accuracy = 0.001, r.accuracy = 0.01) +
  labs(y = "FZD8 Expression", x = "ROR2 Expression") +
  geom_smooth(method = "lm", color = "red") +
  theme_classic()