Skip to content

coskunlab/Snowflake

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

51 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Snowflake

This folder contains the scripts and data to reproduce the result in the paper "Spatial Morphoproteomic Features Predict Disease States from Tissue Architectures".

You can find the raw data here:___

Alt text

(A) Raw data obtained from multiplex imaging showing CD20 (Blue), CD21 (yellow), and Ki67 (magenta) staining from tonsil and adenoid tissues with and without COVID infection. Follicles and corresponding germinal centers are segmented using multiplex panels (n=930 follicles and n=775 germinal centers). Single cells are segmented (n=8879749 cells), and cell neighborhood graphs are extracted based on cell spatial location. (B) SNOWFLAKE prediction pipeline combining morphological information with single-cell data. The two modes of the SNOWFLAKE pipeline are the SNOWFLAKE with MorphPCA and position-aware SNOWFLAKE. In SNOWFLAKE with morphPCA, the morphological information and the single-cell data (in the form of a graph) are processed separately, and the processed outputs are fused and further processed in the 'prediction head' to obtain classification probabilities. In position-aware SNOWFLAKE, the morphological information is blended into the single-cell graph through additional node features and edge features; this blended graph is processed (using GNN) and is used to obtain the classification results (graph pooling). (C) Pie graph showing the distribution of follicle database tissue distribution. (D) Pie graph and Sankey plot showing the distribution of NIH-COVID follicle database tissue distribution, COVID status distribution.

Alt text

(A) SNOWFLAKE with MorphPCA: The SNOWFLAKE architecture processes single-cell graphs and tissue morphology using separate pathways that merge in a fusion model. The single-cell graph is processed through a series of convolutional layers, each followed by ReLU activation, DropOut, and Layer Normalization. Simultaneously, tissue morphology is processed through a Multi-Layer Perceptron (MLP) for MorphPCA, involving dense layers with ELU activation and DropOut. The outputs from the Graph Neural Network (GNN) model and MLP are concatenated in the fusion model, which includes additional dense layers with Leaky ReLU activation. The final classification probabilities are generated through a SoftMax activation function. (B) Position-aware SNOWFLAKE: This variant incorporates spatial awareness by integrating tissue morphology directly into the single-cell graph. Similar to the architecture in (A), the graph is processed through convolutional layers with ReLU activation, DropOut, and Layer Normalization. The downstream model consists of dense layers, SeLU activation, and a pooling layer, culminating in a SoftMax activation function for classification. This architecture emphasizes the integration of spatial features to enhance COVID-19 status prediction.

Alt text

(A) Schematic showing data generation pipeline for position-aware SNOWFLAKE pipeline. Each cell's spatial position is extracted, and the spatial graph is generated by considering the single-cell segmentation mask contact. The distance and angle of each node neighbor are calculated using polar transformation and are added as feature vectors in the edge features in the graph. (B) Examples of edge importance (left), node importance (center), and cell type (right) projected in the original spatial domain for a COVID-positive patient’s follicle. (C) Examples of edge importance (left), node importance (center), and cell type (right) projected in the original spatial domain for a COVID-negative patient’s follicle. (D) Bar plot showing the importance of each node level feature for the prediction of patient-level status (left). Swarm plot showing how each node level feature expression influences SNOWFLAKE prediction (right). Each dot in the plot shows the average positive prediction or negative prediction from the node in the graphs with the colormap showing the average expression level of the nodes. (E) Violinplot shows the comparison of single-cell marker expression distribution in follicle regions between COVID-positive and negative samples. Two-sided two-sample t-test is used for calculating p-values.

Organization

Notebooks

"notebooks" folder contains jupyter notebook script used.

Source code

"src" folder contains customs scripts used.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published