Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cell count variation #12

Open
alxndrkalinin opened this issue May 3, 2023 · 0 comments
Open

Cell count variation #12

alxndrkalinin opened this issue May 3, 2023 · 0 comments

Comments

@alxndrkalinin
Copy link
Member

alxndrkalinin commented May 3, 2023

1. Exploring cell count variability

Due to strong presence of per-batch/plate patterns in cell count (CC) visualizations (#7), we wanted to look if cell count variability has a relationship with position effect retrievability metrics (#9). To do so, we added Metadata_Count_Cells column to metadata (from jump-cellpainting/morphmap@dbbd1c3) and calculated it's coefficient of variation (CoV). We then empirically chose a cutoff value of CoV=0.12 to split ORF into low and high CC variability.

Cell counts Cell count CoV Cell counts split @ CoV=0.12
cell_counts cell_counts_cov cell_counts_cov_thresh

2. Subsetting same ORF, same well mAP based on low vs high cell count variability

Based on low/high variability, we can select ORFs from the subset that we used to calculate mAP for raw and baseline-corrected data (see #9). Due to low number of samples in same ORF, different well and same well, different ORF, it makes sense to look at same ORF, same well. Columns are the same as in #9:

  • "subset": a subset of raw uncorrected data
  • "subset->correct": a subset of raw profiles that were then corrected by subtracting per-well mean on this subset
  • "correct->subset": a subset of corrected profiles, which were corrected by subtracting per-well mean on full data

2.1 Low cell count variability ORFs

Setting Data mmAP Percent retrieved (p<0.05)
same well, same ORF raw 0.231 0.969 (2519/2600)
same well, same ORF subset->correct 0.177 0.816 (2121/2600)
same well, same ORF correct->subset 0.242 0.7 (1821/2600)
Low CC CoV visualization

low_var_same_same

2.2 High cell count variability ORFs

Setting Data mmAP Percent retrieved (p<0.05)
same well, same ORF raw 0.113 0.737 (776/1053)
same well, same ORF subset->correct 0.134 0.674 (710/1053)
same well, same ORF correct->subset 0.245 0.644 (678/1053)
High CC CoV visualization

high_var_same_same

Metrics on uncorrected data differ substantially between low and high variability subsets. Per-well mean subtraction reduces this difference.

2.3 All ORFs

For the reference, results for from #9
Setting Data mmAP Percent retrieved (p<0.05)
same well, same ORF raw 0.197 0.902 (3295/3653)
same well, same ORF subset->correct 0.165 0.775 (2831/3653)
same well, same ORF correct->subset 0.243 0.684 (2499/3653)

same_well_diff_pert

3. Visualizing distributional relationships between mAP and cell count variability (all ORFs)

Instead of splitting ORFs into low/high variability, we can also plot all their mAPs vs CoVs. There are very few ORFs that have both high cell count variability and mAP values/significance.

mAPs vs CoVs color-coded by p-values
map_vs_ccv

CoVs vs p-values color-coded by mAP values
pval_vs_ccv

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant