Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keep cell type columns when merging #656

Merged
merged 7 commits into from
Jan 17, 2024

Conversation

sjspielman
Copy link
Member

Closes #655

Now in the correct repo! This PR updated merge_sces.R to handle cell type columns better:

  • Find all the cell types that are present in any SCE
  • Define a baseline (taken from scpcaTools default) set of columns to keep as retain_coldata_columns
  • For each cell type that is present in at least 1 SCE, add the relevant cell type columns into retain_coldata_columns
    • For SingleR, I only add singler_celltype_ontology if ontology labels were used for at least one library
  • Finally, tell merge_sce_list about the retain_coldata_columns vector

Moreover, here's how I handle libraries without the given cell type method, but where that method exists in other project libraries:

  • Submitter
    • Add the column submitter_celltype_annotation with the value "Submitter-excluded"
  • SingleR
    • Add the column singler_celltype_annotation with the value "Cell type annotation not performed"
    • If ontologies were used, then I do the same for singler_celltype_ontology
  • CellAssign
    • Add the column cellassign_celltype_annotation with the value "Cell type annotation not performed"
    • Add the column cellassign_max_prediction with the value NA_real_, just to be safe. This is probably overkill since the merge function will add this column with NAs anyways, but I like that it's specifically NA_real_ here

I've tested the code with an SCE list where some, but not all, have cell types and it works as expected!

Let me know what you think of some of those string choices I made. I'll note the conclusions from this PR over in scpca-docs in the issue about merged contents - AlexsLemonade/scpca-docs#219

Copy link
Member

@allyhawkins allyhawkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine. At first I wasn't sure that we need the full text of "Cell type annotation not performed" and we should just leave as NA, but maybe it's good to be explicit. This also distinguishes from SingleR NA results. The only scenario where I expect this to even be an issue is with a project that has a mix of tissue and cell line samples.

@sjspielman
Copy link
Member Author

This also distinguishes from SingleR NA results.

This is the specific case I had in mind!

The only scenario where I expect this to even be an issue is with a project that has a mix of tissue and cell line samples.

Definitely agreed, I think it's unlikely in any other circumstance... unless, like, CellAssign simply refuses to run...

@sjspielman sjspielman merged commit 82ecbe0 into development Jan 17, 2024
3 checks passed
@sjspielman sjspielman deleted the sjspielman/655-keep-celltypes-during-merge branch January 17, 2024 19:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants