Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: ADT post-processing fails when cleanTagCounts assumptions are not met #378

Closed
sjspielman opened this issue Jul 13, 2023 · 3 comments
Closed
Assignees

Comments

@sjspielman
Copy link
Member

After the v0.5.2 release 😞 , we have observed that a small amount of datasets fail during post_process_sce specifically because of ADT quality control statistics.
In certain (infrequent!) circumstances, the assumptions of cleanTagCounts(), which uses ambientContibSparse() in the absence of isotype controls, are not met (source).
From the source:

The assumption here is that of sparsity, i.e., no more than \code{prop * nrow(y)} features should be actually present in each cell with a non-zero number of molecules.
This is reasonable for most tag-based applications where we would expect only 1-2 tags (for cell hashing) or a minority of tags (for general CITE-seq) to be present per cell.
Thus, counts for all other features must be driven by ambient contamination, allowing us to estimate a scaling factor for each cell based on the ratio to the ambient profile.

So, we'll need to get a patch in for this. We have discussed taking the following strategy -

  • If discard has NA values, instead use zero.ambient for filtering: cells get "Keep" if zero.ambient is TRUE and "Remove" if it is FALSE.
  • If both discard and zero.ambient have NA values, we might just fail on the post-processing altogether? Or, open for discussion here!

Worth noting that zero.ambient is always returned by cleanTagCounts() regardless of whether any isotype controls are present, so this strategy can be used universally.

CC @jashapiro @allyhawkins

@jashapiro
Copy link
Member

So, we'll need to get a patch in for this. We have discussed taking the following strategy -

  • If discard has NA values, instead use zero.ambient for filtering: cells get "Keep" if zero.ambient is TRUE and "Remove" if it is FALSE.
  • If both discard and zero.ambient have NA values, we might just fail on the post-processing altogether? Or, open for discussion here!

Just for clarification, is this on a cell-by-cell basis? I would expect based on your description that if one discard is NA they all are, but I still might implement this so that we can accommodate NA if it occurs in just some cells. Though as I write that I'm not confident I like it.

@sjspielman
Copy link
Member Author

sjspielman commented Jul 13, 2023

Just for clarification, is this on a cell-by-cell basis?

Yes, it's cell-by-cell. My description may have been misleading, sorry! If one is NA, it's not a guarantee that others are also; there can be NA, FALSE, and TRUE all in one library.

@sjspielman
Copy link
Member Author

Fixed and closed by #379

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants