Bug: ADT post-processing fails when cleanTagCounts assumptions are not met #378

sjspielman · 2023-07-13T15:50:08Z

After the v0.5.2 release 😞 , we have observed that a small amount of datasets fail during post_process_sce specifically because of ADT quality control statistics.
In certain (infrequent!) circumstances, the assumptions of cleanTagCounts(), which uses ambientContibSparse() in the absence of isotype controls, are not met (source).
From the source:

The assumption here is that of sparsity, i.e., no more than \code{prop * nrow(y)} features should be actually present in each cell with a non-zero number of molecules.
This is reasonable for most tag-based applications where we would expect only 1-2 tags (for cell hashing) or a minority of tags (for general CITE-seq) to be present per cell.
Thus, counts for all other features must be driven by ambient contamination, allowing us to estimate a scaling factor for each cell based on the ratio to the ambient profile.

So, we'll need to get a patch in for this. We have discussed taking the following strategy -

If discard has NA values, instead use zero.ambient for filtering: cells get "Keep" if zero.ambient is TRUE and "Remove" if it is FALSE.
If both discard and zero.ambient have NA values, we might just fail on the post-processing altogether? Or, open for discussion here!

Worth noting that zero.ambient is always returned by cleanTagCounts() regardless of whether any isotype controls are present, so this strategy can be used universally.

CC @jashapiro @allyhawkins

The text was updated successfully, but these errors were encountered:

jashapiro · 2023-07-13T15:58:02Z

So, we'll need to get a patch in for this. We have discussed taking the following strategy -

If discard has NA values, instead use zero.ambient for filtering: cells get "Keep" if zero.ambient is TRUE and "Remove" if it is FALSE.

If both discard and zero.ambient have NA values, we might just fail on the post-processing altogether? Or, open for discussion here!

Just for clarification, is this on a cell-by-cell basis? I would expect based on your description that if one discard is NA they all are, but I still might implement this so that we can accommodate NA if it occurs in just some cells. Though as I write that I'm not confident I like it.

sjspielman · 2023-07-13T16:10:36Z

Just for clarification, is this on a cell-by-cell basis?

Yes, it's cell-by-cell. My description may have been misleading, sorry! If one is NA, it's not a guarantee that others are also; there can be NA, FALSE, and TRUE all in one library.

sjspielman · 2023-07-18T13:29:52Z

Fixed and closed by #379

sjspielman self-assigned this Jul 13, 2023

sjspielman mentioned this issue Jul 13, 2023

update ADT filtering docs AlexsLemonade/scpca-docs#124

Closed

sjspielman mentioned this issue Jul 14, 2023

Fix ADT filtering #379

Merged

sjspielman closed this as completed Jul 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug: ADT post-processing fails when cleanTagCounts assumptions are not met #378

Bug: ADT post-processing fails when cleanTagCounts assumptions are not met #378

sjspielman commented Jul 13, 2023

jashapiro commented Jul 13, 2023

sjspielman commented Jul 13, 2023 •

edited

Loading

sjspielman commented Jul 18, 2023

Bug: ADT post-processing fails when cleanTagCounts assumptions are not met #378

Bug: ADT post-processing fails when cleanTagCounts assumptions are not met #378

Comments

sjspielman commented Jul 13, 2023

jashapiro commented Jul 13, 2023

sjspielman commented Jul 13, 2023 • edited Loading

sjspielman commented Jul 18, 2023

sjspielman commented Jul 13, 2023 •

edited

Loading