Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skip sample metadata addition to colData if cellhash is present #664

Merged
merged 6 commits into from
Jan 23, 2024

Conversation

allyhawkins
Copy link
Member

When running the multiplexed samples through the workflow, I was getting the following error:

Command error:
  Warning message:
  replacing previous import ‘S4Arrays::makeNindexFromArrayViewport’ by ‘DelayedArray::makeNindexFromArrayViewport’ when loading ‘SummarizedExperiment’ 
  Error in scpcaTools::metadata_to_coldata(sce, join_columns = "library_id") : 
    The specified `join_columns` are producing multiple matches, but only one match is allowed.
  Calls: format_czi -> <Anonymous>
  Execution halted

This is because we are using the sce_to_anndata.R script to convert all objects to AnnData prior to running CellAssign. In a separate process, we use this same script to do the conversion to produce the final AnnData output. However, in that process, we are skipping any multiplexed samples and when running CellAssign, we are keeping multiplexed samples. The issue here is that we have a step in the format_czi function that merges the colData with the sample_metadata using library_id as the join column. This won't work when trying to convert an SCE that contains multiplexed data since the library_id will match up with multiple samples.

To get around this, I added a check in the format_czi function to make sure that cellhash is not an altExp found in the SCE object. If so, no sample metadata is added to the colData. I also updated the check for converting altExp objects to only convert if the feature name was not cellhash.

Copy link
Member

@jashapiro jashapiro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall, but I suggested a tiny restructure to reduce nesting and I think you have the test backward?

bin/sce_to_anndata.R Outdated Show resolved Hide resolved
Comment on lines 138 to 140
if (!(opt$feature_name %in% altExpNames(sce))) {
stop("feature_name must match name of altExp in provided SCE object.")
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you want this condition first, still, then use } else if (opt$feature_name == "cellhash") { to reduce nesting depth.

)
} else {
# warn that the altExp cannot be converted
message(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should be consistent in whether we use warning() or message() here. I guess I would lean toward warning() for both?

allyhawkins and others added 2 commits January 23, 2024 13:54
Co-authored-by: Joshua Shapiro <josh.shapiro@ccdatalab.org>
@allyhawkins
Copy link
Member Author

I updated the order of the if statement and then used warning for both warning messages. This should be ready for another look.

Copy link
Member

@jashapiro jashapiro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, with one spacing update

bin/sce_to_anndata.R Outdated Show resolved Hide resolved
alt_sce,
anndata_file = opt$output_feature_h5
)
} else {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wait, this else is another else... we need to fix that

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nvm, this is fine, it is the nested if! (This is what I was saying about failing first... just helps my brain parse it)

Co-authored-by: Joshua Shapiro <josh.shapiro@ccdatalab.org>
@allyhawkins allyhawkins merged commit e24c08b into main Jan 23, 2024
3 checks passed
@allyhawkins allyhawkins deleted the allyhawkins/no-sample-metadata-for-cellhash branch January 23, 2024 20:19
This was referenced Jan 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants