Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CellAssign assignments to SCE object #402

Merged
merged 9 commits into from
Aug 14, 2023

Conversation

allyhawkins
Copy link
Member

Closes #394

This PR adds in a process to incorporate the predictions output from running CellAssign to the annotated SCE object. The input here is a SCE file, the predictions file, and the name of the reference used with CellAssign. This is set up so that the SCE object produced by the SingleR process is the input of this process. The final annotated object then contains the cell type assignments from both SingleR and CellAssign. I mirrored how we named both the assignments and the metadata from SingleR.

The only tricky thing here was deciding how to pass the reference name through to this process. In SingleR we can add that to the model and store it in the object and then grab it that way. However here, the reference file is just a marker genes tsv file so we can't really store the name there. Instead, I grab it directly from the cell type metadata that we read in. This also means I had to pass it through the initial classify_cellassign process even though it isn't actively used there.

I also updated this only to publish the final SCE object with both annotations. Do we want to continue publishing both?

Note: This is a draft PR because I'm currently running a test to evaluate this which takes some time to complete. I will update this once that test is complete. I temporarily added a publish step to the predictions step in case the new step I added failed. I want to be able to troubleshoot it easily. Maybe this is an argument for adding in the predictions output to our checkpoints folder for any future issues.

@allyhawkins allyhawkins marked this pull request as draft August 10, 2023 21:32
@allyhawkins allyhawkins marked this pull request as ready for review August 11, 2023 16:43
@allyhawkins
Copy link
Member Author

This now should be ready for a formal review. I did end up changing the predictions publishing step to write to the checkpoints directory. I also added a check that the cell type assignments were getting assigned to the correct original barcodes.

@sjspielman
Copy link
Member

Major caveat that I haven't looked at the code yet, but I did have an immediate thought about this PR comment -

The only tricky thing here was deciding how to pass the reference name through to this process. In SingleR we can add that to the model and store it in the object and then grab it that way. However here, the reference file is just a marker genes tsv file so we can't really store the name there.

We could actually do the same thing as in SingleR with a bit of a modification to the generate_cellassign_refs.R script. Rather than exporting a TSV, we could export an RDS file which contains a list with two named items:

  • the reference name
  • the data frame (formerly the whole TSV)

This might help to simplify some of the nextflow code, but it would also involve some backtracking and I don't want that to be too tricky/time-consuming!

@allyhawkins
Copy link
Member Author

We could actually do the same thing as in SingleR with a bit of a modification to the generate_cellassign_refs.R script. Rather than exporting a TSV, we could export an RDS file which contains a list with two named items:

the reference name
the data frame (formerly the whole TSV)
This might help to simplify some of the nextflow code, but it would also involve some backtracking and I don't want that to be too tricky/time-consuming!

We actually can't do that because the predictions file is output by python, so I don't think we can create an rds file.

@sjspielman
Copy link
Member

We actually can't do that because the predictions file is output by python, so I don't think we can create an rds file.

argh, yes, this is a python library...oh well!

Copy link
Member

@sjspielman sjspielman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks pretty good to me overall! I left some higher-level comments throughout, and after this I'll head to test the workflow with these changes. Let me know if you disagree with anything I commented!

names_to = "celltype",
values_to = "prediction") |>
dplyr::group_by(barcode) |>
dplyr::slice_max(prediction, n = 1) |>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh this is neat! I was only aware of its less exciting relative dplyr::slice()

Comment on lines 84 to 85
cellassign_ref_file = "${params.cellassign_ref_dir}/${it.cellassign_ref_file}",
cellassign_ref_name = it.cellassign_ref_name
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd add a really quick comment here for future us, that it is indeed correct that the cellassign_ref_name needs to be passed in but not the singler_ref_name.

@@ -72,24 +94,33 @@ workflow annotate_celltypes {

// creates [meta, processed, SingleR reference model]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment needs to be updated with the cellassign parts


// creates [meta, processed hdf5, cellassign ref file, cell assign ref name]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// creates [meta, processed hdf5, cellassign ref file, cell assign ref name]
// creates [meta, processed rds, processed hdf5, cellassign ref file, cell assign ref name]

modules/classify-celltypes.nf Show resolved Hide resolved
bin/classify_cellassign.R Show resolved Hide resolved
bin/classify_cellassign.R Outdated Show resolved Hide resolved
bin/classify_cellassign.R Outdated Show resolved Hide resolved
sce$cellassign_max_prediction <- celltype_assignments$prediction

metadata(sce)$cellassign_predictions <- predictions
metadata(sce)$cellassign_reference <- opt$reference_name
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing lines to actually save to RDS below this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yikes... good catch!

@@ -28,16 +27,17 @@ process classify_singleR {

process predict_cellassign {
container params.SCPCATOOLS_CONTAINER
publishDir "${params.checkpoints_dir}/celltype/${meta.library_id}", mode: 'copy'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 agreed to send to checkpoint!

@sjspielman
Copy link
Member

sjspielman commented Aug 11, 2023

I also updated this only to publish the final SCE object with both annotations. Do we want to continue publishing both?

This is good with me! We might want to make a note of this somewhere else to make sure it gets into docs?

@allyhawkins
Copy link
Member Author

Thanks for the review @sjspielman! I addressed all of your comments including adding in the output file check for classify singleR. I think I caught all the comment updates but let me know if I missed anything.

Copy link
Member

@sjspielman sjspielman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All looks good to me! I suggested a couple spaces for style FYI.

I just started a run through of this workflow, and I want to see that it finished without error before approving. So I'll come back later today and approve once the cell typing has finished! That said, if you ran the workflow at most recent commit, I can cancel this run and approve now.

bin/classify_cellassign.R Outdated Show resolved Hide resolved
bin/classify_cellassign.R Outdated Show resolved Hide resolved
bin/classify_SingleR.R Outdated Show resolved Hide resolved
@allyhawkins
Copy link
Member Author

I just started a run through of this workflow, and I want to see that it finished without error before approving. So I'll come back later today and approve once the cell typing has finished! That said, if you ran the workflow at most recent commit, I can cancel this run and approve now.

Sorry for the delay, but yes I ran it with the most recent changes, so we should be good to go!

Co-authored-by: Stephanie <stephanie.spielman@gmail.com>
@allyhawkins allyhawkins merged commit 9efd773 into development Aug 14, 2023
2 checks passed
@allyhawkins allyhawkins deleted the allyhawkins/cellassign-assignments branch August 14, 2023 16:44
@jashapiro jashapiro mentioned this pull request Oct 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants