Add CellAssign assignments to SCE object #402

allyhawkins · 2023-08-10T21:31:31Z

Closes #394

This PR adds in a process to incorporate the predictions output from running CellAssign to the annotated SCE object. The input here is a SCE file, the predictions file, and the name of the reference used with CellAssign. This is set up so that the SCE object produced by the SingleR process is the input of this process. The final annotated object then contains the cell type assignments from both SingleR and CellAssign. I mirrored how we named both the assignments and the metadata from SingleR.

The only tricky thing here was deciding how to pass the reference name through to this process. In SingleR we can add that to the model and store it in the object and then grab it that way. However here, the reference file is just a marker genes tsv file so we can't really store the name there. Instead, I grab it directly from the cell type metadata that we read in. This also means I had to pass it through the initial classify_cellassign process even though it isn't actively used there.

I also updated this only to publish the final SCE object with both annotations. Do we want to continue publishing both?

Note: This is a draft PR because I'm currently running a test to evaluate this which takes some time to complete. I will update this once that test is complete. I temporarily added a publish step to the predictions step in case the new step I added failed. I want to be able to troubleshoot it easily. Maybe this is an argument for adding in the predictions output to our checkpoints folder for any future issues.

allyhawkins · 2023-08-11T16:45:11Z

This now should be ready for a formal review. I did end up changing the predictions publishing step to write to the checkpoints directory. I also added a check that the cell type assignments were getting assigned to the correct original barcodes.

sjspielman · 2023-08-11T17:12:29Z

Major caveat that I haven't looked at the code yet, but I did have an immediate thought about this PR comment -

The only tricky thing here was deciding how to pass the reference name through to this process. In SingleR we can add that to the model and store it in the object and then grab it that way. However here, the reference file is just a marker genes tsv file so we can't really store the name there.

We could actually do the same thing as in SingleR with a bit of a modification to the generate_cellassign_refs.R script. Rather than exporting a TSV, we could export an RDS file which contains a list with two named items:

the reference name
the data frame (formerly the whole TSV)

This might help to simplify some of the nextflow code, but it would also involve some backtracking and I don't want that to be too tricky/time-consuming!

allyhawkins · 2023-08-11T17:21:27Z

We could actually do the same thing as in SingleR with a bit of a modification to the generate_cellassign_refs.R script. Rather than exporting a TSV, we could export an RDS file which contains a list with two named items:

the reference name
the data frame (formerly the whole TSV)
This might help to simplify some of the nextflow code, but it would also involve some backtracking and I don't want that to be too tricky/time-consuming!

We actually can't do that because the predictions file is output by python, so I don't think we can create an rds file.

sjspielman · 2023-08-11T18:02:09Z

We actually can't do that because the predictions file is output by python, so I don't think we can create an rds file.

argh, yes, this is a python library...oh well!

sjspielman

This looks pretty good to me overall! I left some higher-level comments throughout, and after this I'll head to test the workflow with these changes. Let me know if you disagree with anything I commented!

sjspielman · 2023-08-11T18:05:15Z

bin/classify_cellassign.R

+                      names_to = "celltype",
+                      values_to = "prediction") |>
+  dplyr::group_by(barcode) |>
+  dplyr::slice_max(prediction, n = 1) |>


oh this is neat! I was only aware of its less exciting relative dplyr::slice()

sjspielman · 2023-08-11T18:09:05Z

modules/classify-celltypes.nf

+          cellassign_ref_file = "${params.cellassign_ref_dir}/${it.cellassign_ref_file}",
+          cellassign_ref_name = it.cellassign_ref_name


I'd add a really quick comment here for future us, that it is indeed correct that the cellassign_ref_name needs to be passed in but not the singler_ref_name.

sjspielman · 2023-08-11T18:09:36Z

modules/classify-celltypes.nf

@@ -72,24 +94,33 @@ workflow annotate_celltypes {

      // creates [meta, processed, SingleR reference model]


This comment needs to be updated with the cellassign parts

sjspielman · 2023-08-11T18:09:53Z

modules/classify-celltypes.nf


+      // creates [meta, processed hdf5, cellassign ref file, cell assign ref name]


Suggested change

// creates [meta, processed hdf5, cellassign ref file, cell assign ref name]

// creates [meta, processed rds, processed hdf5, cellassign ref file, cell assign ref name]

modules/classify-celltypes.nf

bin/classify_cellassign.R

sjspielman · 2023-08-11T18:19:15Z

bin/classify_cellassign.R

+sce$cellassign_max_prediction <- celltype_assignments$prediction
+
+metadata(sce)$cellassign_predictions <- predictions
+metadata(sce)$cellassign_reference <- opt$reference_name


Missing lines to actually save to RDS below this?

Yikes... good catch!

sjspielman · 2023-08-11T18:20:17Z

modules/classify-celltypes.nf

@@ -28,16 +27,17 @@ process classify_singleR {

 process predict_cellassign {
  container params.SCPCATOOLS_CONTAINER
+  publishDir "${params.checkpoints_dir}/celltype/${meta.library_id}", mode: 'copy'


👍 agreed to send to checkpoint!

sjspielman · 2023-08-11T18:27:55Z

I also updated this only to publish the final SCE object with both annotations. Do we want to continue publishing both?

This is good with me! We might want to make a note of this somewhere else to make sure it gets into docs?

Co-authored-by: Stephanie <stephanie.spielman@gmail.com>

allyhawkins · 2023-08-11T19:58:25Z

Thanks for the review @sjspielman! I addressed all of your comments including adding in the output file check for classify singleR. I think I caught all the comment updates but let me know if I missed anything.

sjspielman

All looks good to me! I suggested a couple spaces for style FYI.

I just started a run through of this workflow, and I want to see that it finished without error before approving. So I'll come back later today and approve once the cell typing has finished! That said, if you ran the workflow at most recent commit, I can cancel this run and approve now.

bin/classify_cellassign.R

bin/classify_SingleR.R

allyhawkins · 2023-08-14T16:37:50Z

I just started a run through of this workflow, and I want to see that it finished without error before approving. So I'll come back later today and approve once the cell typing has finished! That said, if you ran the workflow at most recent commit, I can cancel this run and approve now.

Sorry for the delay, but yes I ran it with the most recent changes, so we should be good to go!

Co-authored-by: Stephanie <stephanie.spielman@gmail.com>

initial cell type assignments script and process

8841d77

allyhawkins marked this pull request as draft August 10, 2023 21:32

allyhawkins added 3 commits August 11, 2023 10:31

switch order of inputs

e65169e

make sure cell type assignments are in the right order

9171578

publish predictions to checkpoints directory

d48ec60

allyhawkins marked this pull request as ready for review August 11, 2023 16:43

allyhawkins requested a review from sjspielman August 11, 2023 16:45

sjspielman reviewed Aug 11, 2023

View reviewed changes

allyhawkins and others added 4 commits August 11, 2023 14:50

Apply suggestions from code review

a828978

Co-authored-by: Stephanie <stephanie.spielman@gmail.com>

export rds

9719001

add output file check to classify SingleR

a3a8d52

update comments

6449ca1

allyhawkins requested a review from sjspielman August 11, 2023 19:58

sjspielman reviewed Aug 14, 2023

View reviewed changes

bin/classify_cellassign.R Outdated Show resolved Hide resolved

bin/classify_cellassign.R Outdated Show resolved Hide resolved

bin/classify_SingleR.R Outdated Show resolved Hide resolved

Apply suggestions from code review

9c97aee

Co-authored-by: Stephanie <stephanie.spielman@gmail.com>

allyhawkins requested a review from sjspielman August 14, 2023 16:38

sjspielman approved these changes Aug 14, 2023

View reviewed changes

allyhawkins merged commit 9efd773 into development Aug 14, 2023
2 checks passed

allyhawkins deleted the allyhawkins/cellassign-assignments branch August 14, 2023 16:44

allyhawkins mentioned this pull request Aug 16, 2023

Incorporate CellAssign predictions into annotated SCE object #394

Closed

sjspielman mentioned this pull request Aug 29, 2023

Change how cellassign celltypes are saved #420

Closed

jashapiro mentioned this pull request Oct 3, 2023

Add CellAssign process #479

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add CellAssign assignments to SCE object #402

Add CellAssign assignments to SCE object #402

allyhawkins commented Aug 10, 2023

allyhawkins commented Aug 11, 2023

sjspielman commented Aug 11, 2023

allyhawkins commented Aug 11, 2023

sjspielman commented Aug 11, 2023

sjspielman left a comment

sjspielman Aug 11, 2023

sjspielman Aug 11, 2023

sjspielman Aug 11, 2023

sjspielman Aug 11, 2023

sjspielman Aug 11, 2023

allyhawkins Aug 11, 2023

sjspielman Aug 11, 2023

sjspielman commented Aug 11, 2023 •

edited

Loading

allyhawkins commented Aug 11, 2023

sjspielman left a comment

allyhawkins commented Aug 14, 2023

		cellassign_ref_file = "${params.cellassign_ref_dir}/${it.cellassign_ref_file}",
		cellassign_ref_name = it.cellassign_ref_name

		@@ -72,24 +94,33 @@ workflow annotate_celltypes {

		// creates [meta, processed, SingleR reference model]


		// creates [meta, processed hdf5, cellassign ref file, cell assign ref name]

	// creates [meta, processed hdf5, cellassign ref file, cell assign ref name]
	// creates [meta, processed rds, processed hdf5, cellassign ref file, cell assign ref name]

Add CellAssign assignments to SCE object #402

Add CellAssign assignments to SCE object #402

Conversation

allyhawkins commented Aug 10, 2023

allyhawkins commented Aug 11, 2023

sjspielman commented Aug 11, 2023

allyhawkins commented Aug 11, 2023

sjspielman commented Aug 11, 2023

sjspielman left a comment

Choose a reason for hiding this comment

sjspielman Aug 11, 2023

Choose a reason for hiding this comment

sjspielman Aug 11, 2023

Choose a reason for hiding this comment

sjspielman Aug 11, 2023

Choose a reason for hiding this comment

sjspielman Aug 11, 2023

Choose a reason for hiding this comment

sjspielman Aug 11, 2023

Choose a reason for hiding this comment

allyhawkins Aug 11, 2023

Choose a reason for hiding this comment

sjspielman Aug 11, 2023

Choose a reason for hiding this comment

sjspielman commented Aug 11, 2023 • edited Loading

allyhawkins commented Aug 11, 2023

sjspielman left a comment

Choose a reason for hiding this comment

allyhawkins commented Aug 14, 2023

sjspielman commented Aug 11, 2023 •

edited

Loading