-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add CellAssign assignments to SCE object #402
Conversation
This now should be ready for a formal review. I did end up changing the predictions publishing step to write to the checkpoints directory. I also added a check that the cell type assignments were getting assigned to the correct original barcodes. |
Major caveat that I haven't looked at the code yet, but I did have an immediate thought about this PR comment -
We could actually do the same thing as in
This might help to simplify some of the nextflow code, but it would also involve some backtracking and I don't want that to be too tricky/time-consuming! |
We actually can't do that because the predictions file is output by python, so I don't think we can create an rds file. |
argh, yes, this is a python library...oh well! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks pretty good to me overall! I left some higher-level comments throughout, and after this I'll head to test the workflow with these changes. Let me know if you disagree with anything I commented!
names_to = "celltype", | ||
values_to = "prediction") |> | ||
dplyr::group_by(barcode) |> | ||
dplyr::slice_max(prediction, n = 1) |> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh this is neat! I was only aware of its less exciting relative dplyr::slice()
modules/classify-celltypes.nf
Outdated
cellassign_ref_file = "${params.cellassign_ref_dir}/${it.cellassign_ref_file}", | ||
cellassign_ref_name = it.cellassign_ref_name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd add a really quick comment here for future us, that it is indeed correct that the cellassign_ref_name
needs to be passed in but not the singler_ref_name
.
modules/classify-celltypes.nf
Outdated
@@ -72,24 +94,33 @@ workflow annotate_celltypes { | |||
|
|||
// creates [meta, processed, SingleR reference model] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment needs to be updated with the cellassign parts
modules/classify-celltypes.nf
Outdated
|
||
// creates [meta, processed hdf5, cellassign ref file, cell assign ref name] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// creates [meta, processed hdf5, cellassign ref file, cell assign ref name] | |
// creates [meta, processed rds, processed hdf5, cellassign ref file, cell assign ref name] |
sce$cellassign_max_prediction <- celltype_assignments$prediction | ||
|
||
metadata(sce)$cellassign_predictions <- predictions | ||
metadata(sce)$cellassign_reference <- opt$reference_name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing lines to actually save to RDS below this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yikes... good catch!
@@ -28,16 +27,17 @@ process classify_singleR { | |||
|
|||
process predict_cellassign { | |||
container params.SCPCATOOLS_CONTAINER | |||
publishDir "${params.checkpoints_dir}/celltype/${meta.library_id}", mode: 'copy' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 agreed to send to checkpoint!
This is good with me! We might want to make a note of this somewhere else to make sure it gets into docs? |
Co-authored-by: Stephanie <stephanie.spielman@gmail.com>
Thanks for the review @sjspielman! I addressed all of your comments including adding in the output file check for classify singleR. I think I caught all the comment updates but let me know if I missed anything. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All looks good to me! I suggested a couple spaces for style FYI.
I just started a run through of this workflow, and I want to see that it finished without error before approving. So I'll come back later today and approve once the cell typing has finished! That said, if you ran the workflow at most recent commit, I can cancel this run and approve now.
Sorry for the delay, but yes I ran it with the most recent changes, so we should be good to go! |
Co-authored-by: Stephanie <stephanie.spielman@gmail.com>
Closes #394
This PR adds in a process to incorporate the predictions output from running
CellAssign
to the annotated SCE object. The input here is a SCE file, the predictions file, and the name of the reference used withCellAssign
. This is set up so that the SCE object produced by theSingleR
process is the input of this process. The final annotated object then contains the cell type assignments from bothSingleR
andCellAssign
. I mirrored how we named both the assignments and the metadata fromSingleR
.The only tricky thing here was deciding how to pass the reference name through to this process. In
SingleR
we can add that to the model and store it in the object and then grab it that way. However here, the reference file is just a marker genes tsv file so we can't really store the name there. Instead, I grab it directly from the cell type metadata that we read in. This also means I had to pass it through the initialclassify_cellassign
process even though it isn't actively used there.I also updated this only to publish the final SCE object with both annotations. Do we want to continue publishing both?
Note: This is a draft PR because I'm currently running a test to evaluate this which takes some time to complete. I will update this once that test is complete. I temporarily added a publish step to the predictions step in case the new step I added failed. I want to be able to troubleshoot it easily. Maybe this is an argument for adding in the predictions output to our
checkpoints
folder for any future issues.