Add CellAssign process to main workflow #476

sjspielman · 2023-09-29T20:49:23Z

Towards #415

This PR takes the next steps towards celltyping moving into the main workflow:

I added branching to create two channels in the subworkflow, celltype_input_ch.run and celltype_input_ch.skip, based on whether or not the singler file value is null (is there a better way to check if a value is null here? I couldn't find it!)
- However, I am not currently checking if the file exists. This probably needs to happen somewhere before going into the process, but wanted to get feedback on this branching first before upgrading it.
I then prepared a channel for heading into CellAssign, and updated the CellAssign process itself. I included the same type of branch here as well - skip the cellassign if its file value is null. I again have no checks for the file existing and I am also not checking the reference name. Do we think I should?
I have written the code with the SingleR output files coming along for the right into the CellAssign process, but I left a couple TODO comments about where we would change code to have that not happen. In that case, we'd join these files back up with cell assign output for input to the final process that will add annotations to the SCE, in the next PR.
A couple more TODOs for subsequent PRs are sprinkled in there, too. Let me know if anything about them is unclear.
(Edit) - I also updated the stub celltype file to have some NAs, to ensure the flow of the skipped branches gets "tested".

…itself

…Os in script for where to change strategies

jashapiro

I started with some suggestions, but I ended up realizing we were missing something, which requires a bit more of a rewrite.

I'm recommending changing the output to be a folder, which means a few different things, discussed below. We might want to do that first as a separate PR for the singleR to test things out, even though I made all my suggestions here.

Oh, and checking for null: !thing is usally sufficient (and would be here)
It is true if thing is null, as null is treated as false in logic, unlike in R (as is 0 or "" or [], but you usually want those to be treated as false as well)

jashapiro · 2023-09-29T20:55:13Z

modules/classify-celltypes.nf

+    # Convert SCE to AnnData
+    sce_to_anndata.R \
+        --input_sce_file ${processed_rds} \
+        --output_rna_h5 ${processed_hdf5} 


We aren't outputting this, so the "Nextflow way" would be to just give it a file name for use in the process.

Suggested change

--output_rna_h5 ${processed_hdf5}

--output_rna_h5 anndata.hdf5

jashapiro · 2023-09-29T20:55:31Z

modules/classify-celltypes.nf

+        --input_sce_file ${processed_rds} \
+        --output_rna_h5 ${processed_hdf5} 
+
+    # Run CellAssign
    predict_cellassign.py \
      --input_hdf5_file ${processed_hdf5} \


Following earlier suggestion

Suggested change

--input_hdf5_file ${processed_hdf5} \

--input_hdf5_file anndata.hdf5 \

jashapiro · 2023-09-29T20:56:32Z

modules/classify-celltypes.nf

  script:
-    cellassign_predictions = "${meta.library_id}_predictions.tsv"
+    processed_hdf5 = "${meta.library_id}_processed.hdf5"


Following other suggestions, we shouldn't need this

Suggested change

processed_hdf5 = "${meta.library_id}_processed.hdf5"

jashapiro · 2023-09-29T21:06:25Z

modules/classify-celltypes.nf

+    publishDir (
+        path: "${params.checkpoints_dir}/celltype/${meta.library_id}",
+        mode:  'copy',
+        pattern: "*{_predictions.tsv,.json}" // Only the prediction matrix (tsv) and meta


Noting that we don't actually write the metadata here, but we should. Missed this in the singleR step to.

To do this there are a couple of options: the easiest is probably to actually adjust the output so we are writing out a folder rather than the individual files. This actually makes a few things easier downstream, and removes a bunch if definitions... I will follow with suggestions implementing a version of this.

The first is to modify the pattern (sorry glob!)

Suggested change

pattern: "*{_predictions.tsv,.json}" // Only the prediction matrix (tsv) and meta

pattern: "cellassign"

jashapiro · 2023-09-29T21:35:49Z

modules/classify-celltypes.nf

+        // we only run celltyping for rows with a singler model file
+        // branch here so we have meta and processed sce in the .skip
+        .branch{
+          skip: it[2] == null
+          run: true
+        }


I think we probably want this to be done separately for singler and for cellassign? So I would move this down to be part of the singler_input_ch definition, and then mix it back before going to cellassign step, repeating the logic.

Now I'm going back to my earlier thought about when to assign the files and use NO_FILE, because we could pretty easily use that for this branch logic, and then just pass all files to the singler process.

Pseudocode below: This assumes that there are no nulls, just file() results

singler_input_ch = celltype_input_ch .branch{ skip: it[2].name == "NO_FILE" run: true } classify_singleR(singler_input_ch.run) cellassign_input_ch = classify_singleR.out // add on blank file for skipped singleR results and mix back in .mix(singler_input_ch.skip.map{it.asList() + [ file(empty_file) ] ) .branch{ skip: it[3].name == "NO_FILE" run: true }

jashapiro · 2023-09-29T21:45:29Z

modules/classify-celltypes.nf

  output:
-    tuple val(meta), path(cellassign_predictions), val(ref_name)
+    tuple val(meta), path(processed_rds), path(singler_annotations_tsv), path(singler_full_results), path(cellassign_predictions_tsv)


I'm aspirationally changing the singler annotations output here to assume you have changed that process as well to be similar. Note that the cellassign path is now a constant, which we will create below

Suggested change

tuple val(meta), path(processed_rds), path(singler_annotations_tsv), path(singler_full_results), path(cellassign_predictions_tsv)

tuple val(meta), path(processed_rds), path(singler_dir), path("cellassign")

The updated input: would be:

tuple val(meta), path(processed_rds), path(singler_dir), path(cellassign_ref)

jashapiro · 2023-09-29T21:46:41Z

modules/classify-celltypes.nf

    """
+    # Convert SCE to AnnData


create an output directory

Suggested change

# Convert SCE to AnnData

# create output directory

mkdir -p cellassign

# Convert SCE to AnnData

jashapiro · 2023-09-29T21:50:17Z

modules/classify-celltypes.nf

    predict_cellassign.py \
      --input_hdf5_file ${processed_hdf5} \
-      --output_predictions ${cellassign_predictions} \
-      --reference ${cellassign_reference_mtx} \
+      --output_predictions ${cellassign_predictions_tsv} \


Constant path for output

Suggested change

--output_predictions ${cellassign_predictions_tsv} \

--output_predictions cellassign/predictions.tsv \

jashapiro · 2023-09-29T21:53:16Z

modules/classify-celltypes.nf

      --threads ${task.cpus}
    """


Suggested change

--threads ${task.cpus}

"""

--threads ${task.cpus}

# write out metadata for tracking

echo ${echo ${Utils.makeJson(meta)} > cellassign/scpca-meta.json

"""

jashapiro · 2023-09-29T21:54:01Z

modules/classify-celltypes.nf

+    touch "${cellassign_predictions_tsv}"
+    touch "${processed_hdf5}"


You'll want to update the stub to create the output folder as well

jashapiro · 2023-10-02T11:56:52Z

Ignore most of this review and look at #477 first.

sjspielman · 2023-10-02T15:47:53Z

Given changes in #477 & #478, I'm going to close out this PR and open a fresh one so that the PR is shorter without all the above "deprecated" comments. Bonus, no need to ever find out what conflict horrors await when merging those changes into this branch 😬

sjspielman added 7 commits September 29, 2023 19:12

add some barebones branching; preliminary and untested

5470106

use .run branch in next step

9152ec8

Merge branch 'development' into sjspielman/check_for_singler_reference

85a2245

prepared to enter cellassign process, and updated cellassign process …

47b10b1

…itself

cellassign branch if null

0f3e91a

add NA values into stub to test branching

824b387

some fixes and renaming, bring files along for the ride but leave TOD…

6f893b5

…Os in script for where to change strategies

sjspielman requested a review from jashapiro September 29, 2023 20:51

jashapiro reviewed Sep 29, 2023

View reviewed changes

jashapiro mentioned this pull request Sep 30, 2023

Use output folder for singler #477

Merged

sjspielman closed this Oct 2, 2023

sjspielman deleted the sjspielman/add-cellassign-process branch October 2, 2023 15:48

sjspielman restored the sjspielman/add-cellassign-process branch October 2, 2023 17:09

sjspielman deleted the sjspielman/add-cellassign-process branch October 2, 2023 19:14

sjspielman mentioned this pull request Oct 2, 2023

Add CellAssign process #479

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add CellAssign process to main workflow #476

Add CellAssign process to main workflow #476

sjspielman commented Sep 29, 2023 •

edited

Loading

jashapiro left a comment

jashapiro Sep 29, 2023

jashapiro Sep 29, 2023

jashapiro Sep 29, 2023

jashapiro Sep 29, 2023 •

edited

Loading

jashapiro Sep 29, 2023

jashapiro Sep 29, 2023

jashapiro Sep 29, 2023

jashapiro Sep 29, 2023

jashapiro Sep 29, 2023

jashapiro Sep 29, 2023

jashapiro commented Oct 2, 2023 •

edited

Loading

sjspielman commented Oct 2, 2023

	--output_rna_h5 ${processed_hdf5}
	--output_rna_h5 anndata.hdf5

	--input_hdf5_file ${processed_hdf5} \
	--input_hdf5_file anndata.hdf5 \

	pattern: "*{_predictions.tsv,.json}" // Only the prediction matrix (tsv) and meta
	pattern: "cellassign"

	tuple val(meta), path(processed_rds), path(singler_annotations_tsv), path(singler_full_results), path(cellassign_predictions_tsv)
	tuple val(meta), path(processed_rds), path(singler_dir), path("cellassign")

	--output_predictions ${cellassign_predictions_tsv} \
	--output_predictions cellassign/predictions.tsv \

		touch "${cellassign_predictions_tsv}"
		touch "${processed_hdf5}"

Add CellAssign process to main workflow #476

Add CellAssign process to main workflow #476

Conversation

sjspielman commented Sep 29, 2023 • edited Loading

jashapiro left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jashapiro Sep 29, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jashapiro commented Oct 2, 2023 • edited Loading

sjspielman commented Oct 2, 2023

sjspielman commented Sep 29, 2023 •

edited

Loading

jashapiro Sep 29, 2023 •

edited

Loading

jashapiro commented Oct 2, 2023 •

edited

Loading