Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split up cell types QC report #483

Merged
merged 29 commits into from
Oct 5, 2023

Conversation

sjspielman
Copy link
Member

Closes #481

This PR splits up the cell type report:

  • In the main report, only count tables and UMAPs (including coloring by clusters) are present
  • There is now a supplementary report templates/celltypes_report.rmd that contains diagnostic plots and all the heatmaps. This report is linked from the main QC report, using the relative link as it would appear in the ScPCA download.
  • Since we have a lot of reused code now, and since there were a bunch of functions scattered throughout the celltype subreport, I moved all the functions into a new file templates/qc_utils/celltype_functions.R, and added roxygen comments.

Notable places for feedback:

  • File organization and naming! I placed the celltype report intemplates/ since there was an integration report there too, and similarly this is where I placed qc_utils/. I am not tied to these location choices if we have other good ideas here
  • Is content in the right place? I went off of notes recorded in Move everything but UMAPs and tables for cell type results to a separate report #447, but I was out the day those plans were discussed, so let's ensure I interpreted things correctly.
  • Maybe we should beef up text in the supplementary report? What else might it need?

Copy link
Member

@jashapiro jashapiro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few initial thoughts, some of which I can probably get over:

  • My first thought is that I don't really like using source() to define functions, as I think it can make for extremely confusing namespaces. But I can probably get over that. (I looked into some ideas for creating a namespace... one idea here: https://community.rstudio.com/t/is-it-dangerous-to-create-a-namespace-for-functions-in-a-script-without-a-package/91141 But this is probably too much effort for my own discomfort.)

    • I would try to limit the sourced file to only the functions that are used in more than one place, and try to avoid code that is about formatting (kable & plotting) as I think the indirection makes it harder to do the live adjustments we are probably going to need during revisions.
    • When using sourced files, those files should contain all required library() imports, even if they are potentially redundant. (In this case, SingleCellExperiment, most likely). This is probably controversial, as it could result in overwriting of functions in a non-transparent way, but that is really more being mad at source() again.
  • The celltypes_report.rmd file should probably live inside qc_report(s) (possibly with a bit of renaming... main_qc_report.rmd?)

    • This is because when we are passing files to a process, it is easier to pass the directory that contains all needed files and know that the structure will be preserved, vs passing in each of the component files that we will need, and worrying that nextflow might rename them.
    • Since we need the qc_utils directory that should also live within qc_report or be just a bare file in that directory.
  • As I expect we will want to give the celltype report a dynamic name, we probably want to add a parameter with that value to the QC report and use that for the link.

Finally, in response to your last question: yes, I think we will want to expand the celltype report with a bit more explanation in a few places, but I think that can wait for separate PRs.

templates/qc_report/celltypes_qc.rmd Outdated Show resolved Hide resolved
templates/qc_report/qc_report.rmd Outdated Show resolved Hide resolved
templates/qc_utils/celltype_functions.R Outdated Show resolved Hide resolved
templates/qc_utils/celltype_functions.R Outdated Show resolved Hide resolved
Comment on lines 85 to 93
# kable formatting
knitr::kable(align = "r") |>
kableExtra::kable_styling(
bootstrap_options = "striped",
full_width = FALSE,
position = "left"
) |>
kableExtra::column_spec(2, monospace = TRUE)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might leave the kable formatting out. If we are reusing this style frequently across different table types, it might make sense as a separate function that we define at the top and reuse as needed. (I think we duplicate this a lot)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are reusing this style frequently across different table types, it might make sense as a separate function that we define at the top and reuse as needed. (I think we duplicate this a lot)

💯^💯

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for now I'm splitting out the kable code into its own function, but more generally, see #371 (comment).

templates/qc_utils/celltype_functions.R Outdated Show resolved Hide resolved
templates/qc_utils/celltype_functions.R Outdated Show resolved Hide resolved
templates/qc_utils/celltype_functions.R Outdated Show resolved Hide resolved
templates/qc_utils/celltype_functions.R Outdated Show resolved Hide resolved
@sjspielman
Copy link
Member Author

sjspielman commented Oct 3, 2023

Ok, we're getting some stochastic stub errors; see also - #479 (comment)

@sjspielman
Copy link
Member Author

@jashapiro this should be ready for another look! I've made quite a few changes in response to review so I won't list them all here, but the commits are each very focused 💪 to serve as that "change list." Noting there's also a for loop that made it in for this round but I think it was a judiciously-used one, let me know if you have other thoughts!

Copy link
Member

@jashapiro jashapiro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks fine to me.

I probably would have a few more suggestions if I went through every line, but for the general task of breaking things up, it should be fine.

But, I couldn't really handle the for loop, so I made a very tidy suggestion of an alternative.

templates/qc_report/celltypes_qc.rmd Outdated Show resolved Hide resolved
templates/qc_report/celltypes_qc.rmd Outdated Show resolved Hide resolved
templates/qc_report/qc_utils/celltype_functions.R Outdated Show resolved Hide resolved
#' @param df Data frame to format
#'
#' @return kable table of cell type counts
format_celltype_n_table <- function(df) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is probably the same format we are using everywhere, or at least many many places. I would think a more general title like format_counts_table... but I get that this is just the first crack at this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd favor circling back when we really upgrade tables - #371

templates/qc_report/celltypes_qc.rmd Outdated Show resolved Hide resolved
templates/qc_report/celltypes_supplemental_report.rmd Outdated Show resolved Hide resolved
templates/qc_report/celltypes_supplemental_report.rmd Outdated Show resolved Hide resolved
templates/qc_report/celltypes_supplemental_report.rmd Outdated Show resolved Hide resolved
templates/qc_report/celltypes_supplemental_report.rmd Outdated Show resolved Hide resolved
templates/qc_report/celltypes_qc.rmd Outdated Show resolved Hide resolved
@sjspielman
Copy link
Member Author

Feeling good about #487 with all these green checks!

@sjspielman sjspielman merged commit 2fe22d8 into development Oct 5, 2023
3 checks passed
@sjspielman sjspielman deleted the sjspielman/481-split-up-celltypes-qc branch October 5, 2023 16:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants