Skip to content
This repository has been archived by the owner on Jun 16, 2023. It is now read-only.

Updated analysis: use annotator CLI in the cnv-frequencies module #124

Closed
1 task done
logstar opened this issue Jul 20, 2021 · 5 comments
Closed
1 task done

Updated analysis: use annotator CLI in the cnv-frequencies module #124

logstar opened this issue Jul 20, 2021 · 5 comments
Assignees

Comments

@logstar
Copy link

logstar commented Jul 20, 2021

What analysis module should be updated and why?

The cnv-frequencies module should be updated to use the annotator CLI at analyses/long-format-table-utils/annotator/annotator-api.R, after

(EDIT 07/26/2021 YZ: removed the requirement to merge d3b-center/OpenPedCan-analysis#52 before working on this issue)

What changes need to be made? Please provide enough detail for another participant to make the update.

Remove annotation related code in analyses/cnv-frequencies/01-cnv-frequencies.py.

Use the long-format table annotator CLI in the cnv-frequencies module with the following steps:

  1. If "Gene_symbol", "Gene_Ensembl_ID", "Disease" (case-sensitive) are not all present in the column names of the table to be annotated, add new columns or rename existing ones to have all these required columns.
  2. Output the table that needs to be annotated in TSV format.
  3. Make sure that the working directory is OpenPedCan-analysis or a subdirectory of OpenPedCan-analysis. This allows the annotator-cli.R to locate the annotator-api.R.
  4. Run the annotator-cli.R script with Rscript --vanilla path/to/annotator-cli.R and proper options. The Rscript command can be invoked by Python (>= 3.5) import subprocess; subprocess.run("Rscript --vanilla analyses/long-format-table-utils/annotator/annotator-cli.R -h".split()). For more information about Python (>= 3.5) subprocess.run, https://docs.python.org/3/library/subprocess.html#subprocess.run.
  5. Read the annotated table TSV file.
  6. Rename, select, and reorder the columns of the annotated table for output in TSV, or JSON, or JSONL formats.

Following is an example usage in the rna-seq-expression-summary-stats module 01-tpm-summary-stats.R.

> getwd()
[1] "/home/rstudio/OpenPedCan-analysis/analyses/rna-seq-expression-summary-stats"
> class(m_tpm_ss_long_tbl)
[1] "tbl_df"     "tbl"        "data.frame"
> colnames(m_tpm_ss_long_tbl)
 [1] "gene_symbol"                          "gene_id"                             
 [3] "cancer_group"                         "cohort"                              
 [5] "tpm_mean"                             "tpm_sd"                              
 [7] "tpm_mean_cancer_group_wise_zscore"    "tpm_mean_gene_wise_zscore"           
 [9] "tpm_mean_cancer_group_wise_quantiles" "n_samples"                           
> 
> renamed_m_tpm_ss_long_tbl <- dplyr::rename(
+   m_tpm_ss_long_tbl, Gene_symbol = gene_symbol, Gene_Ensembl_ID = gene_id,
+   Disease = cancer_group)
> 
> readr::write_tsv(
+   renamed_m_tpm_ss_long_tbl,
+   "../../scratch/renamed_m_tpm_ss_long_tbl.tsv")
> 
> system(paste(
+   "Rscript --vanilla ../long-format-table-utils/annotator/annotator-cli.R",
+   "-r -v -c MONDO,RMTL,EFO",
+   "-i ../../scratch/renamed_m_tpm_ss_long_tbl.tsv",
+   "-o ../../scratch/annotated_renamed_m_tpm_ss_long_tbl.tsv"))
Read ../../scratch/renamed_m_tpm_ss_long_tbl.tsv...
Annotate ../../scratch/renamed_m_tpm_ss_long_tbl.tsv...
Output ../../scratch/annotated_renamed_m_tpm_ss_long_tbl.tsv...
Done.
> 
> annotated_renamed_m_tpm_ss_long_tbl <- readr::read_tsv(
+   "../../scratch/annotated_renamed_m_tpm_ss_long_tbl.tsv",
+   na = character(),
+   col_types = readr::cols(.default = readr::col_guess()))
|==================================================================================================| 100%  222 MB
> m_tpm_ss_long_tbl <- dplyr::rename(
+   annotated_renamed_m_tpm_ss_long_tbl,
+   gene_symbol = Gene_symbol, gene_id = Gene_Ensembl_ID,
+   cancer_group = Disease)
> m_tpm_ss_long_tbl <- dplyr::select(
+   m_tpm_ss_long_tbl, gene_symbol, RMTL, gene_id,
+   cancer_group, EFO, MONDO, n_samples, cohort,
+   tpm_mean, tpm_sd,
+   tpm_mean_cancer_group_wise_zscore, tpm_mean_gene_wise_zscore,
+   tpm_mean_cancer_group_wise_quantiles)

What input data should be used? Which data were used in the version being updated?

  • data/histologies.tsv
  • data/consensus_seg_annotated_cn_autosomes.tsv.gz
  • data/consensus_seg_annotated_cn_x_and_y.tsv.gz
  • analyses/independent-samples/results/independent-specimens.wgs.primary.eachcohort.tsv
  • analyses/independent-samples/results/independent-specimens.wgs.relapse.eachcohort.tsv

When do you expect the revised analysis will be completed?

1-2 days.

Who will complete the updated analysis?

@ewafula

@logstar
Copy link
Author

logstar commented Jul 26, 2021

@ewafula The annotator CLI is merged to the dev branch, so it could be used for your CNV module now.

@logstar logstar removed the blocked label Jul 26, 2021
@ewafula
Copy link
Contributor

ewafula commented Jul 26, 2021 via email

@ewafula
Copy link
Contributor

ewafula commented Jul 27, 2021

@logstar, the long-format-table-utils analysis module annotator might need updating with input files from v7. Works ok with updated CNV module with v6 but not v7. Checked my local branch is up to date.

root@ab9dc5db917d:/home/OpenPedCan-analysis# bash analyses/cnv-frequencies/run-cnv-frequencies-analysis.sh                               
Read analyses/cnv-frequencies/results/consensus_wgs_plus_cnvkit_wxs_autosomes_freq.tsv...                                                                                                  
Annotate analyses/cnv-frequencies/results/consensus_wgs_plus_cnvkit_wxs_autosomes_freq.tsv...                                            
Error: '/home/OpenPedCan-analysis/data/ensg-hugo-rmtl-v1-mapping.tsv' does not exist.                                                                                                      
Execution halted 

@logstar
Copy link
Author

logstar commented Jul 27, 2021

@ewafula Thank you for checking. This will be fixed in d3b-center/OpenPedCan-analysis#66. You could adapt the changes of analyses/long-format-table-utils/annotator/annotator-api.R in your local branch without committing to use it for v7.

@logstar
Copy link
Author

logstar commented Jul 28, 2021

Closed with PR d3b-center/OpenPedCan-analysis#52 merged.

@logstar logstar closed this as completed Jul 28, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants