Skip to content
This repository has been archived by the owner on Jun 16, 2023. It is now read-only.

Updated analysis: Annotate SNV table of gene-level mutation frequencies #91

Closed
1 task
logstar opened this issue Jul 7, 2021 · 3 comments
Closed
1 task
Assignees

Comments

@logstar
Copy link

logstar commented Jul 7, 2021

What analysis module should be updated and why?

The snv-frequencies module needs to be updated to generate gene-level SNV frequency tables.

What changes need to be made? Please provide enough detail for another participant to make the update.

Generate gene-level SNV frequency tables.

Add a column for gene_type and annotate kinase, TF, oncogene, TSG per this file. The sources of this file is described at https://github.com/d3b-center/annoFuse#prerequisites-for-cohort-level-analysis.

What input data should be used? Which data were used in the version being updated?

snv-consensus-plus-hotspots.maf.tsv.gz

When do you expect the revised analysis will be completed?

1-2 days.

Who will complete the updated analysis?

@logstar

cc: @jharenza

@logstar logstar self-assigned this Jul 7, 2021
@logstar
Copy link
Author

logstar commented Jul 8, 2021

@jharenza Regarding the gene_types in the genelistreference.txt, I wonder if I should add a column for each type and use Y/N as values, in order to handle gene symbols that are mapped to more than one types, like the following ones:

> gsb_gtype_df <- read_tsv('../fusion_filtering/references/genelistreference.txt',
+                          col_types = cols(.default = col_guess()))
> gsb_gtype_df %>%
+   group_by(Gene_Symbol) %>%
+   summarise(n_uniq_types = length(unique(type)),
+             types = paste(sort(unique(type)), collapse = ',')) %>%
+   filter(n_uniq_types > 1)
# A tibble: 1,166 x 3
   Gene_Symbol n_uniq_types types                       
   <chr>              <int> <chr>                       
 1 ABI1                   2 CosmicCensus,Oncogene       
 2 ABL1                   3 CosmicCensus,Kinase,Oncogene
 3 ABL2                   3 CosmicCensus,Kinase,Oncogene
 4 ACKR3                  2 CosmicCensus,Oncogene       
 5 ACSL3                  2 CosmicCensus,Oncogene       
 6 ACSL6                  2 CosmicCensus,Oncogene       
 7 ACVR1                  2 CosmicCensus,Kinase         
 8 ACVR1B                 2 Kinase,Oncogene             
 9 ACVR1C                 2 Kinase,TumorSuppressorGene  
10 ACVR2A                 3 CosmicCensus,Kinase,Oncogene
# … with 1,156 more rows

If a single gene_type column is preferred, could I use a comma separated list of unique types as values for the genes that have more than one types?

@logstar
Copy link
Author

logstar commented Jul 9, 2021

@jharenza I will proceed with adding a single Gene_type column to the gene-level SNV frequency table. If adding multiple Y/N columns is preferred, I will revise accordingly.

@logstar
Copy link
Author

logstar commented Sep 1, 2021

Closed with d3b-center/OpenPedCan-analysis#45 merged.

@logstar logstar closed this as completed Sep 1, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant