Skip to content
This repository has been archived by the owner on Jun 21, 2023. It is now read-only.

Commit

Permalink
removing rownames from reference files (#836)
Browse files Browse the repository at this point in the history
* removing rownames from reference files

* update the document for reference lists

* add gist

* Update README.md
  • Loading branch information
kgaonkar6 authored Nov 9, 2020
1 parent 9c1a57b commit ed4f9a0
Show file tree
Hide file tree
Showing 3 changed files with 24,531 additions and 24,515 deletions.
20 changes: 18 additions & 2 deletions analyses/fusion_filtering/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,24 @@ We also gather counts for recurrent fusions and fused genes found in more than 3
* pbta-gene-expression-rsem-fpkm.stranded.rds : aggregated stranded fpm data

#### Inputs used as reference
* genelistreference.txt : known kinases, oncogenes, tumor suppressors, curated transcription factors [@doi:10.1016/j.cell.2018.01.029], COSMIC Cancer Gene Census list[https://cancer.sanger.ac.uk/census] . MYBL1 [@doi:10.1073/pnas.1300252110], SNCAIP [@doi:10.1038/nature11327], FOXR2 [@doi:10.1016/j.cell.2016.01.015], TTYH1 [@doi:10.1038/ng.2849], and TERT [@doi:10.1038/ng.3438; @doi:10.1002/gcc.22110; @doi:10.1016/j.canlet.2014.11.057; @doi:10.1007/s11910-017-0722-5] were added to the oncogene list and BCOR [@doi:10.1016/j.cell.2016.01.015] and QKI [@doi:10.1038/ng.3500] were added to the tumor suppressor gene list based on pediatric cancer literature review. IGH-@,IGH@ , IGL-@ and IGL@ were also added to reference list as oncogenic genes because StarFusion output contains these gene symbols instead of IGL/IGH as per public databases.
* fusionreference.txt : known TCGA fusions
* genelistreference.txt and fusionreference.txt formatted in code [here](https://gist.github.com/kgaonkar6/02b3fbcfeeddfa282a1cdf4803704794):

Annotation | File | Source
------ | ---------- | ---------
| pfamID | http://hgdownload.soe.ucsc.edu/goldenPath/hg38/database/pfamDesc.txt.gz | UCSC pfamID Description database |
| Domain Location | http://hgdownload.soe.ucsc.edu/goldenPath/hg38/database/ucscGenePfam.txt.gz | UCSC pfamID Description database |
| TCGA fusions | https://tumorfusions.org/PanCanFusV2/downloads/pancanfus.txt.gz | TumorFusions: an integrative resource for cancer-associated transcript fusions PMID: 29099951 |
| Transcription Factors | Table S1 https://ars.els-cdn.com/content/image/1-s2.0-S0092867418301065-mmc2.xlsx | @doi:10.1016/j.cell.2018.01.029
| Oncogenes | http://www.bushmanlab.org/assets/doc/allOnco_Feb2017.tsv | www.bushmanlab.org |
| Tumor suppressor genes (TSGs) | https://bioinfo.uth.edu/TSGene/Human_TSGs.txt?csrt=5027697123997809089 | Tumor Suppressor Gene Database 2.0 PMIDs: 23066107, 26590405 |
| Kinases | http://kinase.com/human/kinome/tables/Kincat_Hsap.08.02.xls | The protein kinase complement of the human genome PMID: 12471243 |
| COSMIC genes | https://cancer.sanger.ac.uk/census | Catalogue of Somatic Mutations in Cancer |
| Pediatric-specific oncogenes | _MYBL1, SNCAIP, FOXR2, TTYH1, TERT_ | doi:10.1073/pnas.1300252110, doi:10.1038/nature11327, doi:10.1016/j.cell.2016.01.015, doi:10.1038/ng.2849, doi:10.1038/ng.3438, doi:10.1002/gcc.22110, doi:10.1016/j.canlet.2014.11.057, doi:10.1007/s11910-017-0722-5 |
| Pediatric-specific TSGs | _BCOR_, _QKI_ | doi:10.1016/j.cell.2016.01.015, doi:10.1038/ng.3500 |

IGH-@,IGH@ , IGL-@ and IGL@ were also added to reference list as oncogenic genes because StarFusion output contains these gene symbols instead of IGL/IGH as per public databases.


* Brain_FPKM_hg38_matrix.txt.zip : GTex brain samples FPKM data
The code to generate genelistreference.txt and fusionreference.txt is available here: https://gist.github.com/kgaonkar6/02b3fbcfeeddfa282a1cdf4803704794#file-format_reference_gene_list-r

Expand Down
Loading

0 comments on commit ed4f9a0

Please sign in to comment.