-
Notifications
You must be signed in to change notification settings - Fork 0
03_Tutorial_Creating_Masked_References
All script data is stored on RBL_NCI (/data/RBL_NCI/Pipelines/Talon_Flair/), which includes the directory for current pipeline versions
The pipeline requires a single input, a gene_list.txt file:
Usage: create_masked_refs.sh -g genelist
-g Text file location of single gene or multiple genes separated by rows
The gene_list file should be a text file, with each gene name (as determined by gencode v30) listed on a separate row. An example file is located at /data/RBL_NCI/Pipelines/Talon_Flair/testing/example_gene_input.txt.
Determine which version is the latest, and move into this directory:
#review available versions, select the version you want to run
ls /data/RBL_NCI/Pipelines/Talon_Flair/v*
#Move to the directory
cd /data/RBL_NCI/Pipelines/Talon_Flair/[version selected]/RBL_RBL3/build/create_masked_refs/
Move to the tagged, stored, pipeline, provide the gene_list.txt and review the output results.
#Start the pipeline
sh create_masked_refs.sh -g /path/to/gene/list/genelist.txt
#Review the output
cd /data/RBL_NCI/Pipelines/Talon_Flair/dependencies/masked
Your output files will be saved by your gene name. For example if one gene is used the output dir will be Gene1 (DGCR8). If two genes are used it will follow the format of Gene1_Gene2 (DGCR8_IGF2). Within this output dir, you will find the following files:
- hg38_cleanheader.genome
- GeneNames.bed
- GeneNames.fa
- GeneNames_complement.bed
- GeneNames.gtf
To use these masked files, edit the snakemake_config.yaml stored in your /path/to/output/dir at the following locations:
- annotationGTFmasked: "/data/RBL_NCI/Pipelines/Talon_Flair/dependencies/masked/GeneName/GeneNames.gtf"
- annotationFAmasked: "/data/RBL_NCI/Pipelines/Talon_Flair/dependencies/masked/GeneName/GeneNames.fa"
After editing these two paths, the pipeline will pull and use these files with the maskedReference flag is set to "Y".