Skip to content

03_Tutorial_Creating_Masked_References

Samantha edited this page Jul 28, 2021 · 1 revision

Overview

All script data is stored on RBL_NCI (/data/RBL_NCI/Pipelines/Talon_Flair/), which includes the directory for current pipeline versions

Prepare the required inputs

The pipeline requires a single input, a gene_list.txt file:

Usage: create_masked_refs.sh -g genelist
	-g Text file location of single gene or multiple genes separated by rows

The gene_list file should be a text file, with each gene name (as determined by gencode v30) listed on a separate row. An example file is located at /data/RBL_NCI/Pipelines/Talon_Flair/testing/example_gene_input.txt.

Access the directory

Determine which version is the latest, and move into this directory:

#review available versions, select the version you want to run
ls /data/RBL_NCI/Pipelines/Talon_Flair/v*

#Move to the directory
cd /data/RBL_NCI/Pipelines/Talon_Flair/[version selected]/RBL_RBL3/build/create_masked_refs/

Run the script

Move to the tagged, stored, pipeline, provide the gene_list.txt and review the output results.

#Start the pipeline
sh create_masked_refs.sh -g /path/to/gene/list/genelist.txt

#Review the output
cd /data/RBL_NCI/Pipelines/Talon_Flair/dependencies/masked

Your output files will be saved by your gene name. For example if one gene is used the output dir will be Gene1 (DGCR8). If two genes are used it will follow the format of Gene1_Gene2 (DGCR8_IGF2). Within this output dir, you will find the following files:

  • hg38_cleanheader.genome
  • GeneNames.bed
  • GeneNames.fa
  • GeneNames_complement.bed
  • GeneNames.gtf

Use the masked files

To use these masked files, edit the snakemake_config.yaml stored in your /path/to/output/dir at the following locations:

  • annotationGTFmasked: "/data/RBL_NCI/Pipelines/Talon_Flair/dependencies/masked/GeneName/GeneNames.gtf"
  • annotationFAmasked: "/data/RBL_NCI/Pipelines/Talon_Flair/dependencies/masked/GeneName/GeneNames.fa"

After editing these two paths, the pipeline will pull and use these files with the maskedReference flag is set to "Y".