atac-seq-pipeline-test-data

Test data for ENCODE atac-seq-pipeline

Single ended dataset (ENCSR889WQX) and paired end dataset (ENCSR356KRQ) are subsampled down to 1/400 reads.

For genome data (/genome_data) sequences for chr19 and chrM are extracted from hg38 and mm10 and bowtie2 indices are built on them.

How to extract chr19 and chrM from original fasta

$ cd scripts
$ ./subsample_ref_fasta.sh

How to generate reference outputs

Make sure that you have an executable cromwell-30.1.jar in your PATH.
Specify correct file paths in subsample_fastq.sh and run to subsample test samples.

$ cd test_sample
$ ./subsample_fastq.sh

Generate base reference outputs by running the following shell scripts. These test samples are subsampled down to 1/200 reads.

$ cd scripts
$ ./ENCSR356KRQ.sh
$ ./ENCSR889WQX.sh

Wait until 2) is done. Link outputs of 2) to JSON files in test_sample/*.sh, run other shell scripts.

$ cd scripts
$ ./ENCSR356KRQ_disable_tn5_shift.sh
$ ./ENCSR356KRQ_no_dup_removal.sh
$ ./ENCSR356KRQ_no_multimapping.sh
$ ./ENCSR356KRQ_subsample.sh
$ ./ENCSR356KRQ_subsample_xcor.sh
$ ./ENCSR889WQX_disable_tn5_shift.sh
$ ./ENCSR889WQX_no_dup_removal.sh
$ ./ENCSR889WQX_no_multimapping.sh
$ ./ENCSR889WQX_subsample.sh
$ ./ENCSR889WQX_subsample_xcor.sh

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
genome_data		genome_data
input		input
ref_output		ref_output
scripts		scripts
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

atac-seq-pipeline-test-data

How to extract chr19 and chrM from original fasta

How to generate reference outputs

About

Releases

Packages

Languages

leepc12/atac-seq-pipeline-test-data

Folders and files

Latest commit

History

Repository files navigation

atac-seq-pipeline-test-data

How to extract chr19 and chrM from original fasta

How to generate reference outputs

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages