golden-datasets

Download, extract and format golden datasets to benchmark pipelines

Requirements

GNU make
miniconda3

Usage

Every steps described below are launched using the GNU make command line tool. To process a command in parallel, the -j N option can be used where N is the is the maximum number of jobs that make will run in parallel.

make COMMAND -j NCPUS [VARS]

Command	Description
`download`	Download datasets in `data/datasets.tsv`
`test`	Download test data in `data/datasets.test.tsv`
`bam2fastq`	Convert downloaded `bam` files into `fastq` files
`compress`	Compress downloaded `fastq` files with `pigz`
`clean`	Remove downloaded data

Tutorial

Download

To download datasets listed in data/datasets.tsv, use the make command described below. Here we set to run the download process in parallel 4 at a time.

make download -j 4 INSTALL_DIR=$HOME/golden-datasets

By default, the installation directory is set to the data folder in the main repository aside the dataset tsv files. This behaviour can be change like in the example above where the installation directory is set from the environment variable INSTALL_DIR. Environment variables can be passed to the makecommand line tool as positional arguments.

Several files in those datasets require a synapse account in order to download them. Go to https://www.synapse.org/ in order to create an account. Those credentials will be asked the first time the make command is launched.

Convert and compress downloaded files

Mapping files located in the INSTALL_DIR folder can be converted to fastq with the command below. The tool Picard SamToFastq is used for the conversion.

make bam2fastq -j 4

After the download and conversion, FASTQ files can be compressed with pigz using the command described below

make compress -j 4

License

Licensed under the GNU General Public License v3.0 License.

Name		Name	Last commit message	Last commit date
Latest commit History 54 Commits
data		data
docs		docs
examples		examples
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
README_instructions		README_instructions
environment.yml		environment.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

golden-datasets

Requirements

Usage

Tutorial

Download

Convert and compress downloaded files

License

About

Releases

Packages

Languages

License

hartwigmedical/golden-datasets

Folders and files

Latest commit

History

Repository files navigation

golden-datasets

Requirements

Usage

Tutorial

Download

Convert and compress downloaded files

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages