System Requirements

Operating system

A modern 64-bit Linux system is required. Ubuntu 16.04 or RHEL / CentOS 7 or later are supported. Ubuntu 18.04 or RHEL / CentOS 8 or later are recommended.

It is possible to run BioGraph on Windows 10 using the Windows Subsystem for Linux (WSL). However, I/O performance is quite poor on WSL, making it impractical to use for anything but the smallest datasets.

Hardware

BioGraph commands are run on a single server. The server should have at least 100GB of memory and 200GB of temporary space for a 30x WGS human. Larger datasets require more resources. A 100x WGS human works best with at least 120GB RAM and about 400GB temporary space.

BioGraph will take advantage of all system CPUs for processing. There is no specific limit on the number of CPUs required, but performance significantly improves with additional CPUs (assuming sufficient RAM is available to accommodate more threads). The benchmarking in all Spiral Genetics documentation has been done on a 64-core system unless otherwise noted.

The program works with most cluster task managers.

See Optimizing Performance for tips on how to optimize processing time.

Network

Once installed, BioGraph itself does not "phone home" and does not require a connection to the Internet.

A few open source python packages are required for installation. These can be installed in offline mode in environments that do not permit direct Internet access.

Various inputs (such as reads, the genetic reference, additional input VCFs, BioGraph files) should be placed on fast local storage whenever possible. While inputs may be accessed from network file storage, this will negatively impact BioGraph performance. Network performance is therefore critical if network attached storage is used.

Network attached storage should be avoided for temporary scratch space whenever possible. See Optimizing Performance for additional details.

The VDB

The optional BioGraph VDB uses AWS cloud services. All biograph vdb commands require Internet connectivity and specific access to AWS services configured by Spiral.

Cloud instance types

For AWS, we recommend using an r5d.16xlarge or larger, with the NVMe ephemeral SSDs configured as a striped RAID0 for temporary storage.

For GCP, we recommend using an n1-standard-64 or larger, with two or more local SSDs configured as a striped RAID0 for temporary storage.

For Azure, we recommend using a Standard_D64_v3 or Standard_F64s_v2 (or larger), with two or more premium SSD volumes configured as a striped RAID0 for temporary storage.

Python

Python v3.6 or later is required. We recommend managing your Python installation using virtualenv. See Installing BioGraph for details.

Docker support

We can provide a Docker image for use in your private Docker registry. Contact Spiral Genetics for details.

Additional software

These additional open source tools are also required to run the full BioGraph pipeline. They may be installed system-wide or placed in any directory in your PATH.

vcf-sort (from VCFtools)
- Recommended: v0.1.16
bgzip (from samtools)
tabix (from samtools)
bcftools (from samtools)
- Recommended: v1.12

Generating a BioGraph reference requires a FASTA file indexed by BWA. You can also download prebuilt genetic references from AWS S3 at s3://spiral-public/references/.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly