Skip to content
Rob Flickenger edited this page Aug 9, 2021 · 1 revision

The biograph stats command calculates various statics on a BioGraph. Specifying a BioGraph reference will include additional statistics.

Note that stats can take a few minutes to calculate, especially for large datasets.

(bg7)$ biograph stats -b HG002.bg -r hs37d5/
Sample:            HG002
NumReads:          643,122,884
NumBases:          94,917,484,152
MaxReadLength:     148
MinReadLength:     103
NumPairedReads:    624,242,066
NumUnpairedReads:  18,880,818
NumPairedBases:    92,140,858,048
NumUnpairedBases:  2,776,626,104
MeanInsertSize:    575.49
MedianInsertSize:  570.00
SDInsertSize:      114.50
EstimatedCoverage: 32.73

As with all biograph commands, you can get additional help by using the --help switch.

(bg7)$ biograph stats --help
usage: stats [-h] -b BG [-r REF] [-s SAMPLE]

Basic QC Stats for a BioGraph file

optional arguments:
  -h, --help            show this help message and exit
  -b BG, --biograph BG  BioGraph file containing an individual
  -r REF, --reference REF
                        Reference genome folder. If not provided, insert size
                        and coverage are not estimated
  -s SAMPLE, --sample SAMPLE
                        Accession id of sample to use. (default=all)

Definitions of the Statistics

  • Sample: The Accession ID of the sample in the BioGraph
  • NumReads: Number of reads in the BioGraph
  • NumBases: Number of bases in the BioGraph
  • MaxReadLength: The number of bases in the longest read
  • MinReadLength: The number of bases in the shortest read
  • NumPairedReads: Number of reads that are paired
  • NumUnpairedReads: Number of reads that are unpaired
  • NumPairedBases: Number of bases in paired reads
  • NumUnpairedBases: Number of bases in unpaired reads
  • MeanInsertSize: Estimated mean insert size (calculated only when --reference is provided)
  • MedianInsertSize: Estimated median insert size (calculated only when --reference is provided)
  • SDInsertSize: Standard deviation of insert sizes (calculated only when --reference is provided)
  • EstimatedCoverage: Estimated sequence coverage (calculated only when --reference is provided)
Clone this wiki locally