bart

bacterial read typer 🧬 🛹 🦠

By Tom Stanton (he/him) 🧑‍🔬

Issues/queries/advice? email me!

Introduction 📖

bart is a bacterial MLST tool for NGS reads, designed to be fast and very easy to use. It uses heuristics to choose the best scheme for your reads and prints results in a standard tab-separated format.

If you found bart helpful, please cite:

bart - BActerial Read Typer
Thomas David Stanton, 2021
https://github.com/tomdstanton/bart

Dependencies 🧰

python >=3.7
kma (use conda)
refseq_masher (use conda)

Installation ⚙️

The last release can be installed via BioConda:

conda install -c bioconda bart

Manual installation

git clone --recursive https://github.com/tomdstanton/bart && cd bart && python setup.py install
conda install -c bioconda kma refseq_masher

Usage 💻

usage: bart input.fq.gz [options] > outfile.tab

--options [defaults]:
  -r {pe,se,ont,int}  read-type (paired/single/nanopore/interleaved)
  -s [scheme]         force scheme, see bart-update -s
  -p [95]             template percent identity cutoff
  -o [input path]     export alleles to fasta
  -k, --keep          keep temporary files
  -a, --alt           consider alternative hits when assigning ST
  -amr [90]           screen for AMR genes, add percid
  -l [cwd]            create logfile
  -t [4]              number of threads
  -v, --verbose       print allele and alt-hits if different from profile
  -vv, --verboser     verbose with percid, coverage and depth
  -q, --quiet         silence messages
  -h, --help          show this help message and exit

I like to test bart on SRA reads like so:

fastq-dump SRR14224855 --split-files --gzip && bart SRR14224855*

MLST of these reads completed in 9.6 seconds on a 4-core laptop.

If you already know the species of your reads, or the specific scheme you would like to use, you can bypass scheme choosing heuristics.

For example if you have Staphylococcus reads, see if the scheme is included:

$ bart-update -s | grep Staphylococcus
Staphylococcus_aureus
Staphylococcus_chromogenes
Staphylococcus_epidermidis
Staphylococcus_haemolyticus
Staphylococcus_hominis
Staphylococcus_lugdunensis
Staphylococcus_pseudintermedius

Now you can run:

bart SRR14224855* -s Staphylococcus_aureus

Output is now a single tab-separated line. Alleles are presented like so:

gene(allele), where the allele is from the matching, or nearest matching profile.
'?' indicates a non-perfect hit
'~' indicates a potential novel hit
'-' indicates no hit.

SRR14224855	Staphylococcus_aureus	9	arcC(3)	aroE(3)	glpF(1)	gmk(1)	pta(1)	tpi(1)	yqiL(10)	clonal_complex(CC1)

Verbose -v prints the top hit allele in square brackets next to the allele number if different from the profile allele. Alternative allele hits that were found will also be printed. This means you can make an informed decision about the ST if there are several near-profile assignments.

SRR14224855	Staphylococcus_aureus	9	arcC(3)346,616	aroE(3)260,415	glpF(1)	gmk(1)85	pta(1)777	tpi(1)269	yqiL(10)816	clonal_complex(CC1)

"Verboser" -vv does the same, but prints mapping data of the top hit in the following format: gene(allele: %identity, %coverage, depth) alternative alleles

or if the top allele hit isn't the same as the assigned profiles: gene(allele)[top hit allele: %identity, %coverage, depth] alternative alleles

SRR14224855	Staphylococcus_aureus	9	arcC(3: 100.00 100.00 40.52)346,616	aroE(3: 100.00 100.00 27.58)260,415	glpF(1: 100.00 100.00 27.84)	gmk(1: 100.00 100.00 24.42)85	pta(1: 100.00 100.00 36.66)777	tpi(1: 100.00 100.00 52.26)269	yqiL(10: 100.00 100.00 44.92)816	clonal_complex(CC1)

The -amr option screens your reads for genes from the NCBI AMRFinderPlus database. This is performed instead of MLST.

The results are printed in a tab-separated format and can be piped to a file:

bart SRR14224855* -amr > SRR14224855_amr.tab

sample	gene	description	length	identity	coverage	depth
SRR14224855	sel27	staphylococcal enterotoxin type 27	753	98.41	100.00	39.25
SRR14224855	sel28	staphylococcal enterotoxin type 28	726	98.90	100.00	20.16
SRR14224855	hlgA	bi-component gamma-hemolysin HlgAB subunit A	930	99.68	100.00	48.64
SRR14224855	icaC	polysaccharide intercellular adhesin biosynthesis/export protein IcaC	1053	99.15	100.00	46.09
SRR14224855	mepA	multidrug efflux MATE transporter MepA	1356	99.78	100.00	56.13
SRR14224855	arsR_pI258	As(III)-sensing metalloregulatory transcriptional repressor ArsR	315	99.68	100.00	40.15
SRR14224855	arsB_pI258	arsenite efflux transporter membrane subunit ArsB	1290	99.84	100.00	53.85
SRR14224855	arsC_thio	thioredoxin-dependent arsenate reductase	396	98.74	100.00	58.92
SRR14224855	mco	multi-copper oxidase Mco	1389	99.71	100.00	36.77
SRR14224855	selX	staphylococcal enterotoxin-like toxin X	612	96.41	100.00	37.06
SRR14224855	aur	zinc metalloproteinase aureolysin	1530	99.15	100.00	82.74
SRR14224855	mecA	PBP2a family beta-lactam-resistant peptidoglycan transpeptidase MecA	2007	99.95	100.00	47.73
SRR14224855	blaZ	penicillin-hydrolyzing class A beta-lactamase BlaZ	846	97.99	100.00	17.73

output truncated

This completed in 3.6 seconds on a 4-core laptop.

bart-update

usage: bart-update [options]

--options [defaults]:
  -s            print available MLST schemes
  -S            -s with genes
  -p            update pubMLST schemes
  -a [ [ ...]]  path to custom scheme fasta and csv
  -r [ [ ...]]  name of scheme(s) to remove
  -amr          update AMR index
  -h            show this help message and exit

You can even add your own schemes to the database! You just need to provide an allele fasta and corresponding TAB-seprarated profile mapping file in the PubMLST format. Check out an example fasta and mapping file.

bart-update -a scheme.fna scheme.tab

Sometimes there are 2 schemes for a species which is problematic because the heuristics will pick the same one every time. For A. baumannii, I don't want the Oxford scheme to be considered, so I simply run:

bart-update -r Acinetobacter_baumannii#1

bart-profile

bart-profile is an interactive script which returns the ST or closest ST(s) for a combination of alleles in a scheme.

usage: bart-profile [scheme] [ST]

$ bart-profile Helicobacter_cinaedi
enter allele for 23S_rRNA: 4
enter allele for ppa: 2
enter allele for aspA: 2
enter allele for aroE: 2
enter allele for atpA: 2
enter allele for tkt: 1
enter allele for cdtB: 2
scheme: Helicobacter_cinaedi	ST: 10	23S_rRNA(4)	ppa(2)	aspA(2)	aroE(2)	atpA(2)	tkt(1)	cdtB(2)	clonal_complex(9)

Alternatively, type STs after the scheme to display the allelic profiles.

$ bart-profile Helicobacter_cinaedi 10 11 12
scheme: Helicobacter_cinaedi	ST: 10	23S_rRNA(4)	ppa(2)	aspA(2)	aroE(2)	atpA(2)	tkt(1)	cdtB(2)	clonal_complex(9)
scheme: Helicobacter_cinaedi	ST: 11	23S_rRNA(2)	ppa(2)	aspA(2)	aroE(2)	atpA(2)	tkt(1)	cdtB(2)	clonal_complex(9)
scheme: Helicobacter_cinaedi	ST: 12	23S_rRNA(5)	ppa(5)	aspA(2)	aroE(5)	atpA(5)	tkt(1)	cdtB(3)	clonal_complex(12)

References:

Name		Name	Last commit message	Last commit date
Latest commit History 166 Commits
bart		bart
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
bart_logo.png		bart_logo.png
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

bart

Introduction 📖

Dependencies 🧰

Installation ⚙️

Manual installation

Usage 💻

bart-update

bart-profile

About

Releases 2

Contributors 2

Languages

License

tomdstanton/bart

Folders and files

Latest commit

History

Repository files navigation

bart

Introduction 📖

Dependencies 🧰

Installation ⚙️

Manual installation

Usage 💻

bart-update

bart-profile

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 2

Contributors 2

Languages