Skip to content

parameters

Jorge edited this page Dec 19, 2021 · 1 revision

Parameters


Type python bigscape.py -h to display a list of all available parameters. See also the following sections:

help

Type -h or --help to display all current available options

Label

--label <string>

By default, BiG-SCAPE runs will have a name such as YYYY-MM-DD_HH-MM-SS_[extra] where the extra string is hybrids (if activated) and the mode (i.e. glocal). With the --label option, it is possible to add an extra string to the run name. This will be reflected in the dropdown menu in the visualization page.

Input folder

-i INPUTDIR, --inputdir INPUTDIR

Specify a path with the starting point to look for .gbk files. If empty, the search will start where the BiG-SCAPE files are located. The search is recursive. See more information here

Output folder

-o OUTPUTDIR, --outputdir OUTPUTDIR

Output directory, this will contain all output data files. See its structure and more details about each type of result here

Include string

--include_gbk_str INCLUDE_GBK_STR

If any string in this list occurs in the filename, this file will be included in the analysis. (default: cluster, region)

Exclude string

--exclude_gbk_str EXCLUDE_GBK_STR

If any string in this list occurs in the filename, this file will not be used for the analysis. (default: final)

Pfam database

--pfam_dir PFAM_DIR

Location of Pfam files. Default is same location of BiG-SCAPE. See how to prepare these files in the installation instructions

Cores

-c CORES, --cores CORES

BiG-SCAPE will try to paralellize some steps in the analysis like domain prediction and distance calculation. Use this option to set the number of cores the script may use. If not specified, BiG-SCAPE will use all available cores.

Verbose

-v, --verbose

Prints more detailed information of each step in the analysis. Toggle to activate. Because of the amount of information, it might be a good idea to redirect output to to a file e.g.:

$> python bigscape.py <options> --verbose > run.log &

Include singletons

--include_singletons

Toggle to activate. This will include BGCs that don't have a distance lower than the cutoff distance specified.

Domain overlap

-d DOMAIN_OVERLAP_CUTOFF, --domain_overlap_cutoff DOMAIN_OVERLAP_CUTOFF

Specify at which overlap percentage domains are considered to overlap. Domain with the best score is kept (default=0.1). See also [domain prediction](domain prediction).

Minimum size

-m MIN_BGC_SIZE, --min_bgc_size MIN_BGC_SIZE

Provide the minimum size of a BGC to be included in the analysis. Default is 0 base pairs. This includes the sum of all loci in a multi-record GenBank file.

Mix

--mix

By default, BiG-SCAPE separates the analysis according to the BGC product and will create network directories for each class (see [BiG-SCAPE classes](BiG-SCAPE classes)). Toggle to include an analysis mixing all classes. As BiG-SCAPE needs to calculate an all-vs-all distance network, this might use a lot of memory.

No classify

--no_classify

By default, BiG-SCAPE classifies the output files analysis based on the BGC product. Toggle to deactivate (note that if the --mix parameter is not activated, BiG-SCAPE will not create any network file but all intermetiate files will be processed)

Filter classes

--banned_classes {PKSI, PKSother, NRPS, RiPPs, Saccharides, Terpene, PKS-NRP_Hybrids, Others}

BiG-SCAPE Classes that should NOT be included in the classification. E.g. "--banned_classes PKSI PKSOther". Strings in lowercase are also allowed.

Cutoffs

--cutoffs {0.0-1.0}

Generate networks using multiple raw distance cutoff values, example: --cutoffs 0.1, 0.25, 0.5, 1.0. Default: c=0.3. For every cutoff value, a different network file will be generated. Regarding the interactive visualization, only the highest cutoff will be shown. Automatic clustering of Gene Cluster Families will be done using each cutoff.

Clans

--clans-off

By default, BiG-SCAPE will perform a second layer of clustering to group GCFs into GCCs. Toggle to deactivate this.

--clans_cutoff {0.0-1.0} {0.0-1.0}

Cutoff Parameters for which clustering families into clans will be performed in raw distance. First value is the cutoff used for finding GCFs that will be used for Clan calling (default: 0.3). If this GCF cutoff value is not included within --cutoffs, it will be added automatically. Second value is the GCC cutoff value for clustering families into clans (default: 0.7). Average distance between members of each pair of GCFs are used as inter-GCF distance. Every pair of GCFs linked with a distance of GCC cutoff value or less will be taken into account. Example: --clan_cutoff 0.5 0.8)

Learn more about [GCFs and GCCs](GCFs and GCCs).

Hybrids

--hybrids-off

By default, BGCs with hybrid predicted products from the PKS/NRPS Hybrids and Others classes will be included into each subclass (e.g. a terpene-nrps BGC that will usually be classified in Others would be added to both the Terpene and NRPS classes). This means that the same cluster may appear in different classes. Toggle to deactivate

Alignment Mode

--mode {global,glocal,auto}

Alignment mode used when comparing each pair of gene clusters. global: the whole list of domains of each BGC are compared; glocal (default): Longest Common Subcluster mode. In it, the subset of the domains used to calculate distance is redefined by finding the longest slice of common domain content per gene in both BGCs, then expands each slice. auto: use glocal mode when at least one of the BGCs in each pair has the contig_edge annotation from antiSMASH v4+, otherwise use global mode on that pair. Learn more about the alignment modes here

Anchorfile

--anchorfile ANCHORFILE

Point to a custom anchor file. Default is anchor_domains.txt, included in with the repository. Learn more about the anchor file [here](anchor file).

Force hmmscan

--force_hmmscan

Force domain prediction using hmmscan even if BiG-SCAPE finds processed domtable files (e.g. to use a new version of the Pfam database).

Skip alignment

--skip_ma

Skip multiple alignment of domains' sequences. Use if alignments have been generated in a previous run. Domain sequence alignment will also be skipped if BiG-SCAPE reutilizes an output directory and no new BGCs are found within the input folder

MIBiG

--mibig, --mibig14, --mibig13

Use included BGCs from the MIBiG database. Currently, versions 2.1 (--mibig), 1.4 (--mibig14) and 1.3 (--mibig13) of the database are included in the BiG-SCAPE project as a compressed file, which will be unzipped the first time these options are used.

Note that this sets are different from the bundle found in the downloads section of the MIBiG site; these GenBank files have been processed by antiSMASH to annotate the BGC type. Additionally, the latest version is an unofficial version of the current content of MIBiG 'repository' page (i.e. it contains a few more BGCs than the official 2.0 bundle)

Query BGC mode

--query_bgc QUERY_BGC

Instead of making an all-VS-all comparison of all the input BGCs, choose one BGC to compare with the rest of the set (one-VS-all). The query BGC does not have to be within inputdir. The distances that will be used for the GCF and GCC analysis are all that are equal or lower than the maximum cutoff value. This will only take into account the BiG-SCAPE class(es) that the Query BGC belongs to.

Domain includelist

--domain_includelist

Only include BGCs which include (any) domains with the Pfam accessions found in the domain_includelist.txt file. In this file, each line contains a single Pfam accession (with an optional comment, separated by a tab). Lines starting with "#" are ignored. See the file for an example using the P450 domain. Pfam accessions are case-sensitive.

Version

--version

Show program's version number and exit