Skip to content
Jorge edited this page Dec 19, 2021 · 1 revision

Output

Output folder structure

  • cache: Holds precalculated data from the analysis. If BiG-SCAPE is run again and pointed to the same output folder, it will try to read and re-use files from this directory.
    • domains For each domain found in the analysis, three files are generated:
      • fasta file: Contains the sequences of the same domain from all the proteins from all BGCs
      • stk file: Alignment of each of the sequences using hmmalign in stockholm format
      • algn file: Fasta file with the aligned domains in fasta format (parsed from the stockholm file). These are the sequences that will be used in DSS
    • domtable Raw output from domain prediction on each BGC's proteins sequences using hmmscan
    • fasta Protein sequences from each BGC. Extracted from the CDS features in the GenBank file.
    • pfd Parsed results from the domtable file in a tab-separated format. These results have already been filtered for overlapping domains. Columns: Cluster name, (per-domain) score, gene id (if present), envelope coordinate from, envelope coordinate to (of the domain prediction, in amino acids), pfam id, pfam descriptor, start coordinate gene, end coordinate gene, internal cds header.
    • pfs A list of predicted domains for each BGC file.
    • .dict files: internal files
  • html_content: All the code necessary for the interactive visualization
  • logs: Currently only holds the parameters used in each run (that points to this output folder) and the run time.
  • network files: See more information in the next section.
  • SVG: Arrow figures in svg format for every BGC in the analysis. Every figure has boxes representing the predicted domains. This are given random colors but can be changed by the user by modifying the domains_color_file.tsv file

Results

Network files

Each run will generate its own set of output files which can be used for analysis using other tools (e.g. Cytoscape):

  • Network_Annotations_Full.tsv a tab-separated file with information about each BGC that was successfully processed in the input. This includes: BGC name, the original accesion ID from the GenBank file, the description from the original GenBank file, the antiSMASH product prediction, the [BiG-SCAPE class](BiG-SCAPE classes), the organism tag from the original GenBank file, and finally, the taxonomy string also from the GenBank file.
  • Folders for each BiG-SCAPE class which contain:
    • The .network file. One file for each cutoff selected.
    • The Network Annotation file with the BGCs used for this particular class
    • The clustering files. These contain, for each cutoff, a first column with the BGC name, and a second column, separated by a tab, with the label representing the cluster (GCF number) that the BGC was put in.

All of these files can be opened in a text editor.

Interactive visualization

Launch the interactive output by clicking on the index.html file or opening the file with any web browser. This file is located in the root of the output folder.

When opening the visualization page, you will be shown an overview page.

Click on the dropdown menu on the right to select the run that you want to visualize.

Clone this wiki locally