Skip to content

GCFs and GCCs

Jorge edited this page Dec 19, 2021 · 1 revision

Once the distance matrix is calculated for the data set, Gene Cluster Family (GCF) assignment is performed for every cutoff distance selected by the --cutoffs parameter.

The interactive visualization of BiG-SCAPE output will show the one with the largest number.

For every cutoff, BiG-SCAPE creates a network using all distances lower or equal than the current cutoff. The Affinity Propagation clustering algorithm is applied to each subnetwork of connected components that emerge from this procedure. The similarity matrix for Affinity Propagation includes all distances between members of the subnetwork (i.e. it includes those with distance greater than the current cutoff)

Gene Cluster Clan (GCC) setting (enabled by default) will perform a second layer of clustering on the GCFs. For this, Affinity Propagation will be applied again (i.e. on a network of subconnected components) but network nodes are represented by the GCFs defined at the cutoff level specified in the first value of the --clan_cutoff parameter (Default: 0.3). Clustering will be applied to the network of all GCFs connected by a distance lower or equal than the GCC cutoff (the second value of the --clan_cutoff parameter; larger distances are discarded. Default: 0.7). Inter-GCF distance is calculated as an average distance between the BGCs within both families.

Affinity propagation parameters used in both clustering layers: damping=0.9, max_iter=1000, convergence_iter=200

Clone this wiki locally