gSpan is an algorithm for mining frequent subgraphs.
Header-only in include directory
$ ./example/gspan --help
Usage: gspan [options]
Graph-based substructure pattern mining.
Depending on the graph count in input, there are two modes:
1. input contains one graph. Mined patterns belong to this one;
in this case only --mincount=NUM option is used
2. input contains many graphs. Mined patterns belong to some graph in input;
in this case --minsupp=NUM option is used, as more useful.
Options:
-i, --input FILE file to read, default stdin
-o, --output FILE file to write, default stdout
-c, --mincount NUM minimal count, integer value, default 1
-s, --minsupp NUM minimal support, 0..1
-l, --legacy use tgf format for input and output (slower!)
-e, --embeddings [opts] none, autgrp, all. default is none
-h, --help this help
two formats supported:
Input data is a one or many input graphs. Output data is one or many patterns with mappings to input graph. # is comment line
#
# begin section with input graph (or transaction)
#
t <graph_id>
v <vertex_id> <value>
...[repeat v]...
e <edge_id> <vertex_id> <vertex_id> <value>
...[repeat e]...
...[repeat t]..
#
# begin section with mined pattern, it is also graph
#
p <pattern_id> # occurence: <num>
v <vertex_id> <value>
...[repeat v]...
e <edge_id> <vertex_id> <vertex_id> <value>
...[repeat e]...
#
# each pattern has multiple mappings to original input graph
#
m <mapping_id>
v <vertex_id> <graph_id> <vertex_id> # map pattern vertex to original vertex
...[repeat v]..
e <edge_id> <graph_id> <edge_id> # map pattern edge to original edge
...[repeat e]...
...[repeat m]...
...[repeat p]...
#
# input
#
t # <graph_id>
v <vertex_id> <value>
e <vertex_id> <vertex_id> <value>
#
# output
#
t # <pattern_id> * <support>
parent : <pattern_id>
v <vertex_id> <value>
e <vertex_id> <vertex_id> <value>
x: <graph_id> ...
gSpan: Graph-Based Substructure Pattern Mining, by X. Yan and J. Han. Proc. 2002 of Int. Conf. on Data Mining (ICDM'02).