This repository is a collection of programs complementing [paper], which can be used for generating affiliation networks and calculating their properties (such as degree and size distributions, clustering coefficient, assortativity, average distance, Q-analysis metrics, etc.)
The programs are written in Unix style: they are command-line-friendly, and in most cases the output of one program (e.g. network generation) can be piped into another program (e.g. computing the degree distribution).
-
Hypergraphs (HG), where every affiliation is a hyperedge (i.e. a non-empty set of nodes). Example in the plain-text format we use:
%hg {1, 2, 3} {3, 4} {3, 4, 5}
-
Abstract Simplicial Complexes (SC), where every affiliation is a facet (i.e. maximal face of the complex). This in practice means that no affiliation is a subset of another affiliation. Therefore, the hypergraph example above is not a valid simplicial complex networks, b.c.
{3 ,4}
is a subset of{3, 4, 5}
, and so the corresponding simplicial complex network will be instead represented as:%sc {1, 2, 3} {3, 4, 5}
-
(GNU)
make
,gcc
-
ocaml
compiler -
ocamlfind
(library manager for OCaml).
Both ocaml
and ocamlfind
can be installed either with the OCaml package manager OPAM, or using your system package manager.
See more info on installing OCaml here: https://ocaml.org/docs/up-and-running.
Usage:
./growth OPTIONS ./growth OPTIONS initial-network-filename
Examples:
$ ./growth --type=sc --model=acl:2:0.5,1,1 --any --n=100
$ cat init.sc
%sc
{ 1 2 3 }
{ 2 3 4 }
$ ./growth --type=sc --model=acl:2:0.5,1,1 --any --n=6 init.sc
%sc
{ 1 2 3 6 }
{ 2 3 4 }
{ 5 }
Mandatory options:
--type=[sc|hg] Simplicial complex or hypergraph
--model=acl:[1|2|3]:{num},{num},{num}
acdistl:[1|2|3]:{num},{filename},{num}
sizedist:{filename}
aclsizedist:[1|2|3]:{num},{num},{num}:{filename}
--any OR
--best=metric Grow randomly or maximize a metric
--n=num OR
--lcc=num OR
--i=num Termination condition. Grow until reached:
n: nodes
lcc: largest connected component size (# of nodes)
i: number of growth operations
Non-mandatory options:
--seed=num Random seed
--safe "Safe" reading the initial network file.
In the SC case, automatically subsumes facets
that are subsets of other facets.
In the HG case, it does nothing.
Recommended / most common usage:
./growth --type={type} --model=acl:2:{alpha}:{c}:{l} --any --n={N}
where replace:
-
{type}
with eitherhg
orsc
-
{alpha}
,{c}
, and{l}
with the corresponding model parameters 0 < α < 1, positive integers c and l -
{N}
with the desired number of nodes in the produced network (alternatively, consider using--i
).
Advanced usage:
-
--model=acl:1:{alpha}:{c}:{l}
and -
--model=acl:3:{alpha}:{c}:{l}
(note:1:
and:3:
instead of:2:
) are two alternative variants of the model (See Section [FIXME] in [paper]) -
--model=acdistl:[1|2|3]:{alpha},{c-dist-file},{l}
is the same as--model=acl
, but instead of using a constant value for c, every growth operation samples a random value from the supplied distribution stored as a file like:
5 0.25
6 0.5
7 0.125
8 0.125
-
--model=sizedist:{filename}
[FIXME] -
--model=aclsizedist:[1|2|3]:{alpha}:{c}:{l}
[FIXME] -
--best={metric}
(instead of--any
), each growth operation is not chosen at random, but attempts to increase certain "metric". Specifically, the algorithm samples three growth operations and branches the network into three corresponding future states. Then it chooses the one future state that maximizes the desired metric. Replace{metric}
with the integer number corresponding to the metric (see source code filemetric.ml
for details), some good options:
101 Maximum facet size
102 Average facet size
103 Maximum facet degree
104 Average facet degree
105 Maximum edge degree
106 Average edge degree
107 Number of nodes
108 Number of facets
109 Number of connected components
Reads a SC or a HG network from STDIN and outputs its degree distribution into STDOUT.
Usage:
cat filename | ./fd_dist
cat filename | ./fd_dist --safe
./fd_dist < filename
./fd_dist --safe < filename
Examples:
echo -e '%sc\n {1,2,3} {3,4}' | ./fd_dist
1 0.75
2 0.25
(i.e. 75% of the nodes have degree 1 and 25% have degree 2)
$ ./growth --type=sc --model=acl:2:0.5,1,1 --any --n=100 | ./fd_dist
1 0.73
2 0.18
3 0.04
4 0.02
5 0.01
9 0.01
12 0.01
Reads a SC or a HG network from STDIN and outputs its affiliation size distribution into STDOUT.
Usage:
cat filename | ./fs_dist
cat filename | ./fs_dist --safe
./fs_dist < filename
./fs_dist --safe < filename
Examples:
$ echo -e '%sc\n {1,2,3} {3,4}' | ./fs_dist
2 0.5
3 0.5
(i.e. 50% of the affiliations have size 2 and 50% have size 3.)
$ ./growth --type=sc --model=acl:2:0.5,1,1 --any --n=100 | ./fs_dist
1 0.0945946
2 0.364865
3 0.405405
4 0.121622
5 0.0135135
Reads a SC or a HG network from STDIN, computes the degree distribution of the skeleton graph of the network and outputs it into STDOUT. (One can see it as a reduction of the network to the corresponding simple graph, and then computing its degree distribution.)
Example:
$ echo -e '%sc\n {1,2,3} {2,3,4} {4,5}' | ./ed_dist
1 0.2
2 0.2
3 0.6
Explanation of the above example:
The skeleton graph:
2
/ | \
1 | 4 - 5
\ | /
3
Nodes of degree=1: 5 (i.e. 1/5 = 20%)
Nodes of degree=2: 1 (i.e. 1/5 = 20%)
Nodes of degree=3: 2, 3, 4 (i.e. 3/5 = 60%)