-
Notifications
You must be signed in to change notification settings - Fork 11
/
README
436 lines (356 loc) · 19 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
***************
Vite (ˈviːte/)
***************
*******
-------
ABOUT
-------
*******
`Vite` is an MPI+OpenMP implementation of Louvain method for (undirected)
graph clustering or community detection, that supports a number of heuristics
to improve the scalability. Vite can use both real-world and synthetic graph
(uses an in-memory random geometric graph generator).
Please refer to the following paper that discusses the
distributed implementation and associated heuristics:
https://ieeexplore.ieee.org/abstract/document/8425242/
We recently added an option to improve the distribution to
be more edge-balanced rather than standard vertex-based ("-b"),
which can avoid communication and improve execution time.
Relevant discussion in the following paper:
https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=8916299
If quality is important for you, please consider the coloring
options. Coloring will increase the overall execution time and
communication, but it is possible to combine coloring with
other heuristics to improve the performance. Relevant discussion
in the following paper:
https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=8547534
Vite can be downloaded from (please don't use the past
PNNL/PNL link to download Vite mentioned in the paper, the
current GitHub link is the correct one):
https://github.com/Exa-Graph/vite
This code requires an MPI library (preferably MPI-3 compatible)
and C++11 compliant compiler for building.
Please contact the following for any queries or support:
Sayan Ghosh, PNNL (sg0 at pnnl dot gov)
Mahantesh Halappanavar, PNNL (hala at pnnl dot gov)
Antonino Tumeo, PNNL (antonino.tumeo at pnnl dot gov)
Please '*' this repository on GitHub if the code is useful to you.
*************
-------------
COMPILATION
-------------
*************
Pass -DDEBUG_PRINTF if detailed diagonostics is required along
program run. This program requires OpenMP and C++11 support,
so pass -fopenmp (for g++)/-qopenmp (for icpc) and -std=c++11/
-std=c++0x. For Cray systems, use CC.
Pass -DUSE_32_BIT_GRAPH if number of nodes in the graph are
within 32-bit range (2 x 10^9), else 64-bit range is assumed.
Pass -DDONT_CREATE_DIAG_FILES if you dont want to create 2 files
per process with detail diagonostics.
Upon building, the program will generate two binaries:
bin/graphClustering (parallel) and bin/fileConvert (serial).
Please use bin/fileConvert for input graph conversion from
native formats to a binary format that bin/graphClustering will
be able to read.
Compiling on Intel KNL:
We made some modifications to the code to port it to Cray XC systems
with Intel KNL. We use lib-memkind to allocate some heavily used data
structures to KNL MCDRAM (for usage in flat-mode).
Please ensure lib-memkind module is loaded and pass
-DUSE_AUTOHBW_MEMALLOC, and pass -xmic-AVX512 instead of -xHost in
the Makefile.
Experimental device offload:
The modularity computation is the only arithmetic logic in this code.
Even though, due to the limited computation associated with the modularity
computation, it is doubtful how much GPU can help in improving the
performance relative to using OpenMP host pragmas. However, we have added
an option to enable OpenMP offload only for the modularity computation
kernel. Vite must be built with -DOMP_TARGET_OFFLOAD and a compiler that
supports OpenMP 4.5. The default compiler options in the Makefile needs to
be updated as well, for instance, on Summit, for enabling OpenMP offload
in the XL compiler, one would need to pass: "-qsmp=omp -qoffload" currently.
Initial testing suggests that for certain graphs, it is possible to achieve
about 10-15% improvement in performance relative to host OpenMP.
************************
------------------------
INPUT GRAPH CONVERSION
------------------------
************************
Convert input files to binary from various formats.
Possible options (can be combined):
1. -n : Input graph in SNAP format
2. -m : Input graph in Matrix-market format
3. -e : Input graph in METIS format
4. -p : Input graph in Pajek format
5. -d : Input graph in DIMACS format. Pass 0 or 1
to indicate undirected/directed graph
6. -s : Input graph is directed edge list
7. -u : Input graph is undirected edge list
(Can be used for Graph Challenge official
datasets - http://graphchallenge.mit.edu/)
8. -x : Read a number of files with edge list
information, usage e.g.:
-x "<file-path> <start-chunk> <end-chunk>"
Requires conformant file names.
9. -i : Accept weights as is from the file. If this
option is not passed, then by default the
absolute value of the original weight is
chosen.
10. -r : Create random edge weights
11. -w : Initialize edge weights to 1.0
12. -o : Output binary file with full path
13. -z : If the index of input graph is 1-based,
then this option makes it 0-based
14. -f : Option preceding input graph file
------------------------
File conversion related
------------------------
Typical example:
bin/./fileConvert -n -f karate.txt -o karate.bin
If your input file does not have edge weights, and you
want random edge weights, then pass option "-r".
bin/./fileConvert -m -r -f karate.mtx -o karate.bin
Note: DIMACS file could be directed or undirected, so
pass 0 if input is directed, or a number > 0 input
graph is undirected. Also, we expect DIMACS inputs to
have weights, therefore passing -r has no effect.
bin/./fileConvert -d 0 -f sample.gr -o sample.bin
Same as the one before, except initializes all weights
to 1, no matter what is in the input file.
bin/./fileConvert -d 0 -w -f sample.gr -o sample.bin
Note: If the input file is index 1-based, then indicate
that by passing -z, such that it will be converted to
0-based internally.
bin/./fileConvert -s -z -f network.dat -o lfrtest.bin
Note: We consider files containing edge list, as 'simple'
formatted files (option "-s" for directed, and option "-u"
for undirected). If you have a weighted simple format file
(i.e., is `u v w` format) then you have to additionally
pass "-i".
If there are weights in such an edge-list file and for some
reason you do not want to consider them, pass "-w" or "-r"
in addition to "-s" or "-u" such that we can assign one-weights
or random-weights internally.
By default, we assume edge list files to not have any weights
(as in, just `u v` per line).
bin/./fileConvert -f uk2007-edges.txt -s -w -o uk2007.bin
Note on `-s` and `-u` options
-----------------------------
The `-s` or `-u` option is just to tell the implementation about
your native graph format (as in if you pass `-s`, we assume in
your edge list just `u v` exists and not its complement; whereas
when you pass `-u`, we will assume that both `u v` and `v u`
exists).
Regardless of the native format, internally, we always use an
undirected graph representation, because the algorithm itself
assumes an undirected input, and accordingly prints the number of
edges as double if your native graph was directed (on the other
hand, if your native graph was undirected, and for e.g., you used
`-u` to convert to binary, the #edges printed by the implementation
should exactly match with the #edges of your native graph).
Apart from the simple edge list formats and matrix-market formats,
we do not actively check the correctness of the other formats.
****************************
----------------------------
SYNTHETIC GRAPH GENERATION
----------------------------
****************************
RGG
---
Apart from real world graphs, users can use specific options
to generate a Random Geometric Graph (RGG) in parallel.
RGGs have been known to have good community structure:
https://arxiv.org/pdf/1604.03993.pdf
The way we have implemented a parallel RGG generator, vertices
owned by a process will only have cross edges with its logical
neighboring processes (each process owning 1x1/p chunk of the
1x1 unit square). If MPI process mapping is such that consecutive
processes (for e.g., p and p+1) are physically close to each other,
then there is not much communication stress in the application.
Therefore, we allow an option to add extra edges between randomly
chosen vertices, whose owners may be physically far apart. Relevant
discussion in paper:
https://ieeexplore.ieee.org/abstract/document/8641631
We require the total number of processes to be a power of 2 and
total number of vertices to be perfectly divisible by the number of
processes when parallel RGG generation options are used.
An n-D random geometric graph (RGG), is generated by randomly placing N
vertices in an n-D space and connecting pairs of vertices whose Euclidean
distance is less than or equal to d. We only consider 2D RGGs contained
within a unit square, [0,1]^2. We distribute the domain such that each
process receives N/p vertices (where p is the total number of processes).
Each process owns (1 * 1/p) portion of the unit square and d is computed
as (please refer to Section 4 of above paper for details):
d = (dc + dt)/2;
where, dc = sqrt(ln(N) / pi*N); dt = sqrt(2.0736 / pi*N)
Therefore, the number of vertices (N) passed during miniVite execution on p
processes must satisfy the condition -- 1/p > d.
Please note, the default distribution of graph generated from the in-built random
geometric graph generator causes a process to only communicate with its two
immediate neighbors. If you want to increase the communication intensity for
generated graphs, please use the "-e" option to specify an extra percentage of edges
that will be generated, linking random vertices. As a side-effect, this option
significantly increases the time required to generate the graph.
E.g.:
mpiexec -n 2 bin/./minivite -f karate.bin
mpiexec -n 2 bin/./minivite -l -n 100
mpiexec -n 2 bin/./minivite -n 100
mpiexec -n 2 bin/./minivite -e 2 -n 100
NetworKit bindings
------------------
We also have bindings to NetworKit[*], that can help generate random and scale
free graphs, but this option is actively supported and may be purged from future
Vite versions. This is mostly serial, but NetworKit internally may use OpenMP.
Note: We have only tested with NetworKit GitHub main (circa Oct 2023), and have no
plans on supporting other NetworKit versions (we will most probably remove this option
from future releases).
[*] https://networkit.github.io/
Networkit can be built from GitHub:
git clone --recursive https://github.com/networkit/networkit.git
Possible options (can be combined):
1. -e : Generate Erdos-Renyi random (ER) graph
2. -c : Generate Clustered Random Graph (CRG)
3. -b : Generate Barabasi-Albert graph
4. -h : Generate hyperbolic graph
5. -p <...> : Probability for edge creation (for ER graph)
6. -n <...> : Number of (max) nodes/vertices
7. -k <...> : Number of clusters for CRG (each unit square is
divided into clusters where edges are created
within) OR attachments per
node for Barabasi-Albert graph
8. -m <...> : Initial nodes attached for Barabasi-Albert
9. -r : Create random edge weights
***********************
-----------------------
EXECUTING THE PROGRAM
-----------------------
***********************
0. Input file conversion to Vite binary format from the respective native
file formats: convert the input file to binary format using fileConvert.
1. Set the desired number of processes and threads.
2. Run distributed parallel Louvain algorithm (graphClustering binary)
using the binary file created in Step #0:
E.g., pass a binary file using "-f" option and include other heuristics:
mpiexec -n 2 bin/./graphClustering -i -f karate.bin
E.g., generate a random geometric graph by passing #vertices=64:
mpiexec -n 2 bin/./graphClustering -n 64
E.g., generate a random geometric graph by passing #vertices=64
and store the resultant binary graph:
mpiexec -n 2 bin/./graphClustering -n 64 -s rgg64.bin
E.g., read a previously generated random geometric binary graph:
mpiexec -n 2 bin/./graphClustering -f rgg64.bin
Possible options (can be combined):
1. -f <input> : Input real-world binary file (read #1 above).
2. -c "<ncolors> <single-color-iteration?>" :
Enable coloring, adjacent vertices are processed
in different order. Synchronization happens once
per <ncolors>, so this option can significantly
increase execution time.
We use incomplete iterative
coloring algorithm (Jones-Plassmann), so passed colors
is just a hint, also pass whether a single-color-iteration
is expected, by default multiple iterations are invoked.
3. -d "<ncolors> <single-color-iteration?>" :
Enable coloring, adjacent vertices are processes
in different order. No communication overhead like
above, only local overhead of coloring subgraph.
4. -i : Threshold cycling, threshold changes dynamically
in every phase of Louvain algorithm.
5. -t 1 : If a vertex is in the same community for past 3
iterations, then consider it inactive.
6. -t 3 : If a vertex is in the same community for past 3
iterations, then consider it inactive. Also,
adds a communication step to gather inactive
vertices and terminate Louvain if >= 90% vertices
at an iteration are inactive.
7. -t 2 -a <0-1> : Early termination* using probability alpha.
8. -t 4 -a <0-1> : Early termination* using probability alpha. Also,
adds a communication step to gather inactive
vertices and terminate Louvain if >= 90% vertices
at an iteration are inactive.
9. -b : Only valid for real-world inputs. Attempts to
distribute approximately equal number of edges among
processes. Irregular number of vertices owned by a
particular process. Increases the distributed graph
creation time due to serial overheads, but may
significantly improve the overall execution time.
10. -o : Output communities into a file. This option will result
in Vite dumping the communities (community-per-vertex in
each line, total number of lines == number of vertices)
in a text file named <input-binary-file>.communities in
the same path as the input binary file.
11. -r <nranks> : This is used to control the number of aggregators in MPI
I/O and is meaningful when an input binary graph file is
passed with option "-f".
naggr := (nranks > 1) ? (nprocs/nranks) : nranks;
12. -g <gfile> : Pass a ground truth file for community comparison. We
expect the ground truth file to contain N lines (equal to
the total #vertices in the graph), while each line containing
a distinct vertex ID and associated community ID, separated by
a space or tab. Ground truth community comparison is performed
in a single node, and it uses OpenMP to parallelize. It may take
a substantial amount of time for large files.
13. -z : Only applicable if "-g <gfile>" option is passed. This tells us
that the passed ground truth file is 1-based. If this option is
not passed, we assume the ground truth to the 0-based.
14. -n <|V|> : Generate graph in memory (uses a parallel Random Geometric Graph
generator).
15. -e <%> : Used in conjunction with the "-n <|V|>" option to generate RGG.
This option tells the percentage of edges to be added, randomly
connecting vertices across processes. Currently, the maximum number
of randomly added edges to RGG cannot exceed INT_MAX (there is no
check to determine this, so exercise caution).
16. -p : Run a single phase of Louvain.
17. -s <output> : Only applicable to the generated graphs (for e.g., -n <|V|>),
output is a binary file as per the output file path.
18: -j : Just process (read or generate) the graph and exit without running
community detection.
Coloring:
mpiexec -n 2 bin/graphClustering -c "4" -f karate.bin
mpiexec -n 2 bin/graphClustering -c "8 false" -i -f karate.bin
By default, the coloring method is run for multiple iterations, to make it run for
just a single iteration, pass "true". For e.g.:
mpiexec -n 2 bin/graphClustering -c "4 true" -f karate.bin
*Note: Option to update inactive vertices percentage is defined as
macro ET_CUTOFF in louvain.hpp, and the default is 2%.
**********************************************
----------------------------------------------
COMPARING COMMUNITIES WITH GROUND TRUTH DATA
----------------------------------------------
**********************************************
1. Generate benchmark graphs with ground truth
community information. For e.g., we have tested
with LFR benchmark.
(https://sites.google.com/site/santofortunato/inthepress2)
LFR generates an edge list (index 1-based), along with community
membership of each vertex.
2. Convert the generated network to binary using
fileConvert binary.
3. Execute the program as shown in the e.g. below:
mpiexec -n 2 bin/./graphClustering -r 1 -f lfrtest.bin -z -g community.dat
The difference between standard way to execute the program and this way
is that one needs to pass the ground truth vertex-community mapping
using the "-g" option. Also, the "-z" option tells the program that
the input ground truth file -- `community.dat` is 1-based. If "-z" is
not passed, then we assume that the file is 0-based.
We return some statistics such as F-score, Gini-coefficient,
false positives/negatives and true positives. We expect that the isolated
vertices will be placed into their respective communities, and not into
a single one.
However, extra communication per phase is required when these metrics
are to be computed, so the program execution time will increase
significantly. Plus, the community comparison part is currently
multithreaded (uses OpenMP), so everything is brought to a single
node/PE and then communities are compared.
***************
---------------
OUTPUT RESULT
---------------
***************
If -DDONT_CREATE_DIAG_FILES is passed during compilation,
then output is send to stdout.
Otherwise, the output result is dumped per process on files
named as dat.out.<process-id>. Check dat.out.0 to review
program diagonostics.
Output files are cleared with: `make clean`.