-
Notifications
You must be signed in to change notification settings - Fork 34
Manual: Running Benchmark
To benchmark a graph-processing platform with Graphalytics, go through the following steps:
- Obtain the platform driver (platform-specific).
- Download the benchmark resources.
- Verify the necessary prerequisites (platform-specific).
- Adjust the benchmark configurations (platform-specific).
- Test the benchmark execution.
- Execute the benchmark.
- Examine the benchmark report.
Note that Step 1, 3, and 4 are platform-specific: follow also the detailed instructions in the README
file of each platform.
There are three possible ways to obtain a platform driver:
-
Recommended: Build the platform drivers from the source code: Find in our website the corresponding Github repositories to build from the source code. See also Software Build for more details
-
Download the (prebuilt) Graphalytics platform driver: Graphalytics maintains a list of (prebuilt) platform drivers distribution that are publicly available, which can be downloaded from our website.
-
Develop a platform driver for a new platform: Graphalytics can be easily extended by developing platform drivers for platforms that are not yet supported. See Implementing Driver for more details.
To execute the benchmark, necessary benchmark resources must be available in the cluster environment:
- Input datasets: real-world and synthetic graphs selected for the benchmark.
- Validation datasets: reference outputs cross-validated by multiple platforms.
Download the required benchmark resources from the datasets page into your cluster environment.
Large-scale graph-processing platforms are usually complex distributed or parallel systems, which might require various platform-specific dependencies. Follow the detailed instructions in the README
file of each platform to configure the cluster environment properly.
The Graphalytics distribution includes a config-template
directory containing (template) configuration files. Before editing any configuration files, it is recommended to create a copy of the config-template
directory and name it config
.
Select one of the three types of benchmark (test
, standard
, custom
) by editing config/benchmark.properties
(only include the benchmark type you need). More fine-grained configuration of each benchmark type can be adjusted at the corresponding benchmark properties file (config/benchmarks/*.properties
).
Large-scale graph dataset can take enormous data storage. Place the downloaded dataset in the proper storage device. Set in config/benchmark.properties
the data directories.
graphs.root-directory: input graphs datasets (dataset.v and dataset.e).
graphs.cache-directory: formatted graph datasets (with only essential edge properties).
graphs.validation-directory = validation graph datasets (reference outputs).
graphs.output-directory = output graph datasets (results of executing algorithms).
Find more details regarding the data flow during the benchmark execution in Chapter 3 Benchmark Process of the technical specification.
The performance of the system-under-test (platform + environment) can be greatly impacted by proper tuning and optimization. Follow the detailed instructions in the README
file of each platform driver to fine-tune the system configuration.
Executing the benchmark can be a very time-consuming process. To verify that Step 2 (verify the necessary prerequisites) and Step 3 (adjust the benchmark configurations) are properly done, the benchmark suite provides an optional test
benchmark which executes 6 core algorithms on 2 tiny graph datasets, example-directed
and example-undirected
. Configuration errors are more likely to be caught before the actual benchmark starts.
After completing the benchmark configuration, compile
and run
the benchmark:
-
If applicable, run
bin/sh/compile-benchmark.sh
to compile the source code. This is usually required by C++ platforms, and can be omitted by Java platforms. -
Run
bin/sh/run-benchmark.sh
to execute the benchmark. The benchmark suite summarizes the targeted benchmark execution, submits a list of benchmark jobs to the platform, and generates the benchmark report.
Find more details regarding the benchmark process in Chapter 3 Benchmark Process of the technical specification.
After the benchmark is completed (successfully), the report for each benchmark can be found in the report
directory. Each report contains the following elements:
-
report.htm
andhtml
directory: the (human-readable) HTML report summarizing the benchmark results. -
json
directory: the (machine-readable) data archive of the benchmark results. To submit your benchmark result on the Global Competition, see Submitting Results for more detailed instructions. -
archive
directory (optional): performance archives of each platform run for finer-grained analysis.
Tutorial
Documentation