Skip to content

Efficiency

Siran Yang edited this page Jun 4, 2019 · 1 revision

Experiment environment

We test Euler's extreme performance based on Alibaba's search advertising data using the traditional GraphSAGE algorithm. For each node, we use about dozens of sparse features, and don't use any dense feature. We set the number of neighbor aggregation layers to 1 and the number of neighbor samples to 10. The graph for test contains about 200 million nodes and 4 billion edges. The machine learning framework used is TensorFlow.

To precisely test Euler's service capability, we adopt Heterogeneous deployment. In this way, Euler is deployed on separate machines and physically isolated from the processes of TensorFlow.

We test the serving capabilities of Euler with one, two, and five instance respectively. Each instance runs on a machine with Intel(R) Xeon(R) Platinum 8163 CPU @ 2.50GHz, 96 cores, 512G memory. All TF workers run independently on the same type of hardware. We use docker to allocate 16 core CPUs and 50GB of memory for each worker.

Experiment results

Under the three deployments, we modify the number of TF workers to test the QPS (samples trained per seconds). Below is the results.

number of TF worker one machine two machines five machines
100 112w 112w 112w
200 145w 220w 220w
300 148w 285w 298w
400 151w 290w 410w
500 152w 300w 505w
600 155w 310w 596w

From the above results, we can see that:

  • With Euler's machine resources increasing, our peak service capability can be linearly expanded.
  • With the number of TF workers increasing, QPS expands linearly until it reaches the limit of the hardware resources used by Euler.
  • Deploying Euler on 5 machines can drive 600 TF workers and provide 600W training QPS. The resources required to deploy Euler occupy only 5% of the total compute resources.
Clone this wiki locally