Efficiency

Experiment environment
Experiment results

Experiment environment

We test Euler's extreme performance based on Alibaba's search advertising data using the traditional GraphSAGE algorithm. For each node, we use about dozens of sparse features, and don't use any dense feature. We set the number of neighbor aggregation layers to 1 and the number of neighbor samples to 10. The graph for test contains about 200 million nodes and 4 billion edges. The machine learning framework used is TensorFlow.

To precisely test Euler's service capability, we adopt Heterogeneous deployment. In this way, Euler is deployed on separate machines and physically isolated from the processes of TensorFlow.

We test the serving capabilities of Euler with one, two, and five instance respectively. Each instance runs on a machine with Intel(R) Xeon(R) Platinum 8163 CPU @ 2.50GHz, 96 cores, 512G memory. All TF workers run independently on the same type of hardware. We use docker to allocate 16 core CPUs and 50GB of memory for each worker.

Experiment results

Under the three deployments, we modify the number of TF workers to test the QPS (samples trained per seconds). Below is the results.

number of TF worker	one machine	two machines	five machines
100	112w	112w	112w
200	145w	220w	220w
300	148w	285w	298w
400	151w	290w	410w
500	152w	300w	505w
600	155w	310w	596w

From the above results, we can see that:

With Euler's machine resources increasing, our peak service capability can be linearly expanded.
With the number of TF workers increasing, QPS expands linearly until it reaches the limit of the hardware resources used by Euler.
Deploying Euler on 5 machines can drive 600 TF workers and provide 600W training QPS. The resources required to deploy Euler occupy only 5% of the total compute resources.