[Feature Request] Add GPU Query acceleration #4678
Labels
area/performance
Performance related issues.
kind/feature
Something completely new we should consider.
Experience Report
When working on large datasets, query & analytics becomes an apparent pain point.
What you wanted to do
On another graph DB, query performance bogged down quite early. On DGraph, the performance was roughly 10x better, but still not exactly great mainly due to CPU bound operations.
What you actually did
Data sampling. Essentially, I sampled a smaller dataset, did my queries & data pre-processing, and once the end of the line was reached, let it run on the full dataset.
Why that wasn't great, with examples
Data sampling comes with multiple issues:
What would be a truly great way to solve this?
GPU accelerated queries & analytic functions as these are up 1000X faster than CPU based queries.
BrytlytDB crunches 1.1. billion data (500GB) between 0.005(!) and 0.188 seconds on a 5 node IBM cluster equipped with 20 Nvidia P100 GPUs.
MapD (now OmniSciDB) does the same task slightly slower but still under 1 second with 8 Pascal Titan X cards.
Both are about 250X faster than Postgres means GPU accelerated queries deliver very real performance gains. To the best of my knowledge, this is enough speedup to run complex queries and analytic tasks on a full dataset and a complete graph.
However, Blazegraph one of the very few available GPU accelerated graph database was acquired by
Amazon and it said to be the foundation for Amazon Neptune.
For graphs, a multi-GPU solution is about 700-1800X faster than CPU on analytics and, on average, on selected queries 156X faster than the non-GPU version. This means that you can realize 150X performance improvement for your existing graph database just by adding a GPU. With the reasonably priced T4 GPU offered by all leading cloud providers, affordable hardware is certainly given. Considering the expected performance bump with the upcoming Turing / GeForce 30X series, even consumer GPU's should deliver massive GPU acceleration.
This kind of massive performance gains would be an excellent addition to the Enterprise edition.
Any external references to support your case
Benchmarks:
https://tech.marksblogg.com/benchmarks.html
MapD / OmniSciDB
https://github.com/omnisci/omniscidb
https://tech.marksblogg.com/billion-nyc-taxi-rides-nvidia-pascal-titan-x-mapd.html
BrytlytDB
https://www.brytlyt.com/
Graphs on GPU's
https://devblogs.nvidia.com/gpus-graph-predictive-analytics/
Blazegraph GPU accelerated Graph DB
https://blazegraph.com/
The text was updated successfully, but these errors were encountered: