This repository contains the code necessary to reproduce the results for the Trillion Entity demonstration that was part of the NODES 2021 Keynote presentation. It contains the store generation code we used, the orchestration scripts for the AWS instances that are needed to run the setup, the queries we executed, and the client that performs the latency measurements. Please read this README in its entirety before proceeding, to make sure you have an understanding of the necessary steps.
Blog post with more behind the scenes information Behind the Scenes of Creating the World’s Biggest Graph Database.
The NODES 2021 Keynote recording showing the Trillion Graph Demo live:
A twitter thread summary of the demo:
What you'll need:
- An AWS account with sufficient capacity for the number and type of EC2 instances you'll create, including access to S3. AWS is the default provider this application uses; it should be possible to modify it to use the cloud provider of your choice.
- Access to Neo4j Enterprise. Fabric is a Neo4j Enterprise feature, which is distributed under a different license. It needs to be properly installed to your local Maven repository and you can find detailed instructions in the Neo4j Documentation
The directory structure is as follows:
cypher
contains the individual cypher queries that were used in the demoserver
contains the data generation code and the instance orchestrationclient
contains the client for the latency measurementsguide
contains a Neo4j Browser guide which explains the LDBC schema and queries
Here we'll describe the basic steps you'll need to take. Detailed instructions are provided further down.
The code provided should be straightforward to understand. You should take some time to familirize yourself with it, since you'll need to provide information specific to your environment. The main two files to look at are the FabricDataGenerator
and AmazonController
that you can find under the server
directory. The first creates the stores both locally and remotely, and the second orchestrates the AWS Neo4j instances. They are structured as scripts, so you can modify them as you like. You will need to edit the code to execute the various steps and configure the setup to your requirements.
You should first create the Person and Template databases. The first is the full Person shard and the latter is the basis for the Forum shards. Typically, you will create these two locally, upload them to S3, and then orchestrate EC2 instances with the AmazonController
to generate en mass Forum shards. Of course, with minimal changes, you can do everything locally, in one step, and then move the databases to the Fabric shards however you prefer.
The AmazonController
class can be used to install and configure Neo4j and the shards. You will need to modify the code to execute the appropriate commands for your setup, but the basic AWS orchestration steps will be the same as for the store generation.
The last step is to locally build and run the UI for the demo. With that, you'll be able to take latency measurements and explore the schema you built.