From 1068ac58488062faee91fb19d8504dc5207986be Mon Sep 17 00:00:00 2001 From: Juan Cubeddu Date: Mon, 31 Jul 2023 19:54:28 -0500 Subject: [PATCH] updates to graph_notebook_config for Amazon Neptune Proxy Connection (#504) * updates to graph_notebook_config for Amazon Neptune Proxy Connection * no more mardown warnings on README * Typo * Update README.md --- README.md | 108 ++++++++++++++++++++++++++++++++++++++---------------- 1 file changed, 76 insertions(+), 32 deletions(-) diff --git a/README.md b/README.md index b9695c1c..4418d613 100644 --- a/README.md +++ b/README.md @@ -1,19 +1,18 @@ -## Graph Notebook: easily query and visualize graphs +# Graph Notebook: easily query and visualize graphs The graph notebook provides an easy way to interact with graph databases using Jupyter notebooks. Using this open-source Python package, you can connect to any graph database that supports the [Apache TinkerPop](https://tinkerpop.apache.org/), [openCypher](https://github.com/opencypher/openCypher) or the [RDF SPARQL](https://www.w3.org/TR/rdf-sparql-query/) graph models. These databases could be running locally on your desktop or in the cloud. Graph databases can be used to explore a variety of use cases including [knowledge graphs](https://aws.amazon.com/neptune/knowledge-graphs-on-aws/) and [identity graphs](https://aws.amazon.com/neptune/identity-graphs-on-aws/). ![A colorful graph picture](./images/ColorfulGraph.png) - -### Visualizing Gremlin queries: +## Visualizing Gremlin queries ![Gremlin query and graph](./images/GremlinQueryGraph.png) -### Visualizing openCypher queries +## Visualizing openCypher queries ![openCypher query and graph](./images/OCQueryGraph.png) -### Visualizing SPARQL queries: +## Visualizing SPARQL queries ![SPARL query and graph](./images/SPARQLQueryGraph.png) @@ -30,7 +29,8 @@ We encourage others to contribute configurations they find useful. There is an [ ## Features -#### Notebook cell 'magic' extensions in the IPython 3 kernel +### Notebook cell 'magic' extensions in the IPython 3 kernel + `%%sparql` - Executes a SPARQL query against your configured database endpoint. [Documentation](https://docs.aws.amazon.com/neptune/latest/userguide/notebooks-magics.html#notebooks-cell-magics-sparql) `%%gremlin` - Executes a Gremlin query against your database using web sockets. The results are similar to those a Gremlin console would return. [Documentation](https://docs.aws.amazon.com/neptune/latest/userguide/notebooks-magics.html#notebooks-cell-magics-gremlin) @@ -48,6 +48,7 @@ We encourage others to contribute configurations they find useful. There is an [ **TIP** :point_right: There is syntax highlighting for language query magic cells to help you structure your queries more easily. #### Notebook line 'magic' extensions in the IPython 3 kernel + `%gremlin_status` - Obtain the status of Gremlin queries. [Documentation](https://docs.aws.amazon.com/neptune/latest/userguide/gremlin-api-status.html) `%sparql_status` - Obtain the status of SPARQL queries. [Documentation](https://docs.aws.amazon.com/neptune/latest/userguide/sparql-api-status.html) @@ -84,9 +85,11 @@ We encourage others to contribute configurations they find useful. There is an [ **TIP** :point_right: Many of the magic commands support a `--help` option in order to provide additional information. ## Example notebooks + This project includes many example Jupyter notebooks. It is recommended to explore them. All of the commands and features supported by `graph-notebook` are explained in detail with examples within the sample notebooks. You can find them [here](./src/graph_notebook/notebooks/). As this project has evolved, many new features have been added. If you are already familiar with graph-notebook but want a quick summary of new features added, a good place to start is the Air-Routes notebooks in the [02-Visualization](./src/graph_notebook/notebooks/02-Visualization) folder. ## Keeping track of new features + It is recommended to check the [ChangeLog.md](ChangeLog.md) file periodically to keep up to date as new features are added. ## Prerequisites @@ -95,22 +98,22 @@ You will need: * [Python](https://www.python.org/downloads/) 3.7.x-3.10.11 * A graph database that provides one or more of: - * A SPARQL 1.1 endpoint - * An Apache TinkerPop Gremlin Server compatible endpoint - * An endpoint compatible with openCypher + * A SPARQL 1.1 endpoint + * An Apache TinkerPop Gremlin Server compatible endpoint + * An endpoint compatible with openCypher ## Installation Begin by installing `graph-notebook` and its prerequisites, then follow the remaining instructions for either Jupyter Classic Notebook or JupyterLab. -``` +``` bash # install the package pip install graph-notebook ``` ### Jupyter Classic Notebook -``` +``` bash # Enable the visualization widget jupyter nbextension enable --py --sys-prefix graph_notebook.widgets @@ -131,7 +134,7 @@ python -m graph_notebook.start_notebook --notebooks-dir ~/notebook/destination/d ### JupyterLab 3.x -``` +``` bash # install jupyterlab pip install "jupyterlab>=3,<4" @@ -145,23 +148,26 @@ python -m graph_notebook.start_jupyterlab --jupyter-dir ~/notebook/destination/d #### Loading magic extensions in JupyterLab When attempting to run a line/cell magic on a new notebook in JupyterLab, you may encounter the error: -``` + +``` bash UsageError: Cell magic `%%graph_notebook_config` not found. ``` -To fix this, run the following command, then restart JupyterLab. -``` +To fix this, run the following command, then restart JupyterLab. + +``` bash python -m graph_notebook.ipython_profile.configure_ipython_profile ``` Alternatively, the magic extensions can be manually reloaded for a single notebook by running the following command in any empty cell. -``` + +``` bash %load_ext graph_notebook.magics ``` ## Upgrading an existing installation -``` +``` bash # upgrade graph-notebook pip install graph-notebook --upgrade ``` @@ -170,11 +176,28 @@ After the above command completes, rerun the commands given at [Jupyter Classic ## Connecting to a graph database +Configuration options can be set using the `%graph_notebook_config` magic command. The command accepts a JSON object as an argument. The JSON object can contain any of the configuration options listed below. The command can be run multiple times to change the configuration. The configuration is stored in the notebook's metadata and will be used for all subsequent queries. + +| Configuration Option | Description | Default Value | Type | +| --- | --- | --- | --- | +| auth_mode | The authentication mode to use for Amazon Neptune connections | DEFAULT | string | +| aws_region | The AWS region to use for Amazon Neptune connections | your-region-1 | string | +| host | The host url to form a connection with | localhost | string | +| load_from_s3_arn | The ARN of the S3 bucket to load data from [Amazon Neptune only] | | string | +| port | The port to use when creating a connection | 8182 | number | +| proxy_host | The proxy host url to route a connection through [Amazon Neptune only]| | string | +| proxy_port | The proxy port to use when creating proxy connection [Amazon Neptune only] | 8182 | number | +| ssl | Whether to make connections to the created endpoint with ssl or not [True/False] | False | boolean | +| ssl_verify | Whether to verify the server's TLS certificate or not [True/False] | True | boolean | +| sparql | SPARQL connection object | ``` { "path": "sparql" } ``` | string | +| gremlin | Gremlin connection object | ``` { "username": "", "password": "", "traversal_source": "g", "message_serializer": "graphsonv3" } ```| string | +| neo4j | Neo4J connection object |``` { "username": "neo4j", "password": "password", "auth": true, "database": null } ``` | string | + ### Gremlin Server In a new cell in the Jupyter notebook, change the configuration using `%%graph_notebook_config` and modify the fields for `host`, `port`, and `ssl`. Optionally, modify `traversal_source` if your graph traversal source name differs from the default value, `username` and `password` if required by the graph store, or `message_serializer` for a specific data transfer format. For a local Gremlin server (HTTP or WebSockets), you can use the following command: -``` +``` python %%graph_notebook_config { "host": "localhost", @@ -195,7 +218,7 @@ To setup a new local Gremlin Server for use with the graph notebook, check out [ Change the configuration using `%%graph_notebook_config` and modify the fields for `host`, `port`, and `ssl`. For a local Blazegraph database, you can use the following command: -``` +``` python %%graph_notebook_config { "host": "localhost", @@ -209,7 +232,7 @@ Change the configuration using `%%graph_notebook_config` and modify the fields f You can also make use of namespaces for Blazegraph by specifying the path `graph-notebook` should use when querying your SPARQL like below: -``` +``` python %%graph_notebook_config { @@ -230,7 +253,7 @@ To setup a new local Blazegraph database for use with the graph notebook, check Change the configuration using `%%graph_notebook_config` and modify the defaults as they apply to your Neptune cluster: -``` +``` python %%graph_notebook_config { "host": "your-neptune-endpoint", @@ -242,15 +265,36 @@ Change the configuration using `%%graph_notebook_config` and modify the defaults "aws_region": "your-neptune-region" } ``` + To setup a new Amazon Neptune cluster, check out the [Amazon Web Services documentation](https://docs.aws.amazon.com/neptune/latest/userguide/manage-console-launch.html). When connecting the graph notebook to Neptune, make sure you have a network setup to communicate to the VPC that Neptune runs on. If not, you can follow [this guide](https://github.com/aws/graph-notebook/tree/main/additional-databases/neptune). +in addition to the above configuration options, you can also specify the following options: + +### Amazon Neptune Proxy Connection + +``` python +%%graph_notebook_config +{ + "host": "clustername.cluster-ididididid.us-east-1.neptune.amazonaws.com", + "port": 8182, + "ssl": true, + "proxy_port": 8182, + "proxy_host": "host.proxy.com", + "auth_mode": "IAM", + "aws_region": "us-east-1", + "load_from_s3_arn": "" +} +``` + +Connecting to Amazon Neptune from clients outside the Neptune VPC using AWS Network [Load Balancer](https://aws-samples.github.io/aws-dbs-refarch-graph/src/connecting-using-a-load-balancer/#connecting-to-amazon-neptune-from-clients-outside-the-neptune-vpc-using-aws-network-load-balancer) + ## Authentication (Amazon Neptune) If you are running a SigV4 authenticated endpoint, ensure that your configuration has `auth_mode` set to `IAM`: -``` +``` python %%graph_notebook_config { "host": "your-neptune-endpoint", @@ -265,10 +309,10 @@ If you are running a SigV4 authenticated endpoint, ensure that your configuratio Additionally, you should have the following Amazon Web Services credentials available in a location accessible to Boto3: -- Access Key ID -- Secret Access Key -- Default Region -- Session Token (OPTIONAL. Use if you are using temporary credentials) +* Access Key ID +* Secret Access Key +* Default Region +* Session Token (OPTIONAL. Use if you are using temporary credentials) These variables must follow a specific naming convention, as listed in the [Boto3 documentation](https://boto3.amazonaws.com/v1/documentation/api/latest/guide/configuration.html#using-environment-variables) @@ -276,13 +320,13 @@ A list of all locations checked for Amazon Web Services credentials can also be ### Neo4J -Change the configuration using `%%graph_notebook_config` and modify the fields for `host`, `port`, `ssl`, and `neo4j` authentication. +Change the configuration using `%%graph_notebook_config` and modify the fields for `host`, `port`, `ssl`, and `neo4j` authentication. If your Neo4J instance supports [multiple databases](https://neo4j.com/developer/manage-multiple-databases/), you can specify a database name via the `database` field. Otherwise, leave the `database` field blank to query the default database. For a local Neo4j Desktop database, you can use the following command: -``` +``` python %%graph_notebook_config { "host": "localhost", @@ -305,7 +349,7 @@ To setup a new local Neo4J Desktop database for use with the graph notebook, che A pre-release distribution can be built from the graph-notebook repository via the following steps: -``` +``` bash # 1) Clone the repository and navigate into the clone directory git clone https://github.com/aws/graph-notebook.git cd graph-notebook @@ -336,16 +380,16 @@ You should now be able to find the built distribution at And use it by following the [installation](https://github.com/aws/graph-notebook#installation) steps, replacing -``` +``` python pip install graph-notebook ``` with -``` +``` python pip install ./dist/graph_notebook-3.8.2-py3-none-any.whl -``` +``` ## Contributing Guidelines