diff --git a/ChangeLog.md b/ChangeLog.md
index d1203102..4e437966 100644
--- a/ChangeLog.md
+++ b/ChangeLog.md
@@ -3,6 +3,8 @@
Starting with v1.31.6, this file will contain a record of major features and updates made in each release of graph-notebook.
## Upcoming
+- New Gremlin Language Tutorial notebooks ([Link to PR](https://github.com/aws/graph-notebook/pull/533))
+ - Path: 06-Language-Tutorials > 03-Gremlin
- Added `--explain-type` option to `%%gremlin` ([Link to PR](https://github.com/aws/graph-notebook/pull/503))
- Added general documentation for `%%graph_notebook_config` options ([Link to PR](https://github.com/aws/graph-notebook/pull/504))
- Modified Dockerfile to support Python 3.10 ([Link to PR](https://github.com/aws/graph-notebook/pull/519))
diff --git a/src/graph_notebook/notebooks/06-Language-Tutorials/03-Gremlin/01-Basic-Read-Queries.ipynb b/src/graph_notebook/notebooks/06-Language-Tutorials/03-Gremlin/01-Basic-Read-Queries.ipynb
new file mode 100644
index 00000000..9d789c85
--- /dev/null
+++ b/src/graph_notebook/notebooks/06-Language-Tutorials/03-Gremlin/01-Basic-Read-Queries.ipynb
@@ -0,0 +1,1088 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "id": "eab505f3",
+ "metadata": {},
+ "source": [
+ "Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.\n",
+ "SPDX-License-Identifier: Apache-2.0\n",
+ "\n",
+ "# Learning Gremlin - Basic Read Queries\n",
+ "\n",
+ "This notebook is the first in a series of notebooks that walk through how to write queries using Gremlin. In this notebook, we will examine the basics of Gremlin read queries and how these queries fit into the \"Find\", \"Filter\", \"Format\" paradigm. Let's begin by loading some sample data into our Neptune cluster. \n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "## Getting Started\n",
+ "\n",
+ "For these notebooks, we will be leveraging a dataset from the book [Graph Databases in Action](https://www.manning.com/books/graph-databases-in-action?a_aid=bechberger) from Manning Publications. \n",
+ "\n",
+ "\n",
+ "**Note** These notebooks do not cover data modeling or building a data loading pipeline. If you would like a more detailed description about how this dataset is constructed and the design of the data model came from then please read the book.\n",
+ "\n",
+ "To get started, the first step is to load data into the cluster. Assuming the cluster is empty, this can be accomplished by running the cell below which will load our Dining By Friends data."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "b515ec7d",
+ "metadata": {},
+ "source": [
+ "### Before you begin\n",
+ "\n",
+ "Throughout all the **Learning Gremlin on Neptune** notebooks, you will notice that each code block starts with either a `%` or `%%` command. These are called *workbench magic* commands, and are essentially shortcuts to specific Neptune APIs. For example:\n",
+ "\n",
+ "* `%%gremlin` - issues a Gremlin query to the Neptune endpoint usng WebSockets\n",
+ "* `%seed` - provides a convenient way to add sample data to your Neptune endpoint\n",
+ "* `%load` - generates a form that you can use to submit a bulk load request to Neptune\n",
+ "\n",
+ "For more information on workbench magics, and to see all the supported commands, refer to the [Using Neptune workbench magics in your notebooks](https://docs.aws.amazon.com/neptune/latest/userguide/notebooks-magics.html) user guide."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "8c569a38",
+ "metadata": {},
+ "source": [
+ "### Loading Data\n",
+ "\n",
+ "Run the following command to load the sample data set that we'll be using. We'll only need to run this once, and ensure your database is empty before doing so."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "03dd0507",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%seed --model property_graph --language gremlin --dataset dining_by_friends --run"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "dfa24286",
+ "metadata": {},
+ "source": [
+ "### Setting up the visualizations\n",
+ "\n",
+ "Run the next two cells to configure various display options for our notebook, which we will use later on to display our results in a pleasing visual way. "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "6e655017",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%graph_notebook_vis_options\n",
+ "{\n",
+ " \"groups\": { \n",
+ " \"person\": {\n",
+ " \"color\": \"#9ac7bf\"\n",
+ " },\n",
+ " \"review\": {\n",
+ " \"color\": \"#f8cecc\"\n",
+ " },\n",
+ " \"city\": {\n",
+ " \"color\": \"#d5e8d4\"\n",
+ " },\n",
+ " \"state\": {\n",
+ " \"color\": \"#dae8fc\"\n",
+ " },\n",
+ " \"review_rating\": {\n",
+ " \"color\": \"#e1d5e7\"\n",
+ " },\n",
+ " \"restaurant\": {\n",
+ " \"color\": \"#ffe6cc\"\n",
+ " },\n",
+ " \"cusine\": {\n",
+ " \"color\": \"#fff2cc\"\n",
+ " }\n",
+ " }\n",
+ "}"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "de66a832",
+ "metadata": {},
+ "source": [
+ "The following cell creates the `node_labels` object which we use to tell the Notebook which property we want to display when creating graphical visualisations."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "d5c80800",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "node_labels = '{\"person\":\"first_name\",\"city\":\"name\",\"state\":\"name\",\"restaurant\":\"name\",\"cuisine\":\"name\"}'"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "3e51c509",
+ "metadata": {},
+ "source": [
+ "We'll be using the `node_labels` variable to provide a nicer visualisation when running the queries in this notebook. To use it, we need to pass it along with the query itself, as follows:\n",
+ "\n",
+ "`%%gremlin -d node_labels`\n",
+ "\n",
+ "The `-d` instructs the notebook as to which properties should be displayed for each specified node label."
+ ]
+ },
+ {
+ "attachments": {
+ "dining-by-friends.png": {
+ "image/png": ""
+ }
+ },
+ "cell_type": "markdown",
+ "id": "48e9a85f",
+ "metadata": {},
+ "source": [
+ "### Looking at our graph data\n",
+ "\n",
+ "Now that we have loaded our data, let's take a moment to look at what our data model looks like:\n",
+ "\n",
+ "\n",
+ "![dining-by-friends.png](attachment:dining-by-friends.png)\n",
+ "\n",
+ "
\n",
+ " Element (Node/Edge) Counts | \n",
+ "
\n",
+ "\n",
+ " \n",
+ "|Node Label|Count|\n",
+ "|:--|:--|\n",
+ "|review|109|\n",
+ "|restaurant|40|\n",
+ "|cuisine|24|\n",
+ "|person|8|\n",
+ "|state|2|\n",
+ "|city|2|\n",
+ " \n",
+ " | \n",
+ " | \n",
+ " \n",
+ "\n",
+ "|Edge Label|Count|\n",
+ "|:--|:--|\n",
+ "|wrote|218|\n",
+ "|about|218|\n",
+ "|within|84|\n",
+ "|serves|80|\n",
+ "|friends|20|\n",
+ "|lives|16|\n",
+ "\n",
+ " |
\n",
+ "\n",
+ "This dataset represents a fictitious, but realistic, restaurant recommendation application that contains:\n",
+ "\n",
+ "* Users, represented by `person` nodes\n",
+ "* Users connected to Users via `friends` edges\n",
+ "* Restaurants and their associated information (`city`, `state`, `cuisine`)\n",
+ "* Reviews include the body and ratings\n",
+ "* Ratings of reviews (helpful/not helpful)\n",
+ "\n",
+ "This application contains three main aspects to the data it collects. First, it contains a social network consisting of `person` nodes connected to other `person` nodes via a `friends` edge. Second, it contains a restaurant review aspect consisting of `restaurant` nodes, information about those restaurants (`city`/`state`/`cuisine`), and `review` nodes for that restaurant. The third, and final aspect, consists of a personalization component where a `person` can rate a `review`, which allows for better recommendations based on a person's preferences.\n",
+ "\n",
+ "Throughout this set of notebooks, we will leverage the different aspects of this data to highlight different fundamental types of common property graph queries, namely neighborhood traversals, hierarchies, paths, and collaborative filtering.\n",
+ "\n",
+ "Now let's get started."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "ab986dac",
+ "metadata": {},
+ "source": [
+ "\n",
+ "## Finding your Data\n",
+ "\n",
+ "When working with Gremlin, the most common usage of the language is to find data. Gremlin accomplishes this using the following constructs:\n",
+ "\n",
+ "* `V()` - used to access nodes in the graph\n",
+ "* `E()` - used to access edges in the graph\n",
+ "* `has()` - used to filter for objects with a property of a specific value\n",
+ "* `hasLabel()` - used to filter for objects with a specific label or labels.\n",
+ "\n",
+ "To access nodes and edges, you must first get access to the graph itself. In Neptune, this is prebound to a variable called `g`, which is used as the first step in any Gremlin query.\n",
+ "\n",
+ "Gremlin supports a number of steps that help us traverse the graph. Some of these steps are listed below:\n",
+ "\n",
+ "#### Gremlin Steps\n",
+ "\n",
+ "| Gremlin Steps|Description|\n",
+ "|:--|:--|\n",
+ "|`both()`|Follow edges in either direction|\n",
+ "|`outE()`|Include the outgoing edges in the query (to check a label or property for example)|\n",
+ "|`inE()`|Include the incoming edges in the query (to check a label or property for example)|\n",
+ "|`bothE()` |Include edges in either direction in the query|\n",
+ "|`outV()`|The node on the other end of an outgoing edge|\n",
+ "|`inV()`|The node on the other end of an incoming edge|\n",
+ "|`otherV()`|The node on the other end of either an outgoing or incoming edge|\n",
+ "\n",
+ "\n",
+ "Now that we have a basic understanding of Gremlin's traversal steps, let's take a look at how this is applied to answer some common graph query patterns.\n",
+ "\n",
+ "### Finding Nodes\n",
+ "\n",
+ "The simplest traversal you can do in Gremlin is to search for nodes. In Gremlin traversals, nodes are represented by `V()`. In our example, *review*, *restaurant*, *cuisine*, *person*, *state* and *city* as represented as nodes.\n",
+ "\n",
+ "Execute the query below to search for all nodes and return them, but limit the number returned to 10."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "deabe58e",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%gremlin\n",
+ "g.V() //find me all reviews, restaurants, cuisines, persons, states and cities\n",
+ ".limit(10) //return only 10 results\n",
+ ".elementMap()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "4eee92c8",
+ "metadata": {},
+ "source": [
+ "### Finding Edges\n",
+ "\n",
+ "The example above works, however it does not leverage the connections within the data, represented by edges in our graph, that make graph databases powerful. \n",
+ "\n",
+ "To perform a search across multiple nodes and edges, we need to use our traversal to specify how the nodes and edges are related using the following syntax:\n",
+ "\n",
+ "\n",
+ "| Gremlin Step|Description|\n",
+ "|:--|:--|\n",
+ "|`outE()`|Include the outgoing edges in the query (to check a label or property for example)|\n",
+ "|`inE()`|Include the incoming edges in the query (to check a label or property for example)|\n",
+ "|`bothE()` |Include edges in either direction in the query|\n",
+ "\n",
+ "Execute the query below to search for node-edge->node patterns described by the `V().inE().outV()` steps, and return 10 results."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "168bcc76",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%gremlin\n",
+ "g.V() //find me all nodes\n",
+ ".inE() //traverse to the incoming edge\n",
+ ".outV() //find the outgoing node\n",
+ ".limit(10) //return only 10 results\n",
+ ".elementMap()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "6d36e1dd",
+ "metadata": {},
+ "source": [
+ "In the example above we specified using incoming edges, using the `inE()` step, but we could have also chosen to look for patterns using only outgoing edges, `outE()`, or ignoring edge direction, `bothE()`. \n",
+ "\n",
+ "To build more complex patterns, we can use these basic constructs to link together multiple levels of connections to find more complex patterns. In the example below, we have extended our previous query to return 10 nodes that have both incoming and outgoing edges, by specifying a `node-edge->node<-edge-node` pattern."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "63d548a0",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%gremlin\n",
+ "g.V() //find me all nodes\n",
+ ".outE() //traverse the outgoing edges\n",
+ ".inV() //find the node at the end of the edge\n",
+ ".inE() //traverse the incoming edge\n",
+ ".outV() //find the outgoing edge\n",
+ ".dedup() //remove any duplicates\n",
+ ".limit(10) //return only 10 results\n",
+ ".elementMap()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "a3093ad2",
+ "metadata": {},
+ "source": [
+ "We can also do the same using the combination of `bothE()` and `otherV()`, instead of explicitly stating whether to travese outgoing or incoming edges."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "55f2aaf9",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%gremlin\n",
+ "g.V() //find me all nodes\n",
+ ".bothE() //traverse both outgoing and incoming edges\n",
+ ".otherV() //find the node at the end of the edge\n",
+ ".dedup() //remove any duplicates\n",
+ ".limit(10) //return only 10 results\n",
+ ".elementMap()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "6f5ef9ca",
+ "metadata": {},
+ "source": [
+ "In the example above, we have returned matches based on a series of connected nodes and edges. When working with graphs, a series of connected nodes and edges may also be referred to as a 'path'. Often when we are looking for patterns within our graph we would like to return not just a node or edge within the pattern but the path containing how these items are connected.\n",
+ "\n",
+ "\n",
+ "### Finding Paths\n",
+ "\n",
+ "To find paths within our graph we can use the constructs we have already learned to specify that we want the path returned. In our previous queries, we started at every node and traversed to the adjacent node using the incoming edge using `inE()`, the outgoing edge using `outE()`, or disregarded the edge direction using `bothE()`. However, we only returned the adjacent node, and not the path.\n",
+ "\n",
+ "To return the path, we use the `path()` step. This will return traversal information for all paths that have been crossed."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "1c3736ac",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%gremlin\n",
+ "g.V() //find me all nodes\n",
+ ".bothE() //traverse both outgoing and incoming edges\n",
+ ".otherV() //find the node at the end of the edge\n",
+ ".dedup() //remove any duplicates\n",
+ ".path() // <-- now return the path\n",
+ ".by(elementMap())\n",
+ ".limit(10) //return only 10 results"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "df52d879",
+ "metadata": {},
+ "source": [
+ "## Filtering your Data\n",
+ "\n",
+ "So far, we have learned how to find specific patterns within our graph based on how the nodes and edges connect. However, most of the time you will want to use attributes of the nodes and edges to filter the results to return a more specific subset of data. \n",
+ "\n",
+ "We accomplish this using the `has()` or `where()` steps. Using the `has()` step filters the traversal based on the existence of a property with a specific value. We can use the `where()` step to filter the traversal based on the existence of a matching traversal. \n",
+ "\n",
+ "This is an **important** differentiation between the two filtering steps. For example, if you wanted to filter based on the existence of a property, or if a property value matched an arbitrary value, you would use `has()`:\n",
+ "\n",
+ "`g.V().has('name','Dave')`\n",
+ "\n",
+ "Alternatively, if you wanted to filter based on a traversal, you would use `where()` instead. For example, if you wanted all nodes with more than 100 outgoing connections, you could use a query such as that below:\n",
+ "\n",
+ "`g.V().where(out().count().is(gt(100)))`\n",
+ "\n",
+ "Within both the `has()` and `where()` steps, there are a variety of predicates available to perform logical operations and comparisons of the data. Below is a listing of the some of the predicates supported by Gremlin.\n",
+ "\n",
+ "### Predicate Functions ###\n",
+ "\n",
+ "Predicates are functions used to compare values based on equality, ranges or certain patterns. Below is a list of some of the predicates supported by Gremlin. These are implemented in either the [TextP](https://tinkerpop.apache.org/javadocs/current/core/org/apache/tinkerpop/gremlin/process/traversal/TextP.html) or [P](https://tinkerpop.apache.org/javadocs/current/core/org/apache/tinkerpop/gremlin/process/traversal/P.html) traversal classes \n",
+ "\n",
+ "|Type|Predicate|\n",
+ "| ----------- | ----------- |\n",
+ "|General|`within()`, `without()`, `between()`|\n",
+ "|Math|`eq()`, `neq()`, `gt()`, `lt()`, `gte()`, `lte()`|\n",
+ "|String|`startingWith()`, `endingWith()`, `notStartingWith()`, `notEndingWith()`, `containing()`, `notContaining()`|\n",
+ "|Boolean|`and()`, `or()`, `not()`|\n",
+ "|Regex|`regex()`, `notRegex()`|\n",
+ "\n",
+ "### Filtering Steps ###\n",
+ "\n",
+ "In addition to the predicate functions listed above, Gremlin also supports steps which provide filtering functionality. Below is a list of some of the supported steps:\n",
+ "\n",
+ "|Gremlin Step|Description|Example|\n",
+ "| ----------- | ----------- | ----------- |\n",
+ "|`is()`|Filter scalar values|`g.V().values('age).is(32)`|\n",
+ "\n",
+ "### Reducing Barrier Steps ###\n",
+ "\n",
+ "Finally, Gremlin supports steps that are defined as `reducing barrier steps`. So what is a Reducing Barrier Step? A full definition can be read in the [official Tinkerpop Apache](https://tinkerpop.apache.org/docs/3.7.0/reference/#a-note-on-barrier-steps), but a shorter explanation is provided below as follows:\n",
+ "\n",
+ "_\"Gremlin is a lazy stream processing language. This means it will not evaluate data within a traversal until it reaches a step (called a `reducing barrier step`) that requires all the previous traverses to be processed, and a single 'reduced value' traverser to be emitted to the next step.\"_\n",
+ "\n",
+ "Some examples of `reducing barrier steps` are as follows:\n",
+ "\n",
+ "|Type|Step\n",
+ "| ----------- | ----------- |\n",
+ "|List|`fold()`|\n",
+ "|Math|`count()`, `sum()`, `max()`, `min()`|\n",
+ "|Aggregation|`group()`,`groupCount()`|\n",
+ "\n",
+ "\n",
+ "In the next sections, we will look at some common ways to apply filters using predicates and filtering steps, as well as using barrier steps to modify our return values."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "20cf71b1",
+ "metadata": {},
+ "source": [
+ "### Filtering Nodes by Label\n",
+ "\n",
+ "One of the most common items you will want to filter on will be the label(s) associated with a node. This can be accomplished by using the `hasLabel()` step."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "651a0f21",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%gremlin\n",
+ "g.V()\n",
+ ".hasLabel('person') // <-- find all person nodes\n",
+ ".limit(10)\n",
+ ".elementMap()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "61f35676",
+ "metadata": {},
+ "source": [
+ "#### Filtering by multiple labels using `hasLabel()`\n",
+ "\n",
+ "In Property Graph, nodes can have multiple labels associated with them, so you may need to filter across more than one label. You can do this using the example below:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "0460223d",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%gremlin\n",
+ "g.V()\n",
+ ".hasLabel('person','restaurant') // <-- filter on the node label, e.g. find all person and restaurant nodes\n",
+ ".limit(10)\n",
+ ".elementMap()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "51893308",
+ "metadata": {},
+ "source": [
+ "### Filtering Edge by Label\n",
+ "Another common item you to filter on is the type or label associated with an edge. As with nodes, you can use the `hasLabel()` step associated with an edge."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "3b673615",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%gremlin -d $node_labels\n",
+ "g.V()\n",
+ ".hasLabel('person') // <-- find all person nodes\n",
+ ".inE()\n",
+ " .hasLabel('friends') // <-- filter on the edge label\n",
+ ".outV()\n",
+ ".path()\n",
+ ".by(elementMap())\n",
+ ".limit(10)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "52375922",
+ "metadata": {},
+ "source": [
+ "What about if we only wanted to include people who have 2 or more connected friends? We can use some of the predicates we mentioned earlier:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "a76e67d6",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%gremlin -d $node_labels\n",
+ "g.V()\n",
+ ".hasLabel('person')\n",
+ ".where(out().hasLabel('person').count().is(gte(2))) // <-- filter only people who have at 2 or more friend connections\n",
+ ".outE()\n",
+ " .hasLabel('friends')\n",
+ ".inV()\n",
+ ".path()\n",
+ ".by(elementMap())\n",
+ ".limit(10)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "c5abc463",
+ "metadata": {},
+ "source": [
+ "What if we wanted to get a list of all the restaurants in order to find out which cuisine's they serve? After all, all this learning has made me hungry!"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "7a6a9575",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%gremlin -d $node_labels\n",
+ "g.V()\n",
+ ".hasLabel('restaurant') // <-- get all the restaurant nodes\n",
+ ".out()\n",
+ " .hasLabel('cuisine') // <-- traverse outwards to the cuisine nodes\n",
+ ".path() // <-- get the path\n",
+ ".by(values('name')) // <-- return the 'name' property for all nodes in the traversals"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "ff908e2b",
+ "metadata": {},
+ "source": [
+ "You'll have noticed we used `by()` and `values()` in the above step. This was to be able to format the results to show the `name` property of each of the nodes in the traversal. We'll be explaining how these work in more detail in the **Formatting Results** section below."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "2d59628b",
+ "metadata": {},
+ "source": [
+ "### Finding by Property\n",
+ "\n",
+ "The next common use case for filtering is to be able to filter on attribute values. \n",
+ "\n",
+ "This can be accomplished by using the `has()` step as described above, which applies to both nodes and edges.\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "e755b029",
+ "metadata": {
+ "scrolled": false
+ },
+ "outputs": [],
+ "source": [
+ "%%gremlin\n",
+ "g.V() // <-- start with all nodes\n",
+ ".has('first_name','Dave') // <-- filter nodes which have a 'first_name' property value of 'Dave'\n",
+ ".limit(10)\n",
+ ".elementMap()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "8b998cd1",
+ "metadata": {},
+ "source": [
+ "Because there are no properties associated to any of our edges, running the following query won't return any records. However, you can use it to see how the same concept of filtering nodes based on properties can be applied to edge.s"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "098151bb",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%gremlin\n",
+ "g.V() // <-- start with all nodes\n",
+ ".outE() // <-- traverse the outbound edge, landing on that edge\n",
+ ".has('weight', 1) // <-- filter edges which have a 'weight' property value of '1'\n",
+ ".elementMap()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "2081d7b4",
+ "metadata": {},
+ "source": [
+ "## Formatting Results\n",
+ "\n",
+ "Having gone through the basics of finding and filtering data with Gremlin, let's take a look at the last step, formatting our results. Almost all Gremlin queries will return a value. How this is formatted depends on the traversal and formatting steps.\n",
+ "\n",
+ "\n",
+ "### Returning all values\n",
+ "\n",
+ "By default, Gremlin will only return the object id in the result set, in the format of `v[]` for nodes and `e[]` for edges. Run the following code to see an example:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "2d1af3bf",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%gremlin\n",
+ "g.V()\n",
+ ".hasLabel('person')\n",
+ ".limit(10)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "953027a0",
+ "metadata": {},
+ "source": [
+ "This is great, but isn't particularly useful unless we're pairing it up with data stored elsewhere. Instead, we'd like to retrieve properties about each returned node, and we have three options to be able to do this:\n",
+ "\n",
+ "* `valueMap()` - returns a map of all the non-internal property values. Use `valueMap().with(WithOptions.tokens)` to include internal properties such as id and label.\n",
+ "* `values()` - returns each non-internal property as an individual row.\n",
+ "* `elementMap()` - returns a map of ALL property values\n",
+ "\n",
+ "`valueMap()`, `values()` and `elementMap()` also accept a property name, or list of properties to return, as shown below:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "ce3ea852",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%gremlin\n",
+ "g.V()\n",
+ ".hasLabel('person')\n",
+ ".values('first_name')"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "a0442191",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%gremlin\n",
+ "g.V()\n",
+ ".hasLabel('person')\n",
+ ".elementMap('first_name')"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "8713cbba",
+ "metadata": {},
+ "source": [
+ "**Note**, If you only need to return specific properties from a query, it's recommended that you provide the names of the required properties, so it doesn't return more data than you need."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "dae29842",
+ "metadata": {},
+ "source": [
+ "It's important to understand the difference between `valueMap()` and `elementMap()`. Whilst both return properties as a map, they do so in fundamentally different ways.\n",
+ "\n",
+ "With `valueMap()` all non-internal properties are returned (unless `.with(WithOptions.tokens)` is specified). In addition, all property values are represented as lists, even if there is only a single property value.\n",
+ "\n",
+ "With `elementMap()` all properties are returned, however unlike `valueMap()` they are not returned as list members. Where you have list or set property containing multiple values, **only the first member is returned**. If you need to return these types of properties, you should use `valueMap()` instead. \n",
+ "\n",
+ "In addition, when using `elementMap()` with edges, additional information regarding the attached vertices is also returned. The following query demonstrates this."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "81c472e9",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%gremlin\n",
+ "\n",
+ "g.V().limit(1).outE().limit(1).elementMap()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "05332a49",
+ "metadata": {},
+ "source": [
+ "### Returning property values\n",
+ "\n",
+ "Most often, you want to be specific about the data elements (node/edges), attributes, or a combination of both, that a query returns. This provides for efficient processing, both at the database and client level, and efficient data transmission, since we are only retrieving, processing, and sending what is needed. \n",
+ "\n",
+ "As we've already seen, Gremlin will only return the results at the end of the traversal, so how do we obtain details of objects that are specified higher up in the traversal? To accomplish this, we can use the `select()` and `project()` steps.\n",
+ "\n",
+ "#### Selecting and Aliasing\n",
+ "\n",
+ "We can use `select()` to refer to objects that have been aliased previously in the traversal using the `as()` step. Below is an example of how this is achieved:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "9ba582e4",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%gremlin\n",
+ "g.V()\n",
+ ".has('first_name','Dave').as('dave')\n",
+ ".out()\n",
+ ".hasLabel('person').as('friend')\n",
+ ".select('dave','friend')\n",
+ ".by('first_name')\n",
+ ".by('first_name')"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "dc834aed",
+ "metadata": {},
+ "source": [
+ "The above query is quite a jump from our previous examples, so let's breakdown the steps:\n",
+ "\n",
+ "* `g.V()` - start by looking at all the nodes\n",
+ "* `.has('first_name','Dave').as('dave')` - find all nodes with a `first_name` property value of `Dave`. Store these nodes under the alias of `dave`.\n",
+ "* `.out().hasLabel('person').as('friend')` - traverse the outgoing edge to an adjacent `person` node. Store these nodes under the alias of `friend`.\n",
+ "* `.select('dave','friend')` - refer to the previously aliased traversals using the `select()` step.\n",
+ "* `.by('first_name')` - this determines how to format the output of each of the aliased objects. In this case, we're only outputting the `first_name` property for all nodes in the `dave` and `friend` variables."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "0452038d",
+ "metadata": {},
+ "source": [
+ "**Note** When using `by()` after a `select()` you must specify the same number of `by()` statements as there are variables in the `select()`. Failing to doing so, will cause Gremlin to re-use whichever by() statements have been specified, starting with the first one. Now, this may not always be a problem, as we can see in the next example:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "ac2eff7d",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%gremlin\n",
+ "g.V()\n",
+ ".has('first_name','Dave').as('Me')\n",
+ ".out()\n",
+ ".hasLabel('person').as('MyFriends')\n",
+ ".select('Me','MyFriends')\n",
+ ".by('first_name')"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "bc56a6d2",
+ "metadata": {},
+ "source": [
+ "The above query is the same as before, however because we only want to return the `first_name` property from nodes in both the `dave` and `friend` aliases, we only need to specify one `by()` modulator.\n",
+ "\n",
+ "#### Projection\n",
+ "\n",
+ "Unlike `select()` that aliases previous traversals, the `project()` step takes data from the incoming traversal and moves forward with it. The following example shows how to use `project()` to return the same results as `select()`:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "04be1fd9",
+ "metadata": {
+ "scrolled": true
+ },
+ "outputs": [],
+ "source": [
+ "%%gremlin\n",
+ "g.V()\n",
+ ".has('first_name','Dave').as(\"Me\")\n",
+ ".out()\n",
+ " .hasLabel('person').values('first_name').as(\"MyFriend\")\n",
+ ".project('Me','MyFriend')\n",
+ " .by(select(\"Me\").values('first_name'))\n",
+ " .by(select(\"MyFriend\"))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "943bdceb",
+ "metadata": {},
+ "source": [
+ "Notice in the above query how we've combined `project()` and `select()` to provide us with the same results. This is because to we've needed to alias specific portions of the incoming traversal, e.g. the node representing *Dave*, and the nodes representing Dave's *friends*.\n",
+ "\n",
+ "If we were to run the following query, you'll notice something very odd happen with the results."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "a8c78c07",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%gremlin\n",
+ "g.V()\n",
+ ".has('first_name','Dave')\n",
+ ".project('Me','MyFriends')\n",
+ ".by(values('first_name'))\n",
+ ".by(out().hasLabel('person').values('first_name'))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "d8628c06",
+ "metadata": {},
+ "source": [
+ "The query looks like it should return the `first_name` value of the node representing *Dave*, and then return the `first_name` property of all the outbound `person` nodes associated with Dave. So why didn't it?\n",
+ "\n",
+ "It's because the first `.has('first_name','Dave')` step created a single traverser. This was the input to the `project()` step. From here, the `.by(out().hasLabel('person').values('first_name'))` step is essentially executing a **sub-query** at this point, which is why it only returns a single row. If you've ever written sub-queries in SQL, you'll know that they're only allowed to return one row, and this query follows the same pattern.\n",
+ "\n",
+ "So how do we solve this? We can use the `fold()` step to bundle up all the `first_name` values into a single value, and return this instead."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "4f89c0ba",
+ "metadata": {},
+ "source": [
+ "### Returning unique values\n",
+ "\n",
+ "To return unique values in the results, we can use the `dedup()` step. This can be used in two ways:\n",
+ "\n",
+ "* to remove duplicates from the incoming traversal\n",
+ "* to only return unique values based on a `by()` modulation\n",
+ "\n",
+ "Both applications are shown below:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "c6cf5578",
+ "metadata": {
+ "scrolled": true
+ },
+ "outputs": [],
+ "source": [
+ "%%gremlin\n",
+ "g.V()\n",
+ ".hasLabel('person')\n",
+ ".both().hasLabel('person')\n",
+ ".dedup()\n",
+ ".values('first_name')\n",
+ ".limit(10)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "c8ab0463",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%gremlin\n",
+ "g.V()\n",
+ ".hasLabel('person')\n",
+ ".dedup()\n",
+ ".by('first_name')\n",
+ ".values('first_name')"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "809468ee",
+ "metadata": {},
+ "source": [
+ "### Returning custom values\n",
+ "\n",
+ "In addition to returning simple key-value pairs, we can construct more complex responses. This is a common requirement, especially when returning aggregations or when returning attributes from different variables in the matched patterns.\n",
+ "\n",
+ "These new projections are created by using the `by()` step modulator (which is discussed more in the Loops-Repeats notebook). As we're previous seen, for each traversal step, we write a `by()` step to apply to it. The example below shows how we can return a custom string with the statement \"*person* is friends with *person*\"."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "b6d4d8c3",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%gremlin\n",
+ "g.V()\n",
+ ".out().hasLabel('person')\n",
+ ".out().hasLabel('person')\n",
+ ".path()\n",
+ ".by('first_name')\n",
+ ".by(constant(' is friends with '))\n",
+ ".by('first_name')"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "870e2989",
+ "metadata": {},
+ "source": [
+ "## Exercises\n",
+ "\n",
+ "Now that we have gone through the basics of writing Gremlin read queries, it's time to put it into practice! Below are several exercises you can complete to verify your understanding of the material covered in this notebook. As practice for what you have learned, please write the Gremlin queries specified below.\n",
+ "\n",
+ "Using the social network portion (`person` and `friends`) of our Dining By Friends graph, let's answer the following questions:\n",
+ "\n",
+ "\n",
+ "### Exercise 1: Find the first name of Dave's friends\n",
+ "\n",
+ "Using the data model above, write a query that will:\n",
+ "\n",
+ "* Find a `person` node(s) with a `first_name` of \"Dave\"\n",
+ "* Find the friends of Dave (i.e. traverse the `friends` edge)\n",
+ "* Return the friends `first_name`\n",
+ "\n",
+ "The correct answer is four results: \"Josh\", \"Hank\", \"Kelly\", \"Jim\""
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "ee77c9c6",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%gremlin\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "f4b6049f",
+ "metadata": {},
+ "source": [
+ "### Exercise 2: Find the first name of the friends of Dave's friends\n",
+ "\n",
+ "Using the data model above, write a query that will:\n",
+ "\n",
+ "* Find a `person` node(s) with a `first_name` of \"Dave\"\n",
+ "* Find the friends of Dave (i.e. traverse the `friends` edge)\n",
+ "* Find the friends of that person (i.e. traverse the `friends` edge)\n",
+ "* Return the friends `first_name`\n",
+ "\n",
+ "The correct answer contains three results: \"Hank\", \"Denise\", \"Paras\""
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "e6cc0978",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%gremlin\n",
+ "\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "91433ad6",
+ "metadata": {},
+ "source": [
+ "### Exercise 3: Find out how the friends of Dave's friends are connected\n",
+ "\n",
+ "Using the data model above, write a query that will:\n",
+ "\n",
+ "* Find a `person` node(s) with a `first_name` of \"Dave\"\n",
+ "* Find the friends of Dave (i.e. traverse the `friends` edge)\n",
+ "* Find the friends of that person (i.e. traverse the `friends` edge)\n",
+ "* Return the path\n",
+ "\n",
+ "The correct answer contains three results:\n",
+ "\n",
+ "- `Dave` -> `Josh` -> `Hank`\n",
+ "- `Dave` -> `Kelly` -> `Denise`\n",
+ "- `Dave` -> `Jim` -> `Paras`"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "0e7b488b",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%gremlin\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "c80aba29",
+ "metadata": {},
+ "source": [
+ "### Exercise 4: Which friends should we recommend for Dave?\n",
+ "\n",
+ "A common use case for graphs in social networks is to recommend new connections. There is a significant amount of research in this area (example [here](https://www.science.org/doi/10.1126/sciadv.aax7310#:~:text=The%20triadic%20closure%20mechanism%20uses,features%20of%20empirical%20social%20networks)) but mainly there are two prevailing mechanisms at work in social networks that we can leverage to help provide efficient recommendations to a user. The first of these mechanisms is called homophily, which is the tendency of similar people to be connected. Homophily is a driving factor in many social networks, with an important outcome being that people connected to you, or connected to people that are connected to you, tend to be similar to you. This leads to the second mechanism in a graph, the concept of a triadic closure. Triadic closure is a way to create or recommend new connections based on common friends or acquaintances. \n",
+ "\n",
+ "\n",
+ "In this exercise, we are going to leverage triadic closure to recommend friends for Dave. To accomplish this, we will need to leverage the previously written queries but extend them to:\n",
+ "\n",
+ "* Find all the friends of friends that do not have a connection to Dave\n",
+ "\n",
+ "The correct answer contains three results: \"Hank\", \"Denise\", \"Paras\""
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "fb26d6fd",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%gremlin\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "d1f7523f",
+ "metadata": {},
+ "source": [
+ "## Conclusion\n",
+ "\n",
+ "In this notebook, we explored the basics of writing Gremlin queries and how they are represented in the \"Find\", \"Filter\", \"Format\" paradigm. First, we learned the basics of how to specify the steps used to match on data in our queries. Next, we learned several different mechanisms for how to filter the data found by our queries to return the correct results. Finally, we learned how to specify the format of the data being returned from a query to make for efficient use of database and application resources.\n",
+ "\n",
+ "In the next notebook, we will take what we have learned in this notebook and extend it to show how to answer questions where the length of the patterns is variable or unknown."
+ ]
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "Python 3",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.10.8"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/src/graph_notebook/notebooks/06-Language-Tutorials/03-Gremlin/02-Loops-Repeats.ipynb b/src/graph_notebook/notebooks/06-Language-Tutorials/03-Gremlin/02-Loops-Repeats.ipynb
new file mode 100644
index 00000000..372f4299
--- /dev/null
+++ b/src/graph_notebook/notebooks/06-Language-Tutorials/03-Gremlin/02-Loops-Repeats.ipynb
@@ -0,0 +1,704 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "id": "eab505f3",
+ "metadata": {},
+ "source": [
+ "Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.\n",
+ "SPDX-License-Identifier: Apache-2.0\n",
+ "\n",
+ "# Learning Gremlin - Loops and Repeat Queries\n",
+ "\n",
+ "This notebook is the second in a series of notebooks that walk through how to write queries using Gremlin. In this notebook, we will examine the basics of how to perform looping and repeating queries in Gremlin. \n",
+ "\n",
+ "\n",
+ "This notebook assumes that you have already completed the previous notebook \"01-Basic-Read-Queries\" so we will continue our lessons from the end of the previous notebook and assume that the data has been loaded into the cluster. "
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "0c12469c",
+ "metadata": {},
+ "source": [
+ "### Setting up the visualizations\n",
+ "\n",
+ "Run the next two cells to configure various display options for our notebook, which we will use later on to display our results in a pleasing visual way. "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "6e655017",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%graph_notebook_vis_options\n",
+ "{\n",
+ " \"groups\": { \n",
+ " \"person\": {\n",
+ " \"color\": \"#9ac7bf\"\n",
+ " },\n",
+ " \"review\": {\n",
+ " \"color\": \"#f8cecc\"\n",
+ " },\n",
+ " \"city\": {\n",
+ " \"color\": \"#d5e8d4\"\n",
+ " },\n",
+ " \"state\": {\n",
+ " \"color\": \"#dae8fc\"\n",
+ " },\n",
+ " \"review_rating\": {\n",
+ " \"color\": \"#e1d5e7\"\n",
+ " },\n",
+ " \"restaurant\": {\n",
+ " \"color\": \"#ffe6cc\"\n",
+ " },\n",
+ " \"cusine\": {\n",
+ " \"color\": \"#fff2cc\"\n",
+ " }\n",
+ " }\n",
+ "}"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "d5c80800",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "node_labels = '{\"person\":\"first_name\",\"city\":\"name\",\"state\":\"name\",\"restaurant\":\"name\",\"cuisine\":\"name\"}'"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "e5bebb15",
+ "metadata": {},
+ "source": [
+ "We'll be using the `node_labels` variable to provide a nicer visualisation when running the queries in this notebook. To use it, we need to pass it along with the query itself, as follows:\n",
+ "\n",
+ "`%%gremlin -d node_labels`\n",
+ "\n",
+ "The `-d` instructs the notebook as to which properties should be displayed for each specified node label."
+ ]
+ },
+ {
+ "attachments": {
+ "dining-by-friends.png": {
+ "image/png": ""
+ }
+ },
+ "cell_type": "markdown",
+ "id": "48e9a85f",
+ "metadata": {},
+ "source": [
+ "### Looking at our graph data\n",
+ "\n",
+ "As we examined the data model in the previous notebook, we are not going to examine it, however we will leave the data schema for reference.\n",
+ "\n",
+ "![dining-by-friends.png](attachment:dining-by-friends.png)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "ab986dac",
+ "metadata": {},
+ "source": [
+ "\n",
+ "## Looping\n",
+ "\n",
+ "When working with any property graph, some of the most powerful queries you can write are ones where the number of connections between a source and a target entity is not known. These types of queries are so common that property graph query languages, such as Gremlin, have first class support as a key piece of the query language. In Gremlin, these queries are written using a mechanism known as Looping and Repeating. Loops allow us to specify a sequence of nodes and relationships, whilst Repeats allow us to specify the number of times to repeat the relationship in the pattern matching syntax, or until an additional pattern has been matched.\n",
+ "\n",
+ "In Gremlin, a basic loop query to find all nodes within 1 to 3 hops looks like:\n",
+ "\n",
+ "```\n",
+ " g.V().repeat( out() ).times(3)\n",
+ "```\n",
+ "\n",
+ "Examining this query we see that there are two defined parts to a loop in Gremlin. The first is the `repeat()` step, which acts as a wrapper to the traversal pattern that we'd like to use. The second part defines the *limit* to be applied to the repeat (we don't want to keep traversing indefinitely!) The *limit* portion can be applied using three different mechanisms, as shown in the list below:\n",
+ "\n",
+ "* `times()` - used to specify the exact number of times a `repeat()` pattern is to be executed\n",
+ "* `until()` - used to specify a traversal pattern that, once satisfied, will stop the `repeat()` for a traversal\n",
+ "* `loops()` - used to extract the number of times a traversal has gone through the current loop"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "14b59d03",
+ "metadata": {},
+ "source": [
+ "### Diving deeper into `repeat()` ###\n",
+ "\n",
+ "The `repeat()` step also supports two 'modulators'; `until()` and `emit()`, which can be both used before or after the `repeat()` step. Using the `until()` step before the `repeat()` is similar to the common [`while...do`](https://www.w3schools.com/java/java_while_loop.asp) programming paradigm, whereas using the `until()` _after_ the `repeat()` is similar to the [`do...while`](https://www.w3schools.com/cpp/cpp_do_while_loop.asp) concept.\n",
+ "\n",
+ "The `emit()` modulator works by returning the results of a traversal as it is executed, and can be useful when used in conjunction with other looping-limiting steps such as `times()`. An example of this is the query below where we want to limit the `repeat()` to two hops, however we also want to return paths which include only one hops."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "37564e65",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%gremlin\n",
+ "g.V()\n",
+ ".repeat(\n",
+ " out()\n",
+ ")\n",
+ ".emit()\n",
+ ".times(2)\n",
+ ".path()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "7553a654",
+ "metadata": {},
+ "source": [
+ "We can also place the `emit()` modulator _before_ the `repeat()` step. This will cause the result in the previous step in the query to be emitted before the results that follow.\n",
+ "\n",
+ "Run the following example, and notice that `Dave` is returned ahead of the results from the `repeat()`."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "caef7b44",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%gremlin\n",
+ "g.V()\n",
+ " .hasLabel(\"person\")\n",
+ " .has(\"first_name\",\"Dave\")\n",
+ " .emit()\n",
+ " .repeat(\n",
+ " out().hasLabel(\"person\")\n",
+ " )\n",
+ " .times(3)\n",
+ " .limit(10)\n",
+ " .path()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "5bfe8e31",
+ "metadata": {},
+ "source": [
+ "Compare this with the output of the following query that doesn't use `emit()` prior to the `repeat()`. You'll notice that `Dave` is no longer included as a path by themselves."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "034bcde1",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%gremlin\n",
+ "g.V()\n",
+ " .hasLabel(\"person\")\n",
+ " .has(\"first_name\",\"Dave\")\n",
+ " .repeat(\n",
+ " out().hasLabel(\"person\")\n",
+ " )\n",
+ " .times(3)\n",
+ " .limit(10)\n",
+ " .path()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "f28802bc",
+ "metadata": {},
+ "source": [
+ "Now that we have a basic understanding of Gremlin's loop syntax, let's look at how this is applied to answer some common graph query patterns.\n",
+ "\n",
+ "### Static Number of Hops\n",
+ "\n",
+ "The simplest looping pattern you can do in Gremlin is to specify a fixed number of hops for your pattern. This is accomplished using the `times()` step. Let's execute the query below to traverse outwards by 2 hops, and return the path."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "deabe58e",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%gremlin\n",
+ "g.V()\n",
+ ".repeat(\n",
+ " out()\n",
+ ")\n",
+ ".times(2)\n",
+ ".path()\n",
+ ".by(elementMap())\n",
+ ".limit(10)"
+ ]
+ },
+ {
+ "attachments": {
+ "looping-example-v2.gif": {
+ "image/gif": ""
+ }
+ },
+ "cell_type": "markdown",
+ "id": "ec458d20",
+ "metadata": {},
+ "source": [
+ "### Explaining the previous Gremlin query\n",
+ "\n",
+ "Using the `repeat()` step, we told Gremlin to traverse all **outgoing** edges **2 times**. The graphic below demonstrates how Gremlin creates additional traverses when there are multiple outgoing edges to follow.\n",
+ "\n",
+ "![looping-example-v2.gif](attachment:looping-example-v2.gif)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "c2178818",
+ "metadata": {},
+ "source": [
+ "### Variable Number of Hops\n",
+ "\n",
+ "While the example above works on a static number of hops, sometimes we do not know the number of connections we need to traverse to answer a question. In this case, we can use the `until()` step to specify an additional pattern that will stop a traverser once the condition is met."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "df52d879",
+ "metadata": {},
+ "source": [
+ "**Note**. The performance of graph queries depend on how much of the graph needs to be traversed. It's important that you have an optimal graph data model to ensure fan-out is kept to a minimum, or large portions of your graph aren't traversed when they don't need to be.\n",
+ "\n",
+ "Execute the query below to see how many paths are connected via any number of `friends` edges. "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "651a0f21",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%gremlin -d $node_labels\n",
+ "g.V()\n",
+ ".hasLabel(\"person\")\n",
+ ".repeat(\n",
+ " bothE()\n",
+ " .hasLabel(\"friends\")\n",
+ " .otherV()\n",
+ " .hasLabel(\"person\")\n",
+ ")\n",
+ ".until(\n",
+ " not(out().hasLabel(\"person\"))\n",
+ ")\n",
+ ".path()\n",
+ ".by(elementMap())\n",
+ ".limit(10)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "81777d83",
+ "metadata": {},
+ "source": [
+ "We can also do the same using `outE()` and `inV()` steps, ensuring we're only traversing in one direction."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "64e01107",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%gremlin -d $node_labels\n",
+ "g.V()\n",
+ ".hasLabel(\"person\")\n",
+ ".repeat(\n",
+ " outE()\n",
+ " .hasLabel(\"friends\")\n",
+ " .inV()\n",
+ " .hasLabel(\"person\")\n",
+ ")\n",
+ ".until(\n",
+ " not(out().hasLabel(\"person\"))\n",
+ ")\n",
+ ".path()\n",
+ ".by(elementMap())\n",
+ ".limit(10)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "07ab8a08",
+ "metadata": {},
+ "source": [
+ "Now execute the following query to limit the number of times we repeat our loop along the `friends` edges:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "2b417d5e",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%gremlin -d $node_labels\n",
+ "g.V()\n",
+ ".hasLabel(\"person\")\n",
+ ".repeat(\n",
+ " bothE()\n",
+ " .hasLabel(\"friends\")\n",
+ " .otherV()\n",
+ " .hasLabel(\"person\")\n",
+ ")\n",
+ ".until(\n",
+ " loops().is(2)\n",
+ ")\n",
+ ".path()\n",
+ ".by(elementMap())\n",
+ ".limit(10)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "807eef5a",
+ "metadata": {},
+ "source": [
+ "There is a 'gotcha' when combining filtering using `has()` and looping using `until()` or `loops()`. \n",
+ "\n",
+ "As we saw in the `01-Basic-Read-Queries` notebook, `has()` provides the functionality to filter on the existence of a specific property, or match based on a property value. We can use this when looping through our graph to stop when we match the specified criteria. For example;"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "154f043a",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%gremlin -d $node_labels\n",
+ "g.V()\n",
+ ".hasLabel(\"person\")\n",
+ ".repeat(\n",
+ " bothE()\n",
+ " .hasLabel(\"friends\")\n",
+ " .otherV()\n",
+ " .hasLabel(\"person\")\n",
+ ")\n",
+ ".until(\n",
+ " has(\"first_name\",\"Dave\")\n",
+ " .or(loops().is(2))\n",
+ ")\n",
+ ".path()\n",
+ ".by(elementMap())"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "8213e885",
+ "metadata": {},
+ "source": [
+ "What we're asking in the above query is:\n",
+ "\n",
+ "*\"traverse from every person node across the friends edge to another person node, and loop until the first_name property matches \"Dave\" or we've repeated 3 iterations\"*\n",
+ "\n",
+ "However, it doesn't quite work in the way that we expect it to. This is common misconception. Whilst we're filtering using `has()` in the `until()` step, no additional filtering is performed on the `or(loops().is(3))` step resulting in additional objects we're not interested in. To mitigate this, we need to apply the same `has()` filtering to the output of the `until()` as follows:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "6b0ff391",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%gremlin -d $node_labels\n",
+ "g.V()\n",
+ ".hasLabel(\"person\")\n",
+ ".repeat(\n",
+ " bothE()\n",
+ " .hasLabel(\"friends\")\n",
+ " .otherV()\n",
+ " .hasLabel(\"person\")\n",
+ ")\n",
+ ".until(\n",
+ " has(\"first_name\",\"Dave\")\n",
+ " .or(loops().is(2))\n",
+ ")\n",
+ ".has(\"first_name\",\"Dave\")\n",
+ ".path()\n",
+ ".by(elementMap())"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "ad636a93",
+ "metadata": {},
+ "source": [
+ "### Cyclic Paths ###\n",
+ "\n",
+ "When repeating a traversal in Gremlin using the `repeat()` it's common to come across a pattern whereby the path loops back on itself. This is called a cyclic path, and can lead to your Gremlin queries looping forever.\n",
+ "\n",
+ "To stop this from occurring, it's good practise to include the `simplePath()` step. This removes paths with repeated objects, thus ensuring cyclic paths are not traversed.\n",
+ "\n",
+ "**Important**. The `simplePath()` filters for repeated object based on the previous step, such as `in()` or `out()`.\n",
+ "\n",
+ "The following query provides an example of combining `simplePath()` with the `out()` step to filter on all connected `person` vertices.\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "0e1c9425",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%gremlin -d $node_labels\n",
+ "g.V()\n",
+ ".hasLabel(\"person\")\n",
+ ".repeat(\n",
+ " out()\n",
+ " .hasLabel(\"person\")\n",
+ " .simplePath()\n",
+ ")\n",
+ ".until(\n",
+ " not(out().hasLabel(\"person\"))\n",
+ ")\n",
+ ".path()\n",
+ ".limit(10)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "974bbde5",
+ "metadata": {},
+ "source": [
+ "\n",
+ "### Visualising Results in a Neptune Notebook\n",
+ "\n",
+ "A key part of using any graph database is being able to visualise the way the objects stored within it are connected to each other. We've already shown how to do this in previous examples, however it's important to understand which of the Gremlin steps support this type of functionality.\n",
+ "\n",
+ "* `path()` - used to provide access to all nodes and edges within each unique path traversed\n",
+ "* `simplePath()` - used to ensure we don't repeat a traversal across an object we've already covered (this can lead to infinite looping if the model supports circular references)\n",
+ "\n",
+ "If you're running this in a Neptune Notebook, we can use the `path()` step we tell the notebook to automatically present a visualisation of the output of a query. The following query returns 10 paths visualising the connections between `person`, `city`, `restaurant` and `cuisine`. Run the following query, and a graphical visualisation will automatically appear."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "4a54b895",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%gremlin -d $node_labels\n",
+ "g.V()\n",
+ " .hasLabel(\"person\") // start with all person nodes\n",
+ " .out(\"lives\") // traverse the outbound \"lives\" edge to city\n",
+ " .in(\"within\") // traverse the inbound edge from city to restaurant\n",
+ " .where(__.inE(\"about\")) // filter on restaurants where at least one review exists\n",
+ " .out(\"serves\") // traverse the outbound edge from restaurant to cuisine\n",
+ " .path() // return the path\n",
+ " .limit(10) // only return 10 results"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "dcfe37a1",
+ "metadata": {},
+ "source": [
+ "Additionally, you can combine `path()` with the `by()` modulator along with the `valueMap()` or `values()` steps to return some or all of the non-internal property values stored against the objects within a path. The following query builds upon what we've already run, by returning all non-internal values as a map."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "859ca5de",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%gremlin -d $node_labels\n",
+ "g.V()\n",
+ " .hasLabel(\"person\") // start with all person nodes\n",
+ " .out(\"lives\") // traverse the outbound \"lives\" edge to city\n",
+ " .in(\"within\") // traverse the inbound edge from city to restaurant\n",
+ " .where(__.inE(\"about\")) // filter on restaurants where at least one review exists\n",
+ " .out(\"serves\") // traverse the outbound edge from restaurant to cuisine\n",
+ " .path() // return the path\n",
+ " .by(\n",
+ " valueMap() // return all the non-internal properties of all vertices within the path\n",
+ " )\n",
+ " .limit(10) // only return 10 results"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "ddc1ee5b",
+ "metadata": {},
+ "source": [
+ "The following query returns only a single property value by using the `values()` step instead of `valueMap()`."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "26bf25b5",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%gremlin -d $node_labels\n",
+ "g.V()\n",
+ " .hasLabel(\"person\") // start with all person nodes\n",
+ " .out(\"lives\") // traverse the outbound \"lives\" edge to city\n",
+ " .in(\"within\") // traverse the inbound edge from city to restaurant\n",
+ " .where(__.inE(\"about\")) // filter on restaurants where at least one review exists\n",
+ " .out(\"serves\") // traverse the outbound edge from restaurant to cuisine\n",
+ " .path() // return the path\n",
+ " .by(\n",
+ " values('first_name','name') // return only the first_name or name property value (whichever is applicable)\n",
+ " )\n",
+ " .limit(10) // only return 10 results"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "ec12aeca",
+ "metadata": {},
+ "source": [
+ "**Important**. It's worth noting that in the above query, just specifying `first_name` or `name` will result in no records being returned. This is because neither are properties across **all** vertices in our data model. For example, the `Person` vertex uses `first_name`, and all other vertices use `name` to store the name of the object. In this case, we can list the different properties and Gremlin will associate whichever property is applicable to whichever vertex.\n",
+ "\n",
+ "We will dive more into using `valueMap()` and `values()` in the next section."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "85de2ce8",
+ "metadata": {},
+ "source": [
+ "## Exercises\n",
+ "\n",
+ "Now that we have gone through the basics of looping and repeating queries in Gremlin, it's time to put it into practice. Below are several exercises you can complete to verify your understanding of the material covered in this notebook. As practice for what you have learned, please write the Gremlin queries specified below.\n",
+ "\n",
+ "### Exercise 1: Find the friends of Dave's Friends using a loop\n",
+ "\n",
+ "Using the data model above, write a query that will:\n",
+ "\n",
+ "* Find a `person` node(s) with a `first_name` of \"Dave\"\n",
+ "* Find the friends of Dave (i.e. traverse the `friends` edge)\n",
+ "* Find the friends of that person (i.e. traverse the `friends` edge)\n",
+ "* Return the friends `first_name`\n",
+ "\n",
+ "The correct answer is a three results: \"Hank\", \"Denise\", \"Paras\""
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "3d1cc0f0",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%gremlin\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "6521e66f",
+ "metadata": {},
+ "source": [
+ "### Exercise 2: Find all `person` nodes connected to Dave\n",
+ "\n",
+ "Starting at a single node and trying to find all connected children (a.k.a. root to leaf) or trying to find the parent of any child node (a.k.a leaf to root) are two very common hierarchical graph query patterns. Commonly, these queries supported bill of materials, information organization, or compliance use cases.\n",
+ "\n",
+ "In this exercise, we will be applying that same query pattern to find the hierarchy of people within our social network. We'll accomplish this by writing a \"root to leaf\" type query where the root node is our `Dave` node in the social network.\n",
+ "\n",
+ "Using the data model above, write a query that will:\n",
+ "\n",
+ "* Find a `person` node(s) with a `first_name` of \"Dave\"\n",
+ "* Keep traversing the outgoing `friends` edge until there are no more outgoing `friends` edges\n",
+ "* Return all the paths\n",
+ "\n",
+ "The correct answer has 5 results"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "10b4aa1f",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%gremlin\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "0ce0b6c8",
+ "metadata": {},
+ "source": [
+ "### Exercise 3: Find all the ways Dave and Denise are connected\n",
+ "\n",
+ "A common extension to the path traversal query we wrote in Loop-3 is to return not just \"if\" someone is connected but \"how\" they are connected.\n",
+ "\n",
+ "In this exercise, we will be making a slight modification to the previous query to return \"how\" Dave and Denise are connected, not just that they are.\n",
+ "\n",
+ "Using the data model above, write a query that will:\n",
+ "\n",
+ "* Find a `person` node(s) with a `first_name` of \"Dave\"\n",
+ "* Find the friends of Dave (i.e. traverse the `friends` edge)\n",
+ "* Keep traversing the `friends` edge until you find `Denise`\n",
+ "* Return the path\n",
+ "\n",
+ "The correct answer has 3 results"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "007a9efd",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%gremlin\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "b5acefc5",
+ "metadata": {},
+ "source": [
+ "## Conclusion\n",
+ "\n",
+ "In this notebook, we explored writing looping and repeat queries in Gremlin. These queries are a powerful and common way to explore connected data to answer questions, especially those where the exact number of connection is unknown. \n",
+ "\n",
+ "In the next notebook we will take what we have learned in this notebook and extend it to demonstrate how to order, group, and aggregate values in queries."
+ ]
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "Python 3",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.10.8"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/src/graph_notebook/notebooks/06-Language-Tutorials/03-Gremlin/03-Ordering-Functions-Grouping.ipynb b/src/graph_notebook/notebooks/06-Language-Tutorials/03-Gremlin/03-Ordering-Functions-Grouping.ipynb
new file mode 100644
index 00000000..6083bf87
--- /dev/null
+++ b/src/graph_notebook/notebooks/06-Language-Tutorials/03-Gremlin/03-Ordering-Functions-Grouping.ipynb
@@ -0,0 +1,824 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "id": "eab505f3",
+ "metadata": {},
+ "source": [
+ "Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.\n",
+ "SPDX-License-Identifier: Apache-2.0\n",
+ "\n",
+ "# Learning Gremlin - Ordering, Functions, and Grouping\n",
+ "\n",
+ "This notebook is the third in a series of notebooks that walk through how to write queries using Gremlin. \n",
+ "\n",
+ "In this notebook, we will examine the basics of how to perform ordering, grouping, and aggregation in Gremlin. This notebook assumes that you have already completed the previous notebook \"01-Basic-Read-Queries\" so we will continue our lessons from the end of the previous notebook and assume that the data has been loaded into the cluster."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "0c12469c",
+ "metadata": {},
+ "source": [
+ "### Setting up the visualizations\n",
+ "\n",
+ "Run the next two cells to configure various display options for our notebook, which we will use later on to display our results in a pleasing visual way. "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "6e655017",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%graph_notebook_vis_options\n",
+ "{\n",
+ " \"groups\": { \n",
+ " \"person\": {\n",
+ " \"color\": \"#9ac7bf\"\n",
+ " },\n",
+ " \"review\": {\n",
+ " \"color\": \"#f8cecc\"\n",
+ " },\n",
+ " \"city\": {\n",
+ " \"color\": \"#d5e8d4\"\n",
+ " },\n",
+ " \"state\": {\n",
+ " \"color\": \"#dae8fc\"\n",
+ " },\n",
+ " \"review_rating\": {\n",
+ " \"color\": \"#e1d5e7\"\n",
+ " },\n",
+ " \"restaurant\": {\n",
+ " \"color\": \"#ffe6cc\"\n",
+ " },\n",
+ " \"cusine\": {\n",
+ " \"color\": \"#fff2cc\"\n",
+ " }\n",
+ " }\n",
+ "}"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "d5c80800",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "node_labels = '{\"person\":\"first_name\",\"city\":\"name\",\"state\":\"name\",\"restaurant\":\"name\",\"cuisine\":\"name\"}'"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "111cfabb",
+ "metadata": {},
+ "source": [
+ "We'll be using the `node_labels` variable to provide a nicer visualisation when running the queries in this notebook. To use it, we need to pass it along with the query itself, as follows:\n",
+ "\n",
+ "`%%gremlin -d node_labels`\n",
+ "\n",
+ "The `-d` instructs the notebook as to which properties should be displayed for each specified node label."
+ ]
+ },
+ {
+ "attachments": {
+ "dining-by-friends.png": {
+ "image/png": ""
+ }
+ },
+ "cell_type": "markdown",
+ "id": "48e9a85f",
+ "metadata": {},
+ "source": [
+ "### Looking at our graph data\n",
+ "\n",
+ "As we examined the data model in the previous notebook, we are not going to examine it, however we will leave the data schema for reference.\n",
+ "\n",
+ "![dining-by-friends.png](attachment:dining-by-friends.png)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "ab986dac",
+ "metadata": {},
+ "source": [
+ "\n",
+ "## Ordering Results\n",
+ "\n",
+ "When working with data, one common requirement is to return that data in a consistent and ordered fashion. \n",
+ "\n",
+ "By default, data returned from an Gremlin query does not have a specified order, and consistent order cannot be assumed across multiple executions of the same query. To give our data a consistent order we must use the combination of the `order()` and `by()` steps. These enable you sort your results using the values that a query can return, such as nodes/edges, ID values, as well as via many expressions. \n",
+ "\n",
+ "**Note** When the data being ordered contains a `null` value, these will be sorted to the end of the results for ascending sort order and the beginning of the list for descending sort order.\n",
+ "\n",
+ "\n",
+ "### Ordering by a property\n",
+ "\n",
+ "The simplest ordering in Gremlin is to specify a single property. This is accomplished using the `order().by()` syntax. By default, items are ordered in ascending order and descending order can be specified using `order().by(, desc)`. \n",
+ "\n",
+ "Let's first look at what our data looks like to find all the `restaurant` nodes in our graph and return the `name` property."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "f7a0df81",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%gremlin\n",
+ "g.V()\n",
+ ".hasLabel('restaurant')\n",
+ ".values('name')"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "91e99c64",
+ "metadata": {},
+ "source": [
+ "As we see, there is no discernible order to the values returned. \n",
+ "\n",
+ "Let's see how to order our data by executing the query below to find all the `restaurant` nodes in our graph and order them by the `name` property in descending order."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "deabe58e",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%gremlin\n",
+ "g.V()\n",
+ ".hasLabel('restaurant')\n",
+ ".order()\n",
+ ".by('name', desc)\n",
+ ".values('name')"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "4eee92c8",
+ "metadata": {},
+ "source": [
+ "As we see, with the addition of the `order().by()` steps we get our data returned in a nice organized manner. \n",
+ "\n",
+ "### Ordering by multiple properties\n",
+ "\n",
+ "A common need when ordering data is to use multiple properties as the ordering criteria. In Gremlin, this is achieved by adding multiple `by()` steps to the `order()` step. When multiple properties are specified, the results are first ordered by the first property, then for equal values, the next property, and so on for all the specified properties. \n",
+ "\n",
+ "Let's see how this works by executing the query below to find all the `restaurant` nodes in our graph and order them by the `name` property, then by the `address` property."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "168bcc76",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%gremlin\n",
+ "g.V()\n",
+ ".hasLabel('restaurant')\n",
+ ".order()\n",
+ ".by('name',desc)\n",
+ ".by('address',asc)\n",
+ ".valueMap('name','address')"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "6d36e1dd",
+ "metadata": {},
+ "source": [
+ "You can also use internal properties such as `id` and `label` as the ordering criteria. In the examples below, we first show how to order by the object `id`, and then show how to order by the object `label`."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "55f2aaf9",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%gremlin\n",
+ "g.V()\n",
+ ".hasLabel(\"restaurant\")\n",
+ ".order().by(id)\n",
+ ".values('name')"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "c63b78f6",
+ "metadata": {
+ "scrolled": true
+ },
+ "outputs": [],
+ "source": [
+ "%%gremlin\n",
+ "g.V()\n",
+ ".hasLabel(\"restaurant\",\"state\")\n",
+ ".order().by(label, desc)\n",
+ ".values('name')"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "55c2ef57",
+ "metadata": {},
+ "source": [
+ "**Important**: Ordering a small set of data (<1M records) in Neptune should be performant. However, when ordering larger data sets (+1M records), it's likely to experience high latency. In this scenario, the recommendation is to query for the results, then use a caching layer such as [Redis Sorted Sets](https://docs.aws.amazon.com/AmazonElastiCache/latest/red-ug/elasticache-use-cases.html#elasticache-for-redis-use-cases-gaming) to perform the ordering and return the data back to the client."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "9b78432a",
+ "metadata": {},
+ "source": [
+ "### Pagination\n",
+ "\n",
+ "One of the most common requirements for applications is the ability to return the data in chunks, or pages in the response. Gremlin supports pagination through the use of three steps: `range()`, and `skip()` and `limit()`. \n",
+ "\n",
+ "We have already used the `limit()` step to specify the maximum number of entities returned. When used with the `skip()` step, which specifies the number of records to ignore at the beginning of the result set, we can create an effective pagination mechanism. One important thing to note about pagination is that we need to explicitly order the results to retrieve a consistent set of data in our pages. Without ordering the results, we have no guarantee that results will be returned in a constant order, meaning that the data shown for a specific \"page\" may differ between calls.\n",
+ "\n",
+ "Let's take a look at how we could use `skip()` and `limit()` to present a paginated view of the restaurants in our graph by retrieving the first page of results."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "0335cb50",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%gremlin\n",
+ "g.V()\n",
+ ".hasLabel(\"restaurant\")\n",
+ ".order().by(\"name\")\n",
+ ".skip(0)\n",
+ ".limit(10)\n",
+ ".values(\"name\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "70d12ad6",
+ "metadata": {},
+ "source": [
+ "Let's see what it looks like to retrieve the second page of data. To accomplish this, we need to set the value in the `skip()` step to represent the page size we would like to skip."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "7371ac06",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%gremlin\n",
+ "g.V()\n",
+ ".hasLabel(\"restaurant\")\n",
+ ".order().by(\"name\")\n",
+ ".skip(10)\n",
+ ".limit(10)\n",
+ ".values(\"name\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "978fff4c",
+ "metadata": {},
+ "source": [
+ "As we see, the data we retrieve from the second query represents the second page of results returned from our query. Please don't hesitate to try additional values for the `skip()` and `limit()` values to see how the query reacts.\n",
+ "\n",
+ "Let's now take a look at how we can use the `range()` step to perform pagination in Gremlin."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "6cd418c3",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%gremlin\n",
+ "g.V()\n",
+ ".hasLabel(\"restaurant\")\n",
+ ".order().by(\"name\")\n",
+ ".range(0,10)\n",
+ ".values(\"name\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "48c15eac",
+ "metadata": {},
+ "source": [
+ "Let's see what it looks like to retrieve the second page of data. To accomplish this, we need to set the value in the `range()` step to represent the page size we would like to return."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "b34b445a",
+ "metadata": {
+ "scrolled": false
+ },
+ "outputs": [],
+ "source": [
+ "%%gremlin\n",
+ "g.V()\n",
+ ".hasLabel(\"restaurant\")\n",
+ ".order().by(\"name\")\n",
+ ".range(10,20)\n",
+ ".values(\"name\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "a0c9db04",
+ "metadata": {},
+ "source": [
+ "When should you use one approach over the other? In some cases, both `limit()` and `skip()` are shorthand alternatives to `range()`. For example, to retrieve all restaurants from the 10th record to the end of the list, we can use the following approaches:\n",
+ "\n",
+ "`g.V().hasLabel(\"restaurant\").skip(5)`\n",
+ "\n",
+ "or\n",
+ "\n",
+ "`g.V().hasLabel(\"restaurant\").range(5,-1)`\n",
+ "\n",
+ "**Important**. When using pagination it's important to understand that even though you're returning a range of records, the query must still retrieve all records and then filter out records outside of the range you're asking for. Therefore, pagination does not improve query performance. Instead, you should use the [Gremlin query results cache](https://docs.aws.amazon.com/neptune/latest/userguide/gremlin-results-cache.html).\n",
+ "\n",
+ "The results of both queries are the same, as is the amount of time to execute each query.\n",
+ "\n",
+ "Now that we have looked at the ordering and pagination in Gremlin, it's time to take a look at another major set of functionality in formatting Gremlin results, grouping."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "952c7825",
+ "metadata": {},
+ "source": [
+ "## Grouping Results\n",
+ "\n",
+ "Grouping results in Gremlin is done by explicitly calling the `group()` or `groupCount()` steps, similar to the `GROUP BY` clause used in SQL. In Gremlin, grouping is also controlled by the use of aggregating expressions containing one or more aggregating functions (`mean()`, `count()`, `max()`, `min()`, `sum()`).\n",
+ "\n",
+ "Groups are determined by the use of the `by` modulator step that proceeds either a `group()` or `groupCount()` step. Let's look at an example to understand how this works.\n",
+ "\n",
+ "**Example**\n",
+ "\n",
+ "|id|first_name|\n",
+ "|---|---|\n",
+ "|1|Dave|\n",
+ "|2|Josh|\n",
+ "|3|Kelly|\n",
+ "|4|Dave|\n",
+ "\n",
+ "```\n",
+ "g.V()\n",
+ ".groupCount()\n",
+ ".by('first_name')\n",
+ ".unfold()\n",
+ "```\n",
+ "Results:\n",
+ "\n",
+ "|result|\n",
+ "|---|\n",
+ "|{'Dave': 2}|\n",
+ "|{'Josh': 1}|\n",
+ "|{'Kelly': 1}|\n",
+ "\n",
+ "In this example, we're counting the number of each occurrence of the `first_name` property value, and then returning it alongside the `first_name` value itself. Similarly, we could use the `group()` step instead, and combine it with the `count()` aggregation function in a second `by` modulation to do the same thing:\n",
+ "\n",
+ "```\n",
+ "g.V()\n",
+ ".group()\n",
+ ".by('first_name')\n",
+ ".by(count())\n",
+ ".unfold()\n",
+ "```\n",
+ "Results:\n",
+ "\n",
+ "|result|\n",
+ "|---|\n",
+ "|{'Dave': 2}|\n",
+ "|{'Josh': 1}|\n",
+ "|{'Kelly': 1}|"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "57de99aa",
+ "metadata": {},
+ "source": [
+ "Before we start on grouping, let's learn about `fold()` and `unfold()` steps, as no doubt the keen eyed amongst you will have noticed that we've been using the `unfold()` step in some of the previous examples.\n",
+ "\n",
+ "### Fold and Unfold\n",
+ "\n",
+ "The `fold()` step converts individual rows into a list in a single row. The following query is an example of returning all restaurants as a list, rather than individual records."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "dcd6555c",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%gremlin\n",
+ "g.V().hasLabel('restaurant')\n",
+ ".values('name')\n",
+ ".fold()\n",
+ "\n",
+ "// Results in:\n",
+ "// [\"Perryman's\", 'Spicy Heat', 'Rare Choice', 'Super Delish', 'Eastern Winds', 'Saucy-Cheesy-Saucers', 'With Pasta', 'With Brine', 'With Wine', 'U-S-A', 'Pick & Go', 'Rare Bull', 'Satiated', 'Good Bull', 'Southern Fire', 'With Salsa', 'With Curry', 'With Shell', 'Taters', 'Awesome Suace', 'Prancing Pony', 'Mexican Hut', 'Rabbitfood', 'Hand Roll', 'Northern Quench', 'Western Granola', 'With Noodles', 'With Sauce', 'Without Chaser', 'With Rice', 'Food For Thought', \"Dave's Big Deluxe\", 'Quick N Greasy', 'Lonely Grape', 'Breaded & Fried', 'All Night Long', 'Black Pit of Des Pair', 'Without Heat', 'With Ginger', 'Fat Fried Fast']"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "507d044c",
+ "metadata": {},
+ "source": [
+ "The `unfold()` step does the opposite, converting a single row of list values into individual rows."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "9db81493",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%gremlin\n",
+ "g.V().hasLabel('restaurant')\n",
+ ".values('name')\n",
+ ".fold()\n",
+ ".unfold()\n",
+ "\n",
+ "// Results in:\n",
+ "//1 Perryman's\n",
+ "//2 Spicy Heat\n",
+ "//3 Rare Choice\n",
+ "//4 Super Delish\n",
+ "//5 Eastern Winds\n",
+ "//6 Saucy-Cheesy-Saucers\n",
+ "//7 With Pasta\n",
+ "//8 With Brine\n",
+ "//9 With Wine\n",
+ "//10 U-S-A"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "821a71e4",
+ "metadata": {},
+ "source": [
+ "### Group by a property\n",
+ "\n",
+ " the query below, returns all the `person` nodes along with the number of nodes with the matching `first_name` property, and ordered by the `first_name` property in **ascending** order."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "f66d367b",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%gremlin -d $node_labels\n",
+ "g.V().hasLabel('person')\n",
+ ".groupCount()\n",
+ ".by('first_name')\n",
+ ".order()\n",
+ ".by('first_name')\n",
+ ".unfold()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "2c92cf2e",
+ "metadata": {},
+ "source": [
+ "### Group on a pattern match\n",
+ " \n",
+ "Another common need is to use multiple different elements in a pattern to perform a grouping/aggregation query. To accomplish this, we combine what we know about filtering with what we have just learned about grouping to achieve this aggregation.\n",
+ "\n",
+ "Let's take a look at what it would look like to find the average rating of the restaurants in our graph."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "f646e881",
+ "metadata": {
+ "scrolled": true
+ },
+ "outputs": [],
+ "source": [
+ "%%gremlin\n",
+ "g.V().hasLabel('restaurant')\n",
+ ".group()\n",
+ ".by('name')\n",
+ ".by(in('about').values('rating').mean())\n",
+ ".unfold()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "60b2eab1",
+ "metadata": {},
+ "source": [
+ "## Combining Queries\n",
+ "\n",
+ "Now that we have learned about all the major features (finding, filtering, formatting, ordering, functions, and grouping) of Gremlin, we have one more topic to discuss in this notebook, how to combine traversals together to create more complex traversals. In Gremlin, we can achieve this by using the `union()` step.\n",
+ "\n",
+ "The `union()` step combines the results of 2 or more traversals together and returns the combined result from both traversals.\n",
+ "\n",
+ "Let's see what an example `union` step looks like:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "190af502",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%gremlin\n",
+ "g.V()\n",
+ ".hasLabel('restaurant')\n",
+ ".union(\n",
+ " has('name','With Pasta'),\n",
+ " has('name','With Wine')\n",
+ ")\n",
+ ".values('name')"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "89545b7b",
+ "metadata": {},
+ "source": [
+ "We can also combine objects from previous traversals into a `union` step using aliases."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "6328a541",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%gremlin\n",
+ "g.V()\n",
+ ".hasLabel('restaurant').has('name','With Pasta').as('a')\n",
+ ".union(\n",
+ " select('a'),\n",
+ " __.V().hasLabel('restaurant').has('name','With Wine')\n",
+ ")\n",
+ ".values('name')"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "e31b0932",
+ "metadata": {},
+ "source": [
+ "An important point to note that `union` works in the same way as other Gremlin steps in that it uses the *incoming traversal* as its starting point. \n",
+ "\n",
+ "The following query traverses to the `With Pasta` restaurant and then uses a `union` step to combine the results with a traversal to the `With Wine` restaurants. Let's take a look at the results."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "9b049f90",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%gremlin\n",
+ "g.V()\n",
+ ".hasLabel('restaurant').has('name','With Pasta').as('a')\n",
+ ".union(\n",
+ " select('a'),\n",
+ " hasLabel('restaurant').has('name','With Wine')\n",
+ ")\n",
+ ".values('name')"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "f0c9a4e2",
+ "metadata": {},
+ "source": [
+ "Only one result was returned! This is because the second traversal in the `union` step *starts* from the **With Pasta** restaurant, and looks for a `restaurant` node with a `name` property value of **With Wine**. To resolve this, we need to use the `__` **anonymous traversal** technique to search the entire graph, irrespective of the previous traversal."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "d29116ed",
+ "metadata": {},
+ "source": [
+ "## Optimizing Query Performance using Caching\n",
+ "\n",
+ "**Note:** Whilst not specific to wholly Gremlin, this section is useful to understand the different approaches to optimizing your query performance within Neptune.\n",
+ "\n",
+ "Amazon Neptune is a fully managed, memory-optimized graph database. As a result, it will try to store as much of your graph in the local memory of the instance that is executing the query in `Buffer Cache` - a specific local instance cache type - for fast query performance. When the required data is not in the buffer cache, Neptune must retrieve it from shared storage before adding it to the buffer cache, which adds to the latency of your queries.\n",
+ "\n",
+ "There are several caching mechanisms supported by Neptune:\n",
+ "\n",
+ "* `Buffer Cache` - This is an always-on caching technique, whereby Neptune allocates two-thirds of the memory of your instance for storing requested data. It works on a `FIFO` (First-In-First-Out) basis, meaning older cached data pages are removed first. The ratio of buffer 'hits' (queries that retrieve data from memory rather than shared storage) should always be >= 99.9%. You can monitor this using the [`BufferCacheHitRatio` CloudWatch metric](https://docs.aws.amazon.com/neptune/latest/userguide/cw-metrics.html).\n",
+ "\n",
+ "* `Results Cache` - (Gremlin only) This provides a mechanism to cache the results from a specific query on a per-instance basis. It is [disabled by default](https://docs.aws.amazon.com/neptune/latest/userguide/gremlin-results-cache.html#gremlin-results-cache-enabling), and works on an `LRU` (Least Recently Used) basis, meaning older cached keys are removed first. See [Paginated cached query results](https://docs.aws.amazon.com/neptune/latest/userguide/gremlin-results-cache.html#gremlin-results-cache-paginating) for an example of combining results cache with the `range()` step. Note, this cannot be enabled on T instance types.\n",
+ "\n",
+ "* `Lookup Cache` - This is an always-on caching technique, but is **only available for D instances, e.g. R5d, and not Serverless**. It uses local instance SSD storage to store property values (strings) or RDF literals for fast retrieval. This can be useful when frequently returning or filtering on a large number of property values.\n",
+ "\n",
+ "As a general rule, you should look to optimise your queries by only filtering on, and returning properties that you need. \n",
+ "\n",
+ "In addition, monitoring your cluster and instance health using [CloudWatch metrics](https://docs.aws.amazon.com/neptune/latest/userguide/cw-metrics.html) can alert you to causes for query performance degradation.\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "f5d352e5",
+ "metadata": {},
+ "source": [
+ "## Exercises\n",
+ "\n",
+ "Now that we have gone through the main concepts of Gremlin read queries, it's time to put it into practice. Below are several exercises you can complete to verify your understanding of the material covered in this notebook. As practice for what you have learned, please write the Gremlin queries specified below.\n",
+ "\n",
+ "For these exercises, we will be leveraging the majority of the different entities in our data to show how we would build a common graph pattern known as \"collaborative filtering\" which is often used to provide recommendations to users based on other's reviews. Collaborative filtering works on the idea that if two people share the same opinion on a topic, such as a restaurant, then they are more likely to share similar opinions on other topics. With a graph we can leverage these connections to help provide recommendations based on these patterns of connections. In these exercises, we will be recommending restaurants to our users based upon reviews.\n",
+ "\n",
+ "\n",
+ "### Exercise 1: What are the 3 highest rated restaurants?\n",
+ "\n",
+ "Using the data model above, write a query that will:\n",
+ "\n",
+ "* Find the 3 highest average restaurant rating\n",
+ "* Find the associated `cuisine`\n",
+ "* Return the restaurant name, the cuisine name, and the average rating\n",
+ "* Order the results by average rating descending\n",
+ "\n",
+ "The results for this query are:\n",
+ "\n",
+ "|Restaurant name|Cuisine|Avg Rating|\n",
+ "|---|---|---|\n",
+ "|Lonely Grape|bar|5.0|\n",
+ "|Perryman's|bar|4.5|\n",
+ "|Rare Bull|steakhouse|4.333333|\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "36adacaf",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%gremlin"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "04ab5b59",
+ "metadata": {},
+ "source": [
+ "### Exercise 2: Find the top 3 highest rated restaurants in the city where Dave lives\n",
+ "\n",
+ "Using the data model above, write a query that will:\n",
+ "\n",
+ "* Find a `person` node(s) with a `first_name` of \"Dave\"\n",
+ "* Find the `city` that Dave lives in\n",
+ "* Find the average rating of restaurants in that city\n",
+ "* Find the top 3 average ratings\n",
+ "* Return the restaurant name, address, and average rating\n",
+ "* Order by the average rating descending\n",
+ "\n",
+ "The results for this query are:\n",
+ "\n",
+ "|Restaurant name|Address|Avg Rating|\n",
+ "|---|---|---|\n",
+ "|Dave's Big Deluxe|\t490 Ivan Cape|4.0|\n",
+ "|Pick & Go|4881 Upton Falls|3.75|\n",
+ "|Without Chaser|\t01511 Casper Fall|3.5|"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "dae9d211",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%gremlin"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "678a243d",
+ "metadata": {},
+ "source": [
+ "### Exercise 3: Which Mexican or Chinese restaurant near Dave is the highest rated?\n",
+ "\n",
+ "Using the data model above, write a query that will:\n",
+ "\n",
+ "* Find a `person` node(s) with a `first_name` of \"Dave\"\n",
+ "* Find the `city` that Dave lives in\n",
+ "* Find the restaurants in that city that serve 'Mexican' or 'Chinese' food\n",
+ "* Find the average rating of those restaurants\n",
+ "* Return the restaurant name, address, and average rating\n",
+ "* Order by the average rating descending\n",
+ "* Return the top 1 result\n",
+ "\n",
+ "The results for this query are:\n",
+ "\n",
+ "|Restaurant name|Address|Avg Rating|\n",
+ "|---|---|---|\n",
+ "|With Salsa|24320 Williamson Causeway|3.5|"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "f96b91e5",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%gremlin\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "e93f266d",
+ "metadata": {},
+ "source": [
+ "### Exercise 4: What are the top 3 restaurants, recommended by his friends, where Dave lives? (Personalized Recommendation)\n",
+ "\n",
+ "Using the data model above, write a query that will:\n",
+ "\n",
+ "* Find a `person` node(s) with a `first_name` of \"Dave\"\n",
+ "* Find the `city` that Dave lives in\n",
+ "* Find Dave's friends\n",
+ "* Find reviews written by Dave's friends in the city \"Dave\" lives in\n",
+ "* Find the average rating of those restaurants\n",
+ "* Return the restaurant name, address, and average rating\n",
+ "* Order by the average rating descending\n",
+ "* Return the top 3\n",
+ "\n",
+ "The results for this query are:\n",
+ "\n",
+ "|Restaurant name|Address|Avg Rating|\n",
+ "|---|---|---|\n",
+ "|Dave's Big Deluxe|490 Ivan Cape|4.0|\n",
+ "|With Salsa|24320 Williamson Causeway|4.0|\n",
+ "|Satiated|370 Hills Estates|3.666667|"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "81e5d4bc",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%gremlin"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "b5acefc5",
+ "metadata": {},
+ "source": [
+ "## Conclusion\n",
+ "\n",
+ "In this notebook, we explored ordering, functions, and grouping in Gremlin queries. These queries are a powerful and common way to format and mutate data within your graph. This is also the last notebook in the set dedicated to writing read queries. In the next notebook we will take a look at how to write queries that mutate data through insert, update, and delete operations."
+ ]
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "Python 3",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.10.8"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/src/graph_notebook/notebooks/06-Language-Tutorials/03-Gremlin/04-Creating-Updating-Deleting-Queries.ipynb b/src/graph_notebook/notebooks/06-Language-Tutorials/03-Gremlin/04-Creating-Updating-Deleting-Queries.ipynb
new file mode 100644
index 00000000..6bf47551
--- /dev/null
+++ b/src/graph_notebook/notebooks/06-Language-Tutorials/03-Gremlin/04-Creating-Updating-Deleting-Queries.ipynb
@@ -0,0 +1,824 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "id": "eab505f3",
+ "metadata": {},
+ "source": [
+ "Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.\n",
+ "SPDX-License-Identifier: Apache-2.0\n",
+ "\n",
+ "# Learning Gremlin - Create, Update, and Delete Queries\n",
+ "\n",
+ "This notebook is the fourth in a series of notebooks that walk through how to write queries using Gremlin. In this notebook, we will examine the basics of how to perform mutation operations, create/update/delete in Gremlin. This notebook assumes that you have already completed the previous notebook \"01-Basic-Read-Queries\" so we will continue our lessons from the end of the previous notebook and assume that the data has been loaded into the cluster."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "0c12469c",
+ "metadata": {},
+ "source": [
+ "### Setting up the visualizations\n",
+ "\n",
+ "Run the next two cells to configure various display options for our notebook, which we will use later on to display our results in a pleasing visual way. "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "6e655017",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%graph_notebook_vis_options\n",
+ "{\n",
+ " \"groups\": { \n",
+ " \"person\": {\n",
+ " \"color\": \"#9ac7bf\"\n",
+ " },\n",
+ " \"review\": {\n",
+ " \"color\": \"#f8cecc\"\n",
+ " },\n",
+ " \"city\": {\n",
+ " \"color\": \"#d5e8d4\"\n",
+ " },\n",
+ " \"state\": {\n",
+ " \"color\": \"#dae8fc\"\n",
+ " },\n",
+ " \"review_rating\": {\n",
+ " \"color\": \"#e1d5e7\"\n",
+ " },\n",
+ " \"restaurant\": {\n",
+ " \"color\": \"#ffe6cc\"\n",
+ " },\n",
+ " \"cusine\": {\n",
+ " \"color\": \"#fff2cc\"\n",
+ " }\n",
+ " }\n",
+ "}"
+ ]
+ },
+ {
+ "attachments": {
+ "dining-by-friends.png": {
+ "image/png": ""
+ }
+ },
+ "cell_type": "markdown",
+ "id": "48e9a85f",
+ "metadata": {},
+ "source": [
+ "### Looking at our graph data\n",
+ "\n",
+ "As we examined the data model in the previous notebook, we are not going to examine it, however we will leave the data schema for reference.\n",
+ "\n",
+ "![dining-by-friends.png](attachment:dining-by-friends.png)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "ab986dac",
+ "metadata": {},
+ "source": [
+ "\n",
+ "## Creating Data\n",
+ "\n",
+ "When working with any database, one of the most common tasks is adding new data. To add new nodes, edges, or path in Gremlin we use the [`addV()`](https://tinkerpop.apache.org/docs/current/reference/#addvertex-step) step. \n",
+ "\n",
+ "\n",
+ "### Creating a node with a label and properties\n",
+ "The simpliest option to create a node in Gremlin is to do a query similar to this:\n",
+ "\n",
+ "```\n",
+ "g.addV()\n",
+ "```\n",
+ "This query will create a node with a default label (`vertex`) and no properties. If we wanted to return the newly created element, we could by adding a [`next()`](https://tinkerpop.apache.org/docs/current/reference/#terminal-steps) step like shown here:\n",
+ "\n",
+ "```\n",
+ "g.addV().next()\n",
+ "```\n",
+ "\n",
+ "We can also create multiple elements simultaneously by combining `addV()` steps, as seen here:\n",
+ "\n",
+ "```\n",
+ "g.addV().addV().next()\n",
+ "```\n",
+ "\n",
+ "While these examples help in understanding the basic syntax, they are not very realistic. In most scenarios you will not want to just add a node, instead you will want to add a node with a specific label and associated properties.\n",
+ "\n",
+ "Let's look at what our query looks like to create a new `person` node with the first name of `John` and a last name of `Doe`."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "f7a0df81",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%gremlin\n",
+ "g.addV('person').property('first_name','John').property('last_name','Doe').next()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "8364e41a",
+ "metadata": {},
+ "source": [
+ "In the example above, the first and last name properties were added by calling the `property()` step after the node is created."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "67c4a339",
+ "metadata": {},
+ "source": [
+ "### Creating multiple elements\n",
+ "\n",
+ "As previously mentioned, you can chain `addV()` steps together to create multiple records in the same statement. This is shown in the query below:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "cc64a6da",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%gremlin\n",
+ "g.addV('person').property('first_name','Fred').property('last_name','Doe')\n",
+ ".addV('person').property('first_name','Jane').property('last_name','Doe')"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "91e99c64",
+ "metadata": {},
+ "source": [
+ "### Creating edges\n",
+ "\n",
+ "Another common task is to create edges between nodes. To create edges, we use the [`addE()`](https://tinkerpop.apache.org/docs/current/reference/#addedge-step) step, which we can supply the name of the label we'd like to use, for example `friends`. \n",
+ "\n",
+ "```\n",
+ "g.addE('friend')\n",
+ "```\n",
+ "\n",
+ "As part of the edge creation process, we must also supply the `to` and `from` nodes with which the edge will connect. We do this by providing a traversal to each. In the query below, we find the nodes we created above for `John Doe` and `Jane Doe` and connect them with a `friends` edge."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "deabe58e",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%gremlin\n",
+ "g.addE('friend')\n",
+ ".from(__.V().hasLabel('person').has('first_name','John').has('last_name','Doe'))\n",
+ ".to(__.V().hasLabel('person').has('first_name','Jane').has('last_name','Doe'))\n",
+ ".next()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "6d36e1dd",
+ "metadata": {},
+ "source": [
+ "### Creating paths\n",
+ "The last major task people want to do when adding data to their graph is to create entire paths containing both nodes and the connecting edges. Using what we have already learned, we can accomplish this using a query like this:\n",
+ "\n",
+ "```\n",
+ "g.addV('person').property('first_name','Jim').property('last_name','Doe')\n",
+ ".addV('person').property('first_name','Joe').property('last_name','Doe')\n",
+ ".addE('friends')\n",
+ " .from(__.V().hasLabel('person').has('first_name','Jim').has('last_name','Doe'))\n",
+ " .to(__.V().hasLabel('person').has('first_name','Joe').has('last_name','Doe'))\n",
+ "```\n",
+ "\n",
+ "You will no doubt have noticed we used the `__` step in the edge creation process to locate the `to` and `from` nodes. This is because the `__` signifies an **anonymous traversal**. This means it will create an additional traverser outside of the main traversal to find the node(s) matching the specified filter.\n",
+ "\n",
+ "**Note**. The `__` is not required if you're running this type of query within a notebook using the `%%gremlin` magic."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "1d716202",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%gremlin\n",
+ "g.addV('person').property('first_name','Jim').property('last_name','Doe')\n",
+ ".addV('person').property('first_name','Joe').property('last_name','Doe')\n",
+ ".addE('friends')\n",
+ " .from(__.V().hasLabel('person').has('first_name','Jim').has('last_name','Doe'))\n",
+ " .to(__.V().hasLabel('person').has('first_name','Joe').has('last_name','Doe'))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "e0d3236d",
+ "metadata": {},
+ "source": [
+ "Another way of performing the above is to use `aliases`. This can save us time and effort (and potentially costly typo's), effectively removing the need re-type the `from()` and `to()` steps."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "5774c284",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%gremlin\n",
+ "g.addV('person').property('first_name','Laura').property('last_name','Kirk').as('a') //alias our first person as 'a'\n",
+ ".addV('person').property('first_name','Peter').property('last_name','Jackson').as('b') //alias our second person as 'b'\n",
+ ".addE('friends')\n",
+ " .from('a') //add the edge from 'a'\n",
+ " .to('b') //to 'b'"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "9b78432a",
+ "metadata": {},
+ "source": [
+ "## Updating Data\n",
+ "\n",
+ "After creating data, the next most common task is to update data within the graph. Lucky for us, we have already learned the building blocks we need to know to accomplish this task. In Gremlin, we can use the same principles that we used when creating and filtering on objects. In the example below, let's update the `first_name` of the `Joe Doe` node we created in the previous step."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "0335cb50",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%gremlin\n",
+ "g.V()\n",
+ "//filter for the John Doe node\n",
+ ".hasLabel('person').has('first_name','Joe').has('last_name','Doe')\n",
+ "//update it using the property() step\n",
+ ".property('first_name','Joseph')"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "b4116def",
+ "metadata": {},
+ "source": [
+ "Now let's run the following query to take a look at our `person` node with a `last_name` of `Doe`."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "8e088afc",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%gremlin\n",
+ "g.V()\n",
+ ".hasLabel('person').has('last_name','Doe')\n",
+ ".valueMap('first_name')"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "58f5dfe0",
+ "metadata": {},
+ "source": [
+ "From the query above, you should have the following results:\n",
+ "\n",
+ "```\n",
+ "{'first_name': ['Jim']}\n",
+ "{'first_name': ['Joe', 'Joseph']}\n",
+ "```\n",
+ "\n",
+ "Hang on, that's not what we meant to do! We wanted to update the `first_name` property to `Joe`, not append it to the existing value. Why has this happened? It's because of something called property cardinality, which we'll discuss below."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "70d12ad6",
+ "metadata": {},
+ "source": [
+ "### Single valued properties and multi-valued properties\n",
+ "\n",
+ "Gremlin supports both `Single` and `Set` properties. Single properties are those that will only have one value at any given time, for example `age` or `first_name`. Set properties are those can have multiple values, for example `favourite_sports`, where it makes sense to store the values as a list or map, instead of individual properties.\n",
+ "\n",
+ "When creating or updating a property, you can tell Gremlin the *type* of property you' using either the `single` or `set` keywords. By not specifying either, Gremlin will assume it is a `set` property.\n",
+ "\n",
+ "In the following example, we're going to create a single `age` property for the `Joseph Doe` node, but first we need to ensure we've only got one `first_name` value."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "8ebce174",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%gremlin\n",
+ "g.V()\n",
+ ".hasLabel('person').has('last_name','Doe')\n",
+ ".has('first_name',within('Joe','Joseph'))\n",
+ ".property(single,'first_name','Joseph')\n",
+ ".valueMap('first_name')"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "a9b2ec8e",
+ "metadata": {},
+ "source": [
+ "The above query will have updated the `first_name` property for `[Joe, Joseph] Doe` to `Joseph` as follows:\n",
+ "\n",
+ "```\n",
+ "{'first_name': ['Joseph']}\n",
+ "```\n",
+ "\n",
+ "Now let's create the `age` property."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "2c0fdc01",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%gremlin\n",
+ "g.V()\n",
+ ".hasLabel('person').has('first_name','Joseph').has('last_name','Doe')\n",
+ ".property(single, 'age',32)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "ff20b063",
+ "metadata": {},
+ "source": [
+ "In the following example, we're going to use the `set` keyword to specify that the new `favourite_sports` property will be used to store multiple values:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "7371ac06",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%gremlin\n",
+ "g.V()\n",
+ ".hasLabel('person').has('first_name','Joseph').has('last_name','Doe')\n",
+ ".property(set,'favourite_sports','soccer')\n",
+ ".property(set,'favourite_sports','tennis')\n",
+ ".property(set,'favourite_sports','baseball')"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "32c06e2f",
+ "metadata": {},
+ "source": [
+ "Finally, we can confirm our properties have been correctly updating by using the following query:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "6aaa7fdb",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%gremlin\n",
+ "g.V().hasLabel('person').has('first_name','Joseph').has('last_name','Doe').valueMap()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "e299d1a4",
+ "metadata": {},
+ "source": [
+ "## Upserting Data\n",
+ "\n",
+ "We have learned how to create and update data in our graph. However there is another important mutation operation that we want to cover. That operation is how to perform an upsert, where data is created if it doesn't exist or updated if it does. In Gremlin, this operation can be performed using two approaches. The first is combining the `coalesce`, `fold` and `unfold` steps. The second - and more recent approach - is using the new `mergeV` and `mergeE` steps.\n",
+ "\n",
+ "Let's start with exploring the new approach:"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "eedeb3a0",
+ "metadata": {},
+ "source": [
+ "### Upserting Nodes using `mergeV()`\n",
+ "\n",
+ "With Neptune [supporting Apache Tinkerpop 3.6.x](https://aws.amazon.com/blogs/database/exploring-new-features-of-apache-tinkerpop-3-6-x-in-amazon-neptune/) in version 1.2.1.0 and above, you can now have access to the new `mergeV()` Gremlin step, which simplifies upsert functionality that we've explored previously in this section. "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "43384470",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%gremlin\n",
+ "g.mergeV([(T.id): 'jamie-1'])\n",
+ " .option(onCreate, [(T.label): 'person', first_name: 'Jamie'])\n",
+ " .option(onMatch, [age: 39])\n",
+ ".id() //not necessary, but helps to optimise the serialization of the output"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "9487e727",
+ "metadata": {},
+ "source": [
+ "Using `mergeV` provides functionality to support different actions depending on whether the node was created or already exists. For example, using the `option` modulator, we specify new key/value pairs that should be added when the node is created or updated.\n",
+ "\n",
+ "We can specify a map of key/value pairs that is used to perform the 'matching' process - in the query above we're specifying the `id` of the node must be `jamie-1` - however, we can provide as many additional key/value pairs as required. Note though, that for a match to exist, **all** values in the map must exist on a node. "
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "d624367a",
+ "metadata": {},
+ "source": [
+ "### Upserting Nodes using `fold`, `coalesce` and `unfold`\n",
+ "\n",
+ "Prior to the availability of `mergeV`, the `fold`, `coalesce` and `unfold` approach was used to perform upserts to both nodes and edges.\n",
+ "\n",
+ "There are four sections to an upsert in Gremlin:\n",
+ "\n",
+ "* `fold` - This combines the objects of the incoming traversal into a single row\n",
+ "* `coalesce` - This accepts the incoming traversal and checks if the pattern exists\n",
+ " * `unfold` - This converts a single row of values into individual rows\n",
+ " * `` - This adds the object as specified\n",
+ "\n",
+ "Let's take a look at what a simple upsert statement looks like with a single node pattern match."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "2ce3f5fb",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%gremlin\n",
+ "g.V().hasLabel('person').has('first_name','Mike')\n",
+ ".fold()\n",
+ ".coalesce(\n",
+ " unfold(),\n",
+ " addV('person').property('first_name','Mike')\n",
+ ")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "8715cd5f",
+ "metadata": {},
+ "source": [
+ "In this case, we created a new node as there are no matches for the specified pattern. As the `addV` step is part of the *create if it doesn't exist* process, we can specify additional properties at the point of creation, as shown below."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "b1858d5f",
+ "metadata": {},
+ "source": [
+ "As you can see, the `mergeV` approach is much clearer and far less complicated. \n",
+ "\n",
+ "It also offers the functionality to apply different properties, such as `CreateDate` or `UpdateDate` depending on whether we create a new node, or update an existing one."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "8f4538aa",
+ "metadata": {},
+ "source": [
+ "### Upserting Edges using `mergeE()`\n",
+ "\n",
+ "In the same way that `mergeV()` helps reduce the complexity of writing upserts for nodes, `mergeE()` offers the same functionality for edges.\n",
+ "\n",
+ "Let's first create two friends, *Jamie* and *Peter* using `mergeV`:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "97753e2c",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%gremlin\n",
+ "g.mergeV([(T.id):'person-1', (T.label):'person', first_name:'Jamie'])\n",
+ " .mergeV([(T.id):'person-2', (T.label):'person', first_name:'Peter'])"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "024935b1",
+ "metadata": {},
+ "source": [
+ "Now, we can use `mergeE` to create a `friend` edge between the two nodes. When creating the edge, we use the `option` modulator to specify the label, as well as the `from` and `to` nodes. If the edge already existed, we would apply the `strength` property to the edge."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "d2b10575",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%gremlin\n",
+ "g.mergeE([(T.id):'friend-1'])\n",
+ " .option(onCreate, [(T.label): 'friend', (from):'person-1',(to):'person-2'])\n",
+ " .option(onMatch, [strength:100])"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "afa87073",
+ "metadata": {},
+ "source": [
+ "The `coalesce` pattern can also be used to upsert edges. Let's use the `Jamie` and `Peter` nodes we've just created to upsert an `friend` edge between."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "f66d367b",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%gremlin\n",
+ "g.V().hasLabel('person').has('first_name','Jamie')\n",
+ ".out('friend')\n",
+ ".hasLabel('person').has('first_name','Peter')\n",
+ ".fold()\n",
+ ".coalesce(\n",
+ " unfold(),\n",
+ " addE('friend')\n",
+ " .from(__.V().hasLabel('person').has('first_name','Jamie'))\n",
+ " .to(__.V().hasLabel('person').has('first_name','Peter'))\n",
+ ")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "663dc3ea",
+ "metadata": {},
+ "source": [
+ "### Conditional Upserts using `mergeV()` ###\n",
+ "\n",
+ "Sometimes you may only want to update a property depending on its current value. An example of this is `last_update_date` where you only want to update it if it's less than or equal to the new value. Combining the `onMatch` option with `sideEffect`, you can check the existing value of a property, and choose whether or not to update it."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "a32fd5e7",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%gremlin\n",
+ "\n",
+ "g.mergeV([(T.id): 'person-3']).\n",
+ " option(onCreate, [(T.label): 'person', first_name: 'Kevin', last_updated_at: datetime('2023-09-11')]). // when creating the object, set first_name and last_updated_at properties\n",
+ " option(onMatch, // when updating the object\n",
+ " sideEffect( // use sideEffect to execute a standalone traversal\n",
+ " __.V('person-3') // find the person-3 vertex\n",
+ " .choose( // use the choose step to perform an if-else \n",
+ " values('last_updated_at').is(lt(datetime('2023-09-12'))), // check if the value of the last_updated_at property is less than the new value\n",
+ " property(single,['last_updated_at':datetime('2023-09-12')]), // if true, set the last_updated_at property to the new value\n",
+ " constant([:]) // if false, return an empty map\n",
+ " )\n",
+ " ).constant([:]) // finally return an empty map\n",
+ " )\n",
+ " .id() // not necessary, but helps to optimise the serialization of the output"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "60b2eab1",
+ "metadata": {},
+ "source": [
+ "## Deleting Data\n",
+ "\n",
+ "Now that we have learned about how to add and update data in our graph, the final operation we need to learn is how to delete data. In Gremlin, deletion of data is done with the `drop` step for removing nodes, edges and properties.\n",
+ "\n",
+ "### Removing a Node\n",
+ "\n",
+ "To remove a node(s) in Gremlin, we first need to match the items we want to delete, using the filtering steps we saw in the 01-Basic-Read-Queries notebook, and then remove them using `drop` step. In the example below, we will remove any nodes with the `first_name` of `Steve` from our graph."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "190af502",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%gremlin\n",
+ "g.V().hasLabel('person').has('first_name','Steve').drop()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "a68c7a83",
+ "metadata": {},
+ "source": [
+ "### Removing an Edge\n",
+ "To remove an edge(s) in Gremlin is very similar to removing a node, except that we need to pass the edge to the `drop` step. In the example below, we will remove any edges associated with nodes with the `first_name` of `Joesph` from our graph."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "2beb5c72",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%gremlin\n",
+ "g.V().hasLabel('person').has('first_name','Joseph').bothE().drop()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "fd55000f",
+ "metadata": {},
+ "source": [
+ "### Deleting Nodes and Edges\n",
+ "\n",
+ "A point to note when comparing the process of deleting objects in Gremlin and other languages such as openCypher is that if you attempt to drop a node that is still attached to an edge in Gremlin, it **will work**. Unlike openCypher, where an error will be raised, Gremlin removes all the attached edges for you."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "2c9d4fb8",
+ "metadata": {},
+ "source": [
+ "### Deleting Properties\n",
+ "\n",
+ "As we've seen in the previous examples, we can combine the `drop` step to any traversal and it will delete all the objects in that traversal. We can do the same to drop properties by specifying them in the traversal pattern. In the following example, we're going to delete the `age` property from the `person` node with a `first_name` of `Jamie`."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "b32e6979",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%gremlin\n",
+ "g.V().hasLabel('person').has('first_name','Jamie').properties('age').drop()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "eb0cd8de",
+ "metadata": {},
+ "source": [
+ "And to confirm we have successfully removed the `age` property (and not the node itself), let's run the following code:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "f517eb8e",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%gremlin\n",
+ "g.V().hasLabel('person').has('first_name','Jamie').valueMap()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "f5d352e5",
+ "metadata": {},
+ "source": [
+ "## Exercises\n",
+ "\n",
+ "Now that we have gone through the concepts of Gremlin mutation queries, it's time to put it into practice. Below are several exercises you can complete to verify your understanding of the material covered in this notebook. As practice for what you have learned, please write the Gremlin queries specified below.\n",
+ "\n",
+ "### Exercise 1: Create a new person `Leonhard Euler` and connect them to `Dave`.\n",
+ "\n",
+ "Using the data model above, write a query that will:\n",
+ "\n",
+ "* Create a new `person` node with a name of `Leonhard Euler` \n",
+ "* Connect the new node to `Dave` via a `friends` edge\n",
+ "* Return the new connection\n",
+ "\n",
+ "The results for this query is the ID of the new edge"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "36adacaf",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%gremlin\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "04ab5b59",
+ "metadata": {},
+ "source": [
+ "### Exercise 2: Upsert a list of `follows` and add an edge to `Dave`.\n",
+ "\n",
+ "Using the data model above, write a query that will:\n",
+ "\n",
+ "* Given the following list:\n",
+ " ```\n",
+ " [{first_name: 'Taylor', last_name: 'Hall'},\n",
+ " {first_name: 'Kelvin', last_name: 'Fernsby'},\n",
+ " {first_name: 'Ian', last_name: 'Rochester'}]\n",
+ " ```\n",
+ "* Add or update `person` nodes for each item in the list\n",
+ "* Add or update a `follows` relationship between each new node and `Dave`\n",
+ "* If the edge is created write a property `creation` with a value `Created`\n",
+ "* If the edge already exists write a property `creation` with a value `Updated`\n",
+ "* Return the new edge elements\n",
+ "* This query should be re-runable without creating new nodes or edges\n",
+ "\n",
+ "The results for this query are the three edge elements"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "dae9d211",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%gremlin\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "678a243d",
+ "metadata": {},
+ "source": [
+ "### Exercise 3: Delete all `follows` edges and remove any connected nodes with no other edges.\n",
+ "\n",
+ "Using the data model above, write a query that will:\n",
+ "\n",
+ "* Find all the `follows` edges and connected nodes and remove the edges\n",
+ "* For each of the connected nodes see if they have any other edges\n",
+ "* If they have edges then ignore them\n",
+ "* If they have no edges then remove them"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "f96b91e5",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%gremlin\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "b5acefc5",
+ "metadata": {},
+ "source": [
+ "## Conclusion\n",
+ "\n",
+ "In this notebook, we explored how to write queries to mutate data in Gremlin. In the next notebook, we'll be discovering how to read explain and profile outputs from your Gremlin queries in order to understand the data contained within each section, and how to use that to write performant queries."
+ ]
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "Python 3",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.10.8"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/src/graph_notebook/notebooks/06-Language-Tutorials/03-Gremlin/Gremlin-Exercises-Answer-Sheet.ipynb b/src/graph_notebook/notebooks/06-Language-Tutorials/03-Gremlin/Gremlin-Exercises-Answer-Sheet.ipynb
new file mode 100644
index 00000000..0d0cd9cd
--- /dev/null
+++ b/src/graph_notebook/notebooks/06-Language-Tutorials/03-Gremlin/Gremlin-Exercises-Answer-Sheet.ipynb
@@ -0,0 +1,662 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "id": "856cb409",
+ "metadata": {},
+ "source": [
+ "# Worksheet 1 - Basic Read Queries"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "355b2b35",
+ "metadata": {},
+ "source": [
+ "### Exercise 1: Find the first name of Dave's friends\n",
+ "\n",
+ "Using the data model above, write a query that will:\n",
+ "\n",
+ "* Find a `person` node(s) with a `first_name` of \"Dave\"\n",
+ "* Find the friends of Dave (i.e. traverse the `friends` edge)\n",
+ "* Return the friends `first_name`\n",
+ "\n",
+ "The correct answer is four results: \"Jim\", \"Josh\", \"Hank\", \"Kelly\""
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "4c6c741d",
+ "metadata": {
+ "scrolled": true
+ },
+ "outputs": [],
+ "source": [
+ "%%gremlin\n",
+ "\n",
+ "g.V().hasLabel('person')\n",
+ ".has('first_name','Dave')\n",
+ ".out('friends')\n",
+ ".values('first_name')"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "76bbc174",
+ "metadata": {},
+ "source": [
+ "### Exercise 2: Find the first name of the friends of Dave's friends\n",
+ "\n",
+ "Using the data model above, write a query that will:\n",
+ "\n",
+ "* Find a `person` node(s) with a `first_name` of \"Dave\"\n",
+ "* Find the friends of Dave (i.e. traverse the `friends` edge)\n",
+ "* Find the friends of that person (i.e. traverse the `friends` edge)\n",
+ "* Return the friends `first_name`\n",
+ "\n",
+ "The correct answer contains three results: \"Hank\", \"Denise\", \"Paras\""
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "6680e1d4",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%gremlin\n",
+ "\n",
+ "g.V().hasLabel('person')\n",
+ ".has('first_name','Dave')\n",
+ ".out('friends')\n",
+ ".out('friends')\n",
+ ".dedup()\n",
+ ".values('first_name')"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "886f9699",
+ "metadata": {},
+ "source": [
+ "### Exercise 3: Find out how the friends of Dave's friends are connected\n",
+ "\n",
+ "Using the data model above, write a query that will:\n",
+ "\n",
+ "* Find a `person` node(s) with a `first_name` of \"Dave\"\n",
+ "* Find the friends of Dave (i.e. traverse the `friends` edge)\n",
+ "* Find the friends of that person (i.e. traverse the `friends` edge)\n",
+ "* Return the path\n",
+ "\n",
+ "The correct answer contains three results:\n",
+ "\n",
+ "- `Dave` -> `Josh` -> `Hank`\n",
+ "- `Dave` -> `Kelly` -> `Denise`\n",
+ "- `Dave` -> `Jim` -> `Paras`"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "d4191ccb",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%gremlin\n",
+ "\n",
+ "g.V().hasLabel('person')\n",
+ ".has('first_name','Dave')\n",
+ ".out('friends')\n",
+ ".out('friends')\n",
+ ".dedup()\n",
+ ".path()\n",
+ ".by(elementMap())"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "2042c3f5",
+ "metadata": {},
+ "source": [
+ "### Exercise 4: Which friends should we recommend for Dave?\n",
+ "\n",
+ "A common use case for graphs in social networks is to recommend new connections. There is a significant amount of research in this area (example [here](https://www.science.org/doi/10.1126/sciadv.aax7310#:~:text=The%20triadic%20closure%20mechanism%20uses,features%20of%20empirical%20social%20networks)) but mainly there are two prevailing mechanisms at work in social networks that we can leverage to help provide efficient recommendations to a user. The first of these mechanisms is called homophily, which is the tendency of similar people to be connected. Homophily is a driving factor in many social networks, with an important outcome being that people connected to you, or connected to people that are connected to you, tend to be similar to you. This leads to the second mechanism in a graph, the concept of a triadic closure. Triadic closure is a way to create or recommend new connections based on common friends or acquaintances. \n",
+ "\n",
+ "\n",
+ "In this exercise, we are going to leverage triadic closure to recommend friends for Dave. To accomplish this, we will need to leverage the previously written queries but extend them to:\n",
+ "\n",
+ "* Find all the friends of friends that do not have a connection to Dave\n",
+ "\n",
+ "The correct answer contains three results: \"Hank\", \"Denise\", \"Paras\""
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "1a068ca6",
+ "metadata": {
+ "scrolled": true
+ },
+ "outputs": [],
+ "source": [
+ "%%gremlin\n",
+ "\n",
+ "g.V().hasLabel('person')\n",
+ ".has('first_name','Dave').as('dave')\n",
+ ".out('friends')\n",
+ ".out('friends')\n",
+ ".where(neq('dave'))\n",
+ ".dedup()\n",
+ ".values('first_name')"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "4e8077f9",
+ "metadata": {},
+ "source": [
+ "# Worksheet 2 - Loops and Repeat Queries"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "d0551517",
+ "metadata": {},
+ "source": [
+ "### Exercise 1: Find the friends of Dave's Friends using a loop.\n",
+ "\n",
+ "Using the data model above, write a query that will:\n",
+ "\n",
+ "* Find a `person` node(s) with a `first_name` of \"Dave\"\n",
+ "* Find the friends of Dave (i.e. traverse the `friends` edge)\n",
+ "* Find the friends of that person (i.e. traverse the `friends` edge)\n",
+ "* Return the friends `first_name`\n",
+ "\n",
+ "The correct answer is a three results: \"Hank\", \"Denise\", \"Paras\""
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "e2ddde36",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%gremlin\n",
+ "\n",
+ "g.V().hasLabel('person')\n",
+ ".has('first_name','Dave')\n",
+ ".repeat(\n",
+ " out('friends')\n",
+ " .simplePath()\n",
+ ")\n",
+ ".times(2)\n",
+ ".dedup()\n",
+ ".values('first_name')"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "84d8182d",
+ "metadata": {},
+ "source": [
+ "### Exercise 2: Find all `person` nodes connected to Dave.\n",
+ "\n",
+ "Starting at a single node and trying to find all connected children (a.k.a. root to leaf) or trying to find the parent of any child node (a.k.a leaf to root) are two very common hierarchical graph query patterns. Commonly, these queries supported bill of materials, information organization, or compliance use cases.\n",
+ "\n",
+ "In this exercise, we will be applying that same query pattern to find the hierarchy of people within our social network. We'll accomplish this by writing a \"root to leaf\" type query where the root node is our `Dave` node in the social network.\n",
+ "\n",
+ "Using the data model above, write a query that will:\n",
+ "\n",
+ "* Find a `person` node(s) with a `first_name` of \"Dave\"\n",
+ "* Keep traversing the outgoing `friends` edge until there are no more outgoing `friends` edges\n",
+ "* Return all the paths\n",
+ "\n",
+ "The correct answer has 5 results"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "6367292c",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%gremlin\n",
+ "\n",
+ "g.V().hasLabel('person')\n",
+ ".has('first_name','Dave')\n",
+ ".repeat(\n",
+ " out('friends')\n",
+ ")\n",
+ ".until(not(out('friends')))\n",
+ ".path()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "a6fba6ab",
+ "metadata": {},
+ "source": [
+ "### Exercise 3: Find all the ways Dave and Denise are connected.\n",
+ "\n",
+ "A common extension to the path traversal query we wrote in Loop-3 is to return not just \"if\" someone is connected but \"how\" they are connected.\n",
+ "\n",
+ "In this exercise, we will be making a slight modification to the previous query to return \"how\" Dave and Denise are connected, not just that they are.\n",
+ "\n",
+ "Using the data model above, write a query that will:\n",
+ "\n",
+ "* Find a `person` node(s) with a `first_name` of \"Dave\"\n",
+ "* Find the friends of Dave (i.e. traverse the `friends` edge)\n",
+ "* Keep traversing the `friends` edge until you find `Denise`\n",
+ "* Return the path\n",
+ "\n",
+ "The correct answer has 3 results"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "fa0b467b",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%gremlin\n",
+ "\n",
+ "g.V().hasLabel('person')\n",
+ ".has('first_name','Dave')\n",
+ ".repeat(\n",
+ " out('friends')\n",
+ " .simplePath()\n",
+ ")\n",
+ ".until(\n",
+ " has('first_name','Denise')\n",
+ ")\n",
+ ".path()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "c3e24581",
+ "metadata": {},
+ "source": [
+ "# Worksheet 3 - Ordering, Functions, and Grouping"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "b033e716",
+ "metadata": {},
+ "source": [
+ "### Exercise 1: What are the 3 highest rated restaurants?\n",
+ "\n",
+ "Using the data model above, write a query that will:\n",
+ "\n",
+ "* Find the 3 highest average restaurant rating\n",
+ "* Find the associated `cuisine`\n",
+ "* Return the restaurant name, the cuisine name, and the average rating\n",
+ "* Order the results by average rating descending\n",
+ "\n",
+ "The results for this query are:\n",
+ "\n",
+ "|Restaurant name|Cuisine|Avg Rating|\n",
+ "|---|---|---|\n",
+ "|Lonely Grape|bar|5.0|\n",
+ "|Perryman's|bar|4.5|\n",
+ "|Rare Bull|steakhouse|4.333333|"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "d3004eb3",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%gremlin\n",
+ "\n",
+ "g.V()\n",
+ ".hasLabel('cuisine')\n",
+ ".in('serves')\n",
+ ".group()\n",
+ ".by(identity())\n",
+ ".by(in('about').values('rating').mean())\n",
+ ".unfold()\n",
+ ".order()\n",
+ ".by(values, desc)\n",
+ ".limit(3)\n",
+ ".unfold()\n",
+ ".project('restaurant name','cuisine','avg rating')\n",
+ ".by(select(keys).values('name'))\n",
+ ".by(select(keys).out('serves').values('name'))\n",
+ ".by(select(values))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "854c9109",
+ "metadata": {},
+ "source": [
+ "### Exercise 2: Find the top 3 highest rated restaurants in the city where Dave lives.\n",
+ "\n",
+ "Using the data model above, write a query that will:\n",
+ "\n",
+ "* Find a `person` node(s) with a `first_name` of \"Dave\"\n",
+ "* Find the `city` that Dave lives in\n",
+ "* Find the average rating of restaurants in that city\n",
+ "* Find the top 3 average ratings\n",
+ "* Return the restaurant name, address, and average rating\n",
+ "* Order by the average rating descending\n",
+ "\n",
+ "The results for this query are:\n",
+ "\n",
+ "|Restaurant name|Address|Avg Rating|\n",
+ "|---|---|---|\n",
+ "|Dave's Big Deluxe|\t490 Ivan Cape|4.0|\n",
+ "|Pick & Go|4881 Upton Falls|3.75|\n",
+ "|Without Chaser|\t01511 Casper Fall|3.5|"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "ea31dd30",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%gremlin\n",
+ "\n",
+ "g.V().has('person','first_name','Dave')\n",
+ ".out('lives')\n",
+ ".in('within')\n",
+ ".where(inE('about'))\n",
+ ".group()\n",
+ ".by(identity())\n",
+ ".by(in('about').values('rating').mean())\n",
+ ".unfold()\n",
+ ".order()\n",
+ ".by(values,desc)\n",
+ ".limit(3)\n",
+ ".unfold()\n",
+ ".project('restaurant name','address','avg rating')\n",
+ ".by(select(keys).values('name'))\n",
+ ".by(select(keys).values('address'))\n",
+ ".by(select(values))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "825631b7",
+ "metadata": {},
+ "source": [
+ "### Exercise 3: Which Mexican or Chinese restaurant near Dave is the highest rated?\n",
+ "\n",
+ "Using the data model above, write a query that will:\n",
+ "\n",
+ "* Find a `person` node(s) with a `first_name` of \"Dave\"\n",
+ "* Find the `city` that Dave lives in\n",
+ "* Find the restaurants in that city that serve 'Mexican' or 'Chinese' food\n",
+ "* Find the average rating of those restaurants\n",
+ "* Return the restaurant name, address, and average rating\n",
+ "* Order by the average rating descending\n",
+ "* Return the top 1 result\n",
+ "\n",
+ "The results for this query are:\n",
+ "\n",
+ "|Restaurant name|Address|Avg Rating|\n",
+ "|---|---|---|\n",
+ "|With Salsa|24320 Williamson Causeway|3.5|"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "3cd2eae2",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%gremlin\n",
+ "\n",
+ "g.V().has('person','first_name','Dave')\n",
+ ".out('lives')\n",
+ ".in('within')\n",
+ ".where(out('serves').has('name',within('Mexican','Chinese')))\n",
+ ".where(inE('about'))\n",
+ ".group()\n",
+ ".by(identity())\n",
+ ".by(in('about').values('rating').mean())\n",
+ ".unfold()\n",
+ ".order()\n",
+ ".by(values,desc)\n",
+ ".limit(1)\n",
+ ".unfold()\n",
+ ".project('restaurant name','address','avg rating')\n",
+ ".by(select(keys).values('name'))\n",
+ ".by(select(keys).values('address'))\n",
+ ".by(select(values))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "a195b2d2",
+ "metadata": {},
+ "source": [
+ "### Exercise 4: What are the top 3 restaurants, recommended by his friends, where Dave lives? (Personalized Recommendation)\n",
+ "\n",
+ "Using the data model above, write a query that will:\n",
+ "\n",
+ "* Find a `person` node(s) with a `first_name` of \"Dave\"\n",
+ "* Find the `city` that Dave lives in\n",
+ "* Find Dave's friends\n",
+ "* Find reviews written by Dave's friends in the city \"Dave\" lives in\n",
+ "* Find the average rating of those restaurants\n",
+ "* Return the restaurant name, address, and average rating\n",
+ "* Order by the average rating descending\n",
+ "* Return the top 3\n",
+ "\n",
+ "The results for this query are:\n",
+ "\n",
+ "|Restaurant name|Address|Avg Rating|\n",
+ "|---|---|---|\n",
+ "|Dave's Big Deluxe|490 Ivan Cape|4.0|\n",
+ "|With Salsa|24320 Williamson Causeway|4.0|\n",
+ "|Satiated|370 Hills Estates|3.666667|"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "2b8b31f2",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%gremlin\n",
+ "\n",
+ "g.V().has('person','first_name','Dave').as('dave')\n",
+ ".out('lives')\n",
+ ".in('within')\n",
+ ".where(in('about').in('wrote').both('friends').where(eq('dave')))\n",
+ ".group()\n",
+ ".by(identity())\n",
+ ".by(in('about').values('rating').mean())\n",
+ ".unfold()\n",
+ ".order()\n",
+ ".by(values,desc)\n",
+ ".limit(3)\n",
+ ".unfold()\n",
+ ".project('restaurant name','address','avg rating')\n",
+ ".by(select(keys).values('name'))\n",
+ ".by(select(keys).values('address'))\n",
+ ".by(select(values))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "897b88c4",
+ "metadata": {},
+ "source": [
+ "# Worksheet 4 - Create, Update and Delete Queries"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "8d71b8ed",
+ "metadata": {},
+ "source": [
+ "### Exercise 1: Create a new person `Leonhard Euler` and connect them to `Dave`.\n",
+ "\n",
+ "Using the data model above, write a query that will:\n",
+ "\n",
+ "* Create a new `person` node with a name of `Leonhard Euler` \n",
+ "* Connect the new node to \"Dave\" via a `friends` edge\n",
+ "* Return the new connection\n",
+ "\n",
+ "The results for this query is ID of the new edge"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "49e157a8",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%gremlin\n",
+ "\n",
+ "g.addV('person').property('name','Leonhard Euler')\n",
+ " .addE('friends').to(__.V().has('person','first_name','Dave'))\n",
+ " \n",
+ "//OR\n",
+ "\n",
+ "//g\n",
+ "// .mergeV([(T.id):'leo', (T.label):'person', name: 'Leonhard Euler')\n",
+ "// .mergeE([(T.label):'friends',(from):Merge.outV,(to):Merge.inV])\n",
+ "// .option(Merge.outV, [(T.label): 'person', name: 'Leonhard Euler'])\n",
+ "// .option(Merge.inV, [(T.label): 'person', first_name: 'Dave', last_name: 'Bech'])"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "8abdb992",
+ "metadata": {},
+ "source": [
+ "### Exercise 2: Upsert a list of followers and add an edge to `Dave`.\n",
+ "\n",
+ "Using the data model above, write a query that will:\n",
+ "\n",
+ "* Given the following list:\n",
+ " ```\n",
+ " [{first_name: 'Taylor', last_name: 'Hall'},\n",
+ " {first_name: 'Kelvin', last_name: 'Fernsby'},\n",
+ " {first_name: 'Ian', last_name: 'Rochester'}]\n",
+ " ```\n",
+ "* Add or update `person` nodes for each item in the list\n",
+ "* Add or update a `follows` relationship between each new node and \"Dave\"\n",
+ "* If the edge is created write a property `creation` with a value `Created`\n",
+ "* If the edge already exists write a property `creation` with a value `Updated`\n",
+ "* Return the new edge elements\n",
+ "* This query should be re-runable without creating new nodes or edges\n",
+ "\n",
+ "The results for this query are the three edge elements"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "4d82cd3e",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%gremlin\n",
+ "\n",
+ "g.V().hasLabel(\"person\")\n",
+ ".has(\"first_name\",\"Taylor\").has(\"last_name\",\"Hall\")\n",
+ ".fold().coalesce(unfold(),addV('person').property('first_name','Taylor').property('last_name','Hall'))\n",
+ ".V().hasLabel(\"person\")\n",
+ " .has(\"first_name\",\"Kelvin\").has(\"last_name\",\"Fernsby\")\n",
+ " .fold().coalesce(unfold(),addV('person').property('first_name','Kelvin').property('last_name','Fernsby'))\n",
+ ".V().hasLabel(\"person\")\n",
+ " .has(\"first_name\",\"Ian\").has(\"last_name\",\"Rochester\")\n",
+ " .fold().coalesce(unfold(),addV('person').property('first_name','Ian').property('last_name','Rochester'))\n",
+ "\n",
+ ".V().hasLabel(\"person\")\n",
+ " .has(\"first_name\",\"Taylor\").has(\"last_name\",\"Hall\")\n",
+ " .outE('follows')\n",
+ " .where(inV().has('person','first_name','Dave'))\n",
+ " .fold().coalesce(unfold().property('creation','updated'), \n",
+ " addE('follows').from(__.V().has('person','first_name','Taylor')).to(__.V().has('person','first_name','Dave')).property('creation','created')\n",
+ " )\n",
+ ".V().hasLabel(\"person\")\n",
+ " .has(\"first_name\",\"Kelvin\").has(\"last_name\",\"Fernsby\")\n",
+ " .outE('follows')\n",
+ " .where(inV().has('person','first_name','Dave'))\n",
+ " .fold().coalesce(unfold().property('creation','updated'), \n",
+ " addE('follows').from(__.V().has('person','first_name','Kelvin')).to(__.V().has('person','first_name','Dave')).property('creation','created')\n",
+ " )\n",
+ ".V().hasLabel(\"person\")\n",
+ " .has(\"first_name\",\"Ian\").has(\"last_name\",\"Rochester\")\n",
+ " .outE('follows')\n",
+ " .where(inV().has('person','first_name','Dave'))\n",
+ " .fold().coalesce(unfold().property('creation','updated'), \n",
+ " addE('follows').from(__.V().has('person','first_name','Ian')).to(__.V().has('person','first_name','Dave')).property('creation','created')\n",
+ " )\n",
+ ".V().hasLabel('person')\n",
+ " .outE('follows').elementMap()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "9abe4c7e",
+ "metadata": {},
+ "source": [
+ "### Exercise 3: Delete all `follows` edges and remove any connected nodes with no other edges.\n",
+ "\n",
+ "Using the data model above, write a query that will:\n",
+ "\n",
+ "* Find all the `follows` edges and connected nodes and remove the edges\n",
+ "* For each of the connected nodes see if they have any other edges\n",
+ "* If they have edges then ignore them\n",
+ "* If they have no edges then remove them"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "6a3ca71d",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%gremlin\n",
+ "\n",
+ "g.E().hasLabel('follows').aggregate('edges')\n",
+ ".bothV()\n",
+ ".hasLabel('person')\n",
+ ".where(out().count().is(eq(1))).aggregate('nodes')\n",
+ ".select('edges').unfold().drop()\n",
+ ".select('nodes').unfold().drop()"
+ ]
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "Python 3",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.10.8"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/src/graph_notebook/notebooks/06-Language-Tutorials/03-Gremlin/__init__.py b/src/graph_notebook/notebooks/06-Language-Tutorials/03-Gremlin/__init__.py
new file mode 100644
index 00000000..e69de29b
diff --git a/test/unit/notebooks/test_validate_notebooks.py b/test/unit/notebooks/test_validate_notebooks.py
index a9b39338..1955c837 100644
--- a/test/unit/notebooks/test_validate_notebooks.py
+++ b/test/unit/notebooks/test_validate_notebooks.py
@@ -62,7 +62,12 @@ def test_no_extra_notebooks(self):
f'{NOTEBOOK_BASE_DIR}/06-Language-Tutorials/02-openCypher/02-Variable-Length-Paths.ipynb',
f'{NOTEBOOK_BASE_DIR}/06-Language-Tutorials/02-openCypher/03-Ordering-Functions-Grouping.ipynb',
f'{NOTEBOOK_BASE_DIR}/06-Language-Tutorials/02-openCypher/04-Creating-Updating-Delete-Queries.ipynb',
- f'{NOTEBOOK_BASE_DIR}/06-Language-Tutorials/02-openCypher/openCypher-Exercises-Answer-Key.ipynb']
+ f'{NOTEBOOK_BASE_DIR}/06-Language-Tutorials/02-openCypher/openCypher-Exercises-Answer-Key.ipynb',
+ f'{NOTEBOOK_BASE_DIR}/06-Language-Tutorials/03-Gremlin/01-Basic-Read-Queries.ipynb',
+ f'{NOTEBOOK_BASE_DIR}/06-Language-Tutorials/03-Gremlin/02-Loops-Repeats.ipynb',
+ f'{NOTEBOOK_BASE_DIR}/06-Language-Tutorials/03-Gremlin/03-Ordering-Functions-Grouping.ipynb',
+ f'{NOTEBOOK_BASE_DIR}/06-Language-Tutorials/03-Gremlin/04-Creating-Updating-Deleting-Queries.ipynb',
+ f'{NOTEBOOK_BASE_DIR}/06-Language-Tutorials/03-Gremlin/Gremlin-Exercises-Answer-Sheet.ipynb']
notebook_paths = get_all_notebooks_paths()
expected_paths.sort()
notebook_paths.sort()