Skip to content

Commit

Permalink
Merge remote-tracking branch 'origin/v2docs' into gh-534-initial-docs…
Browse files Browse the repository at this point in the history
…-for-federated-poc
  • Loading branch information
tb06904 committed Oct 30, 2024
2 parents 9f4ab21 + 0040521 commit a05e0f5
Show file tree
Hide file tree
Showing 11 changed files with 202 additions and 72 deletions.
1 change: 1 addition & 0 deletions docs/administration-guide/gaffer-deployment/gremlin.md
Original file line number Diff line number Diff line change
Expand Up @@ -85,6 +85,7 @@ A full breakdown of the available properties is as follows:
| `gaffer.schemas` | The path to the directory containing the graph schema files. | No |
| `gaffer.userId` | The default user ID for the Tinkerpop graph. | No (User is always set via the [`UserFactory`](../security/user-control.md).) |
| `gaffer.dataAuths` | The default data auths for the user to specify what operations can be performed | No |
| `gaffer.rest.timeout` | The timeout for gremlin queries submitted to the REST API in ms. Default is 2 mins if not specified. | Yes |
| `gaffer.operation.options` | Default `Operation` options in the form `key:value` (this can be overridden per query see [here](../../user-guide/query/gremlin/custom-features.md)) | Yes |
| `gaffer.elements.getalllimit` | The default limit for unseeded queries e.g. `g.V()`. | Yes |
| `gaffer.elements.hasstepfilterstage` | The default stage to apply any `has()` steps e.g. `PRE_AGGREGATION` | Yes |
65 changes: 64 additions & 1 deletion docs/administration-guide/gaffer-stores/accumulo-store.md
Original file line number Diff line number Diff line change
Expand Up @@ -326,4 +326,67 @@ If you have the accumulo cluster shell running, you can set these scan auths dir

```sh
setauths -u root -s vis1,vis2,publicVisibility,privateVisibility,public,private
```
```

#### Unexpected MatchedVertex on Edges

You may notice that sometimes `MatchedVertex` is included on edges when you might not be expecting it.
When you seed with a mixture of EdgeSeeds and EntitySeeds, `MatchedVertex` will always be included on edges whether they were matched by a vertex or not. In this case `MatchedVertex` will always equal `SOURCE`.
This is a peculiarity of the Accumulo store.

!!! example "Example Query"
``` mermaid
graph TD
1 --> 2
2 --> 3
3 --> 4
```

=== "JSON"
```json
{
"class": "GetElements",
"input": [
{
"class": "EdgeSeed",
"source": "1",
"destination": "2"
},
{
"class": "EntitySeed",
"vertex": "4"
},
],
"view": {
"allEdges": true
}
}
```

Results:

=== "JSON"
```json
[
{
"class": "uk.gov.gchq.gaffer.data.element.Edge",
"group": "example",
"source": "3",
"destination": "4",
"directed": true,
"matchedVertex": "DESTINATION",
"properties": {}
},
{
"class": "uk.gov.gchq.gaffer.data.element.Edge",
"group": "example",
"source": "1",
"destination": "2",
"directed": true,
"matchedVertex": "SOURCE",
"properties": {}
}
]
```
The 1 -> 2 edge has MatchedVertex=SOURCE even though the source wasn't matched by an EntitySeed.

18 changes: 9 additions & 9 deletions docs/administration-guide/named-views.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

This guide walks through configuring Gaffer to use Named Views and how to run them.

Named Views allow users to store a [View](../user-guide/query/gaffer-syntax/filtering.md)
Named Views allow users to store a [View](../user-guide/query/gaffer-syntax/filtering.md)
in the cache, this can then be called on in OperationChains or in NamedOperations.

The benefit of using Named Views is that it allows you to store lengthy or complex Views
Expand All @@ -18,7 +18,7 @@ For details on potential caches and how to configure them, see the [Stores Guide

Named Views are enabled by default. To disable this feature the [store property](../administration-guide/gaffer-stores/store-guide.md#all-general-store-properties) `gaffer.store.namedview.enabled` should be set to false.

There are [three operations](../reference/operations-guide/named.md#addnamedview) which manage Named Views.
There are [three operations](../reference/operations-guide/named.md#addnamedview) which manage Named Views.
These are `AddNamedView`, `GetAllNamedViews` and `DeleteNamedView`.

The examples below use the following graph:
Expand Down Expand Up @@ -70,7 +70,7 @@ graph LR

``` json
{
"class" : "AddNamedVieww",
"class" : "AddNamedView",
"name" : "exampleNamedView",
"description" : "Example Named View",
"view": {
Expand Down Expand Up @@ -100,7 +100,7 @@ graph LR
=== "Python"

``` python
g.AddNamedView(
g.AddNamedView(
view = g.View(
edges = [
g.ElementDefinition(
Expand All @@ -117,7 +117,7 @@ graph LR
)
]
),
overwrite_flag=True
overwrite_flag=True
)
```

Expand All @@ -139,7 +139,7 @@ graph LR

=== "JSON"

```json
```json
{
"class": "GetElements",
"input": [
Expand All @@ -149,7 +149,7 @@ graph LR
}
],
"view": {
"class": "NamedView",
"class": "NamedView",
"name": "exampleNamedView"
}
}
Expand All @@ -171,8 +171,8 @@ graph LR
```

!!! example "Delete a NamedView"
This removes the NamedView from the cache. Note that if you delete a Named View
any Operation Chains or Named Operations which reference it will fail.
This removes the NamedView from the cache. Note that if you delete a Named View
any Operation Chains or Named Operations which reference it will fail.

=== "Java"

Expand Down
4 changes: 2 additions & 2 deletions docs/reference/glossary.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,8 @@ hide:
| Term | Description |
| :---------------- | :----------------------------------- |
| Entity | An entity represents a point in a graph |
| Edge | An edge is a connection between two entities |
| Vertex | In Gaffer, a vertex is the id of an entity |
| Edge | An edge is a connection between two vertices |
| Vertex | In Gaffer, a vertex is the id of an entity. Note that a vertex can exist on an edge without an associated entity. Any querying will only show these vertices on their associated edges |
| Node | A node is what Gaffer calls an entity |
| Properties | A property is a key/value pair that stores data on both edges and entities |
| Element | The word is used to describe edges or entities |
Expand Down
8 changes: 4 additions & 4 deletions docs/reference/gremlin-guide/gaffer-options.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,19 +33,19 @@ Note that any options should be passed as a list or dictionary.
g.with_("operationOptions", {"gaffer.federatedstore.operation.graphIds": "graphA"}).V().to_list()
```

## GetAllElements Limit
## GetElements Limit

Key `getAllElementsLimit`
Key `getElementsLimit`

Limits the amount of elements returned if performing an unseeded query e.g. a
Limits the amount of elements returned if performing a query which returns a large amount of elements e.g. a
`GetAllElements` operation. This will override the default for the current
query, see the [admin guide](../../administration-guide/gaffer-deployment/gremlin.md#configuring-the-gafferpop-library)
for more detail on setting up defaults.

!!! example

```groovy
g.with("getAllElementsLimit", 100).V().toList()
g.with("getElementsLimit", 100).V().toList()
```

## Has Step Filter Stage
Expand Down
2 changes: 1 addition & 1 deletion docs/reference/properties-guide/type.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Type Properties

The `TypeValue` ([Javadoc](https://gchq.github.io/Gaffer/uk/gov/gchq/gaffer/types/TypeValue.html)) and `TypeSubTypeValue` ([Javadoc]()) are special properties which are similar to `String`, but also store a secondary string ('type') or secondary and tertiary strings ('type' & 'subtype').
The `TypeValue` ([Javadoc](https://gchq.github.io/Gaffer/uk/gov/gchq/gaffer/types/TypeValue.html)) and `TypeSubTypeValue` ([Javadoc](https://gchq.github.io/Gaffer/uk/gov/gchq/gaffer/types/TypeSubTypeValue.html)) are special properties which are similar to `String`, but also store a secondary string ('type') or secondary and tertiary strings ('type' & 'subtype').

## Predicate Support

Expand Down
106 changes: 106 additions & 0 deletions docs/user-guide/apis/gremlin-api.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
# Gremlin API

!!! warning
The Gremlin API is still under development and has some [limitations](../query/gremlin/gremlin-limits.md).
The implementation may not allow some advanced features of Gremlin and it's
performance is unknown in comparison to standard Gaffer `OperationChains`.

## What is Gremlin?

[Gremlin](https://tinkerpop.apache.org/gremlin.html) is a query language for
traversing graphs. It is a core component of the Apache Tinkerpop library and
allows users to easily express more complex graph queries.

GafferPop is a lightweight Gaffer implementation of the [TinkerPop framework](https://tinkerpop.apache.org/),
where TinkerPop methods are delegated to Gaffer graph operations.

The addition of Gremlin as query language in Gaffer allows users to represent
complex graph queries in a simpler language, akin to other querying languages
used in traditional and NoSQL databases. It also has wide support for various
languages; for example you can write queries in Python via the [`gremlinpython`](https://pypi.org/project/gremlinpython/)
library.

!!! tip
In-depth tutorials on Gremlin as a query language and its associated libraries
can be found in the [Apache Tinkerpop Gremlin docs](https://tinkerpop.apache.org/gremlin.html).

## How to Query a Graph

There are two main methods of using Gremlin in Gaffer, these are via a websocket
similar to a typical [Gremlin Server](https://tinkerpop.apache.org/docs/current/reference/#connecting-gremlin-server)
or by submitting queries via the REST Endpoints like standard Gaffer Operations.
Once connected, the [Gremlin in Gaffer](../query/gremlin/gremlin.md) page
provides a simple comparison of Gremlin compared to Gaffer Operations.

!!! note
Both methods require a running [Gaffer REST API](./rest-api.md) instance.

### Websocket API

The websocket provides the most _standard_ way to use the Gremlin API. The
Gaffer REST API provides a Gremlin server-like experience via a websocket at
`/gremlin`. This can be connected to to provide a graph traversal source for
spawning queries.

The websocket should support all standard Gremlin tooling and uses GraphSONv3
serialisation for communication. To connect a tool like [`gremlinpython`](https://pypi.org/project/gremlinpython/)
we can do something similar to [`gafferpy`](./python-api.md). First import the
required libraries (many of these will be needed later for queries):

```python
from gremlin_python.process.anonymous_traversal import traversal
from gremlin_python.driver.driver_remote_connection import DriverRemoteConnection
from gremlin_python.driver.serializer import GraphSONSerializersV3d0
from gremlin_python.process.graph_traversal import __
```

We can then establish a connection to the Gremlin server and save the reference
(typically called `g`):

```python
# Setup a connection with the REST API running on localhost
g = traversal().with_remote(DriverRemoteConnection('ws://localhost:8080/gremlin', 'g', message_serializer=GraphSONSerializersV3d0()))
```

Now that we have the traversal reference this can be used to spawn graph traversals
and get results back.

### REST API Endpoints

The Gremlin endpoints provide a similar interface to running Gaffer Operations.
They accept a plaintext Gremlin Groovy or OpenCypher query and will return
the results in [GraphSONv3](https://tinkerpop.apache.org/docs/current/dev/io/#graphson-3d0)
format.

The two endpoints are:

- `/rest/gremlin/execute` - Runs a Gremlin Groovy script and outputs the result
as GraphSONv3 JSON.
- `/rest/gremlin/cypher/execute` - Translates a Cypher query to Gremlin and
executes it returning a GraphSONv3 JSON result. Note will always append a
`.toList()` to the translation.

A query can be submitted via the Swagger UI or simple POST request such as:

```bash
curl -X 'POST' \
'http://localhost:8080/rest/gremlin/execute' \
-H 'accept: application/x-ndjson' \
-H 'Content-Type: text/plain' \
-d 'g.V().hasLabel('\''something'\'').toList()'
```

You can also utilise [Gafferpy](./python-api.md) to connect and run queries
using the endpoints.

```python
from gafferpy import gaffer_connector

gc = gaffer_connector.GafferConnector("http://localhost:8080/rest")

# Execute and return gremlin
gremlin_result = gc.execute_gremlin("g.V('1').toList()")

# Execute and return cypher
cypher_result = gc.execute_cypher("MATCH (n) WHERE ID(n) = '1' RETURN n")
```
3 changes: 3 additions & 0 deletions docs/user-guide/query/gremlin/custom-features.md
Original file line number Diff line number Diff line change
Expand Up @@ -229,3 +229,6 @@ data in that reaches the last step then that step will be missing from the expla
in the Graph schema.
- All submitted Cypher explains will be translated to Gremlin first and have a `.toList()`
appended to the translation so it is actually executed.
- An explanation of a Gremlin `project()` step will not include all the Operations called.
As a Gremlin `project` is essentially a for-each loop the explain will only include the
last iteration of the loop.
8 changes: 5 additions & 3 deletions docs/user-guide/query/gremlin/gremlin-limits.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,14 +22,16 @@ Current known limitations or bugs:
- Edge IDs in GafferPop are not the same as in standard Gremlin. Instead of `g.E(11)`
edge IDs take the format `g.E("[source, dest]")` or `g.E("[source, label, dest]")`.
- The entity group `id` is reserved for an empty group containing only the
vertex ID, this is currently used as a workaround for other limitations.
vertex ID, this is currently used as a workaround for other limitations. One such
use is for holding 'orphaned' vertexes, these are vertexes on an edge that do not
have a Gaffer entity associated with them.
- Chaining `hasLabel()` calls together like `hasLabel("label1").hasLabel("label2")`
will act like an OR rather than an AND in standard Gremlin. This means you
may get results back when you realistically shouldn't.
- Input seeds to Gaffer operations are deduplicated.
Therefore, the results of a query against a GafferPop graph may be different than a standard Gremlin graph.
Therefore, the results of a query against a GafferPop graph may be different than a standard Gremlin graph.
For example, for the Tinkerpop Modern graph:
```
```text
(Gremlin) g.V().out() = [v2, v3, v3, v3, v4, v5]
(GafferPop) g.V().out() = [v2, v3, v4, v5]
```
Expand Down
Loading

0 comments on commit a05e0f5

Please sign in to comment.