At this point, readers should be well familiar with the Introduction to this Reference Documentation and will likely be thinking about implementation details specific to the graph provider they have selected as well as the programming language they intend to use. The choice of programming language could have implications to the architecture and design of the application and the choice itself may have limits imposed upon it by the chosen graph provider. For example, a Remote Gremlin Provider will require the selection of a driver to interact with it. On the other hand, a graph system that is designed for embedded use, like TinkerGraph, needs the Java Virtual Machine (JVM) environment which is easily accessed with a JVM programming language. If however the programming language is not built for the JVM then it will require Gremlin Server in the architecture as well.
TinkerPop provides an array of drivers in different programming languages as a way to connect to a remote Gremlin Server or Remote Gremlin Provider. Drivers allow the developer to make requests to that remote system and get back results from the TinkerPop-enabled graphs hosted within. A driver can submit Gremlin strings and Gremlin bytecode over this sub-protocol. Gremlin strings are written in the scripting language made available by the remote system that the driver is connecting to (typically, Groovy-based). This connection approach is quite similar to what developers are likely familiar with when using JDBC and SQL.
The preferred approach is to use bytecode-based requests, which essentially allows the ability to craft Gremlin directly in the programming language of choice. As Gremlin makes use of two fundamental programming constructs: function composition and function nesting, it is possible to embed the Gremlin language in any modern programming language. It is a far more natural way to program, because it enables IDE interaction, compile time checks, and language level checks that can help prevent errors prior to execution. The differences between these two approaches were outlined in the Connecting Via Drivers Section, which applies to Gremlin Server, but also to Remote Gremlin Providers.
In addition to the languages and drivers that TinkerPop supports, there are also third-party implementations, as well as extensions to the Gremlin language that might be specific to a particular graph provider. That listing can be found on the TinkerPop home page. Their description is beyond the scope of this documentation.
Tip
|
When possible, it is typically best to align the version of TinkerPop used on the client with the version supported on the server. While it is not impossible to have a different version between client and server, it may require additional configuration and/or a deeper knowledge of that changes introduced between versions. It’s simply safer to avoid the conflict, when allowed to do so. |
Important
|
Gremlin-Java is the canonical representation of Gremlin and any (proper) Gremlin language variant will emulate its structure as best as possible given the constructs of the host language. A strong correspondence between variants ensures that the general Gremlin reference documentation is applicable to all variants and that users moving between development languages can easily adopt the Gremlin variant for that language. |
The following sections describe each language variant and driver that is officially TinkerPop a part of the project, providing more detailed information about usage, configuration and known limitations.
Apache TinkerPop’s Gremlin-Java implements Gremlin within the Java language and can be used by any Java Virtual Machine. Gremlin-Java is considered the canonical, reference implementation of Gremlin and serves as the foundation by which all other Gremlin language variants should emulate. As the Gremlin Traversal Machine that processes Gremlin queries is also written in Java, it can be used in all three connection methods described in the Connecting Gremlin Section.
<dependency>
<groupId>org.apache.tinkerpop</groupId>
<artifactId>gremlin-core</artifactId>
<version>x.y.z</version>
</dependency>
<!-- when using Gremlin Server or Remote Gremlin Provider a driver is required -->
<dependency>
<groupId>org.apache.tinkerpop</groupId>
<artifactId>gremlin-driver</artifactId>
<version>x.y.z</version>
</dependency>
<!--
alternatively the driver is packaged as an uberjar with shaded non-optional dependencies including gremlin-core and
tinkergraph-gremlin which are not shaded.
-->
<dependency>
<groupId>org.apache.tinkerpop</groupId>
<artifactId>gremlin-driver</artifactId>
<version>x.y.z</version>
<classifier>shaded</classifier>
<!-- The shaded JAR uses the original POM, therefore conflicts may still need resolution -->
<exclusions>
<exclusion>
<groupId>io.netty</groupId>
<artifactId>*</artifactId>
</exclusion>
</exclusions>
</dependency>
The pattern for connecting is described in Connecting Gremlin and it basically distills down
to creating a GraphTraversalSource
. For embedded mode, this involves first creating a
Graph
and then spawning the GraphTraversalSource
:
Graph graph = ...;
GraphTraversalSource g = traversal().withEmbedded(graph);
Using "g" it is then possible to start writing Gremlin. The "g" allows for the setting of many configuration options which affect traversal execution. The Traversal Section describes some of these options and some are only suitable with embedded style usage. For remote options however there are some added configurations to consider and this section looks to address those.
When connecting to Gremlin Server or Remote Gremlin Providers it
is possible to configure the DriverRemoteConnection
manually as shown in earlier examples where the host and port
are provided as follows:
GraphTraversalSource g = traversal().withRemote(DriverRemoteConnection.using("localhost",8182,"g"));
It is also possible to create it from a configuration. The most basic way to do so involves the following line of code:
GraphTraversalSource g = traversal().withRemote('conf/remote-graph.properties');
The remote-graph.properties
file simply provides connection information to the GraphTraversalSource
which is used
to configure a RemoteConnection
. That file looks like this:
gremlin.remote.remoteConnectionClass=org.apache.tinkerpop.gremlin.driver.remote.DriverRemoteConnection
gremlin.remote.driver.clusterFile=conf/remote-objects.yaml
gremlin.remote.driver.sourceName=g
The RemoteConnection
is an interface that provides the transport mechanism for "g" and makes it possible to for
that mechanism to be altered (typically by graph providers who have their own protocols). TinkerPop provides one such
implementation called the DriverRemoteConnection
which enables transport over Gremlin Server protocols using the
TinkerPop driver. The driver is configured by the specified gremlin.remote.driver.clusterFile
and the local "g" is
bound to the GraphTraversalSource
on the remote end with gremlin.remote.driver.sourceName
which in this case is
also "g".
There are other ways to configure the traversal using withRemote()
as it has other overloads. It can take an
Apache Commons Configuration
object which would have keys similar to those shown in the properties file and it
can also take a RemoteConnection
instance directly. The latter is interesting in that it means it is possible to
programmatically construct all aspects of the RemoteConnection
. For TinkerPop usage, that might mean directly
constructing the DriverRemoteConnection
and the driver instance that supplies the transport mechanism. For example,
the command shown above could be re-written using programmatic construction as follows:
Cluster cluster = Cluster.open();
GraphTraversalSource g = traversal().withRemote(DriverRemoteConnection.using(cluster, "g"));
Please consider the following example:
g = traversal().withRemote('conf/remote-graph.properties') g.V().elementMap() g.close()
GraphTraversalSource g = traversal().withRemote("conf/remote-graph.properties");
List<Map> list = g.V().elementMap();
g.close();
Note the call to close()
above. The call to withRemote()
internally instantiates a connection via the driver that
can only be released by "closing" the GraphTraversalSource
. It is important to take that step to release network
resources associated with g
.
If working with multiple remote TraversalSource
instances it is more efficient to construct Cluster
and Client
objects and then re-use them.
cluster = Cluster.open('conf/remote-objects.yaml') client = cluster.connect() g = traversal().withRemote(DriverRemoteConnection.using(client, "g")) g.V().elementMap() g.close() client.close() cluster.close()
If the Client
instance is supplied externally, as is shown above, then it is not closed implicitly by the close of
"g". Closing "g" will have no effect on "client" or "cluster". When supplying them externally, the Client
and
Cluster
objects must also be closed explicitly. It’s worth noting that the close of a Cluster
will close all
Client
instances spawned by the Cluster
.
Some connection options can also be set on individual requests made through the Java driver using with()
step
on the TraversalSource
. For instance to set request timeout to 500 milliseconds:
GraphTraversalSource g = traversal().withRemote(conf);
List<Vertex> vertices = g.with(Tokens.ARGS_EVAL_TIMEOUT, 500L).V().out("knows").toList()
The following options are allowed on a per-request basis in this fashion: batchSize
, requestId
, userAgent
and
evaluationTimeout
(formerly scriptEvaluationTimeout
which is also supported but now deprecated). Use of Tokens
to reference these options is preferred.
There are a number of classes, functions and tokens that are typically used with Gremlin. The following imports provide most of the common functionality required to use Gremlin:
import org.apache.tinkerpop.gremlin.process.traversal.dsl.graph.GraphTraversalSource;
import org.apache.tinkerpop.gremlin.process.traversal.IO;
import static org.apache.tinkerpop.gremlin.process.traversal.AnonymousTraversalSource.traversal;
import static org.apache.tinkerpop.gremlin.process.traversal.Operator.*;
import static org.apache.tinkerpop.gremlin.process.traversal.Order.*;
import static org.apache.tinkerpop.gremlin.process.traversal.P.*;
import static org.apache.tinkerpop.gremlin.process.traversal.Pop.*;
import static org.apache.tinkerpop.gremlin.process.traversal.SackFunctions.*;
import static org.apache.tinkerpop.gremlin.process.traversal.Scope.*;
import static org.apache.tinkerpop.gremlin.process.traversal.TextP.*;
import static org.apache.tinkerpop.gremlin.structure.Column.*;
import static org.apache.tinkerpop.gremlin.structure.Direction.*;
import static org.apache.tinkerpop.gremlin.structure.T.*;
import static org.apache.tinkerpop.gremlin.process.traversal.dsl.graph.__.*;
The following table describes the various configuration options for the Gremlin Driver:
Key | Description | Default |
---|---|---|
connectionPool.channelizer |
The fully qualified classname of the client |
|
connectionPool.enableSsl |
Determines if SSL should be enabled or not. If enabled on the server then it must be enabled on the client. |
false |
connectionPool.keepAliveInterval |
Length of time in milliseconds to wait on an idle connection before sending a keep-alive request. Set to zero to disable this feature. |
180000 |
connectionPool.keyStore |
The private key in JKS or PKCS#12 format. |
none |
connectionPool.keyStorePassword |
The password of the |
none |
connectionPool.keyStoreType |
|
none |
connectionPool.maxContentLength |
The maximum length in bytes that a message can be sent to the server. This number can be no greater than the setting of the same name in the server configuration. |
65536 |
connectionPool.maxInProcessPerConnection |
The maximum number of in-flight requests that can occur on a connection. |
4 |
connectionPool.maxSimultaneousUsagePerConnection |
The maximum number of times that a connection can be borrowed from the pool simultaneously. |
16 |
connectionPool.maxSize |
The maximum size of a connection pool for a host. |
8 |
connectionPool.maxWaitForConnection |
The amount of time in milliseconds to wait for a new connection before timing out. |
3000 |
connectionPool.maxWaitForClose |
The amount of time in milliseconds to wait for pending messages to be returned from the server before closing the connection. |
3000 |
connectionPool.minInProcessPerConnection |
The minimum number of in-flight requests that can occur on a connection. |
1 |
connectionPool.minSimultaneousUsagePerConnection |
The maximum number of times that a connection can be borrowed from the pool simultaneously. |
8 |
connectionPool.minSize |
The minimum size of a connection pool for a host. |
2 |
connectionPool.reconnectInterval |
The amount of time in milliseconds to wait before trying to reconnect to a dead host. |
1000 |
connectionPool.resultIterationBatchSize |
The override value for the size of the result batches to be returned from the server. |
64 |
connectionPool.sslCipherSuites |
The list of JSSE ciphers to support for SSL connections. If specified, only the ciphers that are listed and supported will be enabled. If not specified, the JVM default is used. |
none |
connectionPool.sslEnabledProtocols |
The list of SSL protocols to support for SSL connections. If specified, only the protocols that are listed and supported will be enabled. If not specified, the JVM default is used. |
none |
connectionPool.sslSkipCertValidation |
Configures the |
false |
connectionPool.trustStore |
File location for a SSL Certificate Chain to use when SSL is enabled. If this value is not provided and SSL is enabled, the default |
none |
connectionPool.trustStorePassword |
The password of the |
none |
connectionPool.validationRequest |
A script that is used to test server connectivity. A good script to use is one that evaluates quickly and returns no data. The default simply returns an empty string, but if a graph is required by a particular provider, a good traversal might be |
'' |
connectionPool.connectionSetupTimeoutMillis |
Duration of time in milliseconds provided for connection setup to complete which includes WebSocket protocol handshake and SSL handshake. |
15000 |
hosts |
The list of hosts that the driver will connect to. |
localhost |
jaasEntry |
Sets the |
none |
nioPoolSize |
Size of the pool for handling request/response operations. |
available processors |
password |
The password to submit on requests that require authentication. |
none |
path |
The URL path to the Gremlin Server. |
/gremlin |
port |
The port of the Gremlin Server to connect to. The same port will be applied for all hosts. |
8192 |
protocol |
Sets the |
none |
serializer.className |
The fully qualified class name of the |
none |
serializer.config |
A |
none |
username |
The username to submit on requests that require authentication. |
none |
workerPoolSize |
Size of the pool for handling background work. |
available processors * 2 |
Please see the Cluster.Builder javadoc to get more information on these settings.
Transactions with Java are best described in The Traversal - Transactions section of this documentation as Java covers both embedded and remote use cases.
Remote systems like Gremlin Server and Remote Gremlin Providers respond to requests made in a particular serialization format and respond by serializing results to some format to be interpreted by the client. For JVM-based languages, there are three options for serialization: Gryo, GraphSON and GraphBinary. It is important that the client and server have the same serializers configured in the same way or else one or the other will experience serialization exceptions and fail to always communicate. Discrepancy in serializer registration between client and server can happen fairly easily as different graph systems may automatically include serializers on the server-side, thus leaving the client to be configured manually. As an example:
IoRegistry registry = ...; // an IoRegistry instance exposed by a specific graph provider
TypeSerializerRegistry typeSerializerRegistry = TypeSerializerRegistry.build().addRegistry(registry).create();
MessageSerializer serializer = new GraphBinaryMessageSerializerV1(typeSerializerRegistry);
Cluster cluster = Cluster.build().
serializer(serializer).
create();
Client client = cluster.connect();
GraphTraversalSource g = traversal().withRemote(DriverRemoteConnection.using(client, "g"));
The IoRegistry
tells the serializer what classes from the graph provider to auto-register during serialization.
Gremlin Server roughly uses this same approach when it configures its serializers, so using this same model will
ensure compatibility when making requests. Obviously, it is possible to switch to GraphSON or Gryo by using
the appropriate MessageSerializer
(e.g. GraphSONMessageSerializerV3d0
or GryoMessageSerializerV3d0
respectively)
in the same way and building that into the Cluster
object.
Note
|
Gryo is no longer the preferred binary serialization format for Gremlin Server - please prefer GraphBinary. |
Supporting anonymous functions across languages is difficult as most languages do not support lambda introspection and thus, code analysis. In Gremlin-Java and with embedded usage, lambdas can be leveraged directly:
g.V().out("knows").map(t -> t.get().value("name") + " is the friend name") (1)
g.V().out("knows").sideEffect(System.out::println) (2)
g.V().as("a").out("knows").as("b").select("b").by((Function<Vertex, Integer>) v -> v.<String>value("name").length()) (3)
-
A Java
Function
is used to map aTraverser<S>
to an objectE
. -
Gremlin steps that take consumer arguments can be passed Java method references.
-
Gremlin-Java may sometimes require explicit lambda typing when types can not be automatically inferred.
When sending traversals remotely to Gremlin Server or
Remote Gremlin Providers, the static methods of Lambda
should be used and should denote a
particular JSR-223 ScriptEngine
that is available on the remote end (typically, this is Groovy). Lambda
creates a
string-based lambda that is then converted into a lambda/closure/anonymous-function/etc. by the respective lambda
language’s JSR-223 ScriptEngine
implementation.
g.V().out("knows").map(Lambda.function("it.get().value('name') + ' is the friend name'"))
g.V().out("knows").sideEffect(Lambda.consumer("println it"))
g.V().as("a").out("knows").as("b").select("b").by(Lambda.<Vertex,Integer>function("it.value('name').length()"))
Finally, Gremlin Bytecode
that includes lambdas requires that the traversal be processed by the
ScriptEngine
. To avoid continued recompilation costs, it supports the encoding of bindings, which allow Gremlin
Server to cache traversals that will be reused over and over again save that some parameterization may change. Thus,
instead of translating, compiling, and then executing each submitted bytecode request, it is possible to simply
execute. To express bindings in Java, use Bindings
.
b = Bindings.instance()
g.V(b.of('id',1)).out('created').values('name').map{t -> "name: " + t.get() }
g.V(b.of('id',4)).out('created').values('name').map{t -> "name: " + t.get() }
g.V(b.of('id',4)).out('created').values('name').getBytecode()
g.V(b.of('id',4)).out('created').values('name').getBytecode().getBindings()
cluster.close()
Both traversals are abstractly defined as g.V(id).out('created').values('name').map{t → "name: " + t.get() }
and
thus, the first submission can be cached for faster evaluation on the next submission.
Warning
|
It is generally advised to avoid lambda usage. Please consider A Note On Lambdas for more information. |
TinkerPop comes equipped with a reference client for Java-based
applications. It is referred to as gremlin-driver
, which enables applications to send requests to Gremlin Server
and get back results.
Gremlin scripts are sent to the server from a Client
instance. A Client
is created as follows:
Cluster cluster = Cluster.open(); (1)
Client client = cluster.connect(); (2)
-
Opens a reference to
localhost
- note that there are many configuration options available in defining aCluster
object. -
Creates a
Client
given the configuration options of theCluster
.
Once a Client
instance is ready, it is possible to issue some Gremlin Groovy scripts:
ResultSet results = client.submit("[1,2,3,4]"); (1)
results.stream().map(i -> i.get(Integer.class) * 2); (2)
CompletableFuture<List<Result>> results = client.submit("[1,2,3,4]").all(); (3)
CompletableFuture<ResultSet> future = client.submitAsync("[1,2,3,4]"); (4)
Map<String,Object> params = new HashMap<>();
params.put("x",4);
client.submit("[1,2,3,x]", params); (5)
-
Submits a script that simply returns a
List
of integers. This method blocks until the request is written to the server and aResultSet
is constructed. -
Even though the
ResultSet
is constructed, it does not mean that the server has sent back the results (or even evaluated the script potentially). TheResultSet
is just a holder that is awaiting the results from the server. In this case, they are streamed from the server as they arrive. -
Submit a script, get a
ResultSet
, then return aCompletableFuture
that will be called when all results have been returned. -
Submit a script asynchronously without waiting for the request to be written to the server.
-
Parameterized request are considered the most efficient way to send Gremlin to the server as they can be cached, which will boost performance and reduce resources required on the server.
There are a number of overloads to Client.submit()
that accept a RequestOptions
object. The RequestOptions
provide a way to include options that are specific to the request made with the call to submit()
. A good use-case for
this feature is to set a per-request override to the evaluationTimeout
so that it only applies to the current
request.
Cluster cluster = Cluster.open();
Client client = cluster.connect();
RequestOptions options = RequestOptions.build().timeout(500).create();
List<Result> result = client.submit("g.V().repeat(both()).times(100)", options).all().get();
The preferred method for setting a per-request timeout for scripts is demonstrated above, but those familiar with
bytecode may try g.with(EVALUATION_TIMEOUT, 500)
within a script. Gremlin Server will respect timeouts set this way
in scripts as well. With scripts of course, it is possible to send multiple traversals at once in the same script.
In such events, the timeout for the request is interpreted as a sum of all timeouts identified in the script.
RequestOptions options = RequestOptions.build().timeout(500).create();
List<Result> result = client.submit("g.with(EVALUATION_TIMEOUT, 500).addV().iterate();" +
"g.addV().iterate();
"g.with(EVALUATION_TIMEOUT, 500).addV();", options).all().get();
In the above example, RequestOptions
defines a timeout of 500 milliseconds, but the script has three traversals with
two internal settings for the timeout using with()
. The request timeout used by the server will therefore be 1000
milliseconds (overriding the 500 which itself was an override for whatever configuration was on the server).
Scripts submitted to Gremlin Server automatically have the globally configured Graph
and TraversalSource
instances
made available to them. Therefore, if Gremlin Server configures two TraversalSource
instances called "g1" and "g2"
a script can simply reference them directly as:
client.submit("g1.V()")
client.submit("g2.V()")
While this is an acceptable way to submit scripts, it has the downside of forcing the client to encode the server-side
variable name directly into the script being sent. If the server configuration ever changed such that "g1" became
"g100", the client-side code might have to see a significant amount of change. Decoupling the script code from the
server configuration can be managed by the alias
method on Client
as follows:
Client g1Client = client.alias("g1")
Client g2Client = client.alias("g2")
g1Client.submit("g.V()")
g2Client.submit("g.V()")
The above code demonstrates how the alias
method can be used such that the script need only contain a reference
to "g" and "g1" and "g2" are automatically rebound into "g" on the server-side.
Creating a Domain Specific Language (DSL) in Java requires the @GremlinDsl
Java annotation in gremlin-core
.
This annotation should be applied to a "DSL interface" that extends GraphTraversal.Admin
:
@GremlinDsl
public interface SocialTraversalDsl<S, E> extends GraphTraversal.Admin<S, E> {
}
Important
|
The name of the DSL interface should be suffixed with "TraversalDSL". All characters in the interface name before that become the "name" of the DSL. |
In this interface, define the methods that the DSL will be composed of:
@GremlinDsl
public interface SocialTraversalDsl<S, E> extends GraphTraversal.Admin<S, E> {
public default GraphTraversal<S, Vertex> knows(String personName) {
return out("knows").hasLabel("person").has("name", personName);
}
public default <E2 extends Number> GraphTraversal<S, E2> youngestFriendsAge() {
return out("knows").hasLabel("person").values("age").min();
}
public default GraphTraversal<S, Long> createdAtLeast(int number) {
return outE("created").count().is(P.gte(number));
}
}
Important
|
Follow the TinkerPop convention of using <S,E> in naming generics as those conventions are taken into
account when generating the anonymous traversal class. The processor attempts to infer the appropriate type parameters
when generating the anonymous traversal class. If it cannot do it correctly, it is possible to avoid the inference by
using the GremlinDsl.AnonymousMethod annotation on the DSL method. It allows explicit specification of the types to
use.
|
The @GremlinDsl
annotation is used by the Java Annotation Processor
to generate the boilerplate class structure required to properly use the DSL within the TinkerPop framework. These
classes can be generated and maintained by hand, but it would be time consuming, monotonous and error-prone to do so.
Typically, the Java compilation process is automatically configured to detect annotation processors on the classpath
and will automatically use them when found. If that does not happen, it may be necessary to make configuration changes
to the build to allow for the compilation process to be aware of the following javax.annotation.processing.Processor
implementation:
org.apache.tinkerpop.gremlin.process.traversal.dsl.GremlinDslProcessor
The annotation processor will generate several classes for the DSL:
-
SocialTraversal
- ATraversal
interface that extends theSocialTraversalDsl
proxying methods to its underlying interfaces (such asGraphTraversal
) to instead return aSocialTraversal
-
DefaultSocialTraversal
- A default implementation ofSocialTraversal
(typically not used directly by the user) -
SocialTraversalSource
- SpawnsDefaultSocialTraversal
instances. -
__
- Spawns anonymousDefaultSocialTraversal
instances.
Using the DSL then just involves telling the Graph
to use it:
SocialTraversalSource social = traversal(SocialTraversalSource.class).withEmbedded(graph);
social.V().has("name","marko").knows("josh");
The SocialTraversalSource
can also be customized with DSL functions. As an additional step, include a class that
extends from GraphTraversalSource
and with a name that is suffixed with "TraversalSourceDsl". Include in this class,
any custom methods required by the DSL:
public class SocialTraversalSourceDsl extends GraphTraversalSource {
public SocialTraversalSourceDsl(Graph graph, TraversalStrategies traversalStrategies) {
super(graph, traversalStrategies);
}
public SocialTraversalSourceDsl(Graph graph) {
super(graph);
}
public SocialTraversalSourceDsl(RemoteConnection connection) {
super(connection);
}
public GraphTraversal<Vertex, Vertex> persons(String... names) {
GraphTraversalSource clone = this.clone();
// Manually add a "start" step for the traversal in this case the equivalent of V(). GraphStep is marked
// as a "start" step by passing "true" in the constructor.
clone.getBytecode().addStep(GraphTraversal.Symbols.V);
GraphTraversal<Vertex, Vertex> traversal = new DefaultGraphTraversal<>(clone);
traversal.asAdmin().addStep(new GraphStep<>(traversal.asAdmin(), Vertex.class, true));
traversal = traversal.hasLabel("person");
if (names.length > 0) traversal = traversal.has("name", P.within(names));
return traversal;
}
}
Then, back in the SocialTraversal
interface, update the GremlinDsl
annotation with the traversalSource
argument
to point to the fully qualified class name of the SocialTraversalSourceDsl
:
@GremlinDsl(traversalSource = "com.company.SocialTraversalSourceDsl")
public interface SocialTraversalDsl<S, E> extends GraphTraversal.Admin<S, E> {
...
}
It is then possible to use the persons()
method to start traversals:
SocialTraversalSource social = traversal(SocialTraversalSource.class).withEmbedded(graph);
social.persons("marko").knows("josh");
Note
|
Using Maven, as shown in the gremlin-archetype-dsl module, makes developing DSLs with the annotation processor
straightforward in that it sets up appropriate paths to the generated code automatically.
|
The available Maven archetypes are as follows:
-
gremlin-archetype-dsl
- An example project that demonstrates how to build Domain Specific Languages with Gremlin in Java. -
gremlin-archetype-server
- An example project that demonstrates the basic structure of a Gremlin Server project, how to connect with the Gremlin Driver, and how to embed Gremlin Server in a testing framework. -
gremlin-archetype-tinkergraph
- A basic example of how to structure a TinkerPop project with Maven.
Use Maven to generate these example projects with a command like:
$ mvn archetype:generate -DarchetypeGroupId=org.apache.tinkerpop -DarchetypeArtifactId=gremlin-archetype-server \
-DarchetypeVersion=x.y.z -DgroupId=com.my -DartifactId=app -Dversion=0.1 -DinteractiveMode=false
This command will generate a new Maven project in a directory called "app" with a pom.xml
specifying a groupId
of
com.my
. Please see the README.asciidoc
in the root of each generated project for information on how to build and
execute it.
Apache TinkerPop’s Gremlin-Groovy implements Gremlin within the Apache Groovy language. As a JVM-based language variant, Gremlin-Groovy is backed by Gremlin-Java constructs. Moreover, given its scripting nature, Gremlin-Groovy serves as the language of Gremlin Console and Gremlin Server.
compile group: 'org.apache.tinkerpop', name: 'gremlin-core', version: 'x.y.z'
compile group: 'org.apache.tinkerpop', name: 'gremlin-driver', version: 'x.y.z'
In Groovy, as
, in
, and not
are reserved words. Gremlin-Groovy does not allow these steps to be called
statically from the anonymous traversal __
and therefore, must always be prefixed with __.
For instance:
g.V().as('a').in().as('b').where(__.not(__.as('a').out().as('b')))
Since Groovy has access to the full JVM as Java does, it is possible to construct Date
-like objects directly, but
the Gremlin language does offer a datetime()
function that is exposed in the Gremlin Console and as a function for
Gremlin scripts sent to Gremlin Server. The function accepts the following forms of dates and times using a default
time zone offset of UTC(+00:00):
-
2018-03-22
-
2018-03-22T00:35:44
-
2018-03-22T00:35:44Z
-
2018-03-22T00:35:44.741
-
2018-03-22T00:35:44.741Z
-
2018-03-22T00:35:44.741+1600
Apache TinkerPop’s Gremlin-Python implements Gremlin within
the Python language and can be used on any Python virtual machine including the popular
CPython machine. Python’s syntax has the same constructs as Java including
"dot notation" for function chaining (a.b.c
), round bracket function arguments (a(b,c)
), and support for global
namespaces (a(b())
vs a(__.b())
). As such, anyone familiar with Gremlin-Java will immediately be able to work
with Gremlin-Python. Moreover, there are a few added constructs to Gremlin-Python that make traversals a bit more
succinct.
To install Gremlin-Python, use Python’s pip package manager.
pip install gremlinpython
pip install gremlinpython[kerberos] # Optional, not available on Microsoft Windows
The pattern for connecting is described in Connecting Gremlin and it basically distills down to
creating a GraphTraversalSource
. A GraphTraversalSource
is created from the anonymous traversal()
method where
the "g" provided to the DriverRemoteConnection
corresponds to the name of a GraphTraversalSource
on the remote end.
g = traversal().withRemote(DriverRemoteConnection('ws://localhost:8182/gremlin','g'))
If you need to send additional headers in the websockets connection, you can pass an optional headers
parameter
to the DriverRemoteConnection
constructor.
g = traversal().withRemote(DriverRemoteConnection(
'ws://localhost:8182/gremlin', 'g', headers={'Header':'Value'}))
Gremlin-Python supports plain text and Kerberos SASL authentication, you can set it on the connection options.
# Plain text authentication
g = traversal().withRemote(DriverRemoteConnection(
'ws://localhost:8182/gremlin', 'g', username='stephen', password='password'))
# Kerberos authentication
g = traversal().withRemote(DriverRemoteConnection(
'ws://localhost:8182/gremlin', 'g', kerberized_service='gremlin@hostname.your.org'))
The value specified for the kerberized_service should correspond to the first part of the principal name configured for the gremlin service, but with the slash replaced by an at sign. The Gremlin-Python client reads the kerberos configurations from your system. It finds the KDC’s hostname and port from the krb5.conf file at the default location or as indicated in the KRB5_CONFIG environment variable. It finds credentials from the credential cache or a keytab file at the default locations or as indicated in the KRB5CCNAME or KRB5_KTNAME environment variables.
If you authenticate to a remote Gremlin Server or Remote Gremlin Provider, this server normally has SSL activated and the websockets url will start with 'wss://'. If Gremlin-Server uses a self-signed certificate for SSL, Gremlin-Python needs access to a local copy of the CA certificate file (in openssl .pem format), to be specified in the SSL_CERT_FILE environment variable.
Note
|
If connecting from an inherently single-threaded Python process where blocking while waiting for Gremlin
traversals to complete is acceptable, it might be helpful to set pool_size and max_workers parameters to 1.
See the Configuration section just below. Examples where this could apply are serverless cloud functions or WSGI
worker processes.
|
Some connection options can also be set on individual requests made through the using with()
step on the
TraversalSource
. For instance to set request timeout to 500 milliseconds:
vertices = g.with_('evaluationTimeout', 500).V().out('knows').toList()
The following options are allowed on a per-request basis in this fashion: batchSize
, requestId
, userAgent
and
evaluationTimeout
(formerly scriptEvaluationTimeout
which is also supported but now deprecated).
There are a number of classes, functions and tokens that are typically used with Gremlin. The following imports provide most of the typical functionality required to use Gremlin:
from gremlin_python import statics
from gremlin_python.process.anonymous_traversal import traversal
from gremlin_python.process.graph_traversal import __
from gremlin_python.process.strategies import *
from gremlin_python.driver.driver_remote_connection import DriverRemoteConnection
from gremlin_python.process.traversal import T
from gremlin_python.process.traversal import Order
from gremlin_python.process.traversal import Cardinality
from gremlin_python.process.traversal import Column
from gremlin_python.process.traversal import Direction
from gremlin_python.process.traversal import Operator
from gremlin_python.process.traversal import P
from gremlin_python.process.traversal import TextP
from gremlin_python.process.traversal import Pop
from gremlin_python.process.traversal import Scope
from gremlin_python.process.traversal import Barrier
from gremlin_python.process.traversal import Bindings
from gremlin_python.process.traversal import WithOptions
These can be used analogously to how they are used in Gremlin-Java.
>>> g.V().hasLabel('person').has('age',P.gt(30)).order().by('age',Order.desc).toList()
[v[6], v[4]]
Moreover, by importing the statics
of Gremlin-Python, the class prefixes can be omitted.
>>> statics.load_statics(globals())
With statics loaded its possible to represent the above traversal as below.
>>> g.V().hasLabel('person').has('age',gt(30)).order().by('age',desc).toList()
[v[6], v[4]]
Statics includes all the __
-methods and thus, anonymous traversals like __.out()
can be expressed as below.
That is, without the __
-prefix.
>>> g.V().repeat(out()).times(2).name.fold().toList()
[['ripple', 'lop']]
There may be situations where certain graphs may want a more exact data type than what Python will allow as a language.
To support these situations gremlin-python
has a few special type classes that can be imported from statics
. They
include:
from gremlin_python.statics import long # Java long
from gremlin_python.statics import timestamp # Java timestamp
from gremlin_python.statics import SingleByte # Java byte type
from gremlin_python.statics import SingleChar # Java char type
from gremlin_python.statics import GremlinType # Java Class
The following table describes the various configuration options for the Gremlin-Python Driver. They
can be passed to the Client
or DriverRemoteConnection
instance as keyword arguments:
Key | Description | Default |
---|---|---|
headers |
Additional headers that will be added to each request message. |
|
max_workers |
Maximum number of worker threads. |
Number of CPUs * 5 |
message_serializer |
The message serializer implementation. |
|
password |
The password to submit on requests that require authentication. |
"" |
pool_size |
The number of connections used by the pool. |
4 |
protocol_factory |
A callable that returns an instance of |
|
transport_factory |
A callable that returns an instance of |
|
username |
The username to submit on requests that require authentication. |
"" |
kerberized_service |
the first part of the principal name configured for the gremlin service |
""" |
session |
A unique string-based identifier (typically a UUID) to enable a session-based connection. This is not a valid configuration for |
None |
Note that the transport_factory
can allow for additional configuration of the AiohttpTransport
, which allows
pass through of the named parameters available in
AIOHTTP’s ws_connect,
and the ability to call the api from an event loop:
import ssl
...
g = traversal().withRemote(
DriverRemoteConnection('ws://localhost:8182/gremlin','g',
transport_factory=lambda: AiohttpTransport(read_timeout=10,
write_timeout=10,
heartbeat=1.0,
call_from_event_loop=True
max_content_length=100*1024*1024,
ssl_options=ssl.create_default_context(Purpose.CLIENT_AUTH))))
Compression configuration options are described in the zlib documentation. By default, compression settings are configured as shown in the above example.
In order to add and remove traversal strategies from a traversal source, Gremlin-Python has a
TraversalStrategy
class along with a collection of subclasses that mirror the standard Gremlin-Java strategies.
>>> g = g.withStrategies(SubgraphStrategy(vertices=hasLabel('person'),edges=has('weight',gt(0.5))))
>>> g.V().name.toList()
['marko', 'vadas', 'josh', 'peter']
>>> g.V().outE().elementMap().toList()
[{<T.id: 1>: 8, <T.label: 4>: 'knows', <Direction.IN: 2>: {<T.id: 1>: 4, <T.label: 4>: 'person'}, <Direction.OUT: 3>: {<T.id: 1>: 1, <T.label: 4>: 'person'}, 'weight': 1.0}]
>>> g = g.withoutStrategies(SubgraphStrategy)
>>> g.V().name.toList()
['marko', 'vadas', 'lop', 'josh', 'ripple', 'peter']
>>> g.V().outE().elementMap().toList()
[{<T.id: 1>: 9, <T.label: 4>: 'created', <Direction.IN: 2>: {<T.id: 1>: 3, <T.label: 4>: 'software'}, <Direction.OUT: 3>: {<T.id: 1>: 1, <T.label: 4>: 'person'}, 'weight': 0.4}, {<T.id: 1>: 7, <T.label: 4>: 'knows', <Direction.IN: 2>: {<T.id: 1>: 2, <T.label: 4>: 'person'}, <Direction.OUT: 3>: {<T.id: 1>: 1, <T.label: 4>: 'person'}, 'weight': 0.5}, {<T.id: 1>: 8, <T.label: 4>: 'knows', <Direction.IN: 2>: {<T.id: 1>: 4, <T.label: 4>: 'person'}, <Direction.OUT: 3>: {<T.id: 1>: 1, <T.label: 4>: 'person'}, 'weight': 1.0}, {<T.id: 1>: 10, <T.label: 4>: 'created', <Direction.IN: 2>: {<T.id: 1>: 5, <T.label: 4>: 'software'}, <Direction.OUT: 3>: {<T.id: 1>: 4, <T.label: 4>: 'person'}, 'weight': 1.0}, {<T.id: 1>: 11, <T.label: 4>: 'created', <Direction.IN: 2>: {<T.id: 1>: 3, <T.label: 4>: 'software'}, <Direction.OUT: 3>: {<T.id: 1>: 4, <T.label: 4>: 'person'}, 'weight': 0.4}, {<T.id: 1>: 12, <T.label: 4>: 'created', <Direction.IN: 2>: {<T.id: 1>: 3, <T.label: 4>: 'software'}, <Direction.OUT: 3>: {<T.id: 1>: 6, <T.label: 4>: 'person'}, 'weight': 0.2}]
>>> g = g.withComputer(workers=2,vertices=has('name','marko'))
>>> g.V().name.toList()
['marko']
>>> g.V().outE().valueMap().with_(WithOptions.tokens).toList()
[{<T.id: 1>: 9, <T.label: 4>: 'created', 'weight': 0.4}, {<T.id: 1>: 7, <T.label: 4>: 'knows', 'weight': 0.5}, {<T.id: 1>: 8, <T.label: 4>: 'knows', 'weight': 1.0}]
Note
|
Many of the TraversalStrategy classes in Gremlin-Python are proxies to the respective strategy on
Apache TinkerPop’s JVM-based Gremlin traversal machine. As such, their apply(Traversal) method does nothing. However,
the strategy is encoded in the Gremlin-Python bytecode and transmitted to the Gremlin traversal machine for
re-construction machine-side.
|
Supporting anonymous functions across languages is difficult as
most languages do not support lambda introspection and thus, code analysis. In Gremlin-Python, a Gremlin lambda should
be represented as a zero-arg callable that returns a string representation of the lambda expected for use in the
traversal. The lambda should be written as a Gremlin-Groovy`string. When the lambda is represented in `Bytecode
its
language is encoded such that the remote connection host can infer which translator and ultimate execution engine to
use.
>>> g.V().out().map(lambda: "it.get().value('name').length()").sum().toList()
[24]
Tip
|
When running into situations where Groovy cannot properly discern a method signature based on the Lambda
instance created, it will help to fully define the closure in the lambda expression - so rather than
lambda: ('it.get().value('name')','gremlin-groovy') , prefer lambda: ('x → x.get().value('name'),'gremlin-groovy') .
|
Finally, Gremlin Bytecode
that includes lambdas requires that the traversal be processed by the
ScriptEngine
. To avoid continued recompilation costs, it supports the encoding of bindings, which allow a remote
engine to to cache traversals that will be reused over and over again save that some parameterization may change. Thus,
instead of translating, compiling, and then executing each submitted bytecode, it is possible to simply execute.
>>> g.V(Bindings.of('x',1)).out('created').map(lambda: "it.get().value('name').length()").sum().toList()
[3]
>>> g.V(Bindings.of('x',4)).out('created').map(lambda: "it.get().value('name').length()").sum().toList()
[9]
Warning
|
As explained throughout the documentation, when possible avoid lambdas. |
The Client
class implementation/interface is based on the Java Driver, with some restrictions. Most notably,
Gremlin-Python does not yet implement the Cluster
class. Instead, Client
is instantiated directly.
Usage is as follows:
from gremlin_python.driver import client (1)
client = client.Client('ws://localhost:8182/gremlin', 'g') (2)
-
Import the Gremlin-Python
client
module. -
Opens a reference to
localhost
- note that there are various configuration options that can be passed to theClient
object upon instantiation as keyword arguments.
Once a Client
instance is ready, it is possible to issue some Gremlin:
result_set = client.submit('[1,2,3,4]') (1)
future_results = result_set.all() (2)
results = future_results.result() (3)
assert results == [1, 2, 3, 4] (4)
future_result_set = client.submitAsync('[1,2,3,4]') (5)
result_set = future_result_set.result() (6)
result = result_set.one() (7)
assert results == [1, 2, 3, 4] (8)
assert result_set.done.done() (9)
client.close() (10)
-
Submit a script that simply returns a
List
of integers. This method blocks until the request is written to the server and aResultSet
is constructed. -
Even though the
ResultSet
is constructed, it does not mean that the server has sent back the results (or even evaluated the script potentially). TheResultSet
is just a holder that is awaiting the results from the server. Theall
method returns aconcurrent.futures.Future
that resolves to a list when it is complete. -
Block until the the script is evaluated and results are sent back by the server.
-
Verify the result.
-
Submit the same script to the server but don’t block.
-
Wait until request is written to the server and
ResultSet
is constructed. -
Read a single result off the result stream.
-
Again, verify the result.
-
Verify that the all results have been read and stream is closed.
-
Close client and underlying pool connections.
The client.submit()
functions accept a request_options
which expects a dictionary. The request_options
provide a way to include options that are specific to the request made with the call to submit()
. A good use-case for
this feature is to set a per-request override to the evaluationTimeout
so that it only applies to the current
request.
result_set = client.submit('g.V().repeat(both()).times(100)', result_options={'evaluationTimeout': 5000})
The following options are allowed on a per-request basis in this fashion: batchSize
, requestId
, userAgent
and
evaluationTimeout
(formerly scriptEvaluationTimeout
which is also supported but now deprecated).
Important
|
The preferred method for setting a per-request timeout for scripts is demonstrated above, but those familiar
with bytecode may try g.with(EVALUATION_TIMEOUT, 500) within a script. Scripts with multiple traversals and multiple
timeouts will be interpreted as a sum of all timeouts identified in the script for that request.
|
RequestOptions options = RequestOptions.build().timeout(500).create();
List<Result> result = client.submit("g.with(EVALUATION_TIMEOUT, 500).addV().iterate();" +
"g.addV().iterate();
"g.with(EVALUATION_TIMEOUT, 500).addV();", options).all().get();
In the above example, RequestOptions
defines a timeout of 500 milliseconds, but the script has three traversals with
two internal settings for the timeout using with()
. The request timeout used by the server will therefore be 1000
milliseconds (overriding the 500 which itself was an override for whatever configuration was on the server).
Writing a Gremlin Domain Specific Language (DSL) in Python simply requires direct extension of several classes:
-
GraphTraversal
- which exposes the various steps used in traversal writing -
__
- which spawns anonymous traversals from steps -
GraphTraversalSource
- which spawnsGraphTraversal
instances
The Social DSL based on the "modern" toy graph might look like this:
class SocialTraversal(GraphTraversal):
def knows(self, person_name):
return self.out('knows').hasLabel('person').has('name', person_name)
def youngestFriendsAge(self):
return self.out('knows').hasLabel('person').values('age').min()
def createdAtLeast(self, number):
return self.outE('created').count().is_(P.gte(number))
class __(AnonymousTraversal):
graph_traversal = SocialTraversal
@classmethod
def knows(cls, *args):
return cls.graph_traversal(None, None, Bytecode()).knows(*args)
@classmethod
def youngestFriendsAge(cls, *args):
return cls.graph_traversal(None, None, Bytecode()).youngestFriendsAge(*args)
@classmethod
def createdAtLeast(cls, *args):
return cls.graph_traversal(None, None, Bytecode()).createdAtLeast(*args)
class SocialTraversalSource(GraphTraversalSource):
def __init__(self, *args, **kwargs):
super(SocialTraversalSource, self).__init__(*args, **kwargs)
self.graph_traversal = SocialTraversal
def persons(self, *args):
traversal = self.get_graph_traversal()
traversal.bytecode.add_step('V')
traversal.bytecode.add_step('hasLabel', 'person')
if len(args) > 0:
traversal.bytecode.add_step('has', 'name', P.within(args))
return traversal
Note
|
The AnonymousTraversal class above is just an alias for __ as in
from gremlin_python.process.graph_traversal import __ as AnonymousTraversal
|
Using the DSL is straightforward and just requires that the graph instance know the SocialTraversalSource
should
be used:
social = traversal(SocialTraversalSource).withRemote(DriverRemoteConnection('ws://localhost:8182/gremlin','g'))
social.persons('marko').knows('josh')
social.persons('marko').youngestFriendsAge()
social.persons().filter(__.createdAtLeast(2)).count()
Python supports meta-programming and operator overloading. There are three uses of these techniques in Gremlin-Python that makes traversals a bit more concise.
>>> g.V().both()[1:3].toList()
[v[2], v[4]]
>>> g.V().both()[1].toList()
[v[2]]
>>> g.V().both().name.toList()
['lop', 'lop', 'lop', 'vadas', 'josh', 'josh', 'josh', 'marko', 'marko', 'marko', 'peter', 'ripple']
In situations where Python reserved words and global functions overlap with standard Gremlin steps and tokens, those bits of conflicting Gremlin get an underscore appended as a suffix:
Steps - and_(), as_(), filter_(), from_(), id_(), is_(), in_(), max_(), min_(), not_(), or_(), range_(), sum_(), with_()
Tokens - Scope.global_
-
Traversals that return a
Set
might be coerced to aList
in Python. In the case of Python, number equality is different from JVM languages which produces differentSet
results when those types are in use. When this case is detected during deserialization, theSet
is coerced to aList
so that traversals return consistent results within a collection across different languages. If aSet
is needed then convertList
results toSet
manually. -
Gremlin is capable of returning
Dictionary
results that use non-hashable keys (e.g. Dictionary as a key) and Python does not support that at a language level. Using GraphSON 3.0 or GraphBinary (after 3.5.0) makes it possible to return such results. In all other cases, Gremlin that returns such results will need to be re-written to avoid that sort of key. -
The
subgraph()
-step is not supported by any variant that is not running on the Java Virtual Machine as there is noGraph
instance to deserialize a result into on the client-side. A workaround is to replace the step withaggregate(local)
and then convert those results to something the client can use locally.
The TinkerPop source code contains a simple Python script that shows a basic example of how gremlinpython works. It
can be found in GitHub here
and is designed to work best with a running Gremlin Server configured with the default
conf/gremlin-server.yaml
file as included with the standard release packaging.
pip install gremlinpython
pip install aiohttp
python example.py
Apache TinkerPop’s Gremlin.Net implements Gremlin within the C# language. It targets .NET Standard and can therefore be used on different operating systems and with different .NET frameworks, such as .NET Framework and .NET Core. Since the C# syntax is very similar to that of Java, it should be easy to switch between Gremlin-Java and Gremlin.Net. The only major syntactical difference is that all method names in Gremlin.Net use PascalCase as opposed to camelCase in Gremlin-Java in order to comply with .NET conventions.
nuget install Gremlin.Net
The pattern for connecting is described in Connecting Gremlin and it basically distills down to
creating a GraphTraversalSource
. A GraphTraversalSource
is created from the AnonymousTraversalSource.traversal()
method where the "g" provided to the DriverRemoteConnection
corresponds to the name of a GraphTraversalSource
on
the remote end.
link:../../../gremlin-dotnet/test/Gremlin.Net.IntegrationTest/Docs/Reference/GremlinVariantsTests.cs[role=include]
Some connection options can also be set on individual requests using the With()
step on the TraversalSource
.
For instance to set request timeout to 500 milliseconds:
var l = g.With(Tokens.ArgsEvalTimeout, 500).V().Out("knows").Count().ToList();
The following options are allowed on a per-request basis in this fashion: batchSize
, requestId
, userAgent
and
evaluationTimeout
(formerly scriptEvaluationTimeout
which is also supported but now deprecated). These options are
available as constants on the Gremlin.Net.Driver.Tokens
class.
There are a number of classes, functions and tokens that are typically used with Gremlin. The following imports provide most of the typical functionality required to use Gremlin:
link:../../../gremlin-dotnet/test/Gremlin.Net.IntegrationTest/Docs/Reference/GremlinVariantsTests.cs[role=include]
The connection properties for the Gremlin.Net driver can be passed to the GremlinServer
instance as keyword arguments:
Key | Description | Default |
---|---|---|
hostname |
The hostname that the driver will connect to. |
localhost |
port |
The port on which Gremlin Server can be reached. |
8182 |
enableSsl |
Determines if SSL should be enabled or not. If enabled on the server then it must be enabled on the client. |
false |
username |
The username to submit on requests that require authentication. |
none |
password |
The password to submit on requests that require authentication. |
none |
It is also possible to configure the ConnectionPool
of the Gremlin.Net driver.
These configuration options can be set as properties
on the ConnectionPoolSettings
instance that can be passed to the GremlinClient
:
Key | Description | Default |
---|---|---|
PoolSize |
The size of the connection pool. |
4 |
MaxInProcessPerConnection |
The maximum number of in-flight requests that can occur on a connection. |
32 |
ReconnectionAttempts |
The number of attempts to get an open connection from the pool to submit a request. |
4 |
ReconnectionBaseDelay |
The base delay used for the exponential backoff for the reconnection attempts. |
1 s |
A NoConnectionAvailableException
is thrown if all connections have reached the MaxInProcessPerConnection
limit
when a new request comes in.
A ServerUnavailableException
is thrown if no connection is available to the server to submit a request after
ReconnectionAttempts
retries.
The Gremlin.Net driver uses by default GraphSON 3.0 but it is also possible to use another serialization format by passing a message serializer when creating the GremlinClient
.
GraphBinary can be configured like this:
link:../../../gremlin-dotnet/test/Gremlin.Net.IntegrationTest/Docs/Reference/GremlinVariantsTests.cs[role=include]
and GraphSON 2.0 like this:
link:../../../gremlin-dotnet/test/Gremlin.Net.IntegrationTest/Docs/Reference/GremlinVariantsTests.cs[role=include]
In order to add and remove traversal strategies from a traversal source, Gremlin.Net has an AbstractTraversalStrategy
class along with a collection of subclasses that mirror the standard Gremlin-Java strategies.
link:../../../gremlin-dotnet/test/Gremlin.Net.IntegrationTest/Docs/Reference/GremlinVariantsTests.cs[role=include]
Note
|
Many of the TraversalStrategy classes in Gremlin.Net are proxies to the respective strategy on Apache TinkerPop’s
JVM-based Gremlin traversal machine. As such, their Apply(ITraversal) method does nothing. However, the strategy is
encoded in the Gremlin.Net bytecode and transmitted to the Gremlin traversal machine for re-construction machine-side.
|
To get a full understanding of this section, it would be good to start by reading the Transactions section of this documentation, which discusses transactions in the general context of TinkerPop itself. This section builds on that content by demonstrating the transactional syntax for C#.
link:../../../gremlin-dotnet/test/Gremlin.Net.IntegrationTest/Docs/Reference/GremlinVariantsTests.cs[role=include]
Supporting anonymous functions across languages is difficult as
most languages do not support lambda introspection and thus, code analysis. While Gremlin.Net doesn’t support C# lambdas, it
is still able to represent lambdas in other languages. When the lambda is represented in Bytecode
its language is encoded
such that the remote connection host can infer which translator and ultimate execution engine to use.
g.V().Out().Map<int>(Lambda.Groovy("it.get().value('name').length()")).Sum<int>().ToList(); (1)
g.V().Out().Map<int>(Lambda.Python("lambda x: len(x.get().value('name'))")).Sum<int>().ToList(); (2)
-
Lambda.Groovy()
can be used to create a Groovy lambda. -
Lambda.Python()
can be used to create a Python lambda.
The ILambda
interface returned by these two methods inherits interfaces like IFunction
and IPredicate
that mirror
their Java counterparts which makes it possible to use lambdas with Gremlin.Net for the same steps as in Gremlin-Java.
Tip
|
When running into situations where Groovy cannot properly discern a method signature based on the Lambda
instance created, it will help to fully define the closure in the lambda expression - so rather than
Lambda.Groovy("it.get().value('name')) , prefer Lambda.Groovy("x → x.get().value('name')) .
|
Gremlin scripts are sent to the server from a IGremlinClient
instance. A IGremlinClient
is created as follows:
link:../../../gremlin-dotnet/test/Gremlin.Net.IntegrationTest/Docs/Reference/GremlinVariantsTests.cs[role=include]
If the remote system has authentication and SSL enabled, then the GremlinServer
object can be configured as follows:
link:../../../gremlin-dotnet/test/Gremlin.Net.IntegrationTest/Docs/Reference/GremlinVariantsTests.cs[role=include]
It is also possible to initialize the Client
to use sessions:
var gremlinServer = new GremlinServer("localhost", 8182);
var client = new GremlinClient(gremlinServer, sessionId: Guid.NewGuid().ToString()))
The GremlinClient.Submit()
functions accept an option to build a raw RequestMessage
. A good use-case for this
feature is to set a per-request override to the evaluationTimeout
so that it only applies to the current request.
link:../../../gremlin-dotnet/test/Gremlin.Net.IntegrationTest/Docs/Reference/GremlinVariantsTests.cs[role=include]
The following options are allowed on a per-request basis in this fashion: batchSize
, requestId
, userAgent
and
evaluationTimeout
(formerly scriptEvaluationTimeout
which is also supported but now deprecated). These options are
available as constants on the Gremlin.Net.Driver.Tokens
class.
Important
|
The preferred method for setting a per-request timeout for scripts is demonstrated above, but those familiar
with bytecode may try g.with(EVALUATION_TIMEOUT, 500) within a script. Scripts with multiple traversals and multiple
timeouts will be interpreted as a sum of all timeouts identified in the script for that request.
|
Developing a Domain Specific Language (DSL) for .Net is most easily implemented using
Extension Methods
as they don’t require direct extension of classes in the TinkerPop hierarchy. Extension Method classes simply need to
be constructed for the GraphTraversal
and the GraphTraversalSource
. Unfortunately, anonymous traversals (spawned
from __
) can’t use the Extension Method approach as they do not work for static classes and static classes can’t be
extended. The only option is to re-implement the methods of __
as a wrapper in the anonymous traversal for the DSL
or to simply create a static class for the DSL and use the two anonymous traversals creators independently. The
following example uses the latter approach as it saves a lot of boilerplate code with the minor annoyance of having a
second static class to deal with when writing traversals rather than just calling __
for everything.
link:../../../gremlin-dotnet/test/Gremlin.Net.IntegrationTest/Docs/Reference/GremlinVariantsDsl.cs[role=include]
Note the creation of __Social
as the Social DSL’s "extension" to the available ways in which to spawn anonymous
traversals. The use of the double underscore prefix in the name is just a convention to consider using and is not a
requirement. To use the DSL, bring it into scope with the using
directive:
link:../../../gremlin-dotnet/test/Gremlin.Net.IntegrationTest/Docs/Reference/GremlinVariantsDslTests.cs[role=include]
and then it can be called from the application as follows:
link:../../../gremlin-dotnet/test/Gremlin.Net.IntegrationTest/Docs/Reference/GremlinVariantsDslTests.cs[role=include]
The biggest difference between Gremlin in .NET and the canonical version in Java is the casing of steps. Canonical
Gremlin utilizes camelCase
as is typical in Java for function names, but C# utilizes PascalCase
as it is more
typical in that language. Therefore, when viewing a typical Gremlin example written in Gremlin Console, the conversion
to C# usually just requires capitalization of the first letter in the step name, thus the following example in Groovy:
g.V().has('person','name','marko').
out('knows').
elementMap().toList()
would become the following in C#:
g.V().Has("Person","name","marko").
Out("knows").
ElementMap().ToList();
In addition to the uppercase change, also note the conversion of the single quotes to double quotes as is expected for declaring string values in C# and the addition of the semi-colon at the end of the line. In short, don’t forget to apply the common syntax expectations for C# when trying to convert an example of Gremlin from a different language.
Another common conversion issues lies in having to explicitly define generics, which can make canonical Gremlin appear much more complex in C# where type erasure is not a feature of the language. For example, the following example in Groovy:
g.V().repeat(__.out()).times(2).values('name')
must be written as:
g.V().Repeat(__.Out()).Times(2).Values<string>("name");
Gremlin allows for Map
instances to include null
keys, but null
keys in C# Dictionary
instances are not allowed.
It is therefore necessary to rewrite a traversal such as:
g.V().groupCount().by('age')
where "age" is not a valid key for all vertices in a way that will remove the need for a null
to be returned.
g.V().has('age').groupCount().by('age')
g.V().hasLabel('person').groupCount().by('age')
Either of the above two options accomplishes the desired goal as both prevent groupCount()
from having to process
the possibility of null
.
-
The
subgraph()
-step is not supported by any variant that is not running on the Java Virtual Machine as there is noGraph
instance to deserialize a result into on the client-side. A workaround is to replace the step withaggregate(local)
and then convert those results to something the client can use locally.
This dotnet template helps getting started with Gremlin.Net. It creates a new C# console project that shows how to connect to a Gremlin Server with Gremlin.Net.
You can install the template with the dotnet CLI tool:
dotnet new -i Gremlin.Net.Template
After the template is installed, a new project based on this template can be installed:
dotnet new gremlin
Specify the output directory for the new project which will then also be used as the name of the created project:
dotnet new gremlin -o MyFirstGremlinProject
Apache TinkerPop’s Gremlin-JavaScript implements Gremlin within the JavaScript language. It targets Node.js runtime and can be used on different operating systems on any Node.js 6 or above. Since the JavaScript naming conventions are very similar to that of Java, it should be very easy to switch between Gremlin-Java and Gremlin-JavaScript.
npm install gremlin
The pattern for connecting is described in Connecting Gremlin and it basically distills down to
creating a GraphTraversalSource
. A GraphTraversalSource
is created from the AnonymousTraversalSource.traversal()
method where the "g" provided to the DriverRemoteConnection
corresponds to the name of a GraphTraversalSource
on
the remote end.
const g = traversal().withRemote(new DriverRemoteConnection('ws://localhost:8182/gremlin'));
Gremlin-JavaScript supports plain text SASL authentication, you can set it on the connection options.
const authenticator = new gremlin.driver.auth.PlainTextSaslAuthenticator('myuser', 'mypassword');
const g = traversal().withRemote(new DriverRemoteConnection('ws://localhost:8182/gremlin', { authenticator });
Given that I/O operations in Node.js are asynchronous by default, Terminal Steps return a Promise
:
-
Traversal.toList()
: Returns aPromise
with anArray
as result value. -
Traversal.next()
: Returns aPromise
with a{ value, done }
tuple as result value, according to the async iterator proposal. -
Traversal.iterate()
: Returns aPromise
without a value.
For example:
g.V().hasLabel('person').values('name').toList()
.then(names => console.log(names));
When using async
functions it is possible to await
the promises:
const names = await g.V().hasLabel('person').values('name').toList();
console.log(names);
Some connection options can also be set on individual requests made through the using with()
step on the
TraversalSource
. For instance to set request timeout to 500 milliseconds:
const vertices = await g.with_('evaluationTimeout', 500).V().out('knows').toList()
The following options are allowed on a per-request basis in this fashion: batchSize
, requestId
, userAgent
and
evaluationTimeout
(formerly scriptEvaluationTimeout
which is also supported but now deprecated).
There are a number of classes, functions and tokens that are typically used with Gremlin. The following imports provide most of the typical functionality required to use Gremlin:
const gremlin = require('gremlin');
const traversal = gremlin.process.AnonymousTraversalSource.traversal;
const __ = gremlin.process.statics;
const DriverRemoteConnection = gremlin.driver.DriverRemoteConnection;
const column = gremlin.process.column
const direction = gremlin.process.direction
const p = gremlin.process.P
const textp = gremlin.process.TextP
const pick = gremlin.process.pick
const pop = gremlin.process.pop
const order = gremlin.process.order
const scope = gremlin.process.scope
const t = gremlin.process.t
By defining these imports it becomes possible to write Gremlin in the more shorthand, canonical style that is demonstrated in most examples found here in the documentation:
const { P: { gt } } = gremlin.process;
const { order: { desc } } = gremlin.process;
g.V().hasLabel('person').has('age',gt(30)).order().by('age',desc).toList()
To get a full understanding of this section, it would be good to start by reading the Transactions section of this documentation, which discusses transactions in the general context of TinkerPop itself. This section builds on that content by demonstrating the transactional syntax for Javascript.
const g = traversal().withRemote(new DriverRemoteConnection('ws://localhost:8182/gremlin'));
const tx = g.tx(); // create a Transaction
// spawn a new GraphTraversalSource binding all traversals established from it to tx
const gtx = tx.begin();
// execute traversals using gtx occur within the scope of the transaction held by tx. the
// tx is closed after calls to commit or rollback and cannot be re-used. simply spawn a
// new Transaction from g.tx() to create a new one as needed. the g context remains
// accessible through all this as a sessionless connection.
Promise.all([
gtx.addV("person").property("name", "jorge").iterate(),
gtx.addV("person").property("name", "josh").iterate()
]).then(() => {
return tx.commit();
}).catch(() => {
return tx.rollback();
});
Supporting anonymous functions across languages is difficult as
most languages do not support lambda introspection and thus, code analysis. In Gremlin-Javascript, a Gremlin lambda
should be represented as a zero-arg callable that returns a string representation of the lambda expected for use in the
traversal. The returned lambda should be written as a Gremlin-Groovy string. When the lambda is represented in
Bytecode
its language is encoded such that the remote connection host can infer which translator and ultimate
execution engine to use.
g.V().out().
map(() => "it.get().value('name').length()").
sum().
toList().then(total => console.log(total))
Tip
|
When running into situations where Groovy cannot properly discern a method signature based on the Lambda
instance created, it will help to fully define the closure in the lambda expression - so rather than
() ⇒ "it.get().value('name')" , prefer () ⇒ "x → x.get().value('name')" .
|
Warning
|
As explained throughout the documentation, when possible avoid lambdas. |
It is possible to submit parametrized Gremlin scripts to the server as strings, using the Client
class:
const gremlin = require('gremlin');
const client = new gremlin.driver.Client('ws://localhost:8182/gremlin', { traversalSource: 'g' });
const result1 = await client.submit('g.V(vid)', { vid: 1 });
const vertex = result1.first();
const result2 = await client.submit('g.V().hasLabel(label).tail(n)', { label: 'person', n: 3 });
// ResultSet is an iterable
for (const vertex of result2) {
console.log(vertex.id);
}
It is also possible to initialize the Client
to use sessions:
const client = new gremlin.driver.Client('ws://localhost:8182/gremlin', { traversalSource: 'g', 'session': 'unique-string-id' });
With this configuration, the state of variables within scripts are preserved between requests.
The client.submit()
functions accept a requestOptions
which expects a dictionary. The requestOptions
provide a way to include options that are specific to the request made with the call to submit()
. A good use-case for
this feature is to set a per-request override to the evaluationTimeout
so that it only applies to the current
request.
const result = await client.submit("g.V().repeat(both()).times(100)", null, { evaluationTimeout: 5000 })
The following options are allowed on a per-request basis in this fashion: batchSize
, requestId
, userAgent
and
evaluationTimeout
(formerly scriptEvaluationTimeout
which is also supported but now deprecated).
Important
|
The preferred method for setting a per-request timeout for scripts is demonstrated above, but those familiar
with bytecode may try g.with(EVALUATION_TIMEOUT, 500) within a script. Scripts with multiple traversals and multiple
timeouts will be interpreted as a sum of all timeouts identified in the script for that request.
|
Developing Gremlin DSLs in JavaScript largely requires extension of existing core classes with use of standalone functions for anonymous traversal spawning. The pattern is demonstrated in the following example:
class SocialTraversal extends GraphTraversal {
constructor(graph, traversalStrategies, bytecode) {
super(graph, traversalStrategies, bytecode);
}
aged(age) {
return this.has('person', 'age', age);
}
}
class SocialTraversalSource extends GraphTraversalSource {
constructor(graph, traversalStrategies, bytecode) {
super(graph, traversalStrategies, bytecode, SocialTraversalSource, SocialTraversal);
}
person(name) {
return this.V().has('person', 'name', name);
}
}
function anonymous() {
return new SocialTraversal(null, null, new Bytecode());
}
function aged(age) {
return anonymous().aged(age);
}
SocialTraversal
extends the core GraphTraversal
class and has a three argument constructor which is immediately
proxied to the GraphTraversal
constructor. New DSL steps are then added to this class using available steps to
construct the underlying traversal to execute as demonstrated in the aged()
step.
The SocialTraversal
is spawned from a SocialTraversalSource
which is extended from GraphTraversalSource
. Steps
added here are meant to be start steps. In the above case, the person()
start step find a "person" vertex to begin
the traversal from.
Typically, steps that are made available on a GraphTraversal
(i.e. SocialTraversal in this example) should also be
made available as spawns for anonymous traversals. The recommendation is that these steps be exposed in the module
as standalone functions. In the example above, the standalone aged()
step creates an anonymous traversal through
an anonymous()
utility function. The method for creating these standalone functions can be handled in other ways if
desired.
To use the DSL, simply initialize the g
as follows:
const g = traversal(SocialTraversalSource).withRemote(connection);
g.person('marko').aged(29).values('name').toList().
then(names => console.log(names));
In situations where Javascript reserved words and global functions overlap with standard Gremlin steps and tokens, those bits of conflicting Gremlin get an underscore appended as a suffix:
Gremlin allows for Map
instances to include null
keys, but null
keys in Javascript have some interesting behavior
as in:
> var a = { null: 'something', 'b': 'else' };
> JSON.stringify(a)
'{"null":"something","b":"else"}'
> JSON.parse(JSON.stringify(a))
{ null: 'something', b: 'else' }
> a[null]
'something'
> a['null']
'something'
This behavior needs to be considered when using Gremlin to return such results. A typical situation where this might
happen is with group()
or groupCount()
as in:
g.V().groupCount().by('age')
where "age" is not a valid key for all vertices. In these cases, it will return null
for that key and group on that.
It may bet better in Javascript to filter away those vertices to avoid the return of null
in the returned Map
:
g.V().has('age').groupCount().by('age')
g.V().hasLabel('person').groupCount().by('age')
Either of the above two options accomplishes the desired goal as both prevent groupCount()
from having to process
the possibility of null
.
-
The
subgraph()
-step is not supported by any variant that is not running on the Java Virtual Machine as there is noGraph
instance to deserialize a result into on the client-side. A workaround is to replace the step withaggregate(local)
and then convert those results to something the client can use locally.