Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add batch query support for drop step [tp-tests] #4456

Merged
merged 1 commit into from
Oct 30, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 26 additions & 0 deletions docs/changelog.md
Original file line number Diff line number Diff line change
Expand Up @@ -128,6 +128,32 @@ Example: `storage.berkeleyje.ext.je.lock.timeout=5000 ms`
For simplicity JSON schema initialization options has been added into JanusGraph.
See [documentation](./schema/schema-init-strategies.md) to learn more about JSON schema initialization process.

##### Batched Queries Enhancement: Introduction of `JanusGraphNoOpBarrierVertexOnlyStep`

In previous versions, when a query that could benefit from batch-query optimization (multi-query) was executed without
a user-defined barrier step, JanusGraph would inject a `NoOpBarrierStep` by default. This approach allowed batching
for edges and properties, which do not gain advantages from multi-query optimization.

Starting with JanusGraph 1.1.0, this behavior has been improved. The system now injects a
`JanusGraphNoOpBarrierVertexOnlyStep` instead of the standard `NoOpBarrierStep` when no barrier steps are detected.
This change ensures that batching is applied exclusively to vertices, which do benefit from batch queries,
while excluding edges and properties from the batching process.

If a user explicitly defines a `.barrier()` step in the query, the system will continue to use the `NoOpBarrierStep` as expected.

##### Batch Query Optimizations Now Support Traversals Containing the `drop()` Step

Starting with JanusGraph 1.1.0, batch optimizations for vertex removal have been introduced in the `drop()` step and
are enabled by default. Previously, any batch optimization would be skipped for queries containing at least one
`drop()` step. However, with this update, such queries are now eligible for batch query optimization (multi-query).

Please note that the `LazyBarrierStrategy` (a TinkerPop strategy) is disabled for any query that includes at least one `drop()` step.

To disable the `drop()` step optimization and maintain the previous behavior, users can set the following configuration:
```
query.batch.drop-step-mode=none
```

### Version 1.0.1 (Release Date: ???)

/// tab | Maven
Expand Down
1 change: 1 addition & 0 deletions docs/configs/janusgraph-cfg.md
Original file line number Diff line number Diff line change
Expand Up @@ -366,6 +366,7 @@ Configuration options to configure batch queries optimization behavior

| Name | Description | Datatype | Default Value | Mutability |
| ---- | ---- | ---- | ---- | ---- |
| query.batch.drop-step-mode | Batching mode for `drop()` step. Used only when `query.batch.enabled` is `true`.<br>Supported modes:<br>- `all` - Drops all vertices in a batch.<br>- `none` - Skips drop batching optimization.<br> | String | all | MASKABLE |
| query.batch.enabled | Whether traversal queries should be batched when executed against the storage backend. This can lead to significant performance improvement if there is a non-trivial latency to the backend. If `false` then all other configuration options under `query.batch` namespace are ignored. | Boolean | true | MASKABLE |
| query.batch.has-step-mode | Properties pre-fetching mode for `has` step. Used only when `query.batch.enabled` is `true`.<br>Supported modes:<br>- `all_properties` - Pre-fetch all vertex properties on any property access (fetches all vertex properties in a single slice query)<br>- `required_properties_only` - Pre-fetch necessary vertex properties for the whole chain of foldable `has` steps (uses a separate slice query per each required property)<br>- `required_and_next_properties` - Prefetch the same properties as with `required_properties_only` mode, but also prefetch<br>properties which may be needed in the next properties access step like `values`, `properties,` `valueMap`, `elementMap`, or `propertyMap`.<br>In case the next step is not one of those properties access steps then this mode behaves same as `required_properties_only`.<br>In case the next step is one of the properties access steps with limited scope of properties, those properties will be<br>pre-fetched together in the same multi-query.<br>In case the next step is one of the properties access steps with unspecified scope of property keys then this mode<br>behaves same as `all_properties`.<br>- `required_and_next_properties_or_all` - Prefetch the same properties as with `required_and_next_properties`, but in case the next step is not<br>`values`, `properties,` `valueMap`, `elementMap`, or `propertyMap` then acts like `all_properties`.<br>- `none` - Skips `has` step batch properties pre-fetch optimization.<br> | String | required_and_next_properties | MASKABLE |
| query.batch.label-step-mode | Labels pre-fetching mode for `label()` step. Used only when `query.batch.enabled` is `true`.<br>Supported modes:<br>- `all` - Pre-fetch labels for all vertices in a batch.<br>- `none` - Skips vertex labels pre-fetching optimization.<br> | String | all | MASKABLE |
Expand Down
3 changes: 2 additions & 1 deletion docs/operations/batch-processing.md
Original file line number Diff line number Diff line change
Expand Up @@ -159,7 +159,7 @@ Batched query processing takes into account two types of steps:

1. Batch compatible step. This is the step which will execute batch requests. Currently, the list of such steps
is the next: `out()`, `in()`, `both()`, `inE()`, `outE()`, `bothE()`, `has()`, `values()`, `properties()`, `valueMap()`,
`propertyMap()`, `elementMap()`, `label()`.
`propertyMap()`, `elementMap()`, `label()`, `drop()`.
2. Parent step. This is a parent step which has local traversals with the same start. Such parent steps also implement the
interface `TraversalParent`. There are many such steps, but as for an example those could be: `and(...)`, `or(...)`,
`not(...)`, `order().by(...)`, `project("valueA", "valueB", "valueC").by(...).by(...).by(...)`, `union(..., ..., ...)`,
Expand Down Expand Up @@ -331,3 +331,4 @@ See configuration option `query.batch.has-step-mode` to control properties pre-f
See configuration option `query.batch.properties-mode` to control properties pre-fetching behaviour for `values`,
`properties`, `valueMap`, `propertyMap`, and `elementMap` steps.
See configuration option `query.batch.label-step-mode` to control labels pre-fetching behaviour for `label` step.
See configuration option `query.batch.drop-step-mode` to control drop batching behaviour for `drop` step.
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@
import org.apache.tinkerpop.gremlin.process.traversal.dsl.graph.GraphTraversal;
import org.apache.tinkerpop.gremlin.process.traversal.dsl.graph.GraphTraversalSource;
import org.apache.tinkerpop.gremlin.process.traversal.dsl.graph.__;
import org.apache.tinkerpop.gremlin.process.traversal.step.filter.DropStep;
import org.apache.tinkerpop.gremlin.process.traversal.step.filter.HasStep;
import org.apache.tinkerpop.gremlin.process.traversal.step.util.WithOptions;
import org.apache.tinkerpop.gremlin.process.traversal.strategy.decoration.SubgraphStrategy;
Expand Down Expand Up @@ -60,6 +61,7 @@
import org.janusgraph.core.PropertyKey;
import org.janusgraph.core.RelationType;
import org.janusgraph.core.SchemaViolationException;
import org.janusgraph.core.Transaction;
import org.janusgraph.core.VertexLabel;
import org.janusgraph.core.VertexList;
import org.janusgraph.core.attribute.Cmp;
Expand Down Expand Up @@ -138,10 +140,12 @@
import org.janusgraph.graphdb.relations.StandardVertexProperty;
import org.janusgraph.graphdb.serializer.SpecialInt;
import org.janusgraph.graphdb.serializer.SpecialIntSerializer;
import org.janusgraph.graphdb.tinkerpop.optimize.step.JanusGraphDropStep;
import org.janusgraph.graphdb.tinkerpop.optimize.step.JanusGraphElementMapStep;
import org.janusgraph.graphdb.tinkerpop.optimize.step.JanusGraphHasStep;
import org.janusgraph.graphdb.tinkerpop.optimize.step.JanusGraphPropertiesStep;
import org.janusgraph.graphdb.tinkerpop.optimize.step.JanusGraphPropertyMapStep;
import org.janusgraph.graphdb.tinkerpop.optimize.strategy.MultiQueryDropStepStrategyMode;
import org.janusgraph.graphdb.tinkerpop.optimize.strategy.MultiQueryHasStepStrategyMode;
import org.janusgraph.graphdb.tinkerpop.optimize.strategy.MultiQueryLabelStepStrategyMode;
import org.janusgraph.graphdb.tinkerpop.optimize.strategy.MultiQueryPropertiesStrategyMode;
Expand Down Expand Up @@ -220,6 +224,7 @@
import static org.janusgraph.graphdb.configuration.GraphDatabaseConfiguration.DB_CACHE;
import static org.janusgraph.graphdb.configuration.GraphDatabaseConfiguration.DB_CACHE_CLEAN_WAIT;
import static org.janusgraph.graphdb.configuration.GraphDatabaseConfiguration.DB_CACHE_TIME;
import static org.janusgraph.graphdb.configuration.GraphDatabaseConfiguration.DROP_STEP_BATCH_MODE;
import static org.janusgraph.graphdb.configuration.GraphDatabaseConfiguration.FORCE_INDEX_USAGE;
import static org.janusgraph.graphdb.configuration.GraphDatabaseConfiguration.HARD_MAX_LIMIT;
import static org.janusgraph.graphdb.configuration.GraphDatabaseConfiguration.HAS_STEP_BATCH_MODE;
Expand Down Expand Up @@ -10074,11 +10079,7 @@ public void testMultiQueryDropsVertices() {

int verticesAmount = 42;

for (int i = 0; i < verticesAmount; i++) {
Vertex vertex = tx.addVertex("id", i);
vertex.property("name", "name_test");
vertex.property("details", "details_" + i);
}
addVerticesForDropTest(verticesAmount, tx);

clopen();

Expand All @@ -10090,20 +10091,161 @@ public void testMultiQueryDropsVertices() {
.map(v -> (JanusGraphVertex) v)
.collect(Collectors.toList());

int actualCount = tx.multiQuery(vertices).drop();
int actualCount = tx.multiQuery(vertices).drop().size();
clopen();

assertEquals(verticesAmount, actualCount);

int afterDropCount = tx.traversal()
.V()
.has("name", "name_test")
.toList()
.size();
long afterDropCount = getVerticesForDropTestCount(tx.traversal());

assertEquals(0, afterDropCount);
}

@Test
public void testMultiQueryDropsStrategyModes() {

mgmt.makePropertyKey("id").dataType(Integer.class).cardinality(Cardinality.SINGLE).make();
PropertyKey nameProp = mgmt.makePropertyKey("name").dataType(String.class).cardinality(Cardinality.SINGLE).make();
mgmt.makePropertyKey("details").dataType(String.class).cardinality(Cardinality.SINGLE).make();
mgmt.buildIndex("nameIndex", Vertex.class).addKey(nameProp).buildCompositeIndex();

finishSchema();

long verticesAmount = 42;

// Mode: NONE

addVerticesForDropTest(verticesAmount);
graph.tx().commit();
clopen(option(DROP_STEP_BATCH_MODE), MultiQueryDropStepStrategyMode.NONE.getConfigName());
assertEquals(verticesAmount, getVerticesForDropTestCount());
TraversalMetrics profileT = graph.traversal().V().drop().profile().next();
assertTrue(profileT.getMetrics().stream().anyMatch(metrics -> metrics.getName().equals(DropStep.class.getSimpleName())));
graph.tx().commit();
assertEquals(0, getVerticesForDropTestCount());

// Mode: ALL

addVerticesForDropTest(verticesAmount);
graph.tx().commit();
clopen(option(DROP_STEP_BATCH_MODE), MultiQueryDropStepStrategyMode.ALL.getConfigName());
assertEquals(verticesAmount, getVerticesForDropTestCount());
profileT = graph.traversal().V().drop().profile().next();
assertEquals("true", profileT.getMetrics().stream().filter(metrics -> metrics.getName().equals(JanusGraphDropStep.class.getSimpleName())).findAny().get().getAnnotation("multi"));
graph.tx().commit();
assertEquals(0, getVerticesForDropTestCount());

// `limit` with `drop` step.

addVerticesForDropTest(verticesAmount);
graph.tx().commit();
clopen(option(DROP_STEP_BATCH_MODE), MultiQueryDropStepStrategyMode.NONE.getConfigName());
assertEquals(verticesAmount, getVerticesForDropTestCount());
int limitSize = 2;
profileT = graph.traversal().V().limit(limitSize).drop().profile().next();
assertTrue(profileT.getMetrics().stream().anyMatch(metrics -> metrics.getName().equals(DropStep.class.getSimpleName())));
graph.tx().commit();
long afterDropCount = getVerticesForDropTestCount();
assertEquals(verticesAmount-limitSize, afterDropCount);
graph.traversal().V().drop().iterate();
graph.tx().commit();
addVerticesForDropTest(verticesAmount);
graph.tx().commit();
clopen(option(DROP_STEP_BATCH_MODE), MultiQueryDropStepStrategyMode.ALL.getConfigName());
assertEquals(verticesAmount, getVerticesForDropTestCount());
profileT = graph.traversal().V().limit(limitSize).drop().profile().next();
assertEquals("true", profileT.getMetrics().stream().filter(metrics -> metrics.getName().equals(JanusGraphDropStep.class.getSimpleName())).findAny().get().getAnnotation("multi"));
graph.tx().commit();
afterDropCount = getVerticesForDropTestCount();
assertEquals(verticesAmount-limitSize, afterDropCount);
}

@Test
public void testMetaPropertiesDrop(){
mgmt.makePropertyKey("id").dataType(Integer.class).cardinality(Cardinality.SINGLE).make();
PropertyKey nameProp = mgmt.makePropertyKey("name").dataType(String.class).cardinality(Cardinality.SINGLE).make();
mgmt.makePropertyKey("details").dataType(String.class).cardinality(Cardinality.SINGLE).make();
mgmt.buildIndex("nameIndex", Vertex.class).addKey(nameProp).buildCompositeIndex();

finishSchema();

long verticesAmount = 42;

for (int i = 0; i < verticesAmount; i++) {
Vertex vertex = tx.addVertex("id", i);
VertexProperty<String> property = vertex.property("name", "name_test");
property.property("details", "details_" + i);
}
graph.tx().commit();

clopen(option(DROP_STEP_BATCH_MODE), MultiQueryDropStepStrategyMode.ALL.getConfigName());

assertEquals(verticesAmount, graph.traversal().V().properties("name").properties("details").count().next());

graph.traversal().V().properties("name").properties("details").drop().hasNext();

assertEquals(0, graph.traversal().V().properties("name").properties("details").count().next());

graph.tx().commit();

assertEquals(0, graph.traversal().V().properties("name").properties("details").count().next());
assertEquals(verticesAmount, graph.traversal().V().has("name").count().next());
}

@Test
public void testEdgePropertiesDrop(){
mgmt.makePropertyKey("id").dataType(Integer.class).cardinality(Cardinality.SINGLE).make();
mgmt.makePropertyKey("name").dataType(String.class).cardinality(Cardinality.SINGLE).make();
mgmt.makeEdgeLabel("relate").make();

finishSchema();

long verticesAmount = 42;

for (int i = 0; i < verticesAmount; i++) {
Vertex vertex = tx.addVertex("id", i);
Vertex vertex2 = tx.addVertex("id", i+verticesAmount);
vertex.addEdge("relate", vertex2).property("name", "name_"+i);
}

graph.tx().commit();

clopen(option(DROP_STEP_BATCH_MODE), MultiQueryDropStepStrategyMode.ALL.getConfigName());

assertEquals(verticesAmount, graph.traversal().E().properties("name").count().next());

graph.traversal().E().properties("name").drop().hasNext();

assertEquals(0, graph.traversal().E().properties("name").count().next());

graph.tx().commit();

assertEquals(0, graph.traversal().E().properties("name").count().next());
assertEquals(verticesAmount, graph.traversal().E().count().next());
}

private void addVerticesForDropTest(long verticesAmount){
addVerticesForDropTest(verticesAmount, graph);
}

private long getVerticesForDropTestCount(){
return getVerticesForDropTestCount(graph.traversal());
}

private void addVerticesForDropTest(long verticesAmount, Transaction tx){
for (int i = 0; i < verticesAmount; i++) {
Vertex vertex = tx.addVertex("id", i);
vertex.property("name", "name_test");
vertex.property("details", "details_" + i);
}
}

private long getVerticesForDropTestCount(GraphTraversalSource g){
return g.V()
.has("name", "name_test")
.count().next();
}

@ParameterizedTest
@ValueSource(booleans = {true, false})
public void testParallelBackendOps(boolean parallelBackendOpsEnabled) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@
import org.janusgraph.diskstorage.configuration.WriteConfiguration;
import org.janusgraph.diskstorage.cql.CQLConfigOptions;
import org.janusgraph.graphdb.configuration.GraphDatabaseConfiguration;
import org.janusgraph.graphdb.tinkerpop.optimize.strategy.MultiQueryDropStepStrategyMode;
import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.BenchmarkMode;
import org.openjdk.jmh.annotations.Fork;
Expand Down Expand Up @@ -65,6 +66,7 @@ public WriteConfiguration getConfiguration() {
config.set(GraphDatabaseConfiguration.STORAGE_BACKEND,"cql");
config.set(CQLConfigOptions.LOCAL_DATACENTER, "dc1");
config.set(GraphDatabaseConfiguration.USE_MULTIQUERY, true);
config.set(GraphDatabaseConfiguration.DROP_STEP_BATCH_MODE, MultiQueryDropStepStrategyMode.NONE.getConfigName());
return config.getConfiguration();
}

Expand Down Expand Up @@ -103,7 +105,7 @@ public Integer dropVertices() {
.map(v -> (JanusGraphVertex) v)
.collect(Collectors.toList());

dropCount = tx.multiQuery(vertices).drop();
dropCount = tx.multiQuery(vertices).drop().size();
} else {
dropCount = tx.traversal()
.V()
Expand All @@ -117,6 +119,27 @@ public Integer dropVertices() {
return dropCount;
}

@Benchmark
public Integer dropVerticesGremlinQuery() {

JanusGraphTransaction tx;
if (isMultiDrop) {
tx = graph.buildTransaction().setDropStepStrategyMode(MultiQueryDropStepStrategyMode.ALL).start();
} else {
tx = graph.buildTransaction().setDropStepStrategyMode(MultiQueryDropStepStrategyMode.NONE).start();
}

Integer dropCount = tx.traversal()
.V()
.has("name", "name_test")
.drop()
.toList()
.size();

tx.rollback();
return dropCount;
}

private void addVertices() {
for (int i = 0; i < verticesAmount; i++) {
Vertex vertex = graph.addVertex("id", i);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,6 @@ public interface JanusGraphMultiVertexQuery<Q extends JanusGraphMultiVertexQuery
*/
JanusGraphMultiVertexQuery addAllVertices(Collection<? extends Vertex> vertices);


@Override
Q adjacent(Vertex vertex);

Expand Down Expand Up @@ -156,7 +155,8 @@ public interface JanusGraphMultiVertexQuery<Q extends JanusGraphMultiVertexQuery
/**
* Drops all vertices that match this query
*
* @return Count of dropped vertices
* @return Map of vertices and their relations which were dropped
*/
Integer drop();
Map<JanusGraphVertex, Iterable<JanusGraphRelation>> drop();

}
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
package org.janusgraph.core;

import org.janusgraph.graphdb.configuration.GraphDatabaseConfiguration;
import org.janusgraph.graphdb.tinkerpop.optimize.strategy.MultiQueryDropStepStrategyMode;
import org.janusgraph.graphdb.tinkerpop.optimize.strategy.MultiQueryHasStepStrategyMode;
import org.janusgraph.graphdb.tinkerpop.optimize.strategy.MultiQueryLabelStepStrategyMode;
import org.janusgraph.graphdb.tinkerpop.optimize.strategy.MultiQueryPropertiesStrategyMode;
Expand Down Expand Up @@ -186,6 +187,15 @@ public interface TransactionBuilder {
*/
TransactionBuilder setLabelsStepStrategyMode(MultiQueryLabelStepStrategyMode labelStepStrategyMode);

/**
* Sets `drop` step strategy mode.
* <p>
* Doesn't have any effect if multi-query was disabled via config `query.batch.enabled = false`.
*
* @return Object with the set drop strategy mode settings
*/
TransactionBuilder setDropStepStrategyMode(MultiQueryDropStepStrategyMode dropStepStrategyMode);

/**
* Sets the group name for this transaction which provides a way for gathering
* reporting on multiple transactions into one group.
Expand Down
Loading
Loading