Skip to content

Commit

Permalink
Merge branch 'feature/property-map-multi-query'
Browse files Browse the repository at this point in the history
PR #3803

Signed-off-by: Oleksandr Porunov <alexandr.porunov@gmail.com>
  • Loading branch information
porunov committed Jun 16, 2023
2 parents af4173b + 813da90 commit e7b0eea
Show file tree
Hide file tree
Showing 24 changed files with 1,976 additions and 96 deletions.
6 changes: 6 additions & 0 deletions docs/changelog.md
Original file line number Diff line number Diff line change
Expand Up @@ -189,6 +189,12 @@ GraphBinary is now used as the default MessageSerializer.hb
use `query.batch.has-step-mode = none` as replacement for `query.batch-property-prefetch = false` or use
`query.batch.has-step-mode = all_properties` as replacement for `query.batch-property-prefetch = true`.

`query.fast-property` has no influence on `values`, `properties`, `valueMap`, `propertyMap`, `elementMap` anymore when
`query.batch.enabled` is `true`. By default, those steps are configured to fetch only required properties
(with separate query per property), but the behaviour can be changed with the configuration `query.batch.properties-mode`.
In case previous behavior is desired, use `query.batch.properties-mode = required_properties_only` for `query.fast-property = false`
or use `query.batch.properties-mode = all_properties` for `query.fast-property = true`.

[Batch processing](https://docs.janusgraph.org/operations/batch-processing/) allows JanusGraph to fetch a batch of
vertices from the storage backend together instead of requesting each vertex individually which leads to a high number
of backend queries.
Expand Down
14 changes: 9 additions & 5 deletions docs/configs/janusgraph-cfg.md
Original file line number Diff line number Diff line change
Expand Up @@ -346,7 +346,10 @@ Configuration options for query processing

| Name | Description | Datatype | Default Value | Mutability |
| ---- | ---- | ---- | ---- | ---- |
| query.fast-property | Whether to pre-fetch all properties on first singular vertex property access. This can eliminate backend calls on subsequent property access for the same vertex at the expense of retrieving all properties at once. This can be expensive for vertices with many properties | Boolean | true | MASKABLE |
| query.fast-property | Whether to pre-fetch all properties on first singular vertex property access. This can eliminate backend calls on subsequent property access for the same vertex at the expense of retrieving all properties at once. This can be expensive for vertices with many properties.
This setting is applicable to direct vertex properties access (like `vertex.properties("foo")` but not to `vertex.properties("foo","bar")` because the latter case is not a singular property access).
This setting is not applicable to the next Gremlin steps: `valueMap`, `propertyMap`, `elementMap`, `properties`, `values` (configuration option `query.batch.properties-mode` should be used to configure their behavior).
When `true` this setting overwrites `query.batch.has-step-mode` to `all_properties` unless `none` mode is used. | Boolean | true | MASKABLE |
| query.force-index | Whether JanusGraph should throw an exception if a graph query cannot be answered using an index. Doing so limits the functionality of JanusGraph's graph queries but ensures that slow graph queries are avoided on large graphs. Recommended for production use of JanusGraph. | Boolean | false | MASKABLE |
| query.hard-max-limit | If smart-limit is disabled and no limit is given in the query, query optimizer adds a limit in light of possibly large result sets. It works in the same way as smart-limit except that hard-max-limit is usually a large number. Default value is Integer.MAX_VALUE which effectively disables this behavior. This option does not take effect when smart-limit is enabled. | Integer | 2147483647 | MASKABLE |
| query.ignore-unknown-index-key | Whether to ignore undefined types encountered in user-provided index queries | Boolean | false | MASKABLE |
Expand All @@ -362,17 +365,18 @@ Configuration options to configure batch queries optimization behavior
| Name | Description | Datatype | Default Value | Mutability |
| ---- | ---- | ---- | ---- | ---- |
| query.batch.enabled | Whether traversal queries should be batched when executed against the storage backend. This can lead to significant performance improvement if there is a non-trivial latency to the backend. If `false` then all other configuration options under `query.batch` namespace are ignored. | Boolean | true | MASKABLE |
| query.batch.has-step-mode | Properties pre-fetching mode for `has` step. Used only when query.batch.enabled is `true`.<br>Supported modes:<br>- `all_properties` Pre-fetch all vertex properties on any property access<br>- `required_properties_only` Pre-fetch necessary vertex properties for the whole chain of foldable `has` steps<br>- `required_and_next_properties` Prefetch the same properties as with `required_properties_only` mode, but also prefetch
| query.batch.has-step-mode | Properties pre-fetching mode for `has` step. Used only when `query.batch.enabled` is `true`.<br>Supported modes:<br>- `all_properties` - Pre-fetch all vertex properties on any property access (fetches all vertex properties in a single slice query)<br>- `required_properties_only` - Pre-fetch necessary vertex properties for the whole chain of foldable `has` steps (uses a separate slice query per each required property)<br>- `required_and_next_properties` - Prefetch the same properties as with `required_properties_only` mode, but also prefetch
properties which may be needed in the next properties access step like `values`, `properties,` `valueMap`, `elementMap`, or `propertyMap`.
In case the next step is not one of those properties access steps then this mode behaves same as `required_properties_only`.
In case the next step is one of the properties access steps with limited scope of properties, those properties will be
pre-fetched together in the same multi-query.
In case the next step is one of the properties access steps with unspecified scope of property keys then this mode
behaves same as `all_properties`.<br>- `required_and_next_properties_or_all` Prefetch the same properties as with `required_and_next_properties`, but in case the next step is not
`values`, `properties,` `valueMap`, `elementMap`, or `propertyMap` then acts like `all_properties`.<br>- `none` Skips `has` step batch properties pre-fetch optimization.<br> | String | required_and_next_properties | MASKABLE |
behaves same as `all_properties`.<br>- `required_and_next_properties_or_all` - Prefetch the same properties as with `required_and_next_properties`, but in case the next step is not
`values`, `properties,` `valueMap`, `elementMap`, or `propertyMap` then acts like `all_properties`.<br>- `none` - Skips `has` step batch properties pre-fetch optimization.<br> | String | required_and_next_properties | MASKABLE |
| query.batch.limited | Configure a maximum batch size for queries against the storage backend. This can be used to ensure responsiveness if batches tend to grow very large. The used batch size is equivalent to the barrier size of a preceding `barrier()` step. If a step has no preceding `barrier()`, the default barrier of TinkerPop will be inserted. This option only takes effect if `query.batch.enabled` is `true`. | Boolean | true | MASKABLE |
| query.batch.limited-size | Default batch size (barrier() step size) for queries. This size is applied only for cases where `LazyBarrierStrategy` strategy didn't apply `barrier` step and where user didn't apply barrier step either. This option is used only when `query.batch.limited` is `true`. Notice, value `2147483647` is considered to be unlimited. | Integer | 2500 | MASKABLE |
| query.batch.repeat-step-mode | Batch mode for `repeat` step. Used only when query.batch.enabled is `true`.<br>These modes are controlling how the child steps with batch support are behaving if they placed to the start of the `repeat`, `emit`, or `until` traversals.<br>Supported modes:<br>- `closest_repeat_parent` Child start steps are receiving vertices for batching from the closest `repeat` step parent only.<br>- `all_repeat_parents` Child start steps are receiving vertices for batching from all `repeat` step parents.<br>- `starts_only_of_all_repeat_parents` Child start steps are receiving vertices for batching from the closest `repeat` step parent (both for the parent start and for next iterations) and also from all `repeat` step parents for the parent start. | String | all_repeat_parents | MASKABLE |
| query.batch.properties-mode | Properties pre-fetching mode for `values`, `properties`, `valueMap`, `propertyMap`, `elementMap` steps. Used only when `query.batch.enabled` is `true`.<br>Supported modes:<br>- `all_properties` - Pre-fetch all vertex properties on non-singular property access (fetches all vertex properties in a single slice query). On single property access this mode behaves the same as `required_properties_only` mode.<br>- `required_properties_only` - Pre-fetch necessary vertex properties only (uses a separate slice query per each required property)<br>- `none` - Skips vertex properties pre-fetching optimization.<br> | String | required_properties_only | MASKABLE |
| query.batch.repeat-step-mode | Batch mode for `repeat` step. Used only when query.batch.enabled is `true`.<br>These modes are controlling how the child steps with batch support are behaving if they are placed to the start of the `repeat`, `emit`, or `until` traversals.<br>Supported modes:<br>- `closest_repeat_parent` - Child start steps are receiving vertices for batching from the closest `repeat` step parent only.<br>- `all_repeat_parents` - Child start steps are receiving vertices for batching from all `repeat` step parents.<br>- `starts_only_of_all_repeat_parents` - Child start steps are receiving vertices for batching from the closest `repeat` step parent (both for the parent start and for next iterations) and also from all `repeat` step parents for the parent start. | String | all_repeat_parents | MASKABLE |

### schema
Schema related configuration options
Expand Down
23 changes: 22 additions & 1 deletion docs/operations/batch-processing.md
Original file line number Diff line number Diff line change
Expand Up @@ -158,7 +158,8 @@ when the query is accessing many vertices.
Batched query processing takes into account two types of steps:

1. Batch compatible step. This is the step which will execute batch requests. Currently, the list of such steps
is the next: `out()`, `in()`, `both()`, `inE()`, `outE()`, `bothE()`, `has()`, `values()`, `properties()`.
is the next: `out()`, `in()`, `both()`, `inE()`, `outE()`, `bothE()`, `has()`, `values()`, `properties()`, `valueMap()`,
`propertyMap()`, `elementMap()`.
2. Parent step. This is a parent step which has local traversals with the same start. Such parent steps also implement the
interface `TraversalParent`. There are many such steps, but as for an example those could be: `and(...)`, `or(...)`,
`not(...)`, `order().by(...)`, `project("valueA", "valueB", "valueC").by(...).by(...).by(...)`, `union(..., ..., ...)`,
Expand Down Expand Up @@ -309,3 +310,23 @@ Currently, JanusGraph supports vertices registration for batch processing inside
step, but not between those local traversals. Also, JanusGraph doesn't register start of the `match` step with any
of the local traversals of the `match` step. Thus, performance for `match` step might be limited. This is a temporary
limitation until this feature is implemented ([see issue #3788](https://github.com/JanusGraph/janusgraph/issues/3788)).

#### Batch processing for properties

Some of the Gremlin steps with enabled optimization may prefetch vertex properties in batches.
As for now, JanusGraph uses slice queries to query part of the row data. A single-slice query contains
the start key and the end key to define a slice of data JanusGraph is interested in.
As JanusGraph doesn't support multi-range slice queries right now it can either fetch a single property
in a single Slice query or all properties in a single slice query. Thus, users have to decide the tradeoff between
different properties fetching approaches and decide when they want to fetch all properties in a single slice query
(which is usually faster but unnecessary properties might be fetched) or to fetch only requested properties in
separate slice query per each property (might be slightly slower but will fetch only the requested properties).

[See issue #3816](https://github.com/JanusGraph/janusgraph/issues/3816) which will allow fetching only requested
properties via a single slice query.

See configuration option `query.fast-property` which may be used to pre-fetch all properties on a first singular property
access when direct vertex properties are requested (for example `vertex.properties("foo")`).
See configuration option `query.batch.has-step-mode` to control properties pre-fetching behaviour for `has` step.
See configuration option `query.batch.properties-mode` to control properties pre-fetching behaviour for `values`,
`properties`, `valueMap`, `propertyMap`, and `elementMap` steps.
Loading

1 comment on commit e7b0eea

@github-actions
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Benchmark

Benchmark suite Current: e7b0eea Previous: 3a7ba53 Ratio
org.janusgraph.JanusGraphSpeedBenchmark.basicAddAndDelete 14198.69670050245 ms/op 20132.68260361965 ms/op 0.71
org.janusgraph.GraphCentricQueryBenchmark.getVertices 1378.3106312802922 ms/op 1617.8956294193085 ms/op 0.85
org.janusgraph.MgmtOlapJobBenchmark.runClearIndex 219.74754926086956 ms/op 222.76279991304347 ms/op 0.99
org.janusgraph.MgmtOlapJobBenchmark.runReindex 467.9631437512122 ms/op 541.8996275500001 ms/op 0.86
org.janusgraph.JanusGraphSpeedBenchmark.basicCount 376.61225927174604 ms/op 368.0788711730627 ms/op 1.02
org.janusgraph.CQLMultiQueryBenchmark.getIdToOutVerticesProjection 417.61720202726184 ms/op
org.janusgraph.CQLMultiQueryBenchmark.getElementsWithUsingEmitRepeatSteps 30485.29176928849 ms/op
org.janusgraph.CQLMultiQueryBenchmark.getAllElementsTraversedFromOuterVertex 15035.469636326356 ms/op
org.janusgraph.CQLMultiQueryBenchmark.getNeighborNames 15012.990749081673 ms/op 19824.388658859913 ms/op 0.76
org.janusgraph.CQLMultiQueryBenchmark.getVerticesWithDoubleUnion 614.4273777856383 ms/op
org.janusgraph.CQLMultiQueryBenchmark.getElementsWithUsingRepeatUntilSteps 15829.198928864762 ms/op
org.janusgraph.CQLMultiQueryBenchmark.getAdjacentVerticesLocalCounts 15474.56831800046 ms/op
org.janusgraph.CQLMultiQueryBenchmark.getNames 14795.141353185782 ms/op 19643.66567439762 ms/op 0.75
org.janusgraph.CQLMultiQueryBenchmark.getVerticesFilteredByAndStep 686.2781792317747 ms/op
org.janusgraph.CQLMultiQueryBenchmark.getVerticesFromMultiNestedRepeatStepStartingFromSingleVertex 21260.009435516116 ms/op
org.janusgraph.CQLMultiQueryBenchmark.getVerticesWithCoalesceUsage 579.0983303444897 ms/op

This comment was automatically generated by workflow using github-action-benchmark.

Please sign in to comment.