Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable multiQuery optimization for PropertyMapStep and ElementMapStep [cql-tests] [tp-tests] #3803

Merged
merged 1 commit into from
Jun 16, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions docs/changelog.md
Original file line number Diff line number Diff line change
Expand Up @@ -189,6 +189,12 @@ GraphBinary is now used as the default MessageSerializer.hb
use `query.batch.has-step-mode = none` as replacement for `query.batch-property-prefetch = false` or use
`query.batch.has-step-mode = all_properties` as replacement for `query.batch-property-prefetch = true`.

`query.fast-property` has no influence on `values`, `properties`, `valueMap`, `propertyMap`, `elementMap` anymore when
`query.batch.enabled` is `true`. By default, those steps are configured to fetch only required properties
(with separate query per property), but the behaviour can be changed with the configuration `query.batch.properties-mode`.
In case previous behavior is desired, use `query.batch.properties-mode = required_properties_only` for `query.fast-property = false`
or use `query.batch.properties-mode = all_properties` for `query.fast-property = true`.

[Batch processing](https://docs.janusgraph.org/operations/batch-processing/) allows JanusGraph to fetch a batch of
vertices from the storage backend together instead of requesting each vertex individually which leads to a high number
of backend queries.
Expand Down
14 changes: 9 additions & 5 deletions docs/configs/janusgraph-cfg.md
Original file line number Diff line number Diff line change
Expand Up @@ -346,7 +346,10 @@ Configuration options for query processing

| Name | Description | Datatype | Default Value | Mutability |
| ---- | ---- | ---- | ---- | ---- |
| query.fast-property | Whether to pre-fetch all properties on first singular vertex property access. This can eliminate backend calls on subsequent property access for the same vertex at the expense of retrieving all properties at once. This can be expensive for vertices with many properties | Boolean | true | MASKABLE |
| query.fast-property | Whether to pre-fetch all properties on first singular vertex property access. This can eliminate backend calls on subsequent property access for the same vertex at the expense of retrieving all properties at once. This can be expensive for vertices with many properties.
This setting is applicable to direct vertex properties access (like `vertex.properties("foo")` but not to `vertex.properties("foo","bar")` because the latter case is not a singular property access).
This setting is not applicable to the next Gremlin steps: `valueMap`, `propertyMap`, `elementMap`, `properties`, `values` (configuration option `query.batch.properties-mode` should be used to configure their behavior).
When `true` this setting overwrites `query.batch.has-step-mode` to `all_properties` unless `none` mode is used. | Boolean | true | MASKABLE |
| query.force-index | Whether JanusGraph should throw an exception if a graph query cannot be answered using an index. Doing so limits the functionality of JanusGraph's graph queries but ensures that slow graph queries are avoided on large graphs. Recommended for production use of JanusGraph. | Boolean | false | MASKABLE |
| query.hard-max-limit | If smart-limit is disabled and no limit is given in the query, query optimizer adds a limit in light of possibly large result sets. It works in the same way as smart-limit except that hard-max-limit is usually a large number. Default value is Integer.MAX_VALUE which effectively disables this behavior. This option does not take effect when smart-limit is enabled. | Integer | 2147483647 | MASKABLE |
| query.ignore-unknown-index-key | Whether to ignore undefined types encountered in user-provided index queries | Boolean | false | MASKABLE |
Expand All @@ -362,17 +365,18 @@ Configuration options to configure batch queries optimization behavior
| Name | Description | Datatype | Default Value | Mutability |
| ---- | ---- | ---- | ---- | ---- |
| query.batch.enabled | Whether traversal queries should be batched when executed against the storage backend. This can lead to significant performance improvement if there is a non-trivial latency to the backend. If `false` then all other configuration options under `query.batch` namespace are ignored. | Boolean | true | MASKABLE |
| query.batch.has-step-mode | Properties pre-fetching mode for `has` step. Used only when query.batch.enabled is `true`.<br>Supported modes:<br>- `all_properties` Pre-fetch all vertex properties on any property access<br>- `required_properties_only` Pre-fetch necessary vertex properties for the whole chain of foldable `has` steps<br>- `required_and_next_properties` Prefetch the same properties as with `required_properties_only` mode, but also prefetch
| query.batch.has-step-mode | Properties pre-fetching mode for `has` step. Used only when query.batch.enabled is `true`.<br>Supported modes:<br>- `all_properties` - Pre-fetch all vertex properties on any property access (fetches all vertex properties in a single slice query)<br>- `required_properties_only` - Pre-fetch necessary vertex properties for the whole chain of foldable `has` steps (uses a separate slice query per each required property)<br>- `required_and_next_properties` - Prefetch the same properties as with `required_properties_only` mode, but also prefetch
properties which may be needed in the next properties access step like `values`, `properties,` `valueMap`, `elementMap`, or `propertyMap`.
In case the next step is not one of those properties access steps then this mode behaves same as `required_properties_only`.
In case the next step is one of the properties access steps with limited scope of properties, those properties will be
pre-fetched together in the same multi-query.
In case the next step is one of the properties access steps with unspecified scope of property keys then this mode
behaves same as `all_properties`.<br>- `required_and_next_properties_or_all` Prefetch the same properties as with `required_and_next_properties`, but in case the next step is not
`values`, `properties,` `valueMap`, `elementMap`, or `propertyMap` then acts like `all_properties`.<br>- `none` Skips `has` step batch properties pre-fetch optimization.<br> | String | required_and_next_properties | MASKABLE |
behaves same as `all_properties`.<br>- `required_and_next_properties_or_all` - Prefetch the same properties as with `required_and_next_properties`, but in case the next step is not
`values`, `properties,` `valueMap`, `elementMap`, or `propertyMap` then acts like `all_properties`.<br>- `none` - Skips `has` step batch properties pre-fetch optimization.<br> | String | required_and_next_properties | MASKABLE |
| query.batch.limited | Configure a maximum batch size for queries against the storage backend. This can be used to ensure responsiveness if batches tend to grow very large. The used batch size is equivalent to the barrier size of a preceding `barrier()` step. If a step has no preceding `barrier()`, the default barrier of TinkerPop will be inserted. This option only takes effect if `query.batch.enabled` is `true`. | Boolean | true | MASKABLE |
| query.batch.limited-size | Default batch size (barrier() step size) for queries. This size is applied only for cases where `LazyBarrierStrategy` strategy didn't apply `barrier` step and where user didn't apply barrier step either. This option is used only when `query.batch.limited` is `true`. Notice, value `2147483647` is considered to be unlimited. | Integer | 2500 | MASKABLE |
| query.batch.repeat-step-mode | Batch mode for `repeat` step. Used only when query.batch.enabled is `true`.<br>These modes are controlling how the child steps with batch support are behaving if they placed to the start of the `repeat`, `emit`, or `until` traversals.<br>Supported modes:<br>- `closest_repeat_parent` Child start steps are receiving vertices for batching from the closest `repeat` step parent only.<br>- `all_repeat_parents` Child start steps are receiving vertices for batching from all `repeat` step parents.<br>- `starts_only_of_all_repeat_parents` Child start steps are receiving vertices for batching from the closest `repeat` step parent (both for the parent start and for next iterations) and also from all `repeat` step parents for the parent start. | String | all_repeat_parents | MASKABLE |
| query.batch.properties-mode | Properties pre-fetching mode for `values`, `properties`, `valueMap`, `propertyMap`, `elementMap` steps. Used only when query.batch.enabled is `true`.<br>Supported modes:<br>- `all_properties` - Pre-fetch all vertex properties on non-singular property access (fetches all vertex properties in a single slice query). On single property access this mode behaves the same as `required_properties_only` mode.<br>- `required_properties_only` - Pre-fetch necessary vertex properties only (uses a separate slice query per each required property)<br>- `none` - Skips vertex properties pre-fetching optimization.<br> | String | required_properties_only | MASKABLE |
porunov marked this conversation as resolved.
Show resolved Hide resolved
| query.batch.repeat-step-mode | Batch mode for `repeat` step. Used only when query.batch.enabled is `true`.<br>These modes are controlling how the child steps with batch support are behaving if they placed to the start of the `repeat`, `emit`, or `until` traversals.<br>Supported modes:<br>- `closest_repeat_parent` - Child start steps are receiving vertices for batching from the closest `repeat` step parent only.<br>- `all_repeat_parents` - Child start steps are receiving vertices for batching from all `repeat` step parents.<br>- `starts_only_of_all_repeat_parents` - Child start steps are receiving vertices for batching from the closest `repeat` step parent (both for the parent start and for next iterations) and also from all `repeat` step parents for the parent start. | String | all_repeat_parents | MASKABLE |
porunov marked this conversation as resolved.
Show resolved Hide resolved

### schema
Schema related configuration options
Expand Down
23 changes: 22 additions & 1 deletion docs/operations/batch-processing.md
Original file line number Diff line number Diff line change
Expand Up @@ -158,7 +158,8 @@ when the query is accessing many vertices.
Batched query processing takes into account two types of steps:

1. Batch compatible step. This is the step which will execute batch requests. Currently, the list of such steps
is the next: `out()`, `in()`, `both()`, `inE()`, `outE()`, `bothE()`, `has()`, `values()`, `properties()`.
is the next: `out()`, `in()`, `both()`, `inE()`, `outE()`, `bothE()`, `has()`, `values()`, `properties()`, `valueMap()`,
`propertyMap()`, `elementMap()`.
2. Parent step. This is a parent step which has local traversals with the same start. Such parent steps also implement the
interface `TraversalParent`. There are many such steps, but as for an example those could be: `and(...)`, `or(...)`,
`not(...)`, `order().by(...)`, `project("valueA", "valueB", "valueC").by(...).by(...).by(...)`, `union(..., ..., ...)`,
Expand Down Expand Up @@ -309,3 +310,23 @@ Currently, JanusGraph supports vertices registration for batch processing inside
step, but not between those local traversals. Also, JanusGraph doesn't register start of the `match` step with any
of the local traversals of the `match` step. Thus, performance for `match` step might be limited. This is a temporary
limitation until this feature is implemented ([see issue #3788](https://github.com/JanusGraph/janusgraph/issues/3788)).

#### Batch processing for properties

Some of the Gremlin steps with enabled optimization may prefetch vertex properties in batches.
As for now, JanusGraph uses slice queries to query part of the row data. A single-slice query contains
the start key and the end key to define a slice of data JanusGraph is interested in.
As JanusGraph doesn't support multi-range slice queries right now it can either fetch a single property
in a single Slice query or all properties in a single slice query. Thus, users has to decide the tradeoff between
porunov marked this conversation as resolved.
Show resolved Hide resolved
different properties fetching approaches and decide when they want to fetch all properties in a single slice query
(which is usually faster but unnecessary properties might be fetched) or to fetch only requested properties in
separate slice query per each property (might be slightly slower but will fetch only the requested properties).

[See issue #3816](https://github.com/JanusGraph/janusgraph/issues/3816) which will allow fetching only requested
properties via a single slice query.

See configuration option `query.fast-property` which may be used to pre-fetch all properties on a first singular property
access when direct vertex properties are requested (for example `vertex.properties("foo")`).
See configuration option `query.batch.has-step-mode` to control properties pre-fetching behaviour for `has` step.
See configuration option `query.batch.properties-mode` to control properties pre-fetching behaviour for `values`,
`properties`, `valueMap`, `propertyMap`, and `elementMap` steps.
Loading