Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

describe optimizer hints in SPARQL Service Description #181

Open
VladimirAlexiev opened this issue Jan 4, 2023 · 0 comments
Open

describe optimizer hints in SPARQL Service Description #181

VladimirAlexiev opened this issue Jan 4, 2023 · 0 comments

Comments

@VladimirAlexiev
Copy link
Contributor

VladimirAlexiev commented Jan 4, 2023

Why?

See motivation in #71.
Additionally, it would be useful to have a standardized catalog of hints (if such a thing is possible), and a SPARQL endpoint should advertise them in its Service Description.

Previous work

Stardog advertises hints in this format (eg from https://energy.ld.admin.ch/query, don't know which version is installed).

[ rdf:type                       sd:Service ;
  sd:queryHint                   [ sd:description  "Disables orderby.limit optimizer" ;
                                   sd:name         "optimizer.orderby.limit"
                                 ] ;
  • The props sd:queryHint, sd:description, sd:name are custom, i.e. don't exist in SD
  • A class sd:QueryHint also needs to be added
  • I like the hierarchical (3-part) names they use
  • maybe an extra separate prop should be added for classification of the hints
  • negative hints like join.directhash: Disables DirectHash joins need to be explicated (eg with a flag sd:isNegatedHint)

The full list of Stardog hints follows:

name description
cached.dataset.name Specifies name of the cached dataset to use
cardinality Suggests the cardinality of the following graph pattern
describe.strategy Specifies the strategy to evaluate DESCRIBE queries (either built-in or custom)
edge.properties.lookup Specifies how edge property patterns like '<<?s :p ?o>> :q ?t' should be evaluated: - direct: first by :p (embedded predicate), then by :q (outside predicate) - inverse: first by :q (outside predicate), then by :p (embedded predicate)
equality.identity Enumerates variables for which equality (i.e. ==) should be treated as identity (i.e. sameTerm in SPARQL)
evaluate Suggests to evaluate the scope at the query optimization time
evaluate.limit Limits the number of results that a pattern evaluated at optimization time is allowed to produce
from_named.inline.limit Specifies the limit of the number of graphs in FROM and FROM NAMED which could be inlined in the query string using UNION
group.joins Enables grouping of graph patterns for join order optimization: all patterns in the group will be joined together before joining with other patterns
join.bind Disables Bind joins
join.choice.strategy Strategy to select join algorithms during optimization - standard: the join order optimizer will decide. - economic: the join order optimizer should try to avoid joins with large memory footprints. - aggressive: the join order optimizer should select the fastest joins pretending there won't be memory issues. - streaming: the join order optimizer should avoid pipeline breaking operations.
join.choice.strategy.streaming.limit Maximum value of query limit for considering streaming join strategy - For queries with higher limit, the optimizer will not try to eliminate pipeline breaker operators (unless explicitly requested with join.choice.strategy hint)
join.directhash Disables DirectHash joins
join.gracehash Disables GraceHash joins
join.hash Disables Hash joins
join.merge Disables Merge joins
join.nestedloop Disables NestedLoop joins
join.service Disables Service joins
join.sortmerge Disables SortMerge joins
literal.index Specifies if the optimizer should try to use the literal index, e.g. for numbers, if available
optimizer.binds.flatten Disables binds.flatten optimizer
optimizer.binds.placement Disables binds.placement optimizer
optimizer.constants Disables constants optimizer
optimizer.duplicates.eliminate Disables duplicates.eliminate optimizer
optimizer.empty.propagate Disables empty.propagate optimizer
optimizer.evaluator Disables evaluator optimizer
optimizer.filters.exists Disables filters.exists optimizer
optimizer.filters.in Disables filters.in optimizer
optimizer.filters.notexists Disables filters.notexists optimizer
optimizer.filters.or Disables filters.or optimizer
optimizer.filters.pull Disables filters.pull optimizer
optimizer.filters.push Disables filters.push optimizer
optimizer.hints.cardinality Disables hints.cardinality optimizer
optimizer.inline.bind.values Disables inline.bind.values optimizer
optimizer.inline.equality Disables inline.equality optimizer
optimizer.inline.from Disables inline.from optimizer
optimizer.join.variables Disables join.variables optimizer
optimizer.joins.merge Disables joins.merge optimizer
optimizer.minus.type Disables minus.type optimizer
optimizer.optionals.eliminate Disables optionals.eliminate optimizer
optimizer.orderby.limit Disables orderby.limit optimizer
optimizer.patterns.pull Disables patterns.pull optimizer
optimizer.patterns.push Disables patterns.push optimizer
optimizer.property.paths.reachability Disables property.paths.reachability optimizer
optimizer.property.paths.star Disables property.paths.star optimizer
optimizer.property.paths.start Disables property.paths.start optimizer
optimizer.reorder.solution.modifiers Disables reorder.solution.modifiers optimizer
optimizer.services.push Disables services.push optimizer
optimizer.values.rewrite Disables values.rewrite optimizer
optimizer.virtual.coalesce Disables virtual.coalesce optimizer
optimizer.virtual.prune.joins Disables virtual.prune.joins optimizer
paths.evaluation Specifies how paths should be traversed: - lazy: search is stopped after each new path is found so it can be returned (not available for PATHS ALL queries) - eager: all reachable nodes are reached first
plan.cache Toggles plan cache for this query - If Off, the query will be optimized without any plan cache lookup -
push.filters Specifies how the optimizer tries to push filters down the query plan: - aggressive: every filter will be pushed as deep down as possible - default: the optimizer will decide how deep to push based on other factors, e.g. estimated cardinality - off: the optimization is off
push.reasoning Specifies how the optimizer should try to patterns under reasoning operators like '?s rdf:type ?type': - aggressive: always push the one which looks the most selective - default: the optimizer will decide based on any available criteria - off: the optimization is off
query.decomposition Specifies query decomposition strategy: - aggressive: always decompose joins on the subject variable - default: let the optimizer decide based on other factors, e.g. available indexes - off: don't decompose group graph patterns
query.join.decomposition Enables the optimizer to decompose query patterns in order to prevent non-selective joins before main optimization phase: - aggressive: always prefer to prevent non-selective joins - default: let the optimizer decide based on join selectivity - off: don't decompose group graph patterns along non-selective joins
reasoning Specifies whether query rewriting is on or off for this group graph pattern
reasoning.rewriting Specifies how query rewriting should be done: - per_scope: all triple patterns as a group (default) - per_pattern: each triple pattern individually (results in smaller rewritings in some cases)
search.cardinality.threshold Threshold of maximum cardinality of patterns that can be selected by the optimizer to evaluate before a full-text search pattern
search.max.subset.size Maximum number of patterns that can be selected by the optimizer to evaluate before a full-text search pattern
search.push.threshold Cardinality estimation threshold to detect a non-selective full-text search pattern so that more selective parts of the query can be evaluated first.
service.batch Specifies how many SPARQL SERVICE results will be parsed from the response before processing
values.id.lazy If On, the query engine will try to postpone dictionary encoding for nodes generatedduring query evaluation until necessary, for example, till a join condition requires it for evaluation.
values.rewrite.limit Maximum number of elements in VALUES nodes - VALUES with more elements will not be rewritten into UNIONs
vg.union.strategy How to translate to SQL when the fields for the returned solution can come from multiple mappings: - coalesced: returns the fields from each mapping in its own field and null for all other fields. - embedded: creates two fields for each result, one for the field’s type and one for the field’s value.
virtual.transparency Enables or disables the Virtual Transparency feature

cc @HolgerKnublauch

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant