Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minor: Improve documentation for registering AnalyzerRule #9520

Merged
merged 4 commits into from
Mar 10, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 12 additions & 9 deletions datafusion/core/src/execution/context/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1587,7 +1587,7 @@ impl SessionState {
self
}

/// Replace the default query planner
/// override default query planner with `query_planner`
pub fn with_query_planner(
mut self,
query_planner: Arc<dyn QueryPlanner + Send + Sync>,
Expand All @@ -1596,7 +1596,7 @@ impl SessionState {
self
}

/// Replace the analyzer rules
/// Override the [`AnalyzerRule`]s optimizer plan rules.
pub fn with_analyzer_rules(
mut self,
rules: Vec<Arc<dyn AnalyzerRule + Send + Sync>>,
Expand All @@ -1605,7 +1605,7 @@ impl SessionState {
self
}

/// Replace the optimizer rules
/// Replace the entire list of [`OptimizerRule`]s used to optimize plans
pub fn with_optimizer_rules(
mut self,
rules: Vec<Arc<dyn OptimizerRule + Send + Sync>>,
Expand All @@ -1614,7 +1614,7 @@ impl SessionState {
self
}

/// Replace the physical optimizer rules
/// Replace the entire list of [`PhysicalOptimizerRule`]s used to optimize plans
pub fn with_physical_optimizer_rules(
mut self,
physical_optimizers: Vec<Arc<dyn PhysicalOptimizerRule + Send + Sync>>,
Expand All @@ -1623,7 +1623,8 @@ impl SessionState {
self
}

/// Adds a new [`AnalyzerRule`]
/// Add `analyzer_rule` to the end of the list of
/// [`AnalyzerRule`]s used to rewrite queries.
pub fn add_analyzer_rule(
mut self,
analyzer_rule: Arc<dyn AnalyzerRule + Send + Sync>,
Expand All @@ -1632,7 +1633,8 @@ impl SessionState {
self
}

/// Adds a new [`OptimizerRule`]
/// Add `optimizer_rule` to the end of the list of
/// [`OptimizerRule`]s used to rewrite queries.
pub fn add_optimizer_rule(
mut self,
optimizer_rule: Arc<dyn OptimizerRule + Send + Sync>,
Expand All @@ -1641,12 +1643,13 @@ impl SessionState {
self
}

/// Adds a new [`PhysicalOptimizerRule`]
/// Add `physical_optimizer_rule` to the end of the list of
/// [`PhysicalOptimizerRule`]s used to rewrite queries.
pub fn add_physical_optimizer_rule(
mut self,
optimizer_rule: Arc<dyn PhysicalOptimizerRule + Send + Sync>,
physical_optimizer_rule: Arc<dyn PhysicalOptimizerRule + Send + Sync>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

) -> Self {
self.physical_optimizers.rules.push(optimizer_rule);
self.physical_optimizers.rules.push(physical_optimizer_rule);
self
}

Expand Down
7 changes: 4 additions & 3 deletions datafusion/core/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -132,9 +132,9 @@
//!
//! ## Customization and Extension
//!
//! DataFusion is a "disaggregated" query engine. This
//! DataFusion is a "disaggregated" query engine. This
//! means developers can start with a working, full featured engine, and then
//! extend the parts of DataFusion they need to specialize for their usecase. For example,
//! extend the areas they need to specialize for their usecase. For example,
//! some projects may add custom [`ExecutionPlan`] operators, or create their own
//! query language that directly creates [`LogicalPlan`] rather than using the
//! built in SQL planner, [`SqlToRel`].
Expand All @@ -145,7 +145,7 @@
//! * define your own catalogs, schemas, and table lists ([`CatalogProvider`])
//! * build your own query language or plans ([`LogicalPlanBuilder`])
//! * declare and use user-defined functions ([`ScalarUDF`], and [`AggregateUDF`], [`WindowUDF`])
//! * add custom optimizer rewrite passes ([`OptimizerRule`] and [`PhysicalOptimizerRule`])
//! * add custom plan rewrite passes ([`AnalyzerRule`], [`OptimizerRule`] and [`PhysicalOptimizerRule`])
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

contribution 1: Mention AnalyzerRule in this list

//! * extend the planner to use user-defined logical and physical nodes ([`QueryPlanner`])
//!
//! You can find examples of each of them in the [datafusion-examples] directory.
Expand All @@ -158,6 +158,7 @@
//! [`WindowUDF`]: crate::logical_expr::WindowUDF
//! [`QueryPlanner`]: execution::context::QueryPlanner
//! [`OptimizerRule`]: datafusion_optimizer::optimizer::OptimizerRule
//! [`AnalyzerRule`]: datafusion_optimizer::analyzer::AnalyzerRule
//! [`PhysicalOptimizerRule`]: crate::physical_optimizer::optimizer::PhysicalOptimizerRule
//!
//! # Architecture
Expand Down
8 changes: 6 additions & 2 deletions datafusion/core/src/physical_optimizer/optimizer.rs
Original file line number Diff line number Diff line change
Expand Up @@ -34,8 +34,12 @@ use crate::physical_optimizer::topk_aggregation::TopKAggregation;
use crate::{error::Result, physical_plan::ExecutionPlan};

/// `PhysicalOptimizerRule` transforms one ['ExecutionPlan'] into another which
/// computes the same results, but in a potentially more efficient
/// way.
/// computes the same results, but in a potentially more efficient way.
///
/// Use [`SessionState::add_physical_optimizer_rule`] to register additional
/// `PhysicalOptimizerRule`s.
///
/// [`SessionState::add_physical_optimizer_rule`]: https://docs.rs/datafusion/latest/datafusion/execution/context/struct.SessionState.html#method.add_physical_optimizer_rule
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Contrbution 2: Add links to how to register these rules with SessionContext (which was the API I didn't find initially)

pub trait PhysicalOptimizerRule {
/// Rewrite `plan` to an optimized form
fn optimize(
Expand Down
15 changes: 10 additions & 5 deletions datafusion/optimizer/src/analyzer/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -46,13 +46,18 @@ use self::rewrite_expr::OperatorToFunction;
/// [`AnalyzerRule`]s transform [`LogicalPlan`]s in some way to make
/// the plan valid prior to the rest of the DataFusion optimization process.
///
/// For example, it may resolve [`Expr`]s into more specific forms such
/// as a subquery reference, to do type coercion to ensure the types
/// This is different than an [`OptimizerRule`](crate::OptimizerRule)
/// which must preserve the semantics of the `LogicalPlan`, while computing
/// results in a more optimal way.
///
/// For example, an `AnalyzerRule` may resolve [`Expr`]s into more specific
/// forms such as a subquery reference, or do type coercion to ensure the types
/// of operands are correct.
///
/// This is different than an [`OptimizerRule`](crate::OptimizerRule)
/// which should preserve the semantics of the LogicalPlan but compute
/// it the same result in some more optimal way.
/// Use [`SessionState::add_analyzer_rule`] to register additional
/// `AnalyzerRule`s.
///
/// [`SessionState::add_analyzer_rule`]: https://docs.rs/datafusion/latest/datafusion/execution/context/struct.SessionState.html#method.add_analyzer_rule
pub trait AnalyzerRule {
/// Rewrite `plan`
fn analyze(&self, plan: LogicalPlan, config: &ConfigOptions) -> Result<LogicalPlan>;
Expand Down
11 changes: 10 additions & 1 deletion datafusion/optimizer/src/optimizer.rs
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,16 @@ use log::{debug, warn};
/// `OptimizerRule` transforms one [`LogicalPlan`] into another which
/// computes the same results, but in a potentially more efficient
/// way. If there are no suitable transformations for the input plan,
/// the optimizer can simply return it as is.
/// the optimizer should simply return it unmodified.
///
/// To change the semantics of a `LogicalPlan`, see [`AnalyzerRule`]
///
/// Use [`SessionState::add_optimizer_rule`] to register additional
/// `OptimizerRule`s.
///
/// [`AnalyzerRule`]: crate::analyzer::AnalyzerRule
/// [`SessionState::add_optimizer_rule`]: https://docs.rs/datafusion/latest/datafusion/execution/context/struct.SessionState.html#method.add_optimizer_rule

pub trait OptimizerRule {
/// Try and rewrite `plan` to an optimized form, returning None if the plan cannot be
/// optimized by this rule.
Expand Down
Loading