Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make OptimizerConfig a trait (#4631) (#4638) #4645

Merged
merged 4 commits into from
Dec 16, 2022

Conversation

tustvold
Copy link
Contributor

@tustvold tustvold commented Dec 15, 2022

Which issue does this PR close?

Closes #4631
Closes #4638

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@tustvold tustvold added the api change Changes the API exposed to users of the crate label Dec 15, 2022
@github-actions github-actions bot added core Core DataFusion crate optimizer Optimizer rules labels Dec 15, 2022
@@ -1557,14 +1557,6 @@ impl SessionState {
.register_catalog(config.default_catalog.clone(), default_catalog);
}

let optimizer_config = OptimizerConfig::new().filter_null_keys(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving this into the OptimizerConfig passed at optimize time is the fix for #4638

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

];
if config.filter_null_keys {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the other half of the fix for #4638 - determine the rules dynamically

fn query_execution_start_time(&self) -> DateTime<Utc>;

/// Returns false if the given rule should be skipped
fn rule_enabled(&self, name: &str) -> bool;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I felt this formulation was more future proof than adding a specific function for enabling/disabling FilterNullJoins

/// Create optimizer config
pub fn new() -> Self {
Self {
query_execution_start_time: Utc::now(),
next_id: 0, // useful for generating things like unique subquery aliases
next_id: AtomicUsize::new(1),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use 1, as previously the implementation returned the value post-increment which the atomics don't provide

// TODO this should not be on the config,
// it should be its own 'OptimizerState' or something)
next_id: usize,
next_id: AtomicUsize,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this limited interior mutability is better than the potential confusion around rules taking a mutable OptimizerConfig

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree -- cc @waynexia

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is also good that OptimizerContext doesn't implement Clone which will discourage making copies that could get out of date

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good 👍

.with_query_execution_start_time(
self.execution_props.query_execution_start_time,
);
// TODO: Implement OptimizerContext directly on DataFrame (#4631) (#4626)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the ultimate motivation for this change, to allow a single config container, that then just implements the traits needed by the various sub-systems

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think having the sub crates depend on a traits that are implemented in the core trait sounds like a good idea to me

@tustvold tustvold requested a review from alamb December 15, 2022 11:48
@jackwener
Copy link
Member

jackwener commented Dec 15, 2022

I have some PR about optimizer😂, look like it will make some conflict.

I will wait for this PR merge, and then do them.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a (really) nice improvement to me -- thank you @tustvold 🏆

I think we should gather some more input and leave this open a while before merging it even though I think the actual required API change downstream are relatively low

cc @andygrove @jackwener @yahoNanJing @Dandandan @thinkharderdev @xudong963

.with_query_execution_start_time(
self.execution_props.query_execution_start_time,
);
// TODO: Implement OptimizerContext directly on DataFrame (#4631) (#4626)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think having the sub crates depend on a traits that are implemented in the core trait sounds like a good idea to me

@@ -1557,14 +1557,6 @@ impl SessionState {
.register_catalog(config.default_catalog.clone(), default_catalog);
}

let optimizer_config = OptimizerConfig::new().filter_null_keys(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

// TODO this should not be on the config,
// it should be its own 'OptimizerState' or something)
next_id: usize,
next_id: AtomicUsize,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree -- cc @waynexia

// TODO this should not be on the config,
// it should be its own 'OptimizerState' or something)
next_id: usize,
next_id: AtomicUsize,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is also good that OptimizerContext doesn't implement Clone which will discourage making copies that could get out of date

datafusion/optimizer/src/optimizer.rs Show resolved Hide resolved
Copy link
Member

@xudong963 xudong963 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change makes sense to me, thanks @tustvold

Copy link
Contributor

@thinkharderdev thinkharderdev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me

@tustvold
Copy link
Contributor Author

I plan to merge this in the next few hours unless anybody objects / needs more time to review

@tustvold tustvold merged commit ca8985e into apache:master Dec 16, 2022
@ursabot
Copy link

ursabot commented Dec 16, 2022

Benchmark runs are scheduled for baseline = 8944581 and contender = ca8985e. ca8985e is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

@alamb
Copy link
Contributor

alamb commented Dec 16, 2022

🎉

really nice work @tustvold

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api change Changes the API exposed to users of the crate core Core DataFusion crate optimizer Optimizer rules
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Filter Null Keys Update Not Taking Effect Make OptimizerConfig a Trait
7 participants