Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

opt: add exploration rule for constraining scans #24355

Merged
merged 1 commit into from
Mar 31, 2018

Conversation

RaduBerinde
Copy link
Member

New "constrain scan" explore rule tries to push filters into scans,
as constraints. The result is either just the constrained scan (if the
filter was fully converted into a constraint), or a select on top of
the constrained scan.

Along with the rule, we needed some minor changes on the execution
side to support scan constraints.

Other fixes that were necessary:

  • optgen: the generated code for explore rules was adding expressions
    to the wrong group (because of shadowing of the original _eid).

  • idxconstraint support for FiltersOp.

  • move the idxconstraint test to idxconstraint_test package to avoid
    import loop.

  • test catalog fix: the primary index was returning the wrong ordinal
    value.

  • adding some cost to the select operator, to prefer a constrained
    scan with no remaining filter (vs a constrained scan with a
    remaining filter).

Release note: None

@RaduBerinde RaduBerinde requested review from rytaft, andy-kimball and a team March 30, 2018 00:12
@RaduBerinde RaduBerinde requested a review from a team as a code owner March 30, 2018 00:12
@RaduBerinde RaduBerinde requested a review from a team March 30, 2018 00:12
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@andy-kimball
Copy link
Contributor

:lgtm:

Very nice, mostly nits from me. It's exciting to see our 2nd exploration rule come online, especially with as big of an impact as this rule will have.

Review status: 0 of 39 files reviewed at latest revision, all discussions resolved, all commit checks successful.


pkg/sql/opt/exec/execbuilder/testdata/select, line 11 at r1 (raw file):


exec-explain
SELECT * FROM t.a WHERE x > 1

I think it'd be good to have a few exec tests that are testing data types other than int, just to make sure they're working end-to-end.


pkg/sql/opt/idxconstraint/index_constraints.go, line 669 at r1 (raw file):

}

// makeSpansForAndcalculates spans for an AndOp or FiltersOp.

NIT: need space before "calculates"


pkg/sql/opt/norm/testdata/bool, line 201 at r1 (raw file):

SELECT * FROM a WHERE false OR false
----
scan a,constrained

It seems redundant to show "constrained" here when the constraint is also shown below.


pkg/sql/opt/optgen/cmd/optgen/explorer_gen.go, line 88 at r1 (raw file):

// explore rule defined. The code is similar to this:
//
//   func (_e *explorer) exploreScan(_state *exploreState, _root memo.ExprID) (_fullyExplored bool) {

NIT: It'd be more consistent if we also renamed _state to _rootState, and _scanExpr to _rootExpr.


pkg/sql/opt/xform/explorer.go, line 198 at r1 (raw file):

}

//func (e *explorer) exploreSelect(selectState *exploreState, sel memo.ExprID) (fullyExplored bool) {

You can delete this, since you got Optgen generating the code correctly...


pkg/sql/opt/xform/explorer.go, line 288 at r1 (raw file):

// ----------------------------------------------------------------------

func (e *explorer) isUnconstrainedScan(def memo.PrivateID) bool {

NIT: could use a header comment


pkg/sql/opt/xform/explorer.go, line 293 at r1 (raw file):

}

func (e *explorer) constrainScan(filterGroup memo.GroupID, scanDef memo.PrivateID) []memo.Expr {

NIT: could use a header comment that gives overview of what's going on in this logic


pkg/sql/opt/xform/explorer.go, line 298 at r1 (raw file):

	scanOpDef := e.mem.LookupPrivate(scanDef).(*memo.ScanOpDef)

	md := e.mem.Metadata()

NIT: I'd add a comment at the top of this block: // Fill out data structures needed to initialize the idxconstraint library.


pkg/sql/opt/xform/rules/select.opt, line 10 at r1 (raw file):

# constrained Scan.
[ConstrainScan, Explore]
(Select (Scan $def:* & (IsUnconstrainedScan $def)) $filter:*)

NIT: I usually put ops with multiple args on multiple lines, just so they're easier to read:

(Select
  (Scan $def:* & (IsUnconstrainedScan $def))
  $filter:*
)

pkg/sql/opt/xform/testdata/physprops/ordering, line 72 at r1 (raw file):

# --------------------------------------------------

# Pass through ordering to scan operator that can support it.

These tests need to be updated so that the SELECT doesn't get pushed into the scan, since their purpose is to make sure we pass ordering through a SELECT op. Maybe update condition to something like x=y that won't get pushed down.


pkg/sql/opt/xform/testdata/physprops/ordering, line 151 at r1 (raw file):

# --------------------------------------------------

# Pass through ordering to scan operator that can support it.

Same goes with these tests.


pkg/sql/opt/xform/testdata/physprops/presentation, line 31 at r1 (raw file):

SELECT a.y, a.x, a.y y2 FROM a WHERE x=1
----
scan a,constrained

And these tests as well.


pkg/sql/opt/xform/testdata/rules/scan, line 135 at r1 (raw file):

 └── 1: (scan a)

optsteps

NIT: I've generally avoided checking in optsteps output, just because it's going to be very "fragile", meaning that almost any change will cause a diff, which is painful. I've tried to keep a few optsteps tests in "combo" for that reason, where I combine a bunch of operators and rules in just a few larger queries. I'm thinking we can have an interesting query there that pushes down Project, Select, and Limit into scans.


Comments from Reviewable

@andy-kimball
Copy link
Contributor

Review status: 0 of 39 files reviewed at latest revision, 13 unresolved discussions, all commit checks successful.


pkg/sql/opt/memo/private_storage.go, line 179 at r1 (raw file):

	if def.Constraint != nil {
		// Use a delimiter since the col set is variable-length.
		ps.keyBuf.writeUvarint(0)

Why not write this value before writeColSet? If the pointer is nil, the uint64 will be 0, which will take up just 1 byte. We'll also eventually have HardLimit/SoftLimit fields, and I don't think we want to worry about trying to combine multiple optional fields. Let's just write them all to the buffer each time.


Comments from Reviewable

@petermattis
Copy link
Collaborator

Review status: 0 of 39 files reviewed at latest revision, 14 unresolved discussions, all commit checks successful.


pkg/sql/opt/xform/testdata/rules/select, line 38 at r1 (raw file):

SELECT k FROM a WHERE k = 1
----
[9: "p:k:1"]

Drive-by comment: it took me a while to track down what these lines were indicating. Can the root of the memo be indicated inline below? For example, ├── 9*: .... Or perhaps * ├── 9: ..., though you'd have to indent all of the memo lines then. The required props could then be output as required: "...".


Comments from Reviewable

@RaduBerinde
Copy link
Member Author

pkg/sql/opt/xform/testdata/rules/select, line 38 at r1 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Drive-by comment: it took me a while to track down what these lines were indicating. Can the root of the memo be indicated inline below? For example, ├── 9*: .... Or perhaps * ├── 9: ..., though you'd have to indent all of the memo lines then. The required props could then be output as required: "...".

I think that for display purposes, we should topologically sort the memo and hide orphan groups (like 10 here). I agree the physical prop strings could be less cryptic too. I will explore these improvements in a separate PR.


Comments from Reviewable

@RaduBerinde
Copy link
Member Author

pkg/sql/opt/norm/testdata/bool, line 201 at r1 (raw file):

Previously, andy-kimball (Andy Kimball) wrote…

It seems redundant to show "constrained" here when the constraint is also shown below.

Right, but it's definitely needed in the memo view to distinguish from the unconstrained versions.. Should I add a flag for formatPrivate?


Comments from Reviewable

@petermattis
Copy link
Collaborator

Review status: 0 of 39 files reviewed at latest revision, 15 unresolved discussions, all commit checks successful.


pkg/sql/opt/xform/testdata/rules/select, line 38 at r1 (raw file):

Previously, RaduBerinde wrote…

I think that for display purposes, we should topologically sort the memo and hide orphan groups (like 10 here). I agree the physical prop strings could be less cryptic too. I will explore these improvements in a separate PR.

Ack. Definitely not for this PR.


Comments from Reviewable

@andy-kimball
Copy link
Contributor

Review status: 0 of 39 files reviewed at latest revision, 15 unresolved discussions, all commit checks successful.


pkg/sql/opt/norm/testdata/bool, line 201 at r1 (raw file):

Previously, RaduBerinde wrote…

Right, but it's definitely needed in the memo view to distinguish from the unconstrained versions.. Should I add a flag for formatPrivate?

Hm, I see. Let's leave as-is for now, and see how we like it, or if we have other ideas later.


Comments from Reviewable

@RaduBerinde
Copy link
Member Author

TFTR! Updated.


Review status: 0 of 39 files reviewed at latest revision, 15 unresolved discussions, some commit checks failed.


pkg/sql/opt/exec/execbuilder/testdata/select, line 11 at r1 (raw file):

Previously, andy-kimball (Andy Kimball) wrote…

I think it'd be good to have a few exec tests that are testing data types other than int, just to make sure they're working end-to-end.

Done.


pkg/sql/opt/idxconstraint/index_constraints.go, line 669 at r1 (raw file):

Previously, andy-kimball (Andy Kimball) wrote…

NIT: need space before "calculates"

Done.


pkg/sql/opt/memo/private_storage.go, line 179 at r1 (raw file):

Previously, andy-kimball (Andy Kimball) wrote…

Why not write this value before writeColSet? If the pointer is nil, the uint64 will be 0, which will take up just 1 byte. We'll also eventually have HardLimit/SoftLimit fields, and I don't think we want to worry about trying to combine multiple optional fields. Let's just write them all to the buffer each time.

Done.


pkg/sql/opt/optgen/cmd/optgen/explorer_gen.go, line 88 at r1 (raw file):

Previously, andy-kimball (Andy Kimball) wrote…

NIT: It'd be more consistent if we also renamed _state to _rootState, and _scanExpr to _rootExpr.

Done.


pkg/sql/opt/xform/explorer.go, line 198 at r1 (raw file):

Previously, andy-kimball (Andy Kimball) wrote…

You can delete this, since you got Optgen generating the code correctly...

Done.


pkg/sql/opt/xform/explorer.go, line 288 at r1 (raw file):

Previously, andy-kimball (Andy Kimball) wrote…

NIT: could use a header comment

Done.


pkg/sql/opt/xform/explorer.go, line 293 at r1 (raw file):

Previously, andy-kimball (Andy Kimball) wrote…

NIT: could use a header comment that gives overview of what's going on in this logic

Done.


pkg/sql/opt/xform/explorer.go, line 298 at r1 (raw file):

Previously, andy-kimball (Andy Kimball) wrote…

NIT: I'd add a comment at the top of this block: // Fill out data structures needed to initialize the idxconstraint library.

Done.


pkg/sql/opt/xform/rules/select.opt, line 10 at r1 (raw file):

Previously, andy-kimball (Andy Kimball) wrote…

NIT: I usually put ops with multiple args on multiple lines, just so they're easier to read:

(Select
  (Scan $def:* & (IsUnconstrainedScan $def))
  $filter:*
)

Done.


pkg/sql/opt/xform/testdata/physprops/ordering, line 72 at r1 (raw file):

Previously, andy-kimball (Andy Kimball) wrote…

These tests need to be updated so that the SELECT doesn't get pushed into the scan, since their purpose is to make sure we pass ordering through a SELECT op. Maybe update condition to something like x=y that won't get pushed down.

Done.


pkg/sql/opt/xform/testdata/physprops/ordering, line 151 at r1 (raw file):

Previously, andy-kimball (Andy Kimball) wrote…

Same goes with these tests.

Done.


pkg/sql/opt/xform/testdata/physprops/presentation, line 31 at r1 (raw file):

Previously, andy-kimball (Andy Kimball) wrote…

And these tests as well.

Done.


pkg/sql/opt/xform/testdata/rules/scan, line 135 at r1 (raw file):

Previously, andy-kimball (Andy Kimball) wrote…

NIT: I've generally avoided checking in optsteps output, just because it's going to be very "fragile", meaning that almost any change will cause a diff, which is painful. I've tried to keep a few optsteps tests in "combo" for that reason, where I combine a bunch of operators and rules in just a few larger queries. I'm thinking we can have an interesting query there that pushes down Project, Select, and Limit into scans.

Removed.


Comments from Reviewable

@rytaft
Copy link
Collaborator

rytaft commented Mar 30, 2018

:lgtm_strong:


Reviewed 27 of 39 files at r1, 13 of 13 files at r2.
Review status: all files reviewed at latest revision, 15 unresolved discussions, some commit checks failed.


pkg/sql/opt/exec/factory.go, line 45 at r1 (raw file):

	// ConstructScan returns a node that represents a scan of the given table.
	// Only the given set of needed columns are part of the result.
	// Note: indexConstraint can be nil.

[nit] add a comment about the purpose of indexConstraint if it's not nil


pkg/sql/opt/exec/execbuilder/testdata/select, line 90 at r2 (raw file):

scan  0  scan  ·      ·                  (n, str)  ·
·     0  ·     table  c@str              ·         ·
·     0  ·     spans  -/"moo"/PrefixEnd  ·         ·

this span looks like it represents str < 'moo'


pkg/sql/opt/norm/testdata/bool, line 326 at r2 (raw file):

 └── constraint: /1: contradiction

# Don't fold.

update comment


pkg/sql/opt/xform/explorer.go, line 248 at r2 (raw file):

// ----------------------------------------------------------------------

// isUnconstrainedScan returns true if thegiven expression is a Scan

thegiven -> the given


pkg/sql/opt/xform/testdata/rules/select, line 44 at r2 (raw file):

 │    └── "p:k:1" [cost=100.00]
 │         └── best: (scan a,constrained)
 ├── 8: (scan a) (scan a@u) (scan a@v)

Probably a dumb question: Why is a new group created for (scan a) when group 1 already exists?


Comments from Reviewable

@andy-kimball
Copy link
Contributor

Review status: all files reviewed at latest revision, 20 unresolved discussions, some commit checks failed.


pkg/sql/opt/xform/testdata/rules/select, line 44 at r2 (raw file):

Previously, rytaft wrote…

Probably a dumb question: Why is a new group created for (scan a) when group 1 already exists?

We don't show the "needed columns" in the memo, but if we did, you'd see that the needed cols for group #8 are different than group #1. Changing the output columns changes the logical equivalence of the expression, so it has to go into a different group.

We do need work on the memo output format to try and make this and other things more clear. It's challenging to balance completeness with clarity, though.


Comments from Reviewable

@rytaft
Copy link
Collaborator

rytaft commented Mar 30, 2018

pkg/sql/opt/xform/testdata/rules/select, line 44 at r2 (raw file):

Previously, andy-kimball (Andy Kimball) wrote…

We don't show the "needed columns" in the memo, but if we did, you'd see that the needed cols for group #8 are different than group #1. Changing the output columns changes the logical equivalence of the expression, so it has to go into a different group.

We do need work on the memo output format to try and make this and other things more clear. It's challenging to balance completeness with clarity, though.

Got it - thanks for the explanation!


Comments from Reviewable

@andy-kimball
Copy link
Contributor

Review status: all files reviewed at latest revision, 9 unresolved discussions, some commit checks failed.


pkg/sql/opt/xform/testdata/rules/scan, line 135 at r1 (raw file):

Previously, RaduBerinde wrote…

Removed.

It's fine to leave this as an opt test, if you were trying to test something here.


Comments from Reviewable

New "constrain scan" explore rule tries to push filters into scans,
as constraints. The result is either just the constrained scan (if the
filter was fully converted into a constraint), or a select on top of
the constrained scan.

Along with the rule, we needed some minor changes on the execution
side to support scan constraints.

Other fixes that were necessary:

 - optgen: the generated code for explore rules was adding expressions
   to the wrong group (because of shadowing of the original `_eid`).

 - idxconstraint support for `FiltersOp`.

 - move the idxconstraint test to `idxconstraint_test` package to avoid
   import loop.

 - test catalog fix: the primary index was returning the wrong ordinal
   value.

 - adding some cost to the select operator, to prefer a constrained
   scan with no remaining filter (vs a constrained scan with a
   remaining filter).

Release note: None
@RaduBerinde
Copy link
Member Author

bors r+

@craig
Copy link
Contributor

craig bot commented Mar 31, 2018

Build failed

@RaduBerinde
Copy link
Member Author

bors r+

craig bot added a commit that referenced this pull request Mar 31, 2018
24355: opt: add exploration rule for constraining scans r=RaduBerinde a=RaduBerinde

New "constrain scan" explore rule tries to push filters into scans,
as constraints. The result is either just the constrained scan (if the
filter was fully converted into a constraint), or a select on top of
the constrained scan.

Along with the rule, we needed some minor changes on the execution
side to support scan constraints.

Other fixes that were necessary:

 - optgen: the generated code for explore rules was adding expressions
   to the wrong group (because of shadowing of the original `_eid`).

 - idxconstraint support for `FiltersOp`.

 - move the idxconstraint test to `idxconstraint_test` package to avoid
   import loop.

 - test catalog fix: the primary index was returning the wrong ordinal
   value.

 - adding some cost to the select operator, to prefer a constrained
   scan with no remaining filter (vs a constrained scan with a
   remaining filter).

Release note: None
@craig
Copy link
Contributor

craig bot commented Mar 31, 2018

Build succeeded

@craig craig bot merged commit a4d44ac into cockroachdb:master Mar 31, 2018
@RaduBerinde RaduBerinde deleted the constrained-scan branch March 31, 2018 11:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants