-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
opt, parser: add support for nulls first, nulls last #71083
Conversation
a3b3747
to
04bb2ce
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice! Can you add a few logictests to show that we actually produce the correct output when NULLS FIRST
and NULLS LAST
are used? You'll want to add some tests in logictest/testdata/logic_test/order_by
as well as tests to show the plans in execbuilder/testdata/orderby
.
You should also add a release note describing the change, and maybe "Fixes #xxx" with the issue number.
Reviewed 2 of 4 files at r1, 3 of 3 files at r2, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nehageorge)
pkg/sql/opt/optbuilder/testdata/orderby, line 1065 at r2 (raw file):
# Should be different from above plan. build SELECT a, b FROM abcd ORDER BY b NULLS LAST, c
how about another test where you use the non-default nulls ordering for both columns?
pkg/sql/parser/parse_test.go, line 514 at r2 (raw file):
{`CREATE INDEX a ON b(a NULLS LAST)`, 6224, ``, ``}, {`CREATE INDEX a ON b(a ASC NULLS LAST)`, 6224, ``, ``}, {`CREATE INDEX a ON b(a DESC NULLS FIRST)`, 6224, ``, ``},
is this actually supported now? I think these cases should probably still error
pkg/sql/parser/parse_test.go, line 536 at r2 (raw file):
{`SELECT a FROM t ORDER BY a NULLS LAST`, 6224, ``, ``}, {`SELECT a FROM t ORDER BY a ASC NULLS LAST`, 6224, ``, ``}, {`SELECT a FROM t ORDER BY a DESC NULLS FIRST`, 6224, ``, ``},
looks like you need to remove these
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nehageorge)
pkg/sql/opt/optbuilder/orderby.go, line 162 at r2 (raw file):
// Set NULLS order. nullsDefaultOrder := true
nit: explain that "default" means nulls first and ascending or nulls last and descending
pkg/sql/opt/optbuilder/orderby.go, line 197 at r2 (raw file):
// (table.*). The resulting typed expression(s) are added to extraColsScope. func (b *Builder) analyzeExtraArgument( expr tree.Expr, inScope, projectionsScope, extraColsScope *scope, nullsDefaultOrder bool,
nit: explain what this argument does in the function comment.
pkg/sql/opt/optbuilder/orderby.go, line 260 at r2 (raw file):
ensureColumnOrderable(e) if !nullsDefaultOrder { extraColsScope.addColumn(scopeColName(""), tree.NewTypedIsNullExpr(e))
nit: you can attach a metadata name to a scope column to give a column a specific name in expression trees rather than columnX
with something like scopeColName("").WithMetadataName("null_ordering")
. You'll still want to pass an empty string to scopeColName
to ensure that no other expressions can reference the column (e.g., imagine that a user had a table with a column with the same name of "null_ordering" - references to "null_ordering" in a query could be ambiguous).
04bb2ce
to
ec3f847
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @mgartner and @rytaft)
pkg/sql/opt/optbuilder/orderby.go, line 162 at r2 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
nit: explain that "default" means nulls first and ascending or nulls last and descending
Done
pkg/sql/opt/optbuilder/orderby.go, line 197 at r2 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
nit: explain what this argument does in the function comment.
Done
pkg/sql/opt/optbuilder/orderby.go, line 260 at r2 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
nit: you can attach a metadata name to a scope column to give a column a specific name in expression trees rather than
columnX
with something likescopeColName("").WithMetadataName("null_ordering")
. You'll still want to pass an empty string toscopeColName
to ensure that no other expressions can reference the column (e.g., imagine that a user had a table with a column with the same name of "null_ordering" - references to "null_ordering" in a query could be ambiguous).
Done
pkg/sql/opt/optbuilder/testdata/orderby, line 1065 at r2 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
how about another test where you use the non-default nulls ordering for both columns?
Done
pkg/sql/parser/parse_test.go, line 514 at r2 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
is this actually supported now? I think these cases should probably still error
You're right, I removed the wrong lines. Thanks.
pkg/sql/parser/parse_test.go, line 536 at r2 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
looks like you need to remove these
Done
ec3f847
to
5d3ca25
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 5 of 6 files at r3, 1 of 1 files at r4, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @nehageorge)
pkg/sql/logictest/testdata/logic_test/order_by, line 451 at r4 (raw file):
statement ok INSERT INTO xy VALUES (2, NULL), (NULL, 6), (2, 5), (4, 8)
would be good to also include (NULL, NULL)
pkg/sql/opt/exec/execbuilder/testdata/orderby, line 1329 at r4 (raw file):
# index on column b. We should eliminate this sort in a future PR. query T EXPLAIN (VERBOSE) SELECT a, b FROM t ORDER BY b NULLS LAST
nit: add a matching test case in logic_test/order_by
pkg/sql/opt/optbuilder/orderby.go, line 263 at r4 (raw file):
if !nullsDefaultOrder { extraColsScope.addColumn( scopeColName("").WithMetadataName("nulls_ordering"),
nit: to distinguish when there is more than one column, you could append the column name, e.g., nulls_ordering_b
or nulls_ordering_c
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 4 files at r1, 5 of 6 files at r3, 1 of 1 files at r4, all commit messages.
Reviewable status: complete! 2 of 0 LGTMs obtained (waiting on @nehageorge and @rytaft)
-- commits, line 12 at r4:
nit: You can move "Partially fixes ..." above the release note and typically "Informs #XXX" is used to link to an issue without closing it.
pkg/sql/opt/exec/execbuilder/testdata/orderby, line 1327 at r4 (raw file):
# TODO(nehageorge): Right now the optimizer does a sort even though there is an # index on column b. We should eliminate this sort in a future PR.
Our indexes order NULLS
always first so it can't simply be eliminated.
If we had a UNION-like expression that preserved ordering and always ordered the left before the right inputs, we could perform two scans, one for non-null and one for NULL columns, and union the results.
If we implement NULLS FIRST/LAST
syntax for indexes, then a scan on INDEX (b ASC NULLS LAST)
could be used in a case like this.
pkg/sql/opt/optbuilder/orderby.go, line 263 at r4 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
nit: to distinguish when there is more than one column, you could append the column name, e.g.,
nulls_ordering_b
ornulls_ordering_c
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 2 of 0 LGTMs obtained (waiting on @nehageorge and @rytaft)
pkg/sql/opt/exec/execbuilder/testdata/orderby, line 1327 at r4 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
Our indexes order
NULLS
always first so it can't simply be eliminated.If we had a UNION-like expression that preserved ordering and always ordered the left before the right inputs, we could perform two scans, one for non-null and one for NULL columns, and union the results.
If we implement
NULLS FIRST/LAST
syntax for indexes, then a scan onINDEX (b ASC NULLS LAST)
could be used in a case like this.
I was talking to @rytaft about potentially implementing a transformation rule that might improve performance by using splitScanIntoUnionScans
in pkg/sql/opt/xform/general_funcs.go
and then treating the null and non-null values separately. This would allow the index to be used if it is present for the non-null values. I don't believe a sort would be required then? Let me know if I'm missing something. I can also make the comment more comprehensive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 2 of 0 LGTMs obtained (waiting on @nehageorge and @rytaft)
pkg/sql/logictest/testdata/logic_test/order_by, line 451 at r4 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
would be good to also include (NULL, NULL)
Hm... I'm thinking this would lead to non-deterministic test case results. For example, in the first case, (NULL, NULL) could be either the first row or the second in a valid ordering. Is there a way to accept two valid result sets?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 2 of 0 LGTMs obtained (waiting on @mgartner, @nehageorge, and @rytaft)
pkg/sql/logictest/testdata/logic_test/order_by, line 451 at r4 (raw file):
Previously, nehageorge wrote…
Hm... I'm thinking this would lead to non-deterministic test case results. For example, in the first case, (NULL, NULL) could be either the first row or the second in a valid ordering. Is there a way to accept two valid result sets?
you could insert it before the cases below where you order by both.
I think there may also be a rowsort
option that only sorts some of the columns, but I don't remember what it is
pkg/sql/opt/exec/execbuilder/testdata/orderby, line 1327 at r4 (raw file): Previously, nehageorge wrote…
AFAIK unions aren't guaranteed to preserve the ordering of their inputs, and they don't order everything from the left input before (or after) everything from the right. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 2 of 0 LGTMs obtained (waiting on @mgartner, @nehageorge, and @rytaft)
pkg/sql/logictest/testdata/logic_test/order_by, line 451 at r4 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
you could insert it before the cases below where you order by both.
I think there may also be a
rowsort
option that only sorts some of the columns, but I don't remember what it is
Helpful, thanks!
pkg/sql/opt/exec/execbuilder/testdata/orderby, line 1327 at r4 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
AFAIK unions aren't guaranteed to preserve the ordering of their inputs, and they don't order everything from the left input before (or after) everything from the right.
Hm if that is the case then this wouldn't work. I think what you said about NULLS FIRST/LAST
syntax for indexes would be an appropriate solution to the problem if the index was created with nulls in the right place. Then I believe we wouldn't need the sort. I could leave this in the to do with the issue #6224.
pkg/sql/opt/exec/execbuilder/testdata/orderby, line 1327 at r4 (raw file): Previously, nehageorge wrote…
SGTM. |
Previously, we only supported the default Postgres ordering for ORDER BY. That is, NULLS come first for ascending order and NULLS come last for descending order. Postgres also supports the keywords "NULLS FIRST" and "NULLS LAST", which explicitly places all the NULLS first or last, respectively. This PR adds support for this non-default ordering. Informs cockroachdb#6224. Release note (sql change): NULLS FIRST and NULLS LAST specifiers now supported for ORDER BY.
5d3ca25
to
8565595
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @mgartner and @rytaft)
Previously, mgartner (Marcus Gartner) wrote…
nit: You can move "Partially fixes ..." above the release note and typically "Informs #XXX" is used to link to an issue without closing it.
Done.
pkg/sql/opt/exec/execbuilder/testdata/orderby, line 1329 at r4 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
nit: add a matching test case in
logic_test/order_by
Done
pkg/sql/opt/optbuilder/orderby.go, line 263 at r4 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
+1
Done
pkg/sql/opt/exec/execbuilder/testdata/orderby, line 1327 at r4 (raw file): Previously, mgartner (Marcus Gartner) wrote…
I think it would work if you created a constant projection column in each child of the UNION ALL. So in one child the constant column would be |
pkg/sql/opt/exec/execbuilder/testdata/orderby, line 1327 at r4 (raw file): Previously, rytaft (Rebecca Taft) wrote…
This is similar to how |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 4 of 4 files at r5, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @mgartner)
bors r+ |
Build succeeded: |
as of this PR, are we still NULLS FIRST by default? can we have a cluster setting that makes this NULLS LAST by default? |
Yes. Given that our indexes are always "nulls first", setting |
filed #71381 |
Previously, we only supported the default ordering for
ORDER BY. That is, NULLS come first for ascending order and NULLS
come last for descending order. Postgres also supports the keywords
"NULLS FIRST" and "NULLS LAST", which explicitly places all the NULLS
first or last, respectively. This PR adds support for this non-default
ordering. Note that our default is the opposite of Postgres.
Informs #6224.
Release note (sql change): NULLS FIRST and NULLS LAST specifiers now
supported for ORDER BY.