Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

opt, parser: add support for nulls first, nulls last #71083

Merged
merged 1 commit into from
Oct 7, 2021

Conversation

nehageorge
Copy link

@nehageorge nehageorge commented Oct 4, 2021

Previously, we only supported the default ordering for
ORDER BY. That is, NULLS come first for ascending order and NULLS
come last for descending order. Postgres also supports the keywords
"NULLS FIRST" and "NULLS LAST", which explicitly places all the NULLS
first or last, respectively. This PR adds support for this non-default
ordering. Note that our default is the opposite of Postgres.

Informs #6224.

Release note (sql change): NULLS FIRST and NULLS LAST specifiers now
supported for ORDER BY.

@nehageorge nehageorge requested a review from a team as a code owner October 4, 2021 16:59
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@nehageorge nehageorge linked an issue Oct 4, 2021 that may be closed by this pull request
@nehageorge nehageorge force-pushed the nulls-first-last branch 3 times, most recently from a3b3747 to 04bb2ce Compare October 4, 2021 19:23
Copy link
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice! Can you add a few logictests to show that we actually produce the correct output when NULLS FIRST and NULLS LAST are used? You'll want to add some tests in logictest/testdata/logic_test/order_by as well as tests to show the plans in execbuilder/testdata/orderby.

You should also add a release note describing the change, and maybe "Fixes #xxx" with the issue number.

Reviewed 2 of 4 files at r1, 3 of 3 files at r2, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nehageorge)


pkg/sql/opt/optbuilder/testdata/orderby, line 1065 at r2 (raw file):

# Should be different from above plan.
build
SELECT a, b FROM abcd ORDER BY b NULLS LAST, c

how about another test where you use the non-default nulls ordering for both columns?


pkg/sql/parser/parse_test.go, line 514 at r2 (raw file):

		{`CREATE INDEX a ON b(a NULLS LAST)`, 6224, ``, ``},
		{`CREATE INDEX a ON b(a ASC NULLS LAST)`, 6224, ``, ``},
		{`CREATE INDEX a ON b(a DESC NULLS FIRST)`, 6224, ``, ``},

is this actually supported now? I think these cases should probably still error


pkg/sql/parser/parse_test.go, line 536 at r2 (raw file):

		{`SELECT a FROM t ORDER BY a NULLS LAST`, 6224, ``, ``},
		{`SELECT a FROM t ORDER BY a ASC NULLS LAST`, 6224, ``, ``},
		{`SELECT a FROM t ORDER BY a DESC NULLS FIRST`, 6224, ``, ``},

looks like you need to remove these

Copy link
Collaborator

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nehageorge)


pkg/sql/opt/optbuilder/orderby.go, line 162 at r2 (raw file):

	// Set NULLS order.
	nullsDefaultOrder := true

nit: explain that "default" means nulls first and ascending or nulls last and descending


pkg/sql/opt/optbuilder/orderby.go, line 197 at r2 (raw file):

// (table.*). The resulting typed expression(s) are added to extraColsScope.
func (b *Builder) analyzeExtraArgument(
	expr tree.Expr, inScope, projectionsScope, extraColsScope *scope, nullsDefaultOrder bool,

nit: explain what this argument does in the function comment.


pkg/sql/opt/optbuilder/orderby.go, line 260 at r2 (raw file):

		ensureColumnOrderable(e)
		if !nullsDefaultOrder {
			extraColsScope.addColumn(scopeColName(""), tree.NewTypedIsNullExpr(e))

nit: you can attach a metadata name to a scope column to give a column a specific name in expression trees rather than columnX with something like scopeColName("").WithMetadataName("null_ordering"). You'll still want to pass an empty string to scopeColName to ensure that no other expressions can reference the column (e.g., imagine that a user had a table with a column with the same name of "null_ordering" - references to "null_ordering" in a query could be ambiguous).

Copy link
Author

@nehageorge nehageorge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @mgartner and @rytaft)


pkg/sql/opt/optbuilder/orderby.go, line 162 at r2 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

nit: explain that "default" means nulls first and ascending or nulls last and descending

Done


pkg/sql/opt/optbuilder/orderby.go, line 197 at r2 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

nit: explain what this argument does in the function comment.

Done


pkg/sql/opt/optbuilder/orderby.go, line 260 at r2 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

nit: you can attach a metadata name to a scope column to give a column a specific name in expression trees rather than columnX with something like scopeColName("").WithMetadataName("null_ordering"). You'll still want to pass an empty string to scopeColName to ensure that no other expressions can reference the column (e.g., imagine that a user had a table with a column with the same name of "null_ordering" - references to "null_ordering" in a query could be ambiguous).

Done


pkg/sql/opt/optbuilder/testdata/orderby, line 1065 at r2 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

how about another test where you use the non-default nulls ordering for both columns?

Done


pkg/sql/parser/parse_test.go, line 514 at r2 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

is this actually supported now? I think these cases should probably still error

You're right, I removed the wrong lines. Thanks.


pkg/sql/parser/parse_test.go, line 536 at r2 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

looks like you need to remove these

Done

Copy link
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 5 of 6 files at r3, 1 of 1 files at r4, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @nehageorge)


pkg/sql/logictest/testdata/logic_test/order_by, line 451 at r4 (raw file):


statement ok
INSERT INTO xy VALUES (2, NULL), (NULL, 6), (2, 5), (4, 8)

would be good to also include (NULL, NULL)


pkg/sql/opt/exec/execbuilder/testdata/orderby, line 1329 at r4 (raw file):

# index on column b. We should eliminate this sort in a future PR.
query T
EXPLAIN (VERBOSE) SELECT a, b FROM t ORDER BY b NULLS LAST

nit: add a matching test case in logic_test/order_by


pkg/sql/opt/optbuilder/orderby.go, line 263 at r4 (raw file):

		if !nullsDefaultOrder {
			extraColsScope.addColumn(
				scopeColName("").WithMetadataName("nulls_ordering"),

nit: to distinguish when there is more than one column, you could append the column name, e.g., nulls_ordering_b or nulls_ordering_c

Copy link
Collaborator

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 1 of 4 files at r1, 5 of 6 files at r3, 1 of 1 files at r4, all commit messages.
Reviewable status: :shipit: complete! 2 of 0 LGTMs obtained (waiting on @nehageorge and @rytaft)


-- commits, line 12 at r4:
nit: You can move "Partially fixes ..." above the release note and typically "Informs #XXX" is used to link to an issue without closing it.


pkg/sql/opt/exec/execbuilder/testdata/orderby, line 1327 at r4 (raw file):


# TODO(nehageorge): Right now the optimizer does a sort even though there is an
# index on column b. We should eliminate this sort in a future PR.

Our indexes order NULLS always first so it can't simply be eliminated.

If we had a UNION-like expression that preserved ordering and always ordered the left before the right inputs, we could perform two scans, one for non-null and one for NULL columns, and union the results.

If we implement NULLS FIRST/LAST syntax for indexes, then a scan on INDEX (b ASC NULLS LAST) could be used in a case like this.


pkg/sql/opt/optbuilder/orderby.go, line 263 at r4 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

nit: to distinguish when there is more than one column, you could append the column name, e.g., nulls_ordering_b or nulls_ordering_c

+1

Copy link
Author

@nehageorge nehageorge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 2 of 0 LGTMs obtained (waiting on @nehageorge and @rytaft)


pkg/sql/opt/exec/execbuilder/testdata/orderby, line 1327 at r4 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

Our indexes order NULLS always first so it can't simply be eliminated.

If we had a UNION-like expression that preserved ordering and always ordered the left before the right inputs, we could perform two scans, one for non-null and one for NULL columns, and union the results.

If we implement NULLS FIRST/LAST syntax for indexes, then a scan on INDEX (b ASC NULLS LAST) could be used in a case like this.

I was talking to @rytaft about potentially implementing a transformation rule that might improve performance by using splitScanIntoUnionScans in pkg/sql/opt/xform/general_funcs.go and then treating the null and non-null values separately. This would allow the index to be used if it is present for the non-null values. I don't believe a sort would be required then? Let me know if I'm missing something. I can also make the comment more comprehensive.

Copy link
Author

@nehageorge nehageorge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 2 of 0 LGTMs obtained (waiting on @nehageorge and @rytaft)


pkg/sql/logictest/testdata/logic_test/order_by, line 451 at r4 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

would be good to also include (NULL, NULL)

Hm... I'm thinking this would lead to non-deterministic test case results. For example, in the first case, (NULL, NULL) could be either the first row or the second in a valid ordering. Is there a way to accept two valid result sets?

Copy link
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 2 of 0 LGTMs obtained (waiting on @mgartner, @nehageorge, and @rytaft)


pkg/sql/logictest/testdata/logic_test/order_by, line 451 at r4 (raw file):

Previously, nehageorge wrote…

Hm... I'm thinking this would lead to non-deterministic test case results. For example, in the first case, (NULL, NULL) could be either the first row or the second in a valid ordering. Is there a way to accept two valid result sets?

you could insert it before the cases below where you order by both.

I think there may also be a rowsort option that only sorts some of the columns, but I don't remember what it is

@mgartner
Copy link
Collaborator

mgartner commented Oct 6, 2021


pkg/sql/opt/exec/execbuilder/testdata/orderby, line 1327 at r4 (raw file):

Previously, nehageorge wrote…

I was talking to @rytaft about potentially implementing a transformation rule that might improve performance by using splitScanIntoUnionScans in pkg/sql/opt/xform/general_funcs.go and then treating the null and non-null values separately. This would allow the index to be used if it is present for the non-null values. I don't believe a sort would be required then? Let me know if I'm missing something. I can also make the comment more comprehensive.

AFAIK unions aren't guaranteed to preserve the ordering of their inputs, and they don't order everything from the left input before (or after) everything from the right.

Copy link
Author

@nehageorge nehageorge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 2 of 0 LGTMs obtained (waiting on @mgartner, @nehageorge, and @rytaft)


pkg/sql/logictest/testdata/logic_test/order_by, line 451 at r4 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

you could insert it before the cases below where you order by both.

I think there may also be a rowsort option that only sorts some of the columns, but I don't remember what it is

Helpful, thanks!


pkg/sql/opt/exec/execbuilder/testdata/orderby, line 1327 at r4 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

AFAIK unions aren't guaranteed to preserve the ordering of their inputs, and they don't order everything from the left input before (or after) everything from the right.

Hm if that is the case then this wouldn't work. I think what you said about NULLS FIRST/LAST syntax for indexes would be an appropriate solution to the problem if the index was created with nulls in the right place. Then I believe we wouldn't need the sort. I could leave this in the to do with the issue #6224.

@mgartner
Copy link
Collaborator

mgartner commented Oct 6, 2021


pkg/sql/opt/exec/execbuilder/testdata/orderby, line 1327 at r4 (raw file):

Previously, nehageorge wrote…

Hm if that is the case then this wouldn't work. I think what you said about NULLS FIRST/LAST syntax for indexes would be an appropriate solution to the problem if the index was created with nulls in the right place. Then I believe we wouldn't need the sort. I could leave this in the to do with the issue #6224.

SGTM.

Previously, we only supported the default Postgres ordering for
ORDER BY. That is, NULLS come first for ascending order and NULLS
come last for descending order. Postgres also supports the keywords
"NULLS FIRST" and "NULLS LAST", which explicitly places all the NULLS
first or last, respectively. This PR adds support for this non-default
ordering.

Informs cockroachdb#6224.

Release note (sql change): NULLS FIRST and NULLS LAST specifiers now
supported for ORDER BY.
Copy link
Author

@nehageorge nehageorge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @mgartner and @rytaft)


-- commits, line 12 at r4:

Previously, mgartner (Marcus Gartner) wrote…

nit: You can move "Partially fixes ..." above the release note and typically "Informs #XXX" is used to link to an issue without closing it.

Done.


pkg/sql/opt/exec/execbuilder/testdata/orderby, line 1329 at r4 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

nit: add a matching test case in logic_test/order_by

Done


pkg/sql/opt/optbuilder/orderby.go, line 263 at r4 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

+1

Done

@rytaft
Copy link
Collaborator

rytaft commented Oct 6, 2021


pkg/sql/opt/exec/execbuilder/testdata/orderby, line 1327 at r4 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

SGTM.

I think it would work if you created a constant projection column in each child of the UNION ALL. So in one child the constant column would be false, and in the other child the constant column would be true. Then you'd need the UNION ALL operator to maintain the ordering on this first column, followed by the other columns.

@rytaft
Copy link
Collaborator

rytaft commented Oct 6, 2021


pkg/sql/opt/exec/execbuilder/testdata/orderby, line 1327 at r4 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

I think it would work if you created a constant projection column in each child of the UNION ALL. So in one child the constant column would be false, and in the other child the constant column would be true. Then you'd need the UNION ALL operator to maintain the ordering on this first column, followed by the other columns.

This is similar to how SplitGroupByScanIntoUnionScans works.

Copy link
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 4 of 4 files at r5, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @mgartner)

@nehageorge
Copy link
Author

bors r+

@craig craig bot merged commit 5b1bd56 into cockroachdb:master Oct 7, 2021
@craig
Copy link
Contributor

craig bot commented Oct 7, 2021

Build succeeded:

@otan
Copy link
Contributor

otan commented Oct 10, 2021

as of this PR, are we still NULLS FIRST by default? can we have a cluster setting that makes this NULLS LAST by default?

@RaduBerinde
Copy link
Member

RaduBerinde commented Oct 10, 2021

Yes. Given that our indexes are always "nulls first", setting NULLS LAST by default has the potential to regress a lot of queries. I don't see a problem with a session setting though.

@otan
Copy link
Contributor

otan commented Oct 10, 2021

filed #71381

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

sql: support NULLS FIRST/LAST modifiers for indexes
6 participants