Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql/sem/tree: add FmtForFingerprint functions to collapse long lists #120430

Merged
merged 1 commit into from
Mar 20, 2024

Conversation

xinhaoz
Copy link
Member

@xinhaoz xinhaoz commented Mar 13, 2024

Please note only the latest commit should be reviewed here.


Add functions for the FmtForFingerprint flag to shorten long lists
to a representative two element list where the second element is simply
__more__ to represent the rest of the list.

Specifically, where FmtHideConstants would shorten tuples, arrays and
VALUES clauses to its first two elements if it coontains only literals
and placeholders, FmtCollapseLists shortens these lists to its first
element if all items in the list are any combination of placeholders,
literals, or a simple subexpressions containing only literals and/or
placeholders.

Enabling this behaviour for statement fingerprint generation can be done
via setting the cluster setting sql.stats.statement_fingerprint.format_mask
to include FmtCollapseLists.

FmtHideConstants (previous behaviour):
VALUES ((1, 2), (3, 4)), ((5, 6), (7, 8)), ((9, 10), (11, 12)) ->
VALUES ((_, _), (_, _)), ((_, _), (_, _)), ((_, _), (_, _))

ARRAY[1, 2, 3+4] ->
ARRAY[_, _, _+4]

SELECT * FROM foo WHERE a IN (1, 2, 3, 4, 5) ->
SELECT * FROM foo WHERE a IN (1, 2, __more1_10__)

---

FmtHideConstants | FmtCollapseLists:
VALUES ((1, 2), (3, 4)), ((5, 6), (7, 8)), ((9, 10), (11, 12)) ->
VALUES ((_, __more__), __more__), (__more__)

ARRAY[1, 2, 3+4] ->
ARRAY[_, __more__]

SELECT * FROM foo WHERE a IN (1, 2, 3, 4, 5) ->
SELECT * FROM foo WHERE a IN (1, __more__)

Epic: none
Part of: #120409
Release note: none

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@xinhaoz xinhaoz force-pushed the stmt-fingerprint-shorten-consts branch 3 times, most recently from 54210db to f897a4d Compare March 13, 2024 19:05
@xinhaoz xinhaoz changed the title sql: introduce new fmt flag for stmt fingerprinting sql/sem/tree: add FmtForFingerprint functions to collapse long lists Mar 13, 2024
@xinhaoz xinhaoz force-pushed the stmt-fingerprint-shorten-consts branch 2 times, most recently from f05bfe5 to fbc3ffb Compare March 14, 2024 13:52
@xinhaoz xinhaoz marked this pull request as ready for review March 14, 2024 13:53
@xinhaoz xinhaoz requested review from a team as code owners March 14, 2024 13:53
@xinhaoz xinhaoz requested a review from a team March 14, 2024 13:53
@xinhaoz xinhaoz requested a review from a team as a code owner March 14, 2024 13:53
@xinhaoz xinhaoz requested review from abarganier, mgartner and a team and removed request for a team March 14, 2024 13:53
Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems good to me, I only looked at the 2nd commit here.

Reviewed 4 of 4 files at r2, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @abarganier, @mgartner, and @xinhaoz)


-- commits line 38 at r2:
Hm, shouldn't these become something like VALUES ((_, _), (_, _)), ((_, _), (_, _)), (__more1_10__) and ARRAY[_, _, __more1_10__]?


-- commits line 50 at r2:
Based on the description I expected this to be VALUES ((_, __more__)), (__more__).


pkg/sql/sem/tree/format_fingerprint.go line 16 at r2 (raw file):

// values that would be scrubbed for statement fingerprinting such as literals, placeholders
// or datums. This can be used to determine if an expression can be shortened since it
// doe snot contain any interesting information.

nit: s/doe snot/does not/.


pkg/sql/sem/tree/format_fingerprint.go line 30 at r2 (raw file):

		return false, expr
	case *Tuple, *Array, *CastExpr, *BinaryExpr, *UnaryExpr:
		return true, expr

nit: in order to optimize the walk a bit we could do return !v.containsNonConstExpr, expr (meaning that if we already found a non-const expr, there is no point in continuing the recursion).


pkg/sql/sem/tree/format_fingerprint.go line 63 at r2 (raw file):

func (v *exprContainsOnlyConstantsAndPlaceholders) VisitPost(expr Expr) Expr { return expr }

// formatNodeForFingerprint recurses into a node to format it for a normalized // statement fingerprinting.

nit: this should be two line comment.


pkg/sql/sem/tree/format_fingerprint.go line 91 at r2 (raw file):

func (node *ValuesClause) formatForFingerprint(ctx *FmtCtx) {
	ctx.WriteString("VALUES (")
	ok := canCollapseExprs(node.Rows[0])

Are we guaranteed that len(node.Rows) > 0? In other words, what will happen with VALUES ()?

I'm guessing things should be ok given that we do access node.Rows[0] in ValuesClause.formatHideConstants without the length check.


ditto below for node.Rows[0][0] access - what happens with VALUES (())?


pkg/sql/sem/tree/format_fingerprint.go line 117 at r2 (raw file):

}

// formatForFingerprint formats tuples > 1 element containing only

nit: "tuples > 1 element" seems off - perhaps use words to describe what you mean.


pkg/sql/sem/tree/format_fingerprint.go line 122 at r2 (raw file):

// Examples:
//
//	 (1)               -> (_)

nit: indentation is a bit off.


pkg/sql/sem/tree/format_fingerprint.go line 138 at r2 (raw file):

	v2 := *node
	exprs := Exprs{node.Exprs[0], arityIndicatorNoBounds()}

Similar question - any problems with len(node.Exprs) == 0? We don't have a similar access in formatHideConstants.

ditto for arrays with 0 elements below.


pkg/sql/sem/tree/format_fingerprint.go line 153 at r2 (raw file):

// literals or placeholders that are longer than 1 element as an array
// expression of its first element, scrubbed.
// e.g. array[1]             -> array[_]

nit: identation is a bit off.


pkg/sql/sqlstats/sslocal/sql_stats_test.go line 1138 at r2 (raw file):

}

func TestEnhancedFingerprintCreation(t *testing.T) {

nit: having this end-to-end test is nice, but consider also adding a unit test for FmtForFingerprint in format_test.go. In particular, it'd be good to also test arrays.


pkg/sql/sqlstats/sslocal/sql_stats_test.go line 1166 at r2 (raw file):

				{stmt: "SELECT * FROM t WHERE a IN (1,2,3,4,5,6,7,8,9,10,11,12,13,14)"},
				{stmt: "SELECT * FROM t WHERE a IN (1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23)"},
				{stmt: `SELECT * FROM t WHERE a IN (1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,

nit: consider adding a helper method that would generate the IN clause with the specified number of constants and placeholders.

Copy link
Contributor

@abarganier abarganier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @yuzefovich that it'd be nice to have some unit tests specific to format_fingerprint.go in here. Otherwise, I don't have much to add that Yahor hasn't already addressed (for AST stuff, I think he's a much more capable reviewer than me anyway 😅 )

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @mgartner and @xinhaoz)

@xinhaoz xinhaoz force-pushed the stmt-fingerprint-shorten-consts branch 3 times, most recently from f199b32 to 722e9af Compare March 19, 2024 18:41
Copy link
Member Author

@xinhaoz xinhaoz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some more testing to cover the cases commented on. I adjusted the examples given since scrubbinGGg constants/placeholders as the same char has now been split into its own flag for more granular control. For fingerprint generation the minimum mask will be FmtHideConstants so I'll make sure to document that in the PR that enables these features by default.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @mgartner and @yuzefovich)


-- commits line 38 at r2:

Previously, yuzefovich (Yahor Yuzefovich) wrote…

Hm, shouldn't these become something like VALUES ((_, _), (_, _)), ((_, _), (_, _)), (__more1_10__) and ARRAY[_, _, __more1_10__]?

Fixed values clause. For the array, accidentally left an unscrubbed value in there but otherwise FmtHideConstants doesn't currently collapse the list due to the 3+4 expr.


-- commits line 50 at r2:

Previously, yuzefovich (Yahor Yuzefovich) wrote…

Based on the description I expected this to be VALUES ((_, __more__)), (__more__).

We apply formatting to the first element individually from the rest of the values list and the first element is a tuple of pairs. In retrospect demoing this case isn't super helpful with VALUES clause since it'll rarely appear so I included a separate case with IN to demonstrate.


pkg/sql/sem/tree/format_fingerprint.go line 16 at r2 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: s/doe snot/does not/.

Done.


pkg/sql/sem/tree/format_fingerprint.go line 30 at r2 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: in order to optimize the walk a bit we could do return !v.containsNonConstExpr, expr (meaning that if we already found a non-const expr, there is no point in continuing the recursion).

Done.


pkg/sql/sem/tree/format_fingerprint.go line 91 at r2 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

Are we guaranteed that len(node.Rows) > 0? In other words, what will happen with VALUES ()?

I'm guessing things should be ok given that we do access node.Rows[0] in ValuesClause.formatHideConstants without the length check.


ditto below for node.Rows[0][0] access - what happens with VALUES (())?

Seems like we're guaranteed to have 1 row with 1 item. VALUES (()) is handled accordingly - added test coverage for that.


pkg/sql/sem/tree/format_fingerprint.go line 117 at r2 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: "tuples > 1 element" seems off - perhaps use words to describe what you mean.

Done.


pkg/sql/sem/tree/format_fingerprint.go line 122 at r2 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: indentation is a bit off.

Done.


pkg/sql/sem/tree/format_fingerprint.go line 138 at r2 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

Similar question - any problems with len(node.Exprs) == 0? We don't have a similar access in formatHideConstants.

ditto for arrays with 0 elements below.

The canCollapse check above will return false if the expr is fewer than 2 elements, so we're guaranteed to have at leats 2 expressions here. Added tests cases for coverage.


pkg/sql/sem/tree/format_fingerprint.go line 153 at r2 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: identation is a bit off.

Done.


pkg/sql/sqlstats/sslocal/sql_stats_test.go line 1138 at r2 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: having this end-to-end test is nice, but consider also adding a unit test for FmtForFingerprint in format_test.go. In particular, it'd be good to also test arrays.

Done.


pkg/sql/sqlstats/sslocal/sql_stats_test.go line 1166 at r2 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: consider adding a helper method that would generate the IN clause with the specified number of constants and placeholders.

I'll defer this to a later PR.

Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2nd commit :LGTM:

Reviewed 5 of 9 files at r4, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @mgartner and @xinhaoz)


pkg/sql/sem/tree/format_fingerprint.go line 138 at r2 (raw file):

Previously, xinhaoz (Xin Hao Zhang) wrote…

The canCollapse check above will return false if the expr is fewer than 2 elements, so we're guaranteed to have at leats 2 expressions here. Added tests cases for coverage.

Oh, right, nvm.


pkg/sql/sem/tree/format_fingerprint.go line 105 at r4 (raw file):

// Examples:
//
//	  ()               -> ()

nit: the indentation is still a bit off here and in the comment for arrays below. I recently was struggling with the indentation too in a very similar setting - I think GoLand applies some formatter, so IIRC I ended up using another editor just to fix the comments.

@xinhaoz xinhaoz force-pushed the stmt-fingerprint-shorten-consts branch from 722e9af to 8640a3c Compare March 20, 2024 14:08
Add `FmtCollapseLists` flag to shorten long lists to a representative
two element list where the second element is simply `__more__` to
represent the rest of the list.

Specifically, where `FmtHideConstants` would shorten tuples, arrays and
VALUES clauses to its first two elements if it coontains only literals
and placeholders, `FmtCollapseLists` shortens these lists to its first
element if all items in the list are any combination of placeholders,
literals, or a simple subexpressions containing only literals and/or
placeholders.

Enabling this behaviour for statement fingerprint generation can be done
via setting the cluster setting `sql.stats.statement_fingerprint.format_mask`
to include `FmtCollapseLists`.

```
FmtHideConstants (previous behaviour):
VALUES ((1, 2), (3, 4)), ((5, 6), (7, 8)), ((9, 10), (11, 12)) ->
VALUES ((_, _), (_, _)), (__more1_10__)

SELECT * FROM foo WHERE f IN ((1, 2), (3, 4), (5, 6)) ->
SELECT * FROM foo WHERE f IN ((_, _), (_, _), (_, _))

ARRAY[1, 2, 3+4] ->
ARRAY[_, _, _+_]

SELECT * FROM foo WHERE a IN (1, 2, 3, 4, 5) ->
SELECT * FROM foo WHERE a IN (1, 2, __more1_10__)

---

FmtHideConstants | FmtCollapseLists:
VALUES ((1, 2), (3, 4)), ((5, 6), (7, 8)), ((9, 10), (11, 12)) ->
VALUES ((_, __more__), __more__), (__more__)

SELECT * FROM foo WHERE f IN ((1, 2), (3, 4), (5, 6)) ->
SELECT * FROM foo WHERE f IN ((_, __more__), __more__)

ARRAY[1, 2, 3+4] ->
ARRAY[_, __more__]

SELECT * FROM foo WHERE a IN (1, 2, 3, 4, 5) ->
SELECT * FROM foo WHERE a IN (1, __more__)
```

Epic: none
Part of: cockroachdb#120409

Release note: none
@xinhaoz xinhaoz force-pushed the stmt-fingerprint-shorten-consts branch from 8640a3c to f26b816 Compare March 20, 2024 14:56
@xinhaoz
Copy link
Member Author

xinhaoz commented Mar 20, 2024

tftr!
bors r+

@craig
Copy link
Contributor

craig bot commented Mar 20, 2024

@craig craig bot merged commit aa2e1fb into cockroachdb:master Mar 20, 2024
34 of 36 checks passed
@xinhaoz xinhaoz deleted the stmt-fingerprint-shorten-consts branch April 1, 2024 16:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants