-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql/sem/tree: add FmtForFingerprint functions to collapse long lists #120430
sql/sem/tree: add FmtForFingerprint functions to collapse long lists #120430
Conversation
54210db
to
f897a4d
Compare
f05bfe5
to
fbc3ffb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems good to me, I only looked at the 2nd commit here.
Reviewed 4 of 4 files at r2, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @abarganier, @mgartner, and @xinhaoz)
-- commits
line 38 at r2:
Hm, shouldn't these become something like VALUES ((_, _), (_, _)), ((_, _), (_, _)), (__more1_10__)
and ARRAY[_, _, __more1_10__]
?
-- commits
line 50 at r2:
Based on the description I expected this to be VALUES ((_, __more__)), (__more__)
.
pkg/sql/sem/tree/format_fingerprint.go
line 16 at r2 (raw file):
// values that would be scrubbed for statement fingerprinting such as literals, placeholders // or datums. This can be used to determine if an expression can be shortened since it // doe snot contain any interesting information.
nit: s/doe snot/does not/
.
pkg/sql/sem/tree/format_fingerprint.go
line 30 at r2 (raw file):
return false, expr case *Tuple, *Array, *CastExpr, *BinaryExpr, *UnaryExpr: return true, expr
nit: in order to optimize the walk a bit we could do return !v.containsNonConstExpr, expr
(meaning that if we already found a non-const expr, there is no point in continuing the recursion).
pkg/sql/sem/tree/format_fingerprint.go
line 63 at r2 (raw file):
func (v *exprContainsOnlyConstantsAndPlaceholders) VisitPost(expr Expr) Expr { return expr } // formatNodeForFingerprint recurses into a node to format it for a normalized // statement fingerprinting.
nit: this should be two line comment.
pkg/sql/sem/tree/format_fingerprint.go
line 91 at r2 (raw file):
func (node *ValuesClause) formatForFingerprint(ctx *FmtCtx) { ctx.WriteString("VALUES (") ok := canCollapseExprs(node.Rows[0])
Are we guaranteed that len(node.Rows) > 0
? In other words, what will happen with VALUES ()
?
I'm guessing things should be ok given that we do access node.Rows[0]
in ValuesClause.formatHideConstants
without the length check.
ditto below for node.Rows[0][0]
access - what happens with VALUES (())
?
pkg/sql/sem/tree/format_fingerprint.go
line 117 at r2 (raw file):
} // formatForFingerprint formats tuples > 1 element containing only
nit: "tuples > 1 element" seems off - perhaps use words to describe what you mean.
pkg/sql/sem/tree/format_fingerprint.go
line 122 at r2 (raw file):
// Examples: // // (1) -> (_)
nit: indentation is a bit off.
pkg/sql/sem/tree/format_fingerprint.go
line 138 at r2 (raw file):
v2 := *node exprs := Exprs{node.Exprs[0], arityIndicatorNoBounds()}
Similar question - any problems with len(node.Exprs) == 0
? We don't have a similar access in formatHideConstants
.
ditto for arrays with 0 elements below.
pkg/sql/sem/tree/format_fingerprint.go
line 153 at r2 (raw file):
// literals or placeholders that are longer than 1 element as an array // expression of its first element, scrubbed. // e.g. array[1] -> array[_]
nit: identation is a bit off.
pkg/sql/sqlstats/sslocal/sql_stats_test.go
line 1138 at r2 (raw file):
} func TestEnhancedFingerprintCreation(t *testing.T) {
nit: having this end-to-end test is nice, but consider also adding a unit test for FmtForFingerprint
in format_test.go
. In particular, it'd be good to also test arrays.
pkg/sql/sqlstats/sslocal/sql_stats_test.go
line 1166 at r2 (raw file):
{stmt: "SELECT * FROM t WHERE a IN (1,2,3,4,5,6,7,8,9,10,11,12,13,14)"}, {stmt: "SELECT * FROM t WHERE a IN (1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23)"}, {stmt: `SELECT * FROM t WHERE a IN (1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,
nit: consider adding a helper method that would generate the IN clause with the specified number of constants and placeholders.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @yuzefovich that it'd be nice to have some unit tests specific to format_fingerprint.go
in here. Otherwise, I don't have much to add that Yahor hasn't already addressed (for AST stuff, I think he's a much more capable reviewer than me anyway 😅 )
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @mgartner and @xinhaoz)
f199b32
to
722e9af
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some more testing to cover the cases commented on. I adjusted the examples given since scrubbinGGg constants/placeholders as the same char has now been split into its own flag for more granular control. For fingerprint generation the minimum mask will be FmtHideConstants so I'll make sure to document that in the PR that enables these features by default.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @mgartner and @yuzefovich)
Previously, yuzefovich (Yahor Yuzefovich) wrote…
Hm, shouldn't these become something like
VALUES ((_, _), (_, _)), ((_, _), (_, _)), (__more1_10__)
andARRAY[_, _, __more1_10__]
?
Fixed values clause. For the array, accidentally left an unscrubbed value in there but otherwise FmtHideConstants
doesn't currently collapse the list due to the 3+4
expr.
Previously, yuzefovich (Yahor Yuzefovich) wrote…
Based on the description I expected this to be
VALUES ((_, __more__)), (__more__)
.
We apply formatting to the first element individually from the rest of the values list and the first element is a tuple of pairs. In retrospect demoing this case isn't super helpful with VALUES clause since it'll rarely appear so I included a separate case with IN
to demonstrate.
pkg/sql/sem/tree/format_fingerprint.go
line 16 at r2 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit:
s/doe snot/does not/
.
Done.
pkg/sql/sem/tree/format_fingerprint.go
line 30 at r2 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: in order to optimize the walk a bit we could do
return !v.containsNonConstExpr, expr
(meaning that if we already found a non-const expr, there is no point in continuing the recursion).
Done.
pkg/sql/sem/tree/format_fingerprint.go
line 91 at r2 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
Are we guaranteed that
len(node.Rows) > 0
? In other words, what will happen withVALUES ()
?I'm guessing things should be ok given that we do access
node.Rows[0]
inValuesClause.formatHideConstants
without the length check.
ditto below for
node.Rows[0][0]
access - what happens withVALUES (())
?
Seems like we're guaranteed to have 1 row with 1 item. VALUES (())
is handled accordingly - added test coverage for that.
pkg/sql/sem/tree/format_fingerprint.go
line 117 at r2 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: "tuples > 1 element" seems off - perhaps use words to describe what you mean.
Done.
pkg/sql/sem/tree/format_fingerprint.go
line 122 at r2 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: indentation is a bit off.
Done.
pkg/sql/sem/tree/format_fingerprint.go
line 138 at r2 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
Similar question - any problems with
len(node.Exprs) == 0
? We don't have a similar access informatHideConstants
.ditto for arrays with 0 elements below.
The canCollapse
check above will return false if the expr is fewer than 2 elements, so we're guaranteed to have at leats 2 expressions here. Added tests cases for coverage.
pkg/sql/sem/tree/format_fingerprint.go
line 153 at r2 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: identation is a bit off.
Done.
pkg/sql/sqlstats/sslocal/sql_stats_test.go
line 1138 at r2 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: having this end-to-end test is nice, but consider also adding a unit test for
FmtForFingerprint
informat_test.go
. In particular, it'd be good to also test arrays.
Done.
pkg/sql/sqlstats/sslocal/sql_stats_test.go
line 1166 at r2 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: consider adding a helper method that would generate the IN clause with the specified number of constants and placeholders.
I'll defer this to a later PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 5 of 9 files at r4, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @mgartner and @xinhaoz)
pkg/sql/sem/tree/format_fingerprint.go
line 138 at r2 (raw file):
Previously, xinhaoz (Xin Hao Zhang) wrote…
The
canCollapse
check above will return false if the expr is fewer than 2 elements, so we're guaranteed to have at leats 2 expressions here. Added tests cases for coverage.
Oh, right, nvm.
pkg/sql/sem/tree/format_fingerprint.go
line 105 at r4 (raw file):
// Examples: // // () -> ()
nit: the indentation is still a bit off here and in the comment for arrays below. I recently was struggling with the indentation too in a very similar setting - I think GoLand applies some formatter, so IIRC I ended up using another editor just to fix the comments.
722e9af
to
8640a3c
Compare
Add `FmtCollapseLists` flag to shorten long lists to a representative two element list where the second element is simply `__more__` to represent the rest of the list. Specifically, where `FmtHideConstants` would shorten tuples, arrays and VALUES clauses to its first two elements if it coontains only literals and placeholders, `FmtCollapseLists` shortens these lists to its first element if all items in the list are any combination of placeholders, literals, or a simple subexpressions containing only literals and/or placeholders. Enabling this behaviour for statement fingerprint generation can be done via setting the cluster setting `sql.stats.statement_fingerprint.format_mask` to include `FmtCollapseLists`. ``` FmtHideConstants (previous behaviour): VALUES ((1, 2), (3, 4)), ((5, 6), (7, 8)), ((9, 10), (11, 12)) -> VALUES ((_, _), (_, _)), (__more1_10__) SELECT * FROM foo WHERE f IN ((1, 2), (3, 4), (5, 6)) -> SELECT * FROM foo WHERE f IN ((_, _), (_, _), (_, _)) ARRAY[1, 2, 3+4] -> ARRAY[_, _, _+_] SELECT * FROM foo WHERE a IN (1, 2, 3, 4, 5) -> SELECT * FROM foo WHERE a IN (1, 2, __more1_10__) --- FmtHideConstants | FmtCollapseLists: VALUES ((1, 2), (3, 4)), ((5, 6), (7, 8)), ((9, 10), (11, 12)) -> VALUES ((_, __more__), __more__), (__more__) SELECT * FROM foo WHERE f IN ((1, 2), (3, 4), (5, 6)) -> SELECT * FROM foo WHERE f IN ((_, __more__), __more__) ARRAY[1, 2, 3+4] -> ARRAY[_, __more__] SELECT * FROM foo WHERE a IN (1, 2, 3, 4, 5) -> SELECT * FROM foo WHERE a IN (1, __more__) ``` Epic: none Part of: cockroachdb#120409 Release note: none
8640a3c
to
f26b816
Compare
tftr! |
Please note only the latest commit should be reviewed here.
Add functions for the
FmtForFingerprint
flag to shorten long liststo a representative two element list where the second element is simply
__more__
to represent the rest of the list.Specifically, where
FmtHideConstants
would shorten tuples, arrays andVALUES clauses to its first two elements if it coontains only literals
and placeholders,
FmtCollapseLists
shortens these lists to its firstelement if all items in the list are any combination of placeholders,
literals, or a simple subexpressions containing only literals and/or
placeholders.
Enabling this behaviour for statement fingerprint generation can be done
via setting the cluster setting
sql.stats.statement_fingerprint.format_mask
to include
FmtCollapseLists
.Epic: none
Part of: #120409
Release note: none