-
Notifications
You must be signed in to change notification settings - Fork 870
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Normalize SQL IN(?, ?, ...) statements to "in(?)" to reduce cardinality of db.statement attribute #10564
Normalize SQL IN(?, ?, ...) statements to "in(?)" to reduce cardinality of db.statement attribute #10564
Conversation
bf176d0
to
5723c1a
Compare
…ity of span metrics using db.statement as an attribute
5723c1a
to
c8dbb13
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this be done without using a regular expression? (my concern is performance)
it's precompiled - and it doesn't use pathological patterns |
my bad, this is actually causing a stack overflow for an IN statement with 2000 values ill see if there's a different approach |
I switched to a simpler regular expression that a ReDos checker said is linear complexity. It no longer causes a stack overflow, which I added a test case for Also, I did some profiles to compare performance on an IN statement that has 10k values. Didn't see a noticeable difference for what it's worth |
@@ -52,6 +54,9 @@ WHITESPACE = [ \t\r\n]+ | |||
// max length of the sanitized statement - SQLs longer than this will be trimmed | |||
static final int LIMIT = 32 * 1024; | |||
|
|||
private static final Pattern IN_STATEMENT_PATTERN = Pattern.compile("\\sin\\s*\\(\\s*\\?[\\s?,]*?\\)", Pattern.CASE_INSENSITIVE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this also matches inputs like in (?,,,???)
perhaps using "(\\sin\\s*)\\(\\s*\\?\\s*(,\\s*\\?\\s*)*\\)"
and replacing with "$1(?)"
would be better. These regular expressions are hard to parse, maybe we should try to document them to make them easier to understand?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately the (,\\s*\\?\\s*)*
part of that pattern causes a stack overflow for IN
statements with many values
Let me know if matching invalid syntax like in (?,,,???)
is ok since it'd hide info helpful for debugging bad queries. I'd guess that info's available in most sql library stack traces though
Also, added some documentation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately the (,\s*\?\s*)* part of that pattern causes a stack overflow for IN statements with many values
using a possessive quantifier should fix this, try "(\\sin\\s*)\\(\\s*\\?\\s*(,\\s*\\?\\s*)*+\\)"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, that solves the problem
closing and reopening to trigger checks |
@@ -52,6 +54,10 @@ WHITESPACE = [ \t\r\n]+ | |||
// max length of the sanitized statement - SQLs longer than this will be trimmed | |||
static final int LIMIT = 32 * 1024; | |||
|
|||
// Match on "IN(?, ?, ...)" | |||
private static final Pattern IN_STATEMENT_PATTERN = Pattern.compile("(\\sin\\s*)\\(\\s*\\?\\s*(,\\s*\\?\\s*)*+\\)", Pattern.CASE_INSENSITIVE); | |||
private static final String IN_STATEMENT_NORMALIZED = " in(?)"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
private static final String IN_STATEMENT_NORMALIZED = " in(?)"; | |
private static final String IN_STATEMENT_NORMALIZED = "$1(?)"; |
Sanitizer does not change case or remove whitespace from the original query. Lets keep in
as it was in the original query. longInStatementDoesntCauseStackOverflow
will break after this change as there is a space between in and ( that is currently removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, I updated it to preserve case and whitespace, and updated the test cases to check for that. I didn't add a test case for more than one space between IN
and (
since strings like IN (?)
get sanitized to IN (?)
Also switched to a non-capturing group for matching on the part in-between the brackets as a small optimization
f762e93
to
c52df11
Compare
HI @laurit, let me know if anything else should be done here Would be great to have this in 2.2.0 |
Went for a simple implementation, but let me know if any of the following would be preferred:
Closes #10442