-
Notifications
You must be signed in to change notification settings - Fork 287
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Modify toQueryString to prevent SQLite expression tree from exceeding depth of 1000 #2565
base: master
Are you sure you want to change the base?
Conversation
ae8187f
to
6b3d4c4
Compare
"(${it.condition})" | ||
} else { | ||
it.condition | ||
this.chunked(50) { conditionParams -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MJ1998 do you have thoughts on what the chunk size should be?
* This is to prevent SQLite expression tree exceeding max depth of 1000 See | ||
* https://www.sqlite.org/limits.html for Maximum Depth Of An Expression Tree | ||
*/ | ||
const val CONDITION_PARAMS_CHUNK_SIZE = 50 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
another question on this value, don't we want this to be as high as possible? I'd think fewer chunks are easier to process, and fewer checks definitely means fewer iterations. Why not set this to the max of 1000 as a default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you're right, the chunk size could be higher. Most of the filters for FhirEngine#search are implemented in subqueries, it seems subqueries add to the depth expression tree generated. Will try to figure out a higher number that could be suitable...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be hard to find a number that might be suitable for most applications because of joins and subqueries that depend on the search. I'm thinking of setting the size to a lower number and maybe allow it to be configurable
80c50de
to
0264816
Compare
d9e6d03
to
d764115
Compare
judging purely by the error message, my hypothesis is that the expression tree depth is O(n) for nested OR operators because the expression tree is constructed naively by parsing the OR operators sequentially. For example, for this expression
if the expression tree is constructed naively you'd get:
where each But what you really want is actually this:
where the tree is more "balanced" and this has depth 4. In other words, this is O(log(n)). If my hypothesis of what causes the problem is correct above, instead of trying to break the OR statements into chunks (and having to come up with a value), all you actually have to do is keep the tree balanced by splitting the top level OR statment at the middle of the list of params. Does this make sense? |
Yeah, it makes sense. |
there's no guarantee what i said is true @LZRS - i've done no testing or verification and depending on sqlite's implementation what i said could be complete garbage... so pls test and see if this is true :) (for example you could try to see if you'll hit the depth limit a bit later to bifurcate the tree instead of chunking the parameters) |
Alright, no problem. I'll test it out and get back |
@jingtang10 I tested this out for 1000, 2000 and upto 5000 parameters, and it worked perfectly. I also went ahead a drafted an implementation in the PR |
af4d912
to
821feab
Compare
} | ||
private fun List<ConditionParam<*>>.toQueryString(operation: Operation): String { | ||
if (this.size <= 1) { | ||
return map { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why use map
if the size is <= 1 anyway? i think you can just do the lamba inside the map
function directly since you know there's only one (or zero) item.
val left = this.subList(0, mid).toQueryString(operation) | ||
val right = this.subList(mid, this.size).toQueryString(operation) | ||
|
||
return listOf(left, right) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you just join left and right with the separator without going through all this? in line 89 you have guarantee that the list has 2 or more items. that means neither left nor right can have size 0.
@@ -2752,6 +2754,43 @@ class SearchTest { | |||
.inOrder() | |||
} | |||
|
|||
@Test | |||
fun `search CarePlan filter with large list of patient reference`() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think you do want to also test base cases... for example an empty list, and a list with 1, 2, (maybe add a 3 just to check when the list is not an even number and power of 2), 4 and 8 items.
by that point we can be confident the algorithm works as intended.
we can have a huge one like this - but it's actually not as useful in my view.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The base case for 1 is majorly covered in other test cases in the class, I guess I could add for the other cases
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well - I know that the base case for 1 is covered via other test cases. BUT it's much better to write unit tests for base cases than complex cases.
I would say that tests cases for 0, 1, 2, 3, 4 together is much better than a single test case for 10.
This is because you really want to isolate any cases and make sure tests are easy to debug, easy to fix. If the test case for 10 starts to fail, you'll spend a long time trying to debug. But if the test case fails for either of 1, 2, 3, or 4, you can almost immediately pin down the cause for failure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a DatabaseImplTest with
@@ -2752,6 +2754,43 @@ class SearchTest { | |||
.inOrder() | |||
} | |||
|
|||
@Test | |||
fun `search CarePlan filter with large list of patient reference`() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should add this test in DatabaseImplTest to make sure that it always compiles and runs on the actual database.
@@ -2752,6 +2754,43 @@ class SearchTest { | |||
.inOrder() | |||
} | |||
|
|||
@Test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it be possible to test the depth after the tree balancing? for example, before the change, the depth was 8. but now, after the change, the depth is 4.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really sure if it's easily possible, but checking it out...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not really sure that this is a requirement for our API. This is an implementation detail in the db and our APIs don't really promise to the app how many patients the appp can join... i think maybe we should improve our documentation here but i am ok with not having this as a test. just my opinion.
FORK - With unmerged PR #9 - WUP #13 SDK - WUP google#2178 - WUP google#2650 - WUP google#2663 PERF - WUP google#2669 - WUP google#2565 - WUP google#2561 - WUP google#2535
if (it.params.size > 1) { | ||
"(${it.condition})" | ||
} else { | ||
it.condition | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you make this a function in the class ConditionParam
actually? will look this code look better.
it.condition | ||
} | ||
private fun List<ConditionParam<*>>.toQueryString(operation: Operation): String { | ||
if (this.size <= 1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i wonder if you can make this a bit more explicit - i think we should never have the 0 case? and for the 1 case you can make the code look tidier.
@@ -2752,6 +2754,43 @@ class SearchTest { | |||
.inOrder() | |||
} | |||
|
|||
@Test | |||
fun `search CarePlan filter with large list of patient reference`() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well - I know that the base case for 1 is covered via other test cases. BUT it's much better to write unit tests for base cases than complex cases.
I would say that tests cases for 0, 1, 2, 3, 4 together is much better than a single test case for 10.
This is because you really want to isolate any cases and make sure tests are easy to debug, easy to fix. If the test case for 10 starts to fail, you'll spend a long time trying to debug. But if the test case fails for either of 1, 2, 3, or 4, you can almost immediately pin down the cause for failure.
FORK - With unmerged PR #9 - WUP #13 SDK - WUP google#2178 - WUP google#2650 - WUP google#2663 PERF - WUP google#2669 - WUP google#2565 - WUP google#2561 - WUP google#2535
@@ -5169,6 +5170,29 @@ class DatabaseImplTest { | |||
assertThat(localChangeResourceReferences.size).isEqualTo(locallyCreatedPatients.size) | |||
} | |||
|
|||
@Test | |||
fun searchTasksForManyPatientsReturnCorrectly() = runBlocking { | |||
val patients = (0..5001).map { Patient().apply { id = "task-patient-index-$it" } } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are 5002 patients and 5001 tasks. Is this test case deliberately checking this? I feel like if you really want to test the filter clause, it's better to query tasks linked to patient 0-4999 so at least you're filtering out 1 patient to know that the filter clause works.
} | ||
database.insert(*tasks.toTypedArray()) | ||
|
||
val patientsSearchIdList = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can also limit this list to 2000 or 3000.
val right = this.subList(mid, this.size).toQueryString(operation) | ||
when { | ||
this.isEmpty() -> | ||
return "true" // In case empty, return true to match other conditions in the where clause |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
first of all, we should never get to this point i think.. something is wrong if we are here...
but let's say we are here, i think we should return "true" if the oeprator is and but "false" if the operator is or... for this to be a no-op.
this is because if the operator is or, and you return a true here, then the whole thing will evaluate to true... and that doesn't seem right to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually - can you change the code calling this function on line 61 in this file. so that when the list is empty, you don't call this function.
and then you can safely remove this case.
and then, you need to update the test case in searchtest.kt file on line 2807 to remove the AND true
.
println(getQuery().query) | ||
println(getQuery().args) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove
val result1 = | ||
fhirEngine.search<Patient> { | ||
filter(TokenClientParam("_id"), { value = of(patient3.logicalId) }) | ||
} | ||
assertThat(result1).isNotEmpty() | ||
assertThat(result1.size).isEqualTo(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure this block is useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good test cases in this file that really test what is being done in this pr - thanks lazarus!
IMPORTANT: All PRs must be linked to an issue (except for extremely trivial and straightforward changes).
Fixes #2561
Description
Recursively bifurcates the conditional params expressions to prevent occurences of SQLite expression tree exceeding depth of 1000, as suggested in this comment
Alternative(s) considered
Chunking large expression list to limit 50 within parantheses to avoid crashing with
Expression tree is too large (maximum depth 1000)
, as described hereType
Enhancement Feature
Screenshots (if applicable)
Checklist
./gradlew spotlessApply
and./gradlew spotlessCheck
to check my code follows the style guide of this project../gradlew check
and./gradlew connectedCheck
to test my changes locally.