-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: incorrect results due to sort in-between paired joins #89603
Comments
Incorrect results test case:
|
Session flag Test case is using internal column |
@mgartner I further reduced this, and bisected it to:
I can't say that I understand why this change should cause the problem, but there is a plan change.
It returns 37 rows at this commit, but should return 36 rows.
New EXPLAIN, (sort is no longer the last step):
There are two problems.
|
I think the plan change is probably due to the change made here, which shouldn't cause any correctness issues; rather, it will just prevent us from redundantly optimizing the expression with no required ordering. The plan change is likely just because the cost perturbation is different after the extra sort gets considered. So, (2) seems like the thing to worry about here. |
The incorrect results seem to be because the 2nd join in a paired join is expected to be in the same order as the first join, for proper interpretation of the continuation column:
|
Fixes cockroachdb#89603 This fixes an issue where an illegal paired join plan is created by the optimizer where the first join in the pair is sorted. This breaks the assumption required in paired joiner logic that the first join in the pair is never sorted or distributed for proper interpretation of the continuation column: ``` // ContinuationCol is the column ID of the continuation column when // IsFirstJoinInPairedJoiner is true. The continuation column is a // boolean column that indicates whether an output row is a // continuation of a group corresponding to a single left input row. ContinuationCol opt.ColumnID ``` Function `lookupJoinCanProvideOrdering` allows the required ordering to be passed to the child of the first join in a paired join if only the input columns are projected (no lookup columns). This is not sufficient because only when the first join can further pass the required ordering on to its input can a sort be avoided. The solution is to modify `lookupJoinCanProvideOrdering` to indicate the second join in a paired join can provide an ordering if both the first join and the first join's child can provide the ordering. Release note (bug fix): This patch fixes possible, but rare, incorrect results from paired lookup joins where the first join in the pair is sorted.
This commit adds an assertion that a sort on the first join of a paired join is never enforced. Fixes cockroachdb#89603 Release note: None
Prior to my change we were optimizing with an empty ordering enforcer, but now we're not because we return And it sounds like #86443 didn't introduce a bug, but it change the query plan in a way that reveals an existing bug? Is that correct? |
I think your change was correct, it just removed duplicate work.
Yeah, I think all it did was slightly change the ordering/number of expressions in the memo, which would significantly change costing because of the perturbation. |
Had a chance to take another look at this with @rytaft - PR coming soon. |
92632: opt: fix rare incorrect results due to sort between paired joins r=DrewKimball a=DrewKimball Previously, it was possible for paired joins to produce incorrect results in the case when an ordering was required of their output, and a sort was added between the paired joins to enforce the ordering. This patch prevents a sort from being added to the output of the first join in a set of paired joins. This is necessary because the continuation column that is used to indicate false positives matched by the first join relies on the ordering being maintained between the joins. Fixes #89603 Release note: None 92669: roachtest/cdc: export stats for initial scan test to roachperf r=jayshrivastava a=jayshrivastava This change updates the cdc/initial_scan_only test to produce a `stats.json` artifact to be consumed by roachprod. This file contains stats for p99 foreground latency, changefeed throughput, and CPU usage. Release note: None Epic: None <img width="940" alt="image" src="https://user-images.githubusercontent.com/18633281/204564990-740e86e2-5c43-4d45-a715-4932428a5851.png"> 92693: dev: add rewritable paths for ccl execbuilder tests r=rharding6373 a=rharding6373 There are some ccl tests that use test files in `/pkg/sql/opt/exec/execbuilder`. This commit adds this as a rewritable path so that we can use the `--rewrite` flag with `dev`. Release note: None Epic: None 92695: sqlstats: record idle latency for transactions r=matthewtodd a=matthewtodd Part of #86667 Follows #91098 Release note (sql change): A new NumericStat, idleLat, was introduced to the statistics column of crdb_internal.transaction_statistics, reporting the time spent waiting for the client to send statements while holding a transaction open. 92760: streamclient: replace usage of deprecated ioutil.ReadFile function r=stevendanna a=andyyang890 This patch fixes a lint error resulting from a usage of the deprecated ioutil.ReadFile function. Fixes #92761 Release note: None 92763: jobsprotectedtsccl: unskip TestJobsProtectedTimestamp r=ajwerner a=ajwerner It was fixed by #92692. Fixes #91865. Release note: None Co-authored-by: Drew Kimball <drewk@cockroachlabs.com> Co-authored-by: Jayant Shrivastava <jayants@cockroachlabs.com> Co-authored-by: rharding6373 <rharding6373@users.noreply.github.com> Co-authored-by: Matthew Todd <todd@cockroachlabs.com> Co-authored-by: Andy Yang <yang@cockroachlabs.com> Co-authored-by: Andrew Werner <awerner32@gmail.com>
Previously, it was possible for paired joins to produce incorrect results in the case when an ordering was required of their output, and a sort was added between the paired joins to enforce the ordering. This patch prevents a sort from being added to the output of the first join in a set of paired joins. This is necessary because the continuation column that is used to indicate false positives matched by the first join relies on the ordering being maintained between the joins. Fixes cockroachdb#89603 Release note: None
Previously, it was possible for paired joins to produce incorrect results in the case when an ordering was required of their output, and a sort was added between the paired joins to enforce the ordering. This patch prevents a sort from being added to the output of the first join in a set of paired joins. This is necessary because the continuation column that is used to indicate false positives matched by the first join relies on the ordering being maintained between the joins. Fixes cockroachdb#89603 Release note: None
roachtest.costfuzz failed with artifacts on release-22.2 @ 64049e4b9210de3af4a1d814a9af0123b59a055f:
Parameters:
ROACHTEST_cloud=gce
,ROACHTEST_cpu=4
,ROACHTEST_encrypted=false
,ROACHTEST_ssd=0
Help
See: roachtest README
See: How To Investigate (internal)
Same failure on other branches
This test on roachdash | Improve this report!
Jira issue: CRDB-20329
The text was updated successfully, but these errors were encountered: