-
Notifications
You must be signed in to change notification settings - Fork 590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bug: MemTable error: Inconsistent operation with empty Delete Key #12140
Comments
I'm thinking that we should enable another layer of checks, state_table.get before state_table.insert, in debug mode. |
Otherwise we only find the inconsistent op on flush, further more we don't have the actual key and value in original encoding form. |
Or alternatively, AFTER each executor, we could have a post check executor, where we check for inconsistent op. This post check has some stream buffer, so if op within the buffer are inconsistent, it will know. Otherwise currently it's a little hard to find the source of it. |
We have met this kind of error a lot in different executors. |
That may not necessarily find the issue. For instance suppose some upstream executor emitted 2 duplicate rows or some inconsistent chunk downstream. Only when downstream processes it, then it will error. But the error source is actually in the upstream executor. Edit: i guess your suggestion can still help handle the case where inconsistency is caused by the executor itself, so its complementary |
Oh I misunderstand this issue... In fact we have three place to check the inconsistent input
|
Add somemore findings, error is emitted here: risingwave/src/storage/src/hummock/utils.rs Lines 417 to 422 in 8bd524b
The delete key is missing from local state store. Maybe it got cleaned somehow? Or the delete key is wrong? |
Would you please explain the streaming sql and find the upstream's plan? The issue must comes from the upstream. |
|
This comment was marked as outdated.
This comment was marked as outdated.
Shrunk it further again: CREATE MATERIALIZED VIEW m8 AS
SELECT
first_value(DISTINCT t_0.c9 ORDER BY t_0.c9 ASC NULLS LAST)
FROM
alltypes1 AS t_0
FULL JOIN hop (alltypes1, alltypes1.c11, INTERVAL '93', INTERVAL '279') AS hop_1 ON t_0.c4 = hop_1.c4
WHERE
hop_1.c1; Plan:
After I removed |
#8084 Maybe related |
After investigation, the root cause is: When there's a distinct agg, e.g., Here's a minimal reproducible example:
In this example, The solution is just remove the |
Describe the bug
Mem Table inconsistent error. Sometimes this will cause the cluster to hang, and it will still undergo recovery. Other times the cluster will crash. Likely caused by executor itself OR some upstream executor.
Workload is only INSERTs (you can see the sql further down).
Error message/log
To Reproduce
SQL (already provided in the branch above):
You should see the error either in the CN logs or the meta node logs.
Expected behavior
No response
How did you deploy RisingWave?
No response
The version of RisingWave
No response
Additional context
Found in ci https://buildkite.com/risingwavelabs/sqlsmith-tests/builds/1935#018a52c7-7fc9-41c7-b14a-320987144a67.
Although the error given was "timeout" the actual failure is in CN logs https://buildkite.com/organizations/risingwavelabs/pipelines/sqlsmith-tests/builds/1935/jobs/018a52c7-7fc9-41c7-b14a-320987144a67/artifacts/018a52cf-2eac-40fd-820e-6ae08b2cbb3f.
The text was updated successfully, but these errors were encountered: