-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
roachtest: kv/contention/nodes=4 failed #36089
Comments
|
SHA: https://github.com/cockroachdb/cockroach/commits/a1e6e9decc9dec15a32bbb6d30efc67ca45a532a Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1204585&tab=buildLog
|
Fixes cockroachdb#36089. We've seen that this test can occasionally dip above 5% of time below the minimum QPS. This doesn't seem to be indicative of a full QPS stall and this passing criteria was somewhat arbitrary, so relax it. Release note: None
SHA: https://github.com/cockroachdb/cockroach/commits/83de585d331b05a4aa02a65b353bed6bf829b696 Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1247383&tab=buildLog
|
SHA: https://github.com/cockroachdb/cockroach/commits/ec4728ae986b46d4f57009233b86971198b275ed Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1255121&tab=buildLog
|
SHA: https://github.com/cockroachdb/cockroach/commits/99306ec3e9fcbba01c05431cbf496e8b5b8954b4 Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1260033&tab=buildLog
|
SHA: https://github.com/cockroachdb/cockroach/commits/dff4132a80e62c6c5ad603ff6c608b09419d4e3e Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1264632&tab=buildLog
|
The increased flakiness looks somewhat correlated with #36748. I wonder if there are cases where we were relying on liveness to expire for transactions to be unstuck. I'm going to run this with a huge txn liveness expiration and see if things get even worse. |
That did the trick. I can clearly see transactions waiting on other transactions for 30+ seconds after bumping |
I see the transactions that are failing to roll back their transaction records hitting errors like |
So actually that doesn't say very much. |
On the I'm going to make the following changes:
|
Another option is to have a cluster setting/env var that disables transaction expiration entirely so that a failure to roll a transaction back causes everyone else to get stuck. This benefit of this is that it would be very loud. A compromise would be to have a knob that allows us to set |
Informs cockroachdb#36089. Before this commit, requests could get stuck repeatedly attempting to push a transaction only to repeatedly find that they themselves were already aborted. The error would not propagate up to the transaction coordinator and the request would get stuck. This commit fixes this behavior by correctly propagating errors observed by the contentionQueue. Release note: None
…hold Fixes cockroachdb#36089. This commit bumps the TxnLivenessThreshold for clusters running `kv/contention/nodes=4` to 10 minutes. This is sufficiently large such that if at any point a transaction is abandoned then all other transactions will begin waiting for it and the test will fail to achieve its minimum QPS requirement. Release note: None
SHA: https://github.com/cockroachdb/cockroach/commits/efb45869b242137e5c178b10c646c3ed025fff36 Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1266041&tab=buildLog
|
I like it. |
SHA: https://github.com/cockroachdb/cockroach/commits/24feca7a4106f08c73534e16ebb79d949a479f35 Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1268176&tab=buildLog
|
SHA: https://github.com/cockroachdb/cockroach/commits/ff969dd6cb0e087327f0b210f728f588b2f5aeb0 Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1270738&tab=buildLog
|
SHA: https://github.com/cockroachdb/cockroach/commits/856ba9108f112f85d406bbe88d2208651859336e Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1274175&tab=buildLog
|
Fixes cockroachdb#36089. Informs cockroachdb#37199. This commit addresses the second concern from cockroachdb#37199 (comment) by implementing its suggestion #1. It augments the TxnMeta struct to begin storing the transaction's minimum timestamp. This allows pushers to have perfect accuracy into whether an intent is part of a transaction that can eventually be committed or whether it has been aborted by any other pusher and uncommittable. This allows us to get rid of the false positive cases where a pusher incorrectly detects that a transaction is still active and begins waiting on it. In this worst case, this could lead to transactions waiting for the entire transaction expiration for a contending txn. Release note: None
Fixes cockroachdb#36089. Informs cockroachdb#37199. This commit addresses the second concern from cockroachdb#37199 (comment) by implementing its suggestion #1. It augments the TxnMeta struct to begin storing the transaction's minimum timestamp. This allows pushers to have perfect accuracy into whether an intent is part of a transaction that can eventually be committed or whether it has been aborted by any other pusher and uncommittable. This allows us to get rid of the false positive cases where a pusher incorrectly detects that a transaction is still active and begins waiting on it. In this worst case, this could lead to transactions waiting for the entire transaction expiration for a contending txn. Release note: None
Fixes cockroachdb#36089. Informs cockroachdb#37199. This commit addresses the second concern from cockroachdb#37199 (comment) by implementing its suggestion #1. It augments the TxnMeta struct to begin storing the transaction's minimum timestamp. This allows pushers to have perfect accuracy into whether an intent is part of a transaction that can eventually be committed or whether it has been aborted by any other pusher and uncommittable. This allows us to get rid of the false positive cases where a pusher incorrectly detects that a transaction is still active and begins waiting on it. In this worst case, this could lead to transactions waiting for the entire transaction expiration for a contending txn. Release note: None
Fixes cockroachdb#36089. Informs cockroachdb#37199. This commit addresses the second concern from cockroachdb#37199 (comment) by implementing its suggestion #1. It augments the TxnMeta struct to begin storing the transaction's minimum timestamp. This allows pushers to have perfect accuracy into whether an intent is part of a transaction that can eventually be committed or whether it has been aborted by any other pusher and uncommittable. This allows us to get rid of the false positive cases where a pusher incorrectly detects that a transaction is still active and begins waiting on it. In this worst case, this could lead to transactions waiting for the entire transaction expiration for a contending txn. Release note: None
38782: storage: persist minimum transaction timestamps in intents r=nvanbenschoten a=nvanbenschoten Fixes #36089. Informs #37199. This commit addresses the second concern from #37199 (comment) by implementing its suggestion #1. It augments the TxnMeta struct to begin storing the transaction's minimum timestamp. This allows pushers to have perfect accuracy into whether an intent is part of a transaction that can eventually be committed or whether it has been aborted by any other pusher and uncommittable. This allows us to get rid of the false positive cases where a pusher incorrectly detects that a transaction is still active and begins waiting on it. In this worst case, this could lead to transactions waiting for the entire transaction expiration for a contending txn. @tbg I'm assigning you because you reviewed most of the lazy transaction record stuff (especially #33523), but let me know if you'd like me to find a different reviewer. Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
release-19.1 is susceptible to the issues described in cockroachdb#36089, so it won't reliably pass this test. Release note: None
39149: roachtest: skip kv/contention/nodes=4 for release-19.1 r=tbg a=nvanbenschoten Fixes #39116. release-19.1 is susceptible to the issues described in #36089, so it won't reliably pass this test. 39160: storage: add DisableRaftLogQueue to StoreTestingKnobs r=tbg a=nvanbenschoten Pulled from #38954, which I want to keep focused, especially with the PR's new secondary focus on refactoring entry application to be easier to mock and test. Release note: None 39161: storage: address TODO in TestPushTxnHeartbeatTimeout r=tbg a=nvanbenschoten Pulled from #38954, which I want to keep focused, especially with the PR's new secondary focus on refactoring entry application to be easier to mock and test. Release note: None Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
…stamp Fixes "Bug 1" from cockroachdb#36089 (comment). This commit fixes the bug described in the referenced issue where MVCCScan can read committed MVCC values at timestamps larger than a scan's read timestamp if it finds an intent for the same transaction but from a previous epoch at a timestamp larger than the scan's read timestamp. Fixing this bug resolves the current issue in cockroachdb#36089. Release note (bug fix): Fix bug where MVCC value at future timestamp is returned after a transaction restart.
…stamp Fixes "Bug 1" from cockroachdb#36089 (comment). This commit fixes the bug described in the referenced issue where MVCCScan can read committed MVCC values at timestamps larger than a scan's read timestamp if it finds an intent for the same transaction but from a previous epoch at a timestamp larger than the scan's read timestamp. Fixing this bug resolves the current issue in cockroachdb#36089. Release note (bug fix): Fix bug where MVCC value at future timestamp is returned after a transaction restart.
SHA: https://github.com/cockroachdb/cockroach/commits/c59f5347d5424edb90575fb0fd50bad677953752
Parameters:
To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1195732&tab=buildLog
The text was updated successfully, but these errors were encountered: