Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: dataraces #6407

Merged

Conversation

PastaPastaPasta
Copy link
Member

@PastaPastaPasta PastaPastaPasta commented Nov 19, 2024

Issue being fixed or feature implemented

See each commit; fixes two bugs, both discovered while running feature_llmq_chainlocks.py with tsan / debug.

one a datarace in net_processing.cpp and the other in the test I was using to ensure this fix was correct, feature_llmq_chainlocks

What was done?

net_processing.cpp

You can see the datarace here: https://gist.github.com/PastaPastaPasta/c966a9f805758b34524085e3d52ea7f8

We simply guard it with an existing mutex that is always locked in close proximity.

feature_llmq_chainlocks.py

Most of the time, while generating the cycle quorum, there is sufficient time to generate a chainlock; however, this is racey, and I've observed locally where the block gets generated before a chainlock is present and as such test_coinbase_best_cl fails. We should instead wait for the chainlock first, and then mine the block. This was we can ensure the mined block will include that chainlock.

This was observed locally maybe 1/10 times or so

How Has This Been Tested?

ran feature_llmq_chainlocks.py ~40 times locally with tsan / debug

Breaking Changes

None

Checklist:

Go over all the following points, and put an x in all the boxes that apply.

  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have added or updated relevant unit/integration/functional/e2e tests
  • I have made corresponding changes to the documentation
  • I have assigned this pull request to a milestone (for repository code-owners and collaborators only)

…dd additional annotations

Originally introduced in pr 6365; this datarace was discovered using tsan locally, and running feature_llmq_chainlocks 5 times. 1 out of 5 times failed
…ude said chainlock

Most of the time, while generating the cycle quorum, there is sufficient time to generate a chainlock; however, this is racey, and I've observed locally where the block gets generated before a chainlock is present and as such `test_coinbase_best_cl` fails. We should instead wait for the chainlock first, and then mine the block. This was we can ensure the mined block will include that chainlock.

This was observed locally maybe 1/10 times or so
Copy link
Collaborator

@kwvg kwvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK 5078bae

@PastaPastaPasta PastaPastaPasta requested a review from knst November 20, 2024 17:35
Copy link
Collaborator

@knst knst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK 5078bae

Copy link

@UdjinM6 UdjinM6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK 5078bae

@PastaPastaPasta PastaPastaPasta merged commit 8a14482 into dashpay:develop Nov 21, 2024
33 checks passed
knst pushed a commit to knst/dash that referenced this pull request Nov 26, 2024
5078bae fix(test): wait for chainlock before mining a block we expect to include said chainlock (pasta)
f39c1e6 fix: guard m_can_tx_relay behind m_tx_relay_mutex; make it private; add additional annotations (pasta)

Pull request description:

  ## Issue being fixed or feature implemented
  See each commit; fixes two bugs, both discovered while running feature_llmq_chainlocks.py with tsan / debug.

  one a datarace in net_processing.cpp and the other in the test I was using to ensure this fix was correct, feature_llmq_chainlocks

  ## What was done?
  ### net_processing.cpp
  You can see the datarace here: https://gist.github.com/PastaPastaPasta/c966a9f805758b34524085e3d52ea7f8

  We simply guard it with an existing mutex that is always locked in close proximity.

  ### feature_llmq_chainlocks.py
  Most of the time, while generating the cycle quorum, there is sufficient time to generate a chainlock; however, this is racey, and I've observed locally where the block gets generated before a chainlock is present and as such `test_coinbase_best_cl` fails. We should instead wait for the chainlock first, and then mine the block. This was we can ensure the mined block will include that chainlock.

  This was observed locally maybe 1/10 times or so

  ## How Has This Been Tested?
  ran feature_llmq_chainlocks.py ~40 times locally with tsan / debug

  ## Breaking Changes
  None

  ## Checklist:
    _Go over all the following points, and put an `x` in all the boxes that apply._
  - [x] I have performed a self-review of my own code
  - [x] I have commented my code, particularly in hard-to-understand areas
  - [ ] I have added or updated relevant unit/integration/functional/e2e tests
  - [ ] I have made corresponding changes to the documentation
  - [x] I have assigned this pull request to a milestone _(for repository code-owners and collaborators only)_

ACKs for top commit:
  kwvg:
    utACK 5078bae
  knst:
    utACK 5078bae
  UdjinM6:
    utACK 5078bae

Tree-SHA512: b346fc60809df72d0161f625073dce7062bd2641d35e4f80160fac9afeec63707de552e2856940ac2604875908ae3b98a225d352de36bfbfc6ee3fbe1e1538ff
knst pushed a commit to knst/dash that referenced this pull request Nov 26, 2024
5078bae fix(test): wait for chainlock before mining a block we expect to include said chainlock (pasta)
f39c1e6 fix: guard m_can_tx_relay behind m_tx_relay_mutex; make it private; add additional annotations (pasta)

Pull request description:

  ## Issue being fixed or feature implemented
  See each commit; fixes two bugs, both discovered while running feature_llmq_chainlocks.py with tsan / debug.

  one a datarace in net_processing.cpp and the other in the test I was using to ensure this fix was correct, feature_llmq_chainlocks

  ## What was done?
  ### net_processing.cpp
  You can see the datarace here: https://gist.github.com/PastaPastaPasta/c966a9f805758b34524085e3d52ea7f8

  We simply guard it with an existing mutex that is always locked in close proximity.

  ### feature_llmq_chainlocks.py
  Most of the time, while generating the cycle quorum, there is sufficient time to generate a chainlock; however, this is racey, and I've observed locally where the block gets generated before a chainlock is present and as such `test_coinbase_best_cl` fails. We should instead wait for the chainlock first, and then mine the block. This was we can ensure the mined block will include that chainlock.

  This was observed locally maybe 1/10 times or so

  ## How Has This Been Tested?
  ran feature_llmq_chainlocks.py ~40 times locally with tsan / debug

  ## Breaking Changes
  None

  ## Checklist:
    _Go over all the following points, and put an `x` in all the boxes that apply._
  - [x] I have performed a self-review of my own code
  - [x] I have commented my code, particularly in hard-to-understand areas
  - [ ] I have added or updated relevant unit/integration/functional/e2e tests
  - [ ] I have made corresponding changes to the documentation
  - [x] I have assigned this pull request to a milestone _(for repository code-owners and collaborators only)_

ACKs for top commit:
  kwvg:
    utACK 5078bae
  knst:
    utACK 5078bae
  UdjinM6:
    utACK 5078bae

Tree-SHA512: b346fc60809df72d0161f625073dce7062bd2641d35e4f80160fac9afeec63707de552e2856940ac2604875908ae3b98a225d352de36bfbfc6ee3fbe1e1538ff
PastaPastaPasta added a commit that referenced this pull request Nov 26, 2024
8b88ff7 Merge #6414: chore: bump seeds for v22 (pasta)
02ad523 Merge #6411: chore: update nMinimumChainWork, defaultAssumeValid, checkpointData, chainTxData for mainnet and testnet (pasta)
3bbcd3d Merge #6393: docs: mention building for some HOSTs only in `release-process.md` (pasta)
18f636f Merge #6426: fix: respect SENDDSQUEUE message, move DSQ relay into net processing / peerman (pasta)
9fed456 Merge #6407: fix: dataraces (pasta)
86105da Merge #6408: refactor: removed pre-MN_RR logic of validation of CL (pasta)
a1f7e96 Merge #6406: ci: use `actions/cache` to manage depends cache (pasta)
90a3807 Merge #6402: ci: cache built (pasta)
66f6787 Merge #6401: ci: deduplicate depends building (pasta)
7ca5663 Merge #6397: ci: add powerpc64 to GH Guix job matrix (pasta)

Pull request description:

  ## What was done?
  See commits for each particular change

  ## How Has This Been Tested?
  To be deployed on testnet

  ## Breaking Changes
  N/A

  ## Checklist:
  - [ ] I have performed a self-review of my own code
  - [ ] I have commented my code, particularly in hard-to-understand areas
  - [ ] I have added or updated relevant unit/integration/functional/e2e tests
  - [ ] I have made corresponding changes to the documentation
  - [x] I have assigned this pull request to a milestone _(for repository code-owners and collaborators only)_

ACKs for top commit:
  UdjinM6:
    utACK 8b88ff7
  PastaPastaPasta:
    utACK 8b88ff7

Tree-SHA512: f7fac62996873503e7de875cc96d9cdf5675674345f1bb1df4a16bf19bddc17bc395a80cc761363a0121022d42c46fb313b0973b9cc71f568ef55c6b3d9e29d8
PastaPastaPasta added a commit to PastaPastaPasta/dash that referenced this pull request Dec 14, 2024
1c7bfcb chore: set release true (pasta)
c90339e Merge dashpay#6459: docs: add release notes for v22.0.0 (pasta)
a6f1fc5 Merge dashpay#6475: chore: bumped chain assumed sizes based on latest usage (pasta)
d7cd9f1 Merge dashpay#6464: chore: update man pages for v22 (pasta)
212f91c Merge dashpay#6461: docs: update supported versions in SECURITY.md (pasta)
9a8b685 Merge dashpay#6460: chore: Translations 2024-12 (pasta)
2f71f4d Merge dashpay#6458: chore: bump MIN_MASTERNODE_PROTO_VERSION to latest proto (pasta)
fa29ed5 Merge dashpay#6456: fix(qt): allow refreshing wallet data without crashing (pasta)
758cd64 Merge dashpay#6452: fix: store ready queues on the mixing masternode (pasta)
395447b Merge dashpay#6451: depends: update 'src/dashbls' to dashpay/bls-signatures@7e747e8a as 62fa665 (pasta)
c7b0d80 Merge dashpay#6441: fix: hold wallet shared pointer in CJ Manager/Sessions to prevent concurrent unload (pasta)
c074e09 Merge dashpay#6444: fix: add platform transfer to "most common" filter (pasta)
cb04114 Merge dashpay#6442: fix: coin selection with `include_unsafe` option should respect `nCoinType` (pasta)
db5b53a Merge dashpay#6434: fix: early EHF and buried EHF are indistinguish (pasta)
8b88ff7 Merge dashpay#6414: chore: bump seeds for v22 (pasta)
02ad523 Merge dashpay#6411: chore: update nMinimumChainWork, defaultAssumeValid, checkpointData, chainTxData for mainnet and testnet (pasta)
3bbcd3d Merge dashpay#6393: docs: mention building for some HOSTs only in `release-process.md` (pasta)
18f636f Merge dashpay#6426: fix: respect SENDDSQUEUE message, move DSQ relay into net processing / peerman (pasta)
9fed456 Merge dashpay#6407: fix: dataraces (pasta)
86105da Merge dashpay#6408: refactor: removed pre-MN_RR logic of validation of CL (pasta)
a1f7e96 Merge dashpay#6406: ci: use `actions/cache` to manage depends cache (pasta)
90a3807 Merge dashpay#6402: ci: cache built (pasta)
66f6787 Merge dashpay#6401: ci: deduplicate depends building (pasta)
7ca5663 Merge dashpay#6397: ci: add powerpc64 to GH Guix job matrix (pasta)

Pull request description:

  ## Issue being fixed or feature implemented

  ## What was done?
  Suppressed changes from 1c7bfcb and resolved merge conflicts.

  ```
  Auto-merging .github/workflows/build.yml
  Auto-merging configure.ac
  Auto-merging src/chainparams.cpp
  Auto-merging src/coinjoin/client.cpp
  CONFLICT (content): Merge conflict in src/coinjoin/client.cpp
  Auto-merging src/coinjoin/client.h
  CONFLICT (content): Merge conflict in src/coinjoin/client.h
  Auto-merging src/coinjoin/util.cpp
  CONFLICT (content): Merge conflict in src/coinjoin/util.cpp
  Auto-merging src/coinjoin/util.h
  CONFLICT (content): Merge conflict in src/coinjoin/util.h
  Auto-merging src/evo/specialtxman.cpp
  Auto-merging src/init.cpp
  Auto-merging src/net_processing.cpp
  CONFLICT (content): Merge conflict in src/net_processing.cpp
  Auto-merging src/net_processing.h
  Auto-merging src/qt/transactiontablemodel.cpp
  Auto-merging src/wallet/wallet.cpp
  Auto-merging src/wallet/wallet.h
  CONFLICT (content): Merge conflict in src/wallet/wallet.h
  Auto-merging test/functional/feature_llmq_chainlocks.py
  CONFLICT (content): Merge conflict in test/functional/feature_llmq_chainlocks.py
  ```

  ## How Has This Been Tested?

  ## Breaking Changes

  ## Checklist:
  - [ ] I have performed a self-review of my own code
  - [ ] I have commented my code, particularly in hard-to-understand areas
  - [ ] I have added or updated relevant unit/integration/functional/e2e tests
  - [ ] I have made corresponding changes to the documentation
  - [ ] I have assigned this pull request to a milestone _(for repository code-owners and collaborators only)_

ACKs for top commit:
  PastaPastaPasta:
    utACK d108579; no diff to develop

Tree-SHA512: 3f063011224880fee35edb04ce265dff33a52273c3d45ef1dbcebcecb22c25d8ad7c91b83514f36142716a6fbd0ddd3a8a3f2a9b59ce78ce975bbce69a2a13b5
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants