Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix flaky tests #1570

Merged
merged 5 commits into from
Jun 27, 2023
Merged

Fix flaky tests #1570

merged 5 commits into from
Jun 27, 2023

Conversation

NingLin-P
Copy link
Member

This PR tries to fix some flaky tests:

#1399
I have dug into prune_known() and seems what it does is indeed what we expected, and I have another guess about the failure:
https://github.com/subspace/subspace/blob/317f83a58670e51298703d9e480a004f6986df5b/domains/client/domain-executor/src/tests.rs#L917-L920

namely, after the block that contains the submit_bundle tx is imported and before the call of prune_tx_from_pool, the background worker txpool-notifications handled the imported block, which will ban all the tx of the block, thus the tx is already banned before we calling prune_tx_from_pool.

the solution is to wait 1ms before calling clear_stale in prune_tx_from_pool, because we have already set ban_time to 0, wait a few more times will ensure the bans will be removed in clear_stale as the ban time must be passed.

#1219
This PR is not a complete fix for this issue as we still not figure out why the last K blocks will be pruned such fast (may be related to configuration, local storage of the node, etc).

But this PR should workaround most of the damage as the pruned block will be skipped as the chain grows, and things will back to normal.

Code contributor checklist:

As we have set ban_time to 0, explicitly wait for 1ms to ensure bans will be removed
in the later call of clear_stale

Signed-off-by: linning <linningde25@gmail.com>
We only keep track of the state of the last K blocks, if there are more than
K blocks left to process we can skip some blocks as their result will simply
be dumpped after processing, this also help to prevent the bundle validator
from stuck on some pruned blocks because these block will be skip as the chain
grows.

Signed-off-by: linning <linningde25@gmail.com>
@nazar-pc
Copy link
Member

This PR is not a complete fix for this issue as we still not figure out why the last K blocks will be pruned such fast (may be related to configuration, local storage of the node, etc).

In CI confirmation depth is 5, but in practice none of the blocks should be pruned because they will not be deep enough after changes #1541, none of the blocks should be finalized and pruned unless you have a few gigabytes worth of blockchain history.

@NingLin-P
Copy link
Member Author

none of the blocks should be finalized and pruned unless you have a few gigabytes worth of blockchain history.

There are some differences between finality and canonicalization:
https://github.com/paritytech/substrate/blob/fe1f8ba1c4f23931ae89c1ada35efb3d908b50f5/client/state-db/src/lib.rs#L32-L42

and the blocks of non-canonical branches may get pruned upon canonicalization:
https://github.com/paritytech/substrate/blob/fe1f8ba1c4f23931ae89c1ada35efb3d908b50f5/client/state-db/src/noncanonical.rs#L384-L387

Copy link
Contributor

@liuchengxu liuchengxu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reasoning of banned tx makes sense to me. I'm not pretty sure about the second commit, but the code change is fine. BTW, I think we can at least improve the log to include the block number so that next time we can have more context to investigate.

https://github.com/subspace/subspace/blob/317f83a58670e51298703d9e480a004f6986df5b/crates/subspace-transaction-pool/src/bundle_validator.rs#L278-L281

test/subspace-test-service/src/lib.rs Outdated Show resolved Hide resolved
Signed-off-by: linning <linningde25@gmail.com>
@NingLin-P NingLin-P enabled auto-merge June 27, 2023 04:53
@NingLin-P NingLin-P merged commit d184d21 into main Jun 27, 2023
@NingLin-P NingLin-P deleted the fix-flaky-tests branch June 27, 2023 07:20
ParthDesai added a commit to autonomys/subspace-pulsar-sdk that referenced this pull request Jun 27, 2023
jfrank-summit pushed a commit to autonomys/subspace-pulsar-sdk that referenced this pull request Jun 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants