-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: concurrent recheckTx #163
feat: concurrent recheckTx #163
Conversation
First of all, I'd like to mention we're dealing with In "Concurrency in Go: Tools and Techniques for Developers, Katherine Cox-Buday, O'Reilly Media," the author is saying as follows:
I hope it could be answered for these that we could treat |
I had thought goroutine to be similar to (OS native) threads, but now I understood that it more correctly resembles coroutine (and the name seems to be the same). I think this is the same as a parallel processing model which uses As computer resources are finite, however, I think "free resource" is a little ideal thing. Assuming a parallel processing model of such a jobqueue-workers type, if more than The term "thread pool" in the previous comment is a Java expression, but in this case, |
I think the same way for thread pool & resource management in terms of dealing parallel execution but it doesn't affect this PR at all.
I think
To my understanding, All function is running on (Another thing, I'd like to say, is that
|
@@ -266,5 +273,6 @@ func newLocalReqRes(req *types.Request, res *types.Response) *ReqRes { | |||
reqRes := NewReqRes(req) | |||
reqRes.Response = res | |||
reqRes.SetDone() | |||
reqRes.Done() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm just asking out of curiosity, did you put this code in because there a problem with the absence of this code before? Or is it just a formality?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was no bug behavior before but, as you can see as, it's not good in terms of consistency.
Tx: memTx.tx, | ||
Type: abci.CheckTxType_Recheck, | ||
}) | ||
reqRes.SetCallback(func(res *abci.Response) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where will this reqRes.cb
be called at?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to refactor the code because it's not intuitive and has many assumptions. If I refactor it w/in this PR, I think it makes this PR more complicated so I didn't do it. I'd like to refactor it If I have a chance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I've missed to call callback
when abci
is async
and non-blocking
when I revert refactored code. I'll revise it, thanks,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, fixed at 38df3b3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As you can catch it w/ client.go
, it has a kind of bug to call cb
many times if call SetCallback()
after ReqRes
is done
. I'd still like to refactor it if I have a chance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fully agree with the improvement direction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍
reactor_test of
|
@@ -174,7 +174,7 @@ func (memR *Reactor) Receive(chID byte, src p2p.Peer, msgBytes []byte) { | |||
if src != nil { | |||
txInfo.SenderP2PID = src.ID() | |||
} | |||
err := memR.mempool.CheckTx(msg.Tx, nil, txInfo) | |||
err := memR.mempool.CheckTxAsync(msg.Tx, txInfo, nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the reactor_test
failed because of this code. If the reactor does CheckTxAsync
, the order of txs entering into mempool
may change. If this is intended, there should be some kind of defense in the test case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm already investigating on it and I think we need CheckTxAsync
at here. Thanks,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After investigating, I think it's a legacy issue that could occurs probably. Because removed abci.localClient.mtx
from abci.localClient.CheckTxXXX()
, the occur possibility just was increased.
I'll deal it with #167 because it's not related directly with this PR and also it could be little bit complicated.
* feat: more prometheus metrics for monitoring performance (#146) (#175) * chore: config timeout and connection pool (#150) (#171) * fix: use line/tm-db instead of tendermint/tm-db * bump up tm-db, iavl; re-apply #201 * chore: use default db backend among the available ones (#212) * chore: use default db backend among the available ones * chore: bump up iavl, tm-db * feat: concurrent checkTx #213; fix tm-db call * fix: rename TM to OC * fix: modify key name; tendermint -> ostracon * chore: rename tendermint to ostracon * chore: remove mempool.postCheck (#158) (#217) * fix: error handling after check tx * fix: typo * chore: (mempool) remove postCheck and impl reserve * chore: fix tests * chore: revise log (remove checkTx.Code) * chore: add `CONTRACT` for `mem.proxyAppConn.CheckTxAsync()` * chore: revise numTxs, txsBytes for `ErrMempoolIsFull` in reserve() * chore: revise to remove redundant `isFull()` * fix: remove tx from cache when `app errors` or `failed to reserve` * Revert "chore: revise to remove redundant `isFull()`" This reverts commit 55990ec. * fix: revise to call Begin/EndRecheck even though mem.Size() is 0 (#219) * fix: revise to call Begin/EndRecheck even though `mem.Size()` is 0 * chore: revise local_client.go * fix: lint error * chore: recheckTxs() just return if mem.Size() == 0 * feat: concurrent recheckTx (#163) (#221) * chore: increase the value of maxPerPage (#223) * chore: fix the type of consensus_block_interval_seconds from histogram to gauge (#224) * feat: impl checkTxAsyncReactor() (#168) (#225) * feat: impl checkTxAsyncReactor() (#168) * fix: tests * fix: lint errors * chore: revise abci.Client, Async() interfaces (#169) (#226) * chore: revise abci.Client, Async() interfaces * chore: regen mock w/ mockery 2.7.4 * fix: lint error * fix: test_race * mempool.Flush() flushes all txs from mempool so it should get `Lock()` instead of `RLock()` * chore: remove iavl dependency (#228) * chore: remove iavl dependency * chore: fix lint error * fix: add more fixing for abci.Client, Async() * feat: revise metric for measuring performance * build: remove needless build tag `!libsecp256k1` (#246) The build tag makes disable go implementation of secp256k1. Cause there is no C implementation, a build error will occur when using tag `libsecp256k1`. * feat: add duration metrics of gauge type (#256) * perf: optimize checking the txs size (#264) * perf: optimize checking the txs size * ci: add GOPRIVATE to workflows * test: add a unit test * fix: fix lint errors * perf: do not flush wal when receive consensus msgs (#273) * perf: do not flush wal when receive consensus msgs * fix: ci failure * fix: lint failure * fix: ci-e2e build failure * fix: bump up tm-db * fix: missing abci api * fix: bump up tm-db; use memdb * test: add test case to raise test coverage * fix: race error * fix: race error * fix: race error * fix: increase e2e test timeout * fix: add test case for coverage * fix: e2e docker file * fix: apply comments * fix: a Ostracon to an Ostracon
Related with: https://github.com/line/link/issues/1152
Description
To optimize performance, we need to increase concurrency. After implementing
concurrent checkTx
(#160), I'd like to implementconcurrent recheckTx
.The key change is decomposing
application.CheckTx()
intoCheckTxSync()
andCheckTxAsycn()
.abci.CheckTxSync()
andabci.CheckTxASync()
actually use the same application interface,application.CheckTx()
. So all ofabci.CheckTxSync()
andabci.CheckTxAsync()
areblocking
function. W/application.CheckTxSync()
andapplication.CheckAsync()
, I intend to implement these assync and blocking
andasync and non-blocking
incosmos-sdk
.Please note the reason why
application.CheckTxSync()
is still needed. It's needed becauserpc.broadcaseTxSync()
(andCommit()
) has it's owngoroutine
fromhttp server
. In this case, it's better in terms of not creating unnecessarygoroutine
more.For contributor use:
docs/
) and code commentsFiles changed
in the Github PR explorer