-
Notifications
You must be signed in to change notification settings - Fork 286
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mysql (ticdc): Improve the performance of the mysql sink by refining the transaction event batching logic #10466
Conversation
if txn.txnEvent != nil { | ||
needFlush = w.onEvent(txn) | ||
if !needFlush { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that the core idea here is to ensure that the flush interval is greater than 10ms. Maybe we could record lastFlushTime at the end of each flush and check it in each ticker? such as:
case <-ticker.C:
if time.Since(lastFlushTime) >= w.flushInterval {
needFlush = true
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe both of these two methods are feasible, and I'm wondering if the current code might be a bit more straightforward and easier to understand?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, the nested logic is more complex.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's ok here orz
Codecov Report
Additional details and impacted files
Flags with carried forward coverage won't be shown. Click here to find out more. @@ Coverage Diff @@
## master #10466 +/- ##
================================================
- Coverage 63.4069% 57.4151% -5.9918%
================================================
Files 392 849 +457
Lines 51067 125864 +74797
================================================
+ Hits 32380 72265 +39885
- Misses 16385 48192 +31807
- Partials 2302 5407 +3105 |
/test verify |
/retest |
…the transaction event batching logic (pingcap#10466) ref pingcap#10457
…the transaction event batching logic (pingcap#10466) ref pingcap#10457
/cherry-pick release-6.5 |
@hongyunyan: new pull request created to branch In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository. |
Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
/cherry-pick 7.1 |
/cherry-pick release-7.1 |
@hongyunyan: cannot checkout In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository. |
@hongyunyan: new pull request created to branch In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository. |
Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
In response to a cherrypick label: new pull request could not be created: failed to create pull request against pingcap/tiflow#release-7.1 from head ti-chi-bot:cherry-pick-10466-to-release-7.1: status code 422 not one of [201], body: {"message":"Validation Failed","errors":[{"resource":"PullRequest","code":"custom","message":"A pull request already exists for ti-chi-bot:cherry-pick-10466-to-release-7.1."}],"documentation_url":"https://docs.github.com/rest/pulls/pulls#create-a-pull-request"} |
Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
In response to a cherrypick label: new pull request created to branch |
commit c092599 Author: Ti Chi Robot <ti-community-prow-bot@tidb.io> Date: Wed Jun 12 00:26:59 2024 +0800 pkg/config, sink(ticdc): support output raw change event for mq and cloud storage sink (pingcap#11226) (pingcap#11290) close pingcap#11211 commit 3426e46 Author: Ti Chi Robot <ti-community-prow-bot@tidb.io> Date: Tue Jun 11 19:40:29 2024 +0800 puller(ticdc): fix wrong update splitting behavior after table scheduling (pingcap#11269) (pingcap#11282) close pingcap#11219 commit 2a28078 Author: Ti Chi Robot <ti-community-prow-bot@tidb.io> Date: Tue Jun 11 16:40:37 2024 +0800 mysql(ticdc): remove error filter when check isTiDB in backend init (pingcap#11214) (pingcap#11261) close pingcap#11213 commit 2425d54 Author: Ti Chi Robot <ti-community-prow-bot@tidb.io> Date: Tue Jun 11 16:40:30 2024 +0800 log(ticdc): Add more error query information to the returned error to facilitate users to know the cause of the failure (pingcap#10945) (pingcap#11257) close pingcap#11254 commit 053cdaf Author: Ti Chi Robot <ti-community-prow-bot@tidb.io> Date: Tue Jun 11 15:34:30 2024 +0800 cdc: log slow conflict detect every 60s (pingcap#11251) (pingcap#11287) close pingcap#11271 commit 327ba7b Author: Ti Chi Robot <ti-community-prow-bot@tidb.io> Date: Tue Jun 11 11:42:00 2024 +0800 redo(ticdc): return internal error in redo writer (pingcap#11011) (pingcap#11091) close pingcap#10124 commit d82ae89 Author: Ti Chi Robot <ti-community-prow-bot@tidb.io> Date: Mon Jun 10 22:28:29 2024 +0800 ddl_puller (ticdc): handle dorp pk/uk ddl correctly (pingcap#10965) (pingcap#10981) close pingcap#10890 commit f15bec9 Author: Ti Chi Robot <ti-community-prow-bot@tidb.io> Date: Fri Jun 7 16:16:28 2024 +0800 redo(ticdc): enable pprof and set memory limit for redo applier (pingcap#10904) (pingcap#10996) close pingcap#10900 commit ba50a0e Author: Ti Chi Robot <ti-community-prow-bot@tidb.io> Date: Wed Jun 5 19:58:26 2024 +0800 test(ticdc): enable sequence test (pingcap#11023) (pingcap#11037) close pingcap#11015 commit 94b9897 Author: Ti Chi Robot <ti-community-prow-bot@tidb.io> Date: Wed Jun 5 17:08:56 2024 +0800 mounter(ticdc): timezone fill default value should also consider tz. (pingcap#10932) (pingcap#10946) close pingcap#10931 commit a912d33 Author: Ti Chi Robot <ti-community-prow-bot@tidb.io> Date: Wed Jun 5 10:49:25 2024 +0800 mysql (ticdc): Improve the performance of the mysql sink by refining the transaction event batching logic (pingcap#10466) (pingcap#11242) close pingcap#11241 commit 6277d9a Author: dongmen <20351731+asddongmen@users.noreply.github.com> Date: Wed May 29 20:13:22 2024 +0800 kvClient (ticdc): revert e5999e3 to remove useless metrics (pingcap#11184) close pingcap#11073 commit 54e93ed Author: dongmen <20351731+asddongmen@users.noreply.github.com> Date: Wed May 29 17:43:22 2024 +0800 syncpoint (ticdc): make syncpoint support base64 encoded password (pingcap#11162) close pingcap#10516 commit 0ba9329 Author: Ti Chi Robot <ti-community-prow-bot@tidb.io> Date: Wed May 29 09:07:21 2024 +0800 (redo)ticdc: fix the event orderliness in redo log (pingcap#11117) (pingcap#11180) close pingcap#11096 Signed-off-by: qupeng <qupeng@pingcap.com>
What problem does this PR solve?
Issue Number: close #11241
What is changed and how it works?
The original calculation of worker-busy-ratio may have certain deviation when the flush time is long (such as larger than hundreds of milliseconds). And the longer the flush time, the larger the deviation. The reason for the deviation is that when the flsuh time is long, the interval between each time we increase the worker busy ratio value clearly exceeds 1s, resulting in the growth rate of the worker-busy-ratio per second being less than actual rate.
Performance Test Result
We take a large workload on upstream to keep worker-busy-ratio as 100%, and compare the improvement of sink performance before and after optimization.
After Optimization -- 33339 rows/s (+14%). Before Optimization -- 29075 rows/s
After Optimization -- 20528 rows/s(+13%). Before Optimization -- 18129 rows/s
After Optimization -- 7273 rows/s(+20%). Before Optimization -- 6048 rows/s
New panel of worker busy ratio
Check List
Tests
Questions
Will it cause performance regression or break compatibility?
Do you need to update user documentation, design documentation or monitoring documentation?
Release note