Need a better way for changefeed retry on some errors such as `CDC:ErrJSONCodecRowTooLarge` #3329

amyangfei · 2021-11-08T07:38:35Z

Is your feature request related to a problem?

setup a ticdc cluster, create a changefeed with command cdc cli changefeed create -c test-cf --sink-uri="kafka://172.18.0.2:9092/cdc-test?kafka-version=2.7.0&max-batch-size=1&max-message-bytes=5000"
Execute a SQL in upstream TiDB with a wide row change, whose data size is larger than 5000

Then TiCDC will meet the following error

[2021/11/08 15:36:38.998 +08:00] [WARN] [json.go:433] ["Single message too large"] [max-message-size=5000] [length=10457] [table=test.t1]
[2021/11/08 15:36:39.176 +08:00] [ERROR] [processor.go:313] ["error on running processor"] [capture=127.0.0.1:8300] [changefeed=test-cf] [error="[CDC:ErrJSONCodecRowTooLarge]json codec single row too large"] [errorVerbose="[CDC:ErrJSONCodecRowTooLarge]json codec single row too large\ngithub.com/pingcap/errors.AddStack\n\tgithub.com/pingcap/errors@v0.11.5-0.20210513014640-40f9a1999b3b/errors.go:174\ngithub.com/pingcap/errors.(*Error).GenWithStackByArgs\n\tgithub.com/pingcap/errors@v0.11.5-0.20210513014640-40f9a1999b3b/normalize.go:159\ngithub.com/pingcap/ticdc/cdc/sink/codec.(*JSONEventBatchEncoder).AppendRowChangedEvent\n\tgithub.com/pingcap/ticdc/cdc/sink/codec/json.go:435\ngithub.com/pingcap/ticdc/cdc/sink.(*mqSink).runWorker\n\tgithub.com/pingcap/ticdc/cdc/sink/mq.go:345\ngithub.com/pingcap/ticdc/cdc/sink.(*mqSink).run.func1\n\tgithub.com/pingcap/ticdc/cdc/sink/mq.go:275\ngolang.org/x/sync/errgroup.(*Group).Go.func1\n\tgolang.org/x/sync@v0.0.0-20210220032951-036812b2e83c/errgroup/errgroup.go:57\nruntime.goexit\n\truntime/asm_amd64.s:1371"]
[2021/11/08 15:36:39.176 +08:00] [ERROR] [processor.go:150] ["run processor failed"] [changefeed=test-cf] [capture=127.0.0.1:8300] [error="[CDC:ErrJSONCodecRowTooLarge]json codec single row too large"] [errorVerbose="[CDC:ErrJSONCodecRowTooLarge]json codec single row too large\ngithub.com/pingcap/errors.AddStack\n\tgithub.com/pingcap/errors@v0.11.5-0.20210513014640-40f9a1999b3b/errors.go:174\ngithub.com/pingcap/errors.(*Error).GenWithStackByArgs\n\tgithub.com/pingcap/errors@v0.11.5-0.20210513014640-40f9a1999b3b/normalize.go:159\ngithub.com/pingcap/ticdc/cdc/sink/codec.(*JSONEventBatchEncoder).AppendRowChangedEvent\n\tgithub.com/pingcap/ticdc/cdc/sink/codec/json.go:435\ngithub.com/pingcap/ticdc/cdc/sink.(*mqSink).runWorker\n\tgithub.com/pingcap/ticdc/cdc/sink/mq.go:345\ngithub.com/pingcap/ticdc/cdc/sink.(*mqSink).run.func1\n\tgithub.com/pingcap/ticdc/cdc/sink/mq.go:275\ngolang.org/x/sync/errgroup.(*Group).Go.func1\n\tgolang.org/x/sync@v0.0.0-20210220032951-036812b2e83c/errgroup/errgroup.go:57\nruntime.goexit\n\truntime/asm_amd64.s:1371"]

The changefeed will be resumed every several seconds and fail again (there exists a rate limit in cdc owner)

Note in production environment resume a changefeed frequently could add a lot of load to cluster

Describe the feature you'd like

TiCDC would meet error, pause the changefeed, and won't retry to resume the changefeed until the chanegfeed config is updated and changefeed is resumed.

Describe alternatives you've considered

No response

Teachability, Documentation, Adoption, Migration Strategy

No response

The text was updated successfully, but these errors were encountered:

amyangfei · 2021-11-12T10:10:08Z

Some discussion record and solution candidates

Add a more errors to fast failed error in the following list. The drawback of this way is we can't enumerate all fast failed errors.
https://github.com/pingcap/ticdc/blob/083d6b0f88b98df7ab3235f14a06d974afcb96f9/pkg/errors/helper.go#L37-L39
Add a better backoff mechanism when owner tries to restart a errored changefeed, such as retry after 1min, 2min, 4min etc, which will reduce the overhead that is introduced by changefeed initialization and let user be able to find changefeed in error state by changefeed query command.
Add another changefeed state, such as unrecover error state, changefeed in this state will not be restarted by owner, but it will contribute to the service GC safepoint calculation. We don't recommend this solution since adding a new changefeed state introduces more complex.

amyangfei · 2021-11-18T08:27:19Z

Solution-2 is a short term candidate.

) close #3329, close #3987

) (#4340) close #3329, close #3987

…ngcap#4262) close pingcap#3329, close pingcap#3987

) (#4337) close #3329, close #3987

) (#4339) close #3329, close #3987

) (#4338) close #3329, close #3987

amyangfei added the subject/new-feature Denotes an issue or pull request adding a new feature. label Nov 8, 2021

amyangfei mentioned this issue Nov 8, 2021

TiCDC owner gets stuck with unknown status #3331

Closed

amyangfei added type/bug The issue is confirmed as a bug. area/ticdc Issues or PRs related to TiCDC. and removed subject/new-feature Denotes an issue or pull request adding a new feature. labels Nov 18, 2021

overvenus assigned zhaoxinyu Nov 18, 2021

Tammyxia added the severity/moderate label Dec 15, 2021

zhaoxinyu mentioned this issue Jan 10, 2022

owner(ticdc): Add backoff mechanism into changefeed restart logic #4262

Merged

ti-chi-bot closed this as completed in #4262 Jan 14, 2022

ti-chi-bot pushed a commit that referenced this issue Jan 14, 2022

owner(ticdc): Add backoff mechanism into changefeed restart logic (#4262

58c7cc3

) close #3329, close #3987

ti-chi-bot added a commit that referenced this issue Jan 15, 2022

owner(ticdc): Add backoff mechanism into changefeed restart logic (#4262

5a4c9c3

) (#4340) close #3329, close #3987

zhaoxinyu added a commit to zhaoxinyu/ticdc that referenced this issue Jan 20, 2022

owner(ticdc): Add backoff mechanism into changefeed restart logic (pi…

3c646bb

…ngcap#4262) close pingcap#3329, close pingcap#3987

ti-chi-bot added a commit that referenced this issue Jan 20, 2022

owner(ticdc): Add backoff mechanism into changefeed restart logic (#4262

b597887

) (#4337) close #3329, close #3987

ti-chi-bot added a commit that referenced this issue Feb 21, 2022

owner(ticdc): Add backoff mechanism into changefeed restart logic (#4262

9d50aba

) (#4339) close #3329, close #3987

ti-chi-bot added a commit that referenced this issue Apr 15, 2022

owner(ticdc): Add backoff mechanism into changefeed restart logic (#4262

51289da

) (#4338) close #3329, close #3987

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Need a better way for changefeed retry on some errors such as `CDC:ErrJSONCodecRowTooLarge` #3329

Need a better way for changefeed retry on some errors such as `CDC:ErrJSONCodecRowTooLarge` #3329

amyangfei commented Nov 8, 2021 •

edited

Loading

amyangfei commented Nov 12, 2021 •

edited

Loading

amyangfei commented Nov 18, 2021

Need a better way for changefeed retry on some errors such as CDC:ErrJSONCodecRowTooLarge #3329

Need a better way for changefeed retry on some errors such as CDC:ErrJSONCodecRowTooLarge #3329

Comments

amyangfei commented Nov 8, 2021 • edited Loading

Is your feature request related to a problem?

Describe the feature you'd like

Describe alternatives you've considered

Teachability, Documentation, Adoption, Migration Strategy

amyangfei commented Nov 12, 2021 • edited Loading

amyangfei commented Nov 18, 2021

Need a better way for changefeed retry on some errors such as `CDC:ErrJSONCodecRowTooLarge` #3329

Need a better way for changefeed retry on some errors such as `CDC:ErrJSONCodecRowTooLarge` #3329

amyangfei commented Nov 8, 2021 •

edited

Loading

amyangfei commented Nov 12, 2021 •

edited

Loading