Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dumpling failed with Failed to register MPP Task #43426

Closed
lilinghai opened this issue Apr 26, 2023 · 7 comments
Closed

Dumpling failed with Failed to register MPP Task #43426

lilinghai opened this issue Apr 26, 2023 · 7 comments
Labels
affects-5.0 This bug affects 5.0.x versions. affects-5.1 This bug affects 5.1.x versions. affects-5.2 This bug affects 5.2.x versions. affects-5.3 This bug affects 5.3.x versions. affects-5.4 This bug affects 5.4.x versions. affects-6.0 affects-6.1 affects-6.2 affects-6.3 affects-6.4 affects-6.5 component/tiflash report/community The community has encountered this bug. severity/major sig/execution SIG execution type/bug The issue is confirmed as a bug.

Comments

@lilinghai
Copy link
Contributor

Bug Report

Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

prepare tpch 10 data with tiflash replica
dumpling the tpch data and raise the error ,you can set --params "tidb_isolation_read_engines=tiflash,tidb_enforce_mpp=on" to ensure dumpling query run using tiflash

[2023/04/26 05:44:11.845 +00:00] [ERROR] [main.go:77] ["dump failed error stack info"] [error="sql: SELECT * FROM `tpch`.`lineitem` LIMIT 1, args: []: Error 1105: DB::TiFlashException: Failed to register MPP Task MPP<query:441054094890893314,task:6>, reason: query is being aborted, error message = Receive cancel request from TiDB"] [errorVerbose="Error 1105: DB::TiFlashException: Failed to register MPP Task MPP<query:441054094890893314,task:6>, reason: query is being aborted, error message = Receive cancel request from TiDB\nsql: SELECT * FROM `tpch`.`lineitem` LIMIT 1, args: []\[ngithub.com/pingcap/tidb/dumpling/export.simpleQueryWithArgs\n\tgithub.com/pingcap/tidb/dumpling/export/sql.go:1147\ngithub.com/pingcap/tidb/dumpling/export.(*BaseConn).QuerySQL.func1\n\tgithub.com/pingcap/tidb/dumpling/export/conn.go:42\ngithub.com/pingcap/tidb/br/pkg/utils.WithRetry\n\tgithub.com/pingcap/tidb/br/pkg/utils/retry.go:52\ngithub.com/pingcap/tidb/dumpling/export.(*BaseConn).QuerySQL\n\tgithub.com/pingcap/tidb/dumpling/export/conn.go:34\ngithub.com/pingcap/tidb/dumpling/export.GetColumnTypes\n\tgithub.com/pingcap/tidb/dumpling/export/sql.go:492\ngithub.com/pingcap/tidb/dumpling/export.dumpTableMeta\n\tgithub.com/pingcap/tidb/dumpling/export/dump.go:1187\ngithub.com/pingcap/tidb/dumpling/export.(*Dumper).dumpDatabases\n\tgithub.com/pingcap/tidb/dumpling/export/dump.go:423\ngithub.com/pingcap/tidb/dumpling/export.(*Dumper).Dump\n\tgithub.com/pingcap/tidb/dumpling/export/dump.go:295\nmain.main\n\t./main.go:74\nruntime.main\n\truntime/proc.go:250\nruntime.goexit\n\truntime/asm_amd64.s:1594](http://ngithub.com/pingcap/tidb/dumpling/export.simpleQueryWithArgs/n/tgithub.com/pingcap/tidb/dumpling/export/sql.go:1147/ngithub.com/pingcap/tidb/dumpling/export.(*BaseConn).QuerySQL.func1/n/tgithub.com/pingcap/tidb/dumpling/export/conn.go:42/ngithub.com/pingcap/tidb/br/pkg/utils.WithRetry/n/tgithub.com/pingcap/tidb/br/pkg/utils/retry.go:52/ngithub.com/pingcap/tidb/dumpling/export.(*BaseConn).QuerySQL/n/tgithub.com/pingcap/tidb/dumpling/export/conn.go:34/ngithub.com/pingcap/tidb/dumpling/export.GetColumnTypes/n/tgithub.com/pingcap/tidb/dumpling/export/sql.go:492/ngithub.com/pingcap/tidb/dumpling/export.dumpTableMeta/n/tgithub.com/pingcap/tidb/dumpling/export/dump.go:1187/ngithub.com/pingcap/tidb/dumpling/export.(*Dumper).dumpDatabases/n/tgithub.com/pingcap/tidb/dumpling/export/dump.go:423/ngithub.com/pingcap/tidb/dumpling/export.(*Dumper).Dump/n/tgithub.com/pingcap/tidb/dumpling/export/dump.go:295/nmain.main/n/t./main.go:74/nruntime.main/n/truntime/proc.go:250/nruntime.goexit/n/truntime/asm_amd64.s:1594)"]

tidb log

[2023/04/26 06:24:12.535 +00:00] [INFO] [conn.go:1181] ["command dispatched failed"] [conn=2410989306177915227] [connInfo="id:2410989306177915227, addr:10.233.77.100:41080 status:11, collation:utf8mb4_general_ci, user:root"] [command=Query] [status="inTxn:1, autocommit:1"] [sql="SELECT * FROM `tpch`.`lineitem` LIMIT 1"] [txn_mode=PESSIMISTIC] [timestamp=441054724704698379] [err="DB::TiFlashException: Failed to register MPP Task MPP<query:441054724704698379,task:6>, reason: query is being aborted, error message = Receive cancel request from TiDB\ngithub.com/pingcap/tidb/store/copr.(*mppIterator).handleDispatchReq\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/store/copr/mpp.go:314\ngithub.com/pingcap/tidb/store/copr.(*mppIterator).run.func2\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/store/copr/mpp.go:191\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1594"]

But the query SELECT * FROM tpch.lineitem LIMIT 1 executed directly to tidb can success

2. What did you expect to see? (Required)

3. What did you see instead (Required)

4. What is your TiDB version? (Required)

6.5.1

@lilinghai lilinghai added the type/bug The issue is confirmed as a bug. label Apr 26, 2023
@ti-chi-bot ti-chi-bot bot added may-affects-5.1 This bug maybe affects 5.1.x versions. may-affects-5.2 This bug maybe affects 5.2.x versions. may-affects-5.3 This bug maybe affects 5.3.x versions. may-affects-5.4 This bug maybe affects 5.4.x versions. may-affects-6.1 may-affects-6.5 may-affects-7.1 labels Apr 26, 2023
@jebter jebter added affects-6.5 and removed may-affects-5.1 This bug maybe affects 5.1.x versions. may-affects-5.2 This bug maybe affects 5.2.x versions. may-affects-5.3 This bug maybe affects 5.3.x versions. may-affects-5.4 This bug maybe affects 5.4.x versions. may-affects-6.1 may-affects-6.5 may-affects-7.1 labels Apr 26, 2023
@windtalker
Copy link
Contributor

The root cause is in dumpling, it will continuously sending query like select * from xxx limit 1 to TiFlash inside a transaction. Since this sql has limit clause, once TiDB get enough result, it will send a cancel request to cancel current MPP query.
In TiFlash, it use start_ts as an unique key of a MPP query, unfortunately, the start_ts inside a transaction is always the same, so start_ts is actually not a real unique key, the cancel request sent by TiDB will affect all the queries inside a transaction, so there is a possibility that the current running query is affected by the cancel request that is actually aimed at the previous query, so error happens.

This bug is fixed by #40048, that is to say, the bug only exists for TiDB version lower than v6.6.0

@Ivy-YinSu Ivy-YinSu removed the component/dumpling This is related to Dumpling of TiDB. label May 31, 2023
@jebter
Copy link

jebter commented Jun 1, 2023

@ti-chi-bot ti-chi-bot bot added may-affects-5.1 This bug maybe affects 5.1.x versions. may-affects-5.2 This bug maybe affects 5.2.x versions. may-affects-5.3 This bug maybe affects 5.3.x versions. may-affects-5.4 This bug maybe affects 5.4.x versions. may-affects-6.1 may-affects-7.1 labels Jun 1, 2023
@windtalker windtalker added affects-5.0 This bug affects 5.0.x versions. affects-5.1 This bug affects 5.1.x versions. affects-5.2 This bug affects 5.2.x versions. affects-5.3 This bug affects 5.3.x versions. affects-5.4 This bug affects 5.4.x versions. affects-6.0 affects-6.1 affects-6.2 affects-6.3 affects-6.4 and removed may-affects-5.1 This bug maybe affects 5.1.x versions. may-affects-5.2 This bug maybe affects 5.2.x versions. may-affects-5.3 This bug maybe affects 5.3.x versions. may-affects-5.4 This bug maybe affects 5.4.x versions. may-affects-6.1 may-affects-7.1 labels Jun 1, 2023
@windtalker
Copy link
Contributor

Close as won't fix in earlier versions.

@mstao
Copy link

mstao commented Jul 13, 2023

LTS 6.5 版本什么时候修复呢?

@windtalker
Copy link
Contributor

There is no plan to fix it in v6.5.x because the fix is actually a total refactor of MPPQueryID where involves a lot of changes both in TiDB and TiFlash, not easy to back port to eariler versions.

@jebter jebter added the sig/execution SIG execution label Aug 4, 2023
@Qiuchi0918
Copy link

LTS 6.5 版本什么时候修复呢?

As a work around, you can add session engine isolation to dumpling options to prevent it from querying tiflash.

tiup dumpling --params "tidb_isolation_read_engines=tikv,tidb"

@seiya-annie
Copy link

/found community

@ti-chi-bot ti-chi-bot bot added the report/community The community has encountered this bug. label Jun 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects-5.0 This bug affects 5.0.x versions. affects-5.1 This bug affects 5.1.x versions. affects-5.2 This bug affects 5.2.x versions. affects-5.3 This bug affects 5.3.x versions. affects-5.4 This bug affects 5.4.x versions. affects-6.0 affects-6.1 affects-6.2 affects-6.3 affects-6.4 affects-6.5 component/tiflash report/community The community has encountered this bug. severity/major sig/execution SIG execution type/bug The issue is confirmed as a bug.
Projects
None yet
Development

No branches or pull requests

7 participants