Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ctrl-c can't kill tidb-server #7545

Closed
crazycs520 opened this issue Aug 29, 2018 · 17 comments
Closed

ctrl-c can't kill tidb-server #7545

crazycs520 opened this issue Aug 29, 2018 · 17 comments
Labels
component/server severity/moderate sig/sql-infra SIG: SQL Infra type/bug The issue is confirmed as a bug.

Comments

@crazycs520
Copy link
Contributor

Please answer these questions before submitting your issue. Thanks!

  1. What did you do?
    If possible, provide a recipe for reproducing the error.
2018/08/29 23:07:43.646 update.go:658: [info] [stats] auto analyze table `test`.`t2` now

^C2018/08/29 23:07:44.396 main.go:445: [info] Got signal [interrupt] to exit.
2018/08/29 23:07:44.396 delete_range.go:110: [info] [ddl] closing delRange session pool
2018/08/29 23:07:44.396 ddl_worker.go:108: [info] [ddl-worker 1, tp general] close DDL worker
2018/08/29 23:07:44.396 delete_range.go:110: [info] [ddl] closing delRange session pool
2018/08/29 23:07:44.396 http_status.go:96: [info] http: Server closed
2018/08/29 23:07:44.396 ddl_worker.go:108: [info] [ddl-worker 2, tp add index] close DDL worker
2018/08/29 23:07:44.396 ddl.go:398: [info] [ddl] closing DDL:d3738981-354e-4b95-86f0-881ded2767c8 takes time 109.158µs
2018/08/29 23:07:44.396 ddl.go:351: [info] [ddl] stop DDL:d3738981-354e-4b95-86f0-881ded2767c8
2018/08/29 23:07:49.777 coprocessor.go:674: [info] [TIME_COP_PROCESS] resp_time:6.130569525s txn_start_ts:18446744073709551615 region_id:3 store_addr:store1
  1. What did you expect to see?
    tidb-server exit.

  2. What did you see instead?
    tidb-server do not exit.

  3. What version of TiDB are you using (tidb-server -V or run select tidb_version(); on TiDB)?
    Release Version: v2.1.0-rc.1-6-g1d0d0a666
    Git Commit Hash: 1d0d0a6
    Git Branch: master
    UTC Build Time: 2018-08-29 03:06:25
    GoVersion: go version go1.10.3 darwin/amd64
    Race Enabled: false
    TiKV Min Version: 2.1.0-alpha.1-ff3dd160846b7d1aed9079c389fc188f7f5ea13e
    Check Table Before Drop: false

@coocood
Copy link
Member

coocood commented Aug 30, 2018

@crazycs520
Is it reproducible?

@tiancaiamao
Copy link
Contributor

I also meet this error sometimes.

@shenli
Copy link
Member

shenli commented Aug 30, 2018

I guess it is caused by the running auto-analyze task.

@crazycs520
Copy link
Contributor Author

@coocood yes, It is reproducible.

@birdstorm
Copy link
Contributor

I think this issue duplicates with #7221

@tiancaiamao
Copy link
Contributor

@birdstorm I close that issue.
Now @crazycs520 and @coocood is working on this one.

@crazycs520
Copy link
Contributor Author

@shenli @tiancaiamao
background: There is a big table that contains 50000 rows data and do not have any index.
The reason is already find by @coocood .

  • Reason1: auto analyze trigger condition, Fix by @lamxTyler . In. stats: fix auto analyze trigger condition (#7550) #7556
  • Reason2 and the main reason, in domain.Close(), do.sysSessionPool.Close() is blocked cause by sysSessionPool recycling resources, in vendor/github.com/ngaut/pools/resource_pool.go.SetCapacity(). and the session resource is using to auto analyze in reason 1.

@zhexuany zhexuany added the type/bug The issue is confirmed as a bug. label Sep 5, 2018
@zhexuany
Copy link
Contributor

zhexuany commented Sep 5, 2018

@crazycs520 If you already fixed this issue, please close this one ;)

@tiancaiamao
Copy link
Contributor

I think we should re-design the resource pool, when resource is take out from the pool, its ownership should be moved from the pool.

@shenli
Copy link
Member

shenli commented Sep 5, 2018

@tiancaiamao Agree. Please provide a proposal doc.

@tiancaiamao
Copy link
Contributor

tiancaiamao commented Nov 27, 2018

Session pool refactor here #8456

TiDB version 487bb50

Now I find another problem:

  1. Add index on a large table
  2. Kill TiKV
  3. Kill TiDB ... TiDB hang ...

TiDB can't exit because ddl worker is blocked waiting for add index worker:

goroutine 1 [semacquire, 37 minutes]:
sync.runtime_Semacquire(0xc000395790)
	/media/genius/OS/project/go/src/runtime/sema.go:56 +0x39
sync.(*WaitGroup).Wait(0xc000395788)
	/media/genius/OS/project/go/src/sync/waitgroup.go:130 +0x64
github.com/pingcap/tidb/ddl.(*worker).close(0xc000395770)
	/media/genius/OS/project/src/github.com/pingcap/tidb/ddl/ddl_worker.go:110 +0x6e
github.com/pingcap/tidb/ddl.(*ddl).close(0xc0066be400)
	/media/genius/OS/project/src/github.com/pingcap/tidb/ddl/ddl.go:408 +0x181
github.com/pingcap/tidb/ddl.(*ddl).Stop(0xc0066be400, 0x0, 0x0)
	/media/genius/OS/project/src/github.com/pingcap/tidb/ddl/ddl.go:354 +0x71
github.com/pingcap/tidb/domain.(*Domain).Close(0xc006682240)
	/media/genius/OS/project/src/github.com/pingcap/tidb/domain/domain.go:459 +0x146
main.closeDomainAndStorage()
	/media/genius/OS/project/src/github.com/pingcap/tidb/tidb-server/main.go:508 +0x2d
main.cleanup()
	/media/genius/OS/project/src/github.com/pingcap/tidb/tidb-server/main.go:517 +0x2e
main.main()
	/media/genius/OS/project/src/github.com/pingcap/tidb/tidb-server/main.go:147 +0x8d

and index worker blocked here, waiting for <-taskCh:

goroutine 459 [chan receive, 44 minutes]:
github.com/pingcap/tidb/ddl.(*addIndexWorker).run(0xc0074b1320, 0xc0003765a0)
	/media/genius/OS/project/src/github.com/pingcap/tidb/ddl/index.go:853 +0x1a6
created by github.com/pingcap/tidb/ddl.(*worker).addPhysicalTableIndex
	/media/genius/OS/project/src/github.com/pingcap/tidb/ddl/index.go:1113 +0x20e

The taskCh should have been writen by function addPhysicalTableIndex in index.go, but addPhysicalTableIndex blocked in this function buildIndexForReorgInfo

The stack looks like this:

goroutine 429 [select]:
github.com/pingcap/tidb/store/tikv.NewBackoffFn.func1(0x163ec80, 0xc0084fcd50, 0x0)
	/media/genius/OS/project/src/github.com/pingcap/tidb/store/tikv/backoff.go:68 +0x1c9
github.com/pingcap/tidb/store/tikv.(*Backoffer).Backoff(0xc0142ac060, 0x0, 0x1629080, 0xc007ce02a0, 0x2, 0x1629080)
	/media/genius/OS/project/src/github.com/pingcap/tidb/store/tikv/backoff.go:231 +0x21b
github.com/pingcap/tidb/store/tikv.(*RegionRequestSender).onSendFail(0xc00c365458, 0xc0142ac060, 0xc0084d4a80, 0x16290c0, 0xc007ce0200, 0xc0572c8240, 0x4a817c800)
	/media/genius/OS/project/src/github.com/pingcap/tidb/store/tikv/region_request.go:163 +0x18f
github.com/pingcap/tidb/store/tikv.(*RegionRequestSender).sendReqToRegion(0xc00c365458, 0xc0142ac060, 0xc0084d4a80, 0xc0572c8240, 0x4a817c800, 0xc0084d4a80, 0x0, 0x0, 0x1234fe0)
	/media/genius/OS/project/src/github.com/pingcap/tidb/store/tikv/region_request.go:132 +0x10d
github.com/pingcap/tidb/store/tikv.(*RegionRequestSender).SendReq(0xc00c365458, 0xc0142ac060, 0xc0572c8240, 0x4, 0x1, 0x2, 0x4a817c800, 0x0, 0x0, 0x0)
	/media/genius/OS/project/src/github.com/pingcap/tidb/store/tikv/region_request.go:100 +0xdd
github.com/pingcap/tidb/store/tikv.(*tikvSnapshot).get(0xc006fe0700, 0xc0142ac060, 0xc0067b3d00, 0x2d, 0x32, 0xc013dbaae0, 0x163ec80, 0xc0084fcd50, 0xc00c365510, 0xb171a1)
	/media/genius/OS/project/src/github.com/pingcap/tidb/store/tikv/snapshot.go:242 +0x1e4
github.com/pingcap/tidb/store/tikv.(*tikvSnapshot).Get(0xc006fe0700, 0xc0067b3d00, 0x2d, 0x32, 0x0, 0x0, 0x0, 0x16290c0, 0xc006fe0780)
	/media/genius/OS/project/src/github.com/pingcap/tidb/store/tikv/snapshot.go:213 +0x142
github.com/pingcap/tidb/kv.(*unionStore).Get(0xc0084fccf0, 0xc0067b3d00, 0x2d, 0x32, 0x241d800, 0xc000306540, 0x8, 0xc00c365640, 0x442c07)
	/media/genius/OS/project/src/github.com/pingcap/tidb/kv/union_store.go:183 +0x251
github.com/pingcap/tidb/store/tikv.(*tikvTxn).Get(0xc009feb050, 0xc0067b3d00, 0x2d, 0x32, 0x0, 0x0, 0x0, 0x0, 0x0)
	/media/genius/OS/project/src/github.com/pingcap/tidb/store/tikv/txn.go:94 +0x11b
github.com/pingcap/tidb/structure.(*TxStructure).loadHashValue(0xc007fea800, 0xc0067b3d00, 0x2d, 0x32, 0xc013dbaac8, 0x8, 0x8, 0xc0067b3d00, 0x2d)
	/media/genius/OS/project/src/github.com/pingcap/tidb/structure/hash.go:293 +0x61
github.com/pingcap/tidb/structure.(*TxStructure).updateHash(0xc007fea800, 0x2144a00, 0xb, 0xb, 0xc013dbaac8, 0x8, 0x8, 0xc00c3657e8, 0xc013dbaad0, 0x7)
	/media/genius/OS/project/src/github.com/pingcap/tidb/structure/hash.go:110 +0xd5
github.com/pingcap/tidb/structure.(*TxStructure).HSet(0xc007fea800, 0x2144a00, 0xb, 0xb, 0xc013dbaac8, 0x8, 0x8, 0xc013dbaad8, 0x7, 0x8, ...)
	/media/genius/OS/project/src/github.com/pingcap/tidb/structure/hash.go:50 +0xd5
github.com/pingcap/tidb/meta.(*Meta).UpdateDDLReorgHandle(0xc0084fcd20, 0xc0002fcea0, 0x5cbf1f, 0x989680, 0x5bd, 0xc0084fcd20, 0x0)
	/media/genius/OS/project/src/github.com/pingcap/tidb/meta/meta.go:719 +0x112
github.com/pingcap/tidb/ddl.(*reorgInfo).UpdateReorgMeta(0xc007279e30, 0x1664ae0, 0xc009feb050, 0x5cbf1f, 0x989680, 0x5bd, 0x0, 0xc009feb050)
	/media/genius/OS/project/src/github.com/pingcap/tidb/ddl/reorg.go:391 +0x83
github.com/pingcap/tidb/ddl.(*worker).handleReorgTasks.func1(0x1664ae0, 0xc009feb050, 0xc009feb050, 0x0)
	/media/genius/OS/project/src/github.com/pingcap/tidb/ddl/index.go:994 +0x5d
github.com/pingcap/tidb/kv.RunInNewTxn(0x164bac0, 0xc0002d4d20, 0x241d801, 0xc00c365b18, 0xb, 0xc0064d3d98)
	/media/genius/OS/project/src/github.com/pingcap/tidb/kv/txn.go:49 +0xdb
github.com/pingcap/tidb/ddl.(*worker).handleReorgTasks(0xc000395770, 0xc007279e30, 0xc00c365d98, 0xc006f30f00, 0x10, 0x10, 0xc006d45700, 0xb, 0x10, 0x0, ...)
	/media/genius/OS/project/src/github.com/pingcap/tidb/ddl/index.go:993 +0x2a6
github.com/pingcap/tidb/ddl.(*worker).sendRangeTaskToWorkers(0xc000395770, 0x1664ba0, 0xc0002ef860, 0xc006f30f00, 0x10, 0x10, 0xc007279e30, 0xc0064d3d98, 0xc0064b9b00, 0xb, ...)
	/media/genius/OS/project/src/github.com/pingcap/tidb/ddl/index.go:1040 +0x2fb
github.com/pingcap/tidb/ddl.(*worker).buildIndexForReorgInfo(0xc000395770, 0x7f00fbcaf5d0, 0xc0002ef860, 0xc006f30f00, 0x10, 0x10, 0xc0002fcea0, 0xc007279e30, 0xc0074b17a0, 0x243d6f8)
	/media/genius/OS/project/src/github.com/pingcap/tidb/ddl/index.go:1067 +0x2a1
github.com/pingcap/tidb/ddl.(*worker).addPhysicalTableIndex(0xc000395770, 0x7f00fbcaf5d0, 0xc0002ef860, 0xc000475980, 0xc007279e30, 0x0, 0x0)
	/media/genius/OS/project/src/github.com/pingcap/tidb/ddl/index.go:1116 +0x396
github.com/pingcap/tidb/ddl.(*worker).addTableIndex(0xc000395770, 0x1664ba0, 0xc0002ef860, 0xc000475980, 0xc007279e30, 0xc007057860, 0xc0065707a0)
	/media/genius/OS/project/src/github.com/pingcap/tidb/ddl/index.go:1140 +0x313
github.com/pingcap/tidb/ddl.(*worker).onCreateIndex.func1(0xc000000008, 0x150af90)
	/media/genius/OS/project/src/github.com/pingcap/tidb/ddl/index.go:327 +0x4e
github.com/pingcap/tidb/ddl.(*worker).runReorgJob.func1(0xc000395770, 0xc00748e8d0)
	/media/genius/OS/project/src/github.com/pingcap/tidb/ddl/reorg.go:118 +0x57
created by github.com/pingcap/tidb/ddl.(*worker).runReorgJob
	/media/genius/OS/project/src/github.com/pingcap/tidb/ddl/reorg.go:116 +0x5c9

It turns out that when TiKV is killed, TiDB read snapshot backoff for a very long time, repeating this action:

[tikvRPC,regionMiss,tikvRPC,regionMiss,tikvRPC,regionMiss,tikvRPC,regionMiss,tikvRPC,regionMiss, ...]

@winkyao

@SunRunAway
Copy link
Contributor

mysql> select bike_number, start_station, end_station,  end_date, from_unixtime(lead(unix_timestamp(start_date), 1) over (partition by bike_number order by start_date)) as next_trip_gap from trips where start_date >= '2016-01-01' and start_date < '2017-01-01' limit 10;
^C^C -- query aborted
^C^C -- query aborted
^C^C -- query aborted
^C^C -- query aborted
^C^C -- query aborted
^C^C -- query aborted
+-----------------------+--------------------------------+--------------------------------+---------------------+----------------------------+
| bike_number           | start_station                  | end_station                    | end_date            | next_trip_gap              |
+-----------------------+--------------------------------+--------------------------------+---------------------+----------------------------+
+-----------------------+--------------------------------+--------------------------------+---------------------+----------------------------+
10 rows in set (16.55 sec)

The problem still exists. :(

@tiancaiamao
Copy link
Contributor

Could you provide the goroutine profile files so we can debug it? @SunRunAway

@tiancaiamao
Copy link
Contributor

B.T.W, I mean ctrl-c in the terminal, not in the MySQL client here.
MySQL sends kill connection sessionID while TiDB ignores that kind of command.
TiDB accepts something like kill tidb ...

@ghost
Copy link

ghost commented Jul 15, 2020

See #10260 for another case

@ghost ghost added the component/server label Jul 29, 2020
@jebter jebter added the sig/transaction SIG:Transaction label Nov 16, 2020
@wjhuang2016 wjhuang2016 added sig/sql-infra SIG: SQL Infra and removed sig/transaction SIG:Transaction labels Dec 10, 2020
@xiongjiwei
Copy link
Contributor

See #10260 for another case

close this one

@ti-srebot
Copy link
Contributor

Please edit this comment or add a new comment to complete the following information

Not a bug

  1. Remove the 'type/bug' label
  2. Add notes to indicate why it is not a bug

Duplicate bug

  1. Add the 'type/duplicate' label
  2. Add the link to the original bug

Bug

Note: Make Sure that 'component', and 'severity' labels are added
Example for how to fill out the template: #20100

1. Root Cause Analysis (RCA) (optional)

2. Symptom (optional)

3. All Trigger Conditions (optional)

4. Workaround (optional)

5. Affected versions

6. Fixed versions

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/server severity/moderate sig/sql-infra SIG: SQL Infra type/bug The issue is confirmed as a bug.
Projects
None yet
Development

No branches or pull requests