Skip to content
This repository has been archived by the owner on Dec 8, 2021. It is now read-only.

lightning stuck when disk is full #463

Open
glorv opened this issue Nov 10, 2020 · 1 comment
Open

lightning stuck when disk is full #463

glorv opened this issue Nov 10, 2020 · 1 comment
Labels
severity/moderate type/bug This issue is a bug report

Comments

@glorv
Copy link
Contributor

glorv commented Nov 10, 2020

Bug Report

Please answer these questions before submitting your issue. Thanks!

  1. What did you do? If possible, provide a recipe for reproducing the error.
    Use lightning local backend to import a lot of data with a not big enough disk.

  2. What did you expect to see?
    Lightning report an error and exit when disk is full

  3. What did you see instead?
    Lightning stuck for more than an hour instead of exit

  4. Versions of the cluster

    • TiDB-Lightning version (run tidb-lightning -V):

      Release Version: v4.0.8-10-g51d0e7b
      Git Commit Hash: 51d0e7bf56f7b5007fbdb310be7b7ef5e68ffcc9
      Git Branch: master
      UTC Build Time: 2020-11-10 04:12:15
      Go Version: go version go1.15.4 linux/amd64
      
  5. Operation logs
    lightning logs:

[2020/11/10 12:37:44.202 +08:00] [INFO] [restore.go:1932] ["restore file start"] [table=`dbgen_test`.`dbgen`] [engineNumber=0] [fileIndex=31] [path=dbgen_test.dbgen.032.sql:0]
[2020/11/10 12:42:13.564 +08:00] [INFO] [restore.go:625] [progress] [files="24/100 (24.0%)"] [tables="0/1 (0.0%)"] [chunks=24/100] [speed(MiB/s)=41.08332682534434] [state=writing] [remaining=47m30s]
[2020/11/10 12:47:13.564 +08:00] [INFO] [restore.go:625] [progress] [files="24/100 (24.0%)"] [tables="0/1 (0.0%)"] [chunks=24/100] [speed(MiB/s)=30.81188652093718] [state=writing] [remaining=1h3m20s]
[2020/11/10 12:52:13.564 +08:00] [INFO] [restore.go:625] [progress] [files="24/100 (24.0%)"] [tables="0/1 (0.0%)"] [chunks=24/100] [speed(MiB/s)=24.649245719237936] [state=writing] [remaining=1h19m10s]
[2020/11/10 12:57:13.564 +08:00] [INFO] [restore.go:625] [progress] [files="24/100 (24.0%)"] [tables="0/1 (0.0%)"] [chunks=24/100] [speed(MiB/s)=20.540876743789468] [state=writing] [remaining=1h35m0s]
[2020/11/10 13:02:13.564 +08:00] [INFO] [restore.go:625] [progress] [files="24/100 (24.0%)"] [tables="0/1 (0.0%)"] [chunks=24/100] [speed(MiB/s)=17.60637397937989] [state=writing] [remaining=1h50m50s]
...

related stack trace:

goroutine 744 [sync.Cond.Wait]:
runtime.goparkunlock(...)
	/usr/local/go/src/runtime/proc.go:312
sync.runtime_notifyListWait(0xc0008b7248, 0x6cd)
	/usr/local/go/src/runtime/sema.go:513 +0xf8
sync.(*Cond).Wait(0xc0008b7238)
	/usr/local/go/src/sync/cond.go:56 +0x9d
github.com/cockroachdb/pebble.(*DB).makeRoomForWrite(0xc0008b7000, 0xc00026c1e0, 0x9a30, 0x10000)
	/root/go/pkg/mod/github.com/cockroachdb/pebble@v0.0.0-20201023120638-f1224da22976/db.go:1329 +0x12a
github.com/cockroachdb/pebble.(*DB).commitWrite(0xc0008b7000, 0xc00026c1e0, 0x0, 0x0, 0xc000426778, 0xc00026c1e0, 0xc000426778)
	/root/go/pkg/mod/github.com/cockroachdb/pebble@v0.0.0-20201023120638-f1224da22976/db.go:628 +0xc5
github.com/cockroachdb/pebble.(*commitPipeline).prepare(0xc000425900, 0xc00026c1e0, 0xc000c2f500, 0xc000e305c8, 0xa8ee70b1f7b7bd01, 0xa80000000040f9f0)
	/root/go/pkg/mod/github.com/cockroachdb/pebble@v0.0.0-20201023120638-f1224da22976/commit.go:377 +0x175
github.com/cockroachdb/pebble.(*commitPipeline).Commit(0xc000425900, 0xc00026c1e0, 0x1d54200, 0x48e205, 0x2)
	/root/go/pkg/mod/github.com/cockroachdb/pebble@v0.0.0-20201023120638-f1224da22976/commit.go:253 +0x73
github.com/cockroachdb/pebble.(*DB).Apply(0xc0008b7000, 0xc00026c1e0, 0xc000c2f64f, 0x13, 0xc02dda1400)
	/root/go/pkg/mod/github.com/cockroachdb/pebble@v0.0.0-20201023120638-f1224da22976/db.go:556 +0x110
github.com/cockroachdb/pebble.(*Batch).Commit(...)
	/root/go/pkg/mod/github.com/cockroachdb/pebble@v0.0.0-20201023120638-f1224da22976/batch.go:727
github.com/pingcap/tidb-lightning/lightning/backend.(*local).WriteRows(0xc000c1a3c0, 0x240a0e0, 0xc000e22c00, 0x355b12470a3ffea0, 0xb3eba3d6e59f5cbc, 0xc000681c20, 0x14, 0x0, 0x0, 0x0, ...)
	/home/centos/gl/tidb-lightning/lightning/backend/local.go:1218 +0x2c8
github.com/pingcap/tidb-lightning/lightning/backend.(*OpenedEngine).WriteRows(0xc0008cd4c0, 0x240a0e0, 0xc000e22c00, 0x0, 0x0, 0x0, 0x23ed0e0, 0xc01677bc80, 0x0, 0x0)
	/home/centos/gl/tidb-lightning/lightning/backend/backend.go:273 +0x173
github.com/pingcap/tidb-lightning/lightning/restore.(*chunkRestore).deliverLoop(0xc024059040, 0x240a0e0, 0xc000e22c00, 0xc0007f1620, 0xc000cf4a20, 0xc000000000, 0xc0008cd4c0, 0xc0008cd040, 0xc00056c100, 0x0, ...)
	/home/centos/gl/tidb-lightning/lightning/restore/restore.go:1721 +0xa53
github.com/pingcap/tidb-lightning/lightning/restore.(*chunkRestore).restore.func2(0xc0087b8000, 0xc024059040, 0x240a0e0, 0xc000e22c00, 0xc0007f1620, 0xc000cf4a20, 0xc000000000, 0xc0008cd4c0, 0xc0008cd040, 0xc00056c100)
	/home/centos/gl/tidb-lightning/lightning/restore/restore.go:1921 +0xdd
created by github.com/pingcap/tidb-lightning/lightning/restore.(*chunkRestore).restore
	/home/centos/gl/tidb-lightning/lightning/restore/restore.go:1919 +0x245

goroutine 728 [semacquire, 58 minutes]:
sync.runtime_SemacquireMutex(0xc000426934, 0x407c00, 0x1)
	/usr/local/go/src/runtime/sema.go:71 +0x47
sync.(*Mutex).lockSlow(0xc000426930)
	/usr/local/go/src/sync/mutex.go:138 +0x105
sync.(*Mutex).Lock(...)
	/usr/local/go/src/sync/mutex.go:81
github.com/cockroachdb/pebble.(*commitPipeline).prepare(0xc000425900, 0xc00026c690, 0xc0007bd500, 0xc000e305c8, 0xa8ee70b1f7b7bd01, 0xa80000000040f9f0)
	/root/go/pkg/mod/github.com/cockroachdb/pebble@v0.0.0-20201023120638-f1224da22976/commit.go:364 +0x205
github.com/cockroachdb/pebble.(*commitPipeline).Commit(0xc000425900, 0xc00026c690, 0x1d54200, 0x48e205, 0x5)
	/root/go/pkg/mod/github.com/cockroachdb/pebble@v0.0.0-20201023120638-f1224da22976/commit.go:253 +0x73
github.com/cockroachdb/pebble.(*DB).Apply(0xc0008b7000, 0xc00026c690, 0xc0007bd64f, 0x13, 0xc02934de00)
	/root/go/pkg/mod/github.com/cockroachdb/pebble@v0.0.0-20201023120638-f1224da22976/db.go:556 +0x110
github.com/cockroachdb/pebble.(*Batch).Commit(...)
	/root/go/pkg/mod/github.com/cockroachdb/pebble@v0.0.0-20201023120638-f1224da22976/batch.go:727
github.com/pingcap/tidb-lightning/lightning/backend.(*local).WriteRows(0xc000c1a3c0, 0x240a0e0, 0xc000e22c00, 0x355b12470a3ffea0, 0xb3eba3d6e59f5cbc, 0xc000681c20, 0x14, 0x0, 0x0, 0x0, ...)
	/home/centos/gl/tidb-lightning/lightning/backend/local.go:1218 +0x2c8
github.com/pingcap/tidb-lightning/lightning/backend.(*OpenedEngine).WriteRows(0xc0008cd4c0, 0x240a0e0, 0xc000e22c00, 0x0, 0x0, 0x0, 0x23ed0e0, 0xc02953fca0, 0x0, 0x0)
	/home/centos/gl/tidb-lightning/lightning/backend/backend.go:273 +0x173
github.com/pingcap/tidb-lightning/lightning/restore.(*chunkRestore).deliverLoop(0xc01256b780, 0x240a0e0, 0xc000e22c00, 0xc0007f1a40, 0xc000cf4a20, 0xc000000000, 0xc0008cd4c0, 0xc0008cd040, 0xc00056c100, 0x0, ...)
	/home/centos/gl/tidb-lightning/lightning/restore/restore.go:1721 +0xa53
github.com/pingcap/tidb-lightning/lightning/restore.(*chunkRestore).restore.func2(0xc020439b00, 0xc01256b780, 0x240a0e0, 0xc000e22c00, 0xc0007f1a40, 0xc000cf4a20, 0xc000000000, 0xc0008cd4c0, 0xc0008cd040, 0xc00056c100)
	/home/centos/gl/tidb-lightning/lightning/restore/restore.go:1921 +0xdd
created by github.com/pingcap/tidb-lightning/lightning/restore.(*chunkRestore).restore
	/home/centos/gl/tidb-lightning/lightning/restore/restore.go:1919 +0x245

goroutine 7987 [runnable]:
syscall.Syscall(0x1, 0x1e, 0xc02b7ec000, 0x1000, 0x1000, 0x1000, 0x0)
	/usr/local/go/src/syscall/asm_linux_amd64.s:18 +0x5
syscall.write(0x1e, 0xc02b7ec000, 0x1000, 0x1000, 0x1000, 0x0, 0x0)
	/usr/local/go/src/syscall/zsyscall_linux_amd64.go:914 +0x5a
syscall.Write(...)
	/usr/local/go/src/syscall/syscall_unix.go:212
internal/poll.(*FD).Write.func1(0x7ffff80000000000, 0x4, 0xc0005e612c)
	/usr/local/go/src/internal/poll/fd_unix.go:267 +0x77
internal/poll.ignoringEINTR(0xc0059025d8, 0x77, 0x1, 0x0)
	/usr/local/go/src/internal/poll/fd_unix.go:567 +0x27
internal/poll.(*FD).Write(0xc0005e6120, 0xc02b7ec000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
	/usr/local/go/src/internal/poll/fd_unix.go:267 +0x19c
os.(*File).write(...)
	/usr/local/go/src/os/file_posix.go:48
os.(*File).Write(0xc01e54c028, 0xc02b7ec000, 0x1000, 0x1000, 0xc0059026f8, 0x4dd706, 0x5faa2778)
	/usr/local/go/src/os/file.go:173 +0x77
github.com/cockroachdb/pebble/vfs.(*diskHealthCheckingFile).Write.func1()
	/root/go/pkg/mod/github.com/cockroachdb/pebble@v0.0.0-20201023120638-f1224da22976/vfs/disk_health.go:92 +0x63
github.com/cockroachdb/pebble/vfs.(*diskHealthCheckingFile).timeDiskOp(0xc0007f2140, 0xc005902758)
	/root/go/pkg/mod/github.com/cockroachdb/pebble@v0.0.0-20201023120638-f1224da22976/vfs/disk_health.go:123 +0xb6
github.com/cockroachdb/pebble/vfs.(*diskHealthCheckingFile).Write(0xc0007f2140, 0xc02b7ec000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
	/root/go/pkg/mod/github.com/cockroachdb/pebble@v0.0.0-20201023120638-f1224da22976/vfs/disk_health.go:91 +0xab
github.com/cockroachdb/pebble/vfs.(*syncingFile).Write(0xc0005e6180, 0xc02b7ec000, 0x1000, 0x1000, 0xc02a044340, 0xc019be0140, 0x13)
	/root/go/pkg/mod/github.com/cockroachdb/pebble@v0.0.0-20201023120638-f1224da22976/vfs/syncing_file.go:81 +0x6c
github.com/cockroachdb/pebble.(*compactionFile).Write(0xc02e7dc100, 0xc02b7ec000, 0x1000, 0x1000, 0xc000e3d708, 0x2, 0x6f6)
	/root/go/pkg/mod/github.com/cockroachdb/pebble@v0.0.0-20201023120638-f1224da22976/compaction.go:406 +0x55
bufio.(*Writer).Flush(0xc0007f2240, 0xc02b7ed000, 0xa1b)
	/usr/local/go/src/bufio/bufio.go:607 +0x7b
bufio.(*Writer).Write(0xc0007f2240, 0xc02b7ed000, 0xfb3, 0x1000, 0x35, 0xeaa2273b, 0xc0053dd300)
	/usr/local/go/src/bufio/bufio.go:643 +0xfc
github.com/cockroachdb/pebble/sstable.(*Writer).writeBlock(0xc02a044000, 0xc02b7ed000, 0xfb3, 0x1000, 0x2, 0x5b, 0x5b, 0xc02a0442a8, 0x1000)
	/root/go/pkg/mod/github.com/cockroachdb/pebble@v0.0.0-20201023120638-f1224da22976/sstable/writer.go:476 +0x144
github.com/cockroachdb/pebble/sstable.(*Writer).maybeFlush(0xc02a044000, 0xc004bcc100, 0x13, 0x20, 0x12672a4801, 0x7fd0065a313f, 0x5b, 0x5b, 0xc005902a48, 0x177b8b9)
	/root/go/pkg/mod/github.com/cockroachdb/pebble@v0.0.0-20201023120638-f1224da22976/sstable/writer.go:345 +0xdf
github.com/cockroachdb/pebble/sstable.(*Writer).addPoint(0xc02a044000, 0xc004bcc100, 0x13, 0x20, 0x12672a4801, 0x7fd0065a313f, 0x5b, 0x5b, 0x12672a47, 0x12672a48)
	/root/go/pkg/mod/github.com/cockroachdb/pebble@v0.0.0-20201023120638-f1224da22976/sstable/writer.go:233 +0xaa
github.com/cockroachdb/pebble/sstable.(*Writer).Add(0xc02a044000, 0xc004bcc100, 0x13, 0x20, 0x12672a4801, 0x7fd0065a313f, 0x5b, 0x5b, 0x0, 0x0)
	/root/go/pkg/mod/github.com/cockroachdb/pebble@v0.0.0-20201023120638-f1224da22976/sstable/writer.go:217 +0x114
github.com/cockroachdb/pebble.(*DB).runCompaction(0xc0008b7000, 0x6f5, 0xc029a48000, 0x23c9ce0, 0x3ad03e0, 0xc000ee6050, 0xc00096e070, 0x1, 0x1, 0x0, ...)
	/root/go/pkg/mod/github.com/cockroachdb/pebble@v0.0.0-20201023120638-f1224da22976/compaction.go:2279 +0x1876
github.com/cockroachdb/pebble.(*DB).flush1(0xc0008b7000, 0xc01398f710, 0x81ca68)
	/root/go/pkg/mod/github.com/cockroachdb/pebble@v0.0.0-20201023120638-f1224da22976/compaction.go:1328 +0x2b9
github.com/cockroachdb/pebble.(*DB).flush.func1(0x240a0e0, 0xc0008a0ae0)
	/root/go/pkg/mod/github.com/cockroachdb/pebble@v0.0.0-20201023120638-f1224da22976/compaction.go:1262 +0x77
runtime/pprof.Do(0x240a0e0, 0xc0008a0ae0, 0xc00011c580, 0x1, 0x1, 0xc01398f7b8)
	/usr/local/go/src/runtime/pprof/runtime.go:40 +0xcc
github.com/cockroachdb/pebble.(*DB).flush(0xc0008b7000)
	/root/go/pkg/mod/github.com/cockroachdb/pebble@v0.0.0-20201023120638-f1224da22976/compaction.go:1259 +0x89
created by github.com/cockroachdb/pebble.(*DB).maybeScheduleFlush
	/root/go/pkg/mod/github.com/cockroachdb/pebble@v0.0.0-20201023120638-f1224da22976/compaction.go:1208 +0x17f
  1. Configuration of the cluster and the task

    • tidb-lightning.toml for TiDB-Lightning if possible
    • tikv-importer.toml for TiKV-Importer if possible
    • inventory.ini if deployed by Ansible
  2. Screenshot/exported-PDF of Grafana dashboard or metrics' graph in Prometheus for TiDB-Lightning if possible

@glorv glorv added the type/bug This issue is a bug report label Nov 10, 2020
@kennytm
Copy link
Collaborator

kennytm commented Nov 10, 2020

cc #446.

@glorv if, even after flushing keys to tikv, there is still not enough space left, do you still prefer to quit rather than hang? what about if we log warnings?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
severity/moderate type/bug This issue is a bug report
Projects
None yet
Development

No branches or pull requests

3 participants