Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Lightning]: Panic on invalid Mertic countter operation #37338

Closed
Rustin170506 opened this issue Aug 24, 2022 · 7 comments · Fixed by #53444
Closed

[Lightning]: Panic on invalid Mertic countter operation #37338

Rustin170506 opened this issue Aug 24, 2022 · 7 comments · Fixed by #53444
Labels
affects-5.3 This bug affects 5.3.x versions. affects-5.4 This bug affects the 5.4.x(LTS) versions. affects-6.1 This bug affects the 6.1.x(LTS) versions. affects-6.2 affects-6.5 This bug affects the 6.5.x(LTS) versions. affects-7.1 This bug affects the 7.1.x(LTS) versions. affects-7.5 This bug affects the 7.5.x(LTS) versions. affects-8.1 This bug affects the 8.1.x(LTS) versions. component/lightning This issue is related to Lightning of TiDB. report/customer Customers have encountered this bug. severity/major type/bug The issue is confirmed as a bug.

Comments

@Rustin170506
Copy link
Member

Bug Report

Version: 6.1.0

Lightning panic on https://github.com/pingcap/tidb/blob/master/br/pkg/lightning/restore/restore.go#L2333

I think it's because float64(currOffset - startOffset) < 0.

panic: counter cannot decrease in value

goroutine 202374 [running]:
github.com/prometheus/client_golang/prometheus.(*counter).Add(0x0?, 0xc069674c80?)
/go/pkg/mod/github.com/prometheus/client_golang@v1.11.0/prometheus/counter.go:109 +0xf4
github.com/pingcap/tidb/br/pkg/lightning/restore.(*chunkRestore).deliverLoop(0xc0618805a0,
{0x445dbd8, 0xc0017c6040}

, 0xc06c80a3c0, 0xc0012af740, 0x3, 0xc002a8ff18?, 0xc002a8ffd0?, 0xc0016e5080)
/mnt/tidb/sql/br/pkg/lightning/restore/restore.go:2255 +0x7a3
github.com/pingcap/tidb/br/pkg/lightning/restore.(*chunkRestore).restore.func1()
/mnt/tidb/sql/br/pkg/lightning/restore/restore.go:2509 +0xab
created by github.com/pingcap/tidb/br/pkg/lightning/restore.(*chunkRestore).restore
/mnt/tidb/sql/br/pkg/lightning/restore/restore.go:2507 +0x31b
@Rustin170506 Rustin170506 added type/bug The issue is confirmed as a bug. component/lightning This issue is related to Lightning of TiDB. labels Aug 24, 2022
@D3Hunter
Copy link
Contributor

D3Hunter commented Aug 24, 2022

value of currOffset comes from parser.pos which increase monotonically. the init value of parser.pos comes from chunk.Chunk.Offset. so it should happen that currOffset - startOffset < 0

need to reproduce it.

@D3Hunter D3Hunter added the affects-6.1 This bug affects the 6.1.x(LTS) versions. label Aug 24, 2022
@D3Hunter
Copy link
Contributor

user changed lightning code, cannot find code related to hash b3ee7aef716e39dd5c6744e4dac0fbbb3529aca7

not sure the root cause is from 6.1 code base or from user's customization

@ti-chi-bot ti-chi-bot added may-affects-4.0 This bug maybe affects 4.0.x versions. may-affects-5.0 This bug maybe affects 5.0.x versions. may-affects-5.1 This bug maybe affects 5.1.x versions. may-affects-5.2 This bug maybe affects 5.2.x versions. may-affects-5.3 This bug maybe affects 5.3.x versions. may-affects-5.4 This bug maybe affects 5.4.x versions. may-affects-6.0 may-affects-6.2 labels Aug 26, 2022
@D3Hunter D3Hunter added severity/minor and removed may-affects-4.0 This bug maybe affects 4.0.x versions. may-affects-5.1 This bug maybe affects 5.1.x versions. may-affects-5.2 This bug maybe affects 5.2.x versions. may-affects-5.3 This bug maybe affects 5.3.x versions. may-affects-5.4 This bug maybe affects 5.4.x versions. may-affects-5.0 This bug maybe affects 5.0.x versions. may-affects-6.0 severity/major labels Aug 26, 2022
@D3Hunter
Copy link
Contributor

update: the error happens when importing fixed-width text format file, not csv but pretend it as csv. in the datafile there're binary data which contains csv terminator which causes error.

cannot reproduce it now.

@lance6716
Copy link
Contributor

Reproduced in a ONCALL ticket

@lance6716 lance6716 reopened this May 21, 2024
@lance6716
Copy link
Contributor

I printed newOffset and cr.chunk.Chunk.EndOffset

if kvSize >= minDeliverBytes || len(kvPacket) >= maxKvPairsCnt || newOffset == cr.chunk.Chunk.EndOffset {

there's an off-by-1 error, the chunk reader does not stop

[2024/05/21 15:55:36.744 +08:00] [INFO] [chunk_process.go:485] ["lance test3"] [table=`test`.`gla_glis_quarter`] [engineNumber=0] [fileIndex=7] [path=test.gla_glis_quarter.csv:469762831] [offset=536871695] [EndOffset=536871696]
...
[2024/05/21 15:55:36.744 +08:00] [INFO] [chunk_process.go:485] ["lance test3"] [table=`test`.`gla_glis_quarter`] [engineNumber=0] [fileIndex=7] [path=test.gla_glis_quarter.csv:469762831] [offset=536871902] [EndOffset=536871696]

@lance6716
Copy link
Contributor

lance6716 commented May 21, 2024

root cause:

For data with lint terminator \r\n and don't set terminator in configuration and strict-format = true, we try to seek every 256MiB and read the next line terminator as the chunk end position

if err = parser.SetPos(endOffset, 0); err != nil {
return nil, nil, err
}
_, pos, err := parser.ReadUntilTerminator()

But unfortunately we seek to \r and ReadUntilTerminator then found \n. So end position of a chunk is set to \n. But elsewhere we use \r as the position. This causes the off-by-1 error

workaround is, set terminator to '\r\n' in configuration. ReadUntilTerminator will work fine because it knows \n is not terminator.

@lance6716 lance6716 added affects-6.5 This bug affects the 6.5.x(LTS) versions. affects-7.1 This bug affects the 7.1.x(LTS) versions. affects-7.5 This bug affects the 7.5.x(LTS) versions. affects-8.1 This bug affects the 8.1.x(LTS) versions. severity/major and removed severity/minor labels May 21, 2024
@ti-chi-bot ti-chi-bot bot added the may-affects-5.4 This bug maybe affects 5.4.x versions. label May 21, 2024
@lance6716 lance6716 added affects-5.3 This bug affects 5.3.x versions. affects-5.4 This bug affects the 5.4.x(LTS) versions. labels May 21, 2024
@ti-chi-bot ti-chi-bot bot removed the may-affects-5.4 This bug maybe affects 5.4.x versions. label May 21, 2024
@seiya-annie
Copy link

/found customer

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects-5.3 This bug affects 5.3.x versions. affects-5.4 This bug affects the 5.4.x(LTS) versions. affects-6.1 This bug affects the 6.1.x(LTS) versions. affects-6.2 affects-6.5 This bug affects the 6.5.x(LTS) versions. affects-7.1 This bug affects the 7.1.x(LTS) versions. affects-7.5 This bug affects the 7.5.x(LTS) versions. affects-8.1 This bug affects the 8.1.x(LTS) versions. component/lightning This issue is related to Lightning of TiDB. report/customer Customers have encountered this bug. severity/major type/bug The issue is confirmed as a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants