Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sweep increased allocation count with v1.3.4 #214

Closed
Roasbeef opened this issue Apr 3, 2020 · 16 comments · Fixed by #220
Closed

sweep increased allocation count with v1.3.4 #214

Roasbeef opened this issue Apr 3, 2020 · 16 comments · Fixed by #220

Comments

@Roasbeef
Copy link

Roasbeef commented Apr 3, 2020

After updating our project to the latest version of bbolt which includes #201, we've started running into this runtime error in our tests:

runtime: nelems=21 nalloc=3 previous allocCount=2 nfreed=65535
fatal error: sweep increased allocation count
runtime stack:
runtime.throw(0xc2ecf1, 0x20)
	/home/travis/.gimme/versions/go1.13.linux.amd64/src/runtime/panic.go:774 +0x72
runtime.(*mspan).sweep(0x7f399ffced10, 0x7f399ffced00, 0x455700)
	/home/travis/.gimme/versions/go1.13.linux.amd64/src/runtime/mgcsweep.go:328 +0x8e1
runtime.(*mcentral).uncacheSpan(0x11e9d60, 0x7f399ffced10)
	/home/travis/.gimme/versions/go1.13.linux.amd64/src/runtime/mcentral.go:197 +0x79
runtime.(*mcache).releaseAll(0x7f39a00cd008)
	/home/travis/.gimme/versions/go1.13.linux.amd64/src/runtime/mcache.go:158 +0x6b
runtime.(*mcache).prepareForSweep(0x7f39a00cd008)
	/home/travis/.gimme/versions/go1.13.linux.amd64/src/runtime/mcache.go:185 +0x46
runtime.acquirep(0xc000042000)
	/home/travis/.gimme/versions/go1.13.linux.amd64/src/runtime/proc.go:4114 +0x3d
runtime.stopm()
	/home/travis/.gimme/versions/go1.13.linux.amd64/src/runtime/proc.go:1930 +0xe8
runtime.gcstopm()
	/home/travis/.gimme/versions/go1.13.linux.amd64/src/runtime/proc.go:2125 +0xb4
runtime.schedule()
	/home/travis/.gimme/versions/go1.13.linux.amd64/src/runtime/proc.go:2481 +0x473
runtime.exitsyscall0(0xc000001680)
	/home/travis/.gimme/versions/go1.13.linux.amd64/src/runtime/proc.go:3141 +0x116
runtime.mcall(0x800000)
	/home/travis/.gimme/versions/go1.13.linux.amd64/src/runtime/asm_amd64.s:318 +0x5b

So far, it only seems to happen about every 1 in 5 test runs. At first we were running the tests with Go 1.14, then tried to go back to version 1.13, but the issue would still pop up. Only after downgrading to bbolt v1.3.3 does the issue seem to disappear. Given that the issue seems to persist using the last two major versions of Go, it may be the case that #201 uncovered another underlying issue, or the changes in the PR are actually triggering this new error.

We're working on creating a reproducible program to trigger this warning.

The full trace of this error in our tests can be found here: https://paste.ubuntu.com/p/XNgrZ49dbw/

@mdogan
Copy link

mdogan commented Apr 4, 2020

I have started to see the same issue after upgrading to v1.3.4. And it's appearing on my Github workflow builds very frequently. But I was able to reproduce on my dev machine only once.

 runtime: nelems=73 nalloc=5 previous allocCount=4 nfreed=65535
 fatal error: sweep increased allocation count

 runtime stack:
 runtime.throw(0x78eb89, 0x20)
 	/opt/hostedtoolcache/go/1.14.1/x64/src/runtime/panic.go:1114 +0x72
 runtime.(*mspan).sweep(0x7f09586fc868, 0x7f09586fc800, 0x432300)
 	/opt/hostedtoolcache/go/1.14.1/x64/src/runtime/mgcsweep.go:328 +0x8b8
 runtime.(*mcentral).uncacheSpan(0xa50b78, 0x7f09586fc868)
 	/opt/hostedtoolcache/go/1.14.1/x64/src/runtime/mcentral.go:197 +0x79
 runtime.(*mcache).releaseAll(0x7f0982675108)
 	/opt/hostedtoolcache/go/1.14.1/x64/src/runtime/mcache.go:158 +0x6b
 runtime.(*mcache).prepareForSweep(0x7f0982675108)
 	/opt/hostedtoolcache/go/1.14.1/x64/src/runtime/mcache.go:185 +0x46
 runtime.acquirep(0xc000024000)
 	/opt/hostedtoolcache/go/1.14.1/x64/src/runtime/proc.go:4278 +0x3d
 runtime.stopm()
 	/opt/hostedtoolcache/go/1.14.1/x64/src/runtime/proc.go:1830 +0xe8
 runtime.gcstopm()
 	/opt/hostedtoolcache/go/1.14.1/x64/src/runtime/proc.go:2028 +0xb4
 runtime.schedule()
 	/opt/hostedtoolcache/go/1.14.1/x64/src/runtime/proc.go:2469 +0x497
 runtime.goschedImpl(0xc000001680)
 	/opt/hostedtoolcache/go/1.14.1/x64/src/runtime/proc.go:2705 +0xd6
 runtime.gopreempt_m(0xc000001680)
 	/opt/hostedtoolcache/go/1.14.1/x64/src/runtime/proc.go:2733 +0x34
 runtime.newstack()
 	/opt/hostedtoolcache/go/1.14.1/x64/src/runtime/stack.go:1025 +0x1ae
 runtime.morestack()
 	/opt/hostedtoolcache/go/1.14.1/x64/src/runtime/asm_amd64.s:449 +0x8f

 goroutine 39 [runnable]:
 go.etcd.io/bbolt.(*Tx).write(0xc001dc2000, 0x882cc4d2, 0xa281a0)
 	/home/runner/go/pkg/mod/go.etcd.io/bbolt@v1.3.4/tx.go:589 +0x3f7
 go.etcd.io/bbolt.(*Tx).Commit(0xc001dc2000, 0x0, 0x0)
 	/home/runner/go/pkg/mod/go.etcd.io/bbolt@v1.3.4/tx.go:186 +0x204
 go.etcd.io/bbolt.(*DB).Update(0xc0002a9800, 0xc000337ec0, 0x0, 0x0)
 	/home/runner/go/pkg/mod/go.etcd.io/bbolt@v1.3.4/db.go:701 +0xf6
 github.com/xyz/qwe/persistence.(*Store).Delete(0xc0003549c0, 0xc001d63710, 0x6, 0xc001d63708, 0x3)
 	/home/runner/work/xyz/qwe/persistence/bolt.go:186 +0x6b
 github.com/xyz/qwe/persistence.Test_Store_DataSize(0xc0000cafc0)
 	/home/runner/work/xyz/qwe/persistence/bolt_test.go:199 +0x239
 testing.tRunner(0xc0000cafc0, 0x79e3f8)
 	/opt/hostedtoolcache/go/1.14.1/x64/src/testing/testing.go:992 +0xdc
 created by testing.(*T).Run
 	/opt/hostedtoolcache/go/1.14.1/x64/src/testing/testing.go:1043 +0x357

 goroutine 6 [runnable]:
 go.etcd.io/bbolt.(*Tx).checkBucket.func1(0x7f095898b000, 0x1)
 	/home/runner/go/pkg/mod/go.etcd.io/bbolt@v1.3.4/tx.go:477 +0xed
 go.etcd.io/bbolt.(*Tx).forEachPage(0xc000210700, 0xd, 0x1, 0xc000040c30)
 	/home/runner/go/pkg/mod/go.etcd.io/bbolt@v1.3.4/tx.go:637 +0x80
 go.etcd.io/bbolt.(*Tx).forEachPage(0xc000210700, 0x1d, 0x0, 0xc000040c30)
 	/home/runner/go/pkg/mod/go.etcd.io/bbolt@v1.3.4/tx.go:643 +0xd8
 go.etcd.io/bbolt.(*Tx).checkBucket(0xc000210700, 0xc000012440, 0xc000040ed0, 0xc000040f00, 0xc0000801e0)
 	/home/runner/go/pkg/mod/go.etcd.io/bbolt@v1.3.4/tx.go:466 +0xd4
 go.etcd.io/bbolt.(*Tx).checkBucket.func2(0x7f095899d020, 0x1, 0x1, 0x0, 0x0, 0x0, 0x0, 0x0)
 	/home/runner/go/pkg/mod/go.etcd.io/bbolt@v1.3.4/tx.go:491 +0xc0
 go.etcd.io/bbolt.(*Bucket).ForEach(0xc000210718, 0xc000040d58, 0x0, 0xc000035d88)
 	/home/runner/go/pkg/mod/go.etcd.io/bbolt@v1.3.4/bucket.go:390 +0xfe
 go.etcd.io/bbolt.(*Tx).checkBucket(0xc000210700, 0xc000210718, 0xc000035ed0, 0xc000035f00, 0xc0000801e0)
 	/home/runner/go/pkg/mod/go.etcd.io/bbolt@v1.3.4/tx.go:489 +0x146
 go.etcd.io/bbolt.(*Tx).check(0xc000210700, 0xc0000801e0)
 	/home/runner/go/pkg/mod/go.etcd.io/bbolt@v1.3.4/tx.go:445 +0x556
 created by go.etcd.io/bbolt.(*Tx).Check
 	/home/runner/go/pkg/mod/go.etcd.io/bbolt@v1.3.4/tx.go:415 +0x67

@tmm1
Copy link
Contributor

tmm1 commented Apr 4, 2020

Since the crash seems to happen inside the golang runtime, I think it would be wise to open an issue on golang/go as well to put it on their radar

@liggitt
Copy link

liggitt commented Apr 4, 2020

cc @jpbetz

@mxpv
Copy link

mxpv commented Apr 5, 2020

We can confirm this happening in containerd after recent upgrade to 1.3.4 as well

@jrick
Copy link
Contributor

jrick commented Apr 6, 2020

Can others test https://github.com/jrick/bbolt/tree/memfix? It has fixed these issues for some of our usage, but I'm unsure if it's a complete fix. I'm also not entirely sure what the problem was to begin with and how or why this branch behaves differently.

@liggitt
Copy link

liggitt commented Apr 20, 2020

@gyuho, should #201 be reverted while this issue remains? if I'm reading this correctly, master and the latest release are still known bad against go 1.14.x (which #201 was fixing) and go1.13 ("At first we were running the tests with Go 1.14, then tried to go back to version 1.13, but the would still pop up.")

@Roasbeef
Copy link
Author

if I'm reading this correctly, master and the latest release are still known bad against go 1.14.x (which #201 was fixing) and go1.13 ("At first we were running the tests with Go 1.14, then tried to go back to version 1.13, but the would still pop up.")

Yes, what's stable for us is: Go 1.13.x and bbolt v1.3.3 (so the prior tag). We're running @jrick's possible fix branch on our continuous integration (against Go 1.4.2) now to see if it resolves the issues we were running into at least. Will report back in a day or two.

@gouthamve
Copy link

@Roasbeef Sorry to ping you here. I was curious how the experiment went. Our upgrade to go1.14 is blocked on this and we'd like to help if we can.

@jrick
Copy link
Contributor

jrick commented Apr 27, 2020

@gouthamve you can try out my branch as well if you're interested in testing. I'm still looking for feedback on it before I open another pull request.

This command will apply the replacement for the test:

go mod edit -replace=go.etcd.io/bbolt=github.com/jrick/bbolt@memfix

@Roasbeef
Copy link
Author

@jrick's memfix branch appears to have resolved the issues we were running into in our tests! I think a draft PR for this fix should be created now so those with knowledge of both the bbolt codebase, as well as the inner details of safe unsafe usage within the runtime can vet it to ensure it doesn't break anything else before we declare this our fix (looks good so far though).

liggitt referenced this issue in etcd-io/etcd May 19, 2020
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
@gyuho
Copy link
Contributor

gyuho commented May 19, 2020

@Roasbeef could you share the tests? Is that something that can be added to bbolt as regression testing?

@Roasbeef
Copy link
Author

@gyuho I think this is the test that was used to repro: https://github.com/matheusd/bbolt-sweep-bug/blob/master/my_test.go

@jrick
Copy link
Contributor

jrick commented May 23, 2020

@gyuho I've committed a modified version of that test to my PR.

@arjunsingri
Copy link

Is it okay if we use go 1.14.x and bbolt 1.3.3? Has anyone experimented with that? Please let me know.

@andrewpmartinez
Copy link
Contributor

This is currently cropping up in one of my projects. I see that two days ago #220 was merged to correct this. I am not familiar with this project's timeline for patch releases: when can we expected this to be tagged for a version?

andrewpmartinez added a commit to openziti/foundation that referenced this issue Jun 17, 2020
- need fix from etcd-io/bbolt#214 and
  etcd-io/bbolt#220 for full go 1.14 compat
@gyuho
Copy link
Contributor

gyuho commented Jun 17, 2020

AkihiroSuda added a commit to AkihiroSuda/containerd that referenced this issue Jun 22, 2020
We had once updated bbolt from v1.3.3 to v1.3.4 in containerd#4134,
but reverted to v1.3.3 in containerd#4156 due to "fatal error: sweep increased
allocation count" (etcd-io/bbolt#214).

The issue was fixed in bbolt v1.3.5 (etcd-io/bbolt#220).

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
fahedouch pushed a commit to fahedouch/containerd that referenced this issue Aug 7, 2020
We had once updated bbolt from v1.3.3 to v1.3.4 in containerd#4134,
but reverted to v1.3.3 in containerd#4156 due to "fatal error: sweep increased
allocation count" (etcd-io/bbolt#214).

The issue was fixed in bbolt v1.3.5 (etcd-io/bbolt#220).

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
tussennet pushed a commit to tussennet/containerd that referenced this issue Sep 11, 2020
We had once updated bbolt from v1.3.3 to v1.3.4 in containerd#4134,
but reverted to v1.3.3 in containerd#4156 due to "fatal error: sweep increased
allocation count" (etcd-io/bbolt#214).

The issue was fixed in bbolt v1.3.5 (etcd-io/bbolt#220).

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
AkihiroSuda added a commit to AkihiroSuda/containerd that referenced this issue Nov 10, 2020
We had once updated bbolt from v1.3.3 to v1.3.4 in containerd#4134,
but reverted to v1.3.3 in containerd#4156 due to "fatal error: sweep increased
allocation count" (etcd-io/bbolt#214).

The issue was fixed in bbolt v1.3.5 (etcd-io/bbolt#220).

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
(cherry picked from commit bebfbab)
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
plorenz pushed a commit to openziti/transport that referenced this issue Mar 30, 2022
- need fix from etcd-io/bbolt#214 and
  etcd-io/bbolt#220 for full go 1.14 compat
plorenz pushed a commit to openziti/storage that referenced this issue Mar 30, 2022
- need fix from etcd-io/bbolt#214 and
  etcd-io/bbolt#220 for full go 1.14 compat
plorenz pushed a commit to openziti/identity that referenced this issue Jun 29, 2022
- need fix from etcd-io/bbolt#214 and
  etcd-io/bbolt#220 for full go 1.14 compat
plorenz pushed a commit to openziti/agent that referenced this issue Jun 29, 2022
- need fix from etcd-io/bbolt#214 and
  etcd-io/bbolt#220 for full go 1.14 compat
plorenz pushed a commit to openziti/metrics that referenced this issue Jun 29, 2022
- need fix from etcd-io/bbolt#214 and
  etcd-io/bbolt#220 for full go 1.14 compat
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging a pull request may close this issue.

10 participants