-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
upon restart after crash 100% cpu is being used #908
Comments
Thanks for the report, @dwoske We'll investigate the issue soon, but in the meantime, I wanted to propose a quick fix by catching I know that doesn't guarantee that the program won't ever crash without closing the database, so we will still make sure the bug is fixed too. |
Looks like |
@dwoske Can you share your badger directory? I would like to understand why |
This has a pprof dump and my badger directory. I open the DB like this:
I have also seen a problem where if the process exits without doing a DB.Close() -- then upon process restart, open the database, then do a DB.Close() - that DB close never returns. This only happens after a process exit without doing DB.Close() ... open/close, open/close all work fine if they happen as they are supposed to during a single process lifetime. Only when a close is missed, do subsequent closes never return. |
Hi @dwoske, that does seem like a different issue. I tried to write a little program that gives the option to crash without closing the database and everything work correctly with v1.6.0. Could you confirm your problem still occurs with that one and, if so, file a new issue? Thanks! |
I do still see the problem with 1.6 .. I've published my little test program here : https://github.com/dwoske/btest with the pprof. The behavior I saw with 1.6, which I didn't notice with the current master was that the first time stopping the program without a close the cpu pegged for about 30 seconds, I ran it again without cleaning my badger directory, it pegged for what seemed a little longer - but cpu still dropped down to normal (at this point I was thinking 1.6 was fine), the third time I started the cpu pegged and didn't drop after about 5 minutes, and I stopped the program - and that is the profile currently in my repo. So, I think the problem exists (for me) in 1.6 as well. I'll file a new issue. I wonder if this is something with my particular laptop configuration. I can test this tonight when I get home on a different Mac. |
FWIW, this did happen on a different Mac. Same kind of behavior, the first couple of times the CPU dropped to normal after some 30+ seconds, and the third time it stayed pegged for several minutes before I stopped the process. Sounds like you may have recreated this as well since the issue is accepted. Thanks. |
When doneUntil is very large, process loop in watermark can end up in a busy-wait loop. Fix that by choosing to iterate either over integer range or over waiters, depending upon whichever produces a shorter loop. Fixes #908 . Changes: * When doneUntil is very large, process loop in watermark can end up in a busy-wait loop. Fix that. * Only call notifyAndRemove if a waiter is present. * Move updating of w.doneUntil before notifying waiters.
When doneUntil is very large, process loop in watermark can end up in a busy-wait loop. Fix that by choosing to iterate either over integer range or over waiters, depending upon whichever produces a shorter loop. Fixes #908 . Changes: * When doneUntil is very large, process loop in watermark can end up in a busy-wait loop. Fix that. * Only call notifyAndRemove if a waiter is present. * Move updating of w.doneUntil before notifying waiters. (cherry picked from commit 1f8c43a)
What version of Go are you using (
go version
)?What version of Badger are you using?
github.com/dgraph-io/badger v2.0.0-rc.2.0.20190626232749-b116882676f2+incompatible
Does this issue reproduce with the latest master?
Yes, as far as I can tell. I did an update a few days ago.
What are the hardware specifications of the machine (RAM, OS, Disk)?
MacBook Pro (13-inch, 2016, Four Thunderbolt 3 Ports)
2.9 GHz Intel Core i5
16 GB 2133 MHz LPDDR3
500 GB Flash Storage
What did you do?
I'm adding a web-api interface on top of badger, so there is an http listener and a program which does not stop running, except in event of a crash.
I open a badger database and commit some some keys and values. Commit works fine, keys and values can be read from the DB as expected.
If I ctrl-c out of the program without doing a db.Close() - then upon restart the CPU is 100% utilized after db.OpenManaged()
Badger is doing something in the background using all the CPU.
As a test, if I do a db.Close() before exiting the program, cpu usage does not peg like that.
However, I can't ensure that db.Close() will always be called - as my machine or process might crash. I never Close the DB during normal operations of the service.
The pprof during this is :
I've let this run for several minutes and the CPU does not drop down to normal. There can be any (small) number of key/values in the database.
After the restart, Badger appears to be working fine, I can insert/read entries from the database, but the CPU is pegged.
What did you expect to see?
I expect to see Badger recover somehow (quickly) after a process crash, and not peg the CPU utilization.
What did you see instead?
CPU usage pegged at 100%+
Is there something I can call immediately after Commit which puts the DB in a "good" state, so that if the process exits unexpectedly without calling DB.Close() that a restart won't peg the CPU. I tried calling Sync() directly after each commit without any effect.
The text was updated successfully, but these errors were encountered: