-
Notifications
You must be signed in to change notification settings - Fork 357
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CC doesn't always stop #420
Comments
We have created an issue in Pivotal Tracker to manage this. You can view the current status of your issue at: https://www.pivotaltracker.com/story/show/102210084. |
Are there long-running queries that are holding the connection between the CC and the DB? If so, do you know what they are? This will help us reproduce the issue. |
Yes, the queries are for /v2/events/..., on a DB with .... 6 million records. In the DB, this is an example of of one many queries taking minutes to complete: |
Hey, we've been doing a bit of investigation over here. We found that according to this line in We tried reproducing this by adding the following query to an endpoint and slamming it We also tried coming at it from the other side by lowering the max_connections for postgres to a very small number to see if we could max out connections that way. We were still unable to reproduce. We're a little stumped on what to attempt next to reproduce this. Any thoughts? Thanks, |
In doing more debugging, the only logical conclusion is that utils.sh is exiting on the call to the regular kill. |
@fraenkel We just merged this fix cloudfoundry-attic/shared-release-packages#4 |
This is actually on the other end. The process never stops. |
@fraenkel Do you have any ideas on what else we could look at to try to reproduce the issue? |
So I mimicked the behavior by commenting out all the kills in the common/utils.sh
If you go back on to api_z1, monit stop will fail but monit start will succeed. It causes the same behavior we saw but we had it happen via the scripts unmodified. |
Closing this issue as it looks like it was addressed in the story referenced above. |
We have been hitting situations where CCDB has been holding CC hostage and preventing the kills from actually stopping CC. We instead see monit restart CC and have issues since we eventually run out of DB connections since the previous CC is holding many of them.
Our monit log shows the following,
Notice the Killing @ 3:02:26 with no Stopped or Timed Out.
We haven't been able to find any additional details other than the CC ctl.log which only shows the above.
The text was updated successfully, but these errors were encountered: