Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot stop gateway #328

Closed
UnderGreen opened this issue Mar 11, 2015 · 6 comments
Closed

Cannot stop gateway #328

UnderGreen opened this issue Mar 11, 2015 · 6 comments

Comments

@UnderGreen
Copy link

I started local cluster on Ubuntu 14.04 with managers, 1 storage and 1 gateway. When I tried to stop gateway I have stucked command(can be interrrupted only with ctrl-c):

ubuntu@leofs1:/usr/local/leofs/1.2.7$ leo_gateway/bin/leo_gateway stop
ok

And status after ctrl-c:

ubuntu@leofs1:/usr/local/leofs/1.2.7$ leofs-adm status
 [System Confiuration]
---------------------------------+----------
 Item                            | Value    
---------------------------------+----------
 Basic/Consistency level
---------------------------------+----------
                  system version | 1.2.7
                      cluster Id | clid
                           DC Id | n1
                  Total replicas | 1
        number of successes of R | 1
        number of successes of W | 1
        number of successes of D | 1
 number of DC-awareness replicas | 0
                       ring size | 2^128
---------------------------------+----------
 Multi DC replication settings
---------------------------------+----------
      max number of joinable DCs | 3
         number of replicas a DC | 1
---------------------------------+----------
 Manager RING hash
---------------------------------+----------
               current ring-hash | 28e8d0fe
              previous ring-hash | 28e8d0fe
---------------------------------+----------

 [State of Node(s)]
-------+-----------------------------+--------------+----------------+----------------+----------------------------
 type  |            node             |    state     |  current ring  |   prev ring    |          updated at         
-------+-----------------------------+--------------+----------------+----------------+----------------------------
  S    | storage_0@10.54.254.24      | running      | 28e8d0fe       | 28e8d0fe       | 2015-03-11 15:23:57 +0600
  G    | gateway_0@10.54.254.24      | stop         | -1             | -1             | 2015-03-11 16:20:37 +0600
-------+-----------------------------+--------------+----------------+----------------+----------------------------

I tried to start gateway again, but have this error: Node is already running!.

@yosukehara
Copy link
Member

It seems LeoFS's gateway failed to stop own process. You can check whether the process still alive or not with the ps command. If the process is still alive, you need to execute the kill command to kill the process.

$ ps aux | grep leo_gateway

We'll check LeoFS gateway's stop command.

@UnderGreen
Copy link
Author

Yes, sure. Process is still alive. Don't see in logs info about this problem, maybe it will be able to help you.

@mocchira
Copy link
Member

@UnderGreen @yosukehara

Googling erlang init stop hang gave me what happend.

Here are the useful results.

In short,
If there is some applications that forgot to add the dependencies to kernel, stdlib,
There is a possibility that kernel, stdlib got unloaded before unloading its app and result in deadlocked.

As with the Riak,
LeoFS have a dependency to bear which doesn't declare the dependencies to kernel, stdlib.
This is the root cause.

@yosukehara do we apply the same method with Riak? ( fork bear and modify bear.app.src to add dependencies to kernel, stdlib) or PR to boundary?

@mocchira
Copy link
Member

So this could happend on leo_(gateway|storage|manager).

We will go forward with the Riak way for now.

mocchira added a commit to leo-project/folsom that referenced this issue Mar 13, 2015
mocchira added a commit to leo-project/savanna_commons that referenced this issue Mar 13, 2015
@yosukehara yosukehara modified the milestones: 1.2.8, 1.4.0 Mar 17, 2015
@yosukehara
Copy link
Member

We've fixed this issue and change the depending library as follows, which will be included in LeoFS v1.2.8.
https://github.com/leo-project/savanna_commons/blob/develop/rebar.config#L28

@yosukehara
Copy link
Member

I've found the root cause which is ranch's processes cannot stop w/observer and a script.

screen shot 2015-04-07 at 14 56 51

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants