Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Marking a plasma manager as dead does not mark its local scheduler as dead. #569

Closed
robertnishihara opened this issue May 19, 2017 · 1 comment

Comments

@robertnishihara
Copy link
Collaborator

The file monitor-008015.err on the head node looks like this.

WARNING:root:Timed out b'plasma_manager'
WARNING:root:Removed b'plasma_manager', client ID 00fb29d393f227ce044542f05065560325fb72fd
WARNING:root:Marked 1274 objects as lost.

The entry of ray.global_state.client_table() for this node is the following.

'172.31.30.57': [
  {'ClientType': 'plasma_manager',
   'DBClientID': '00fb29d393f227ce044542f05065560325fb72fd',
   'Deleted': True},
  {'AuxAddress': '172.31.30.57:11227',
   'ClientType': 'local_scheduler',
   'DBClientID': '46139b8d82494ce2480dfd37d98b05fea6da1984',
   'Deleted': False,
   'LocalSchedulerSocketName': '/tmp/scheduler40743926',
   'NumCPUs': 8.0,
   'NumGPUs': 0.0}]

So the plasma manager has been marked as dead, but the local scheduler on the same node has not.

When I run new workloads, it looks like tasks are scheduled on the node with the "dead" plasma manager. Note that when I run `ps aux | grep "plasma_manager " on the relevant node, the manager seems to still be alive.

What is the intended behavior here. If Ray thinks that the manager is dead, then shouldn't we stop assigning work that node?

@robertnishihara
Copy link
Collaborator Author

No longer relevant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant