Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distributed deployment #14

Open
khalidpandit opened this issue Apr 26, 2018 · 9 comments
Open

Distributed deployment #14

khalidpandit opened this issue Apr 26, 2018 · 9 comments

Comments

@khalidpandit
Copy link

how to distribute the layers of NN on different machines..

@tverbele
Copy link
Member

In a nutshell:

  • Make sure you have multiple DIANNE runtimes and have them connected together. You can connect them from the command line interface typing connect <ip of the machine running another runtime>
  • Then in the UI deploy tab, there should be a coloured box for each runtime in the left. First click on the box of the runtime you want to deploy to, and next click on the NN modules you want to deploy there. Repeat until all modules are deployed.

In case you use DIANNE programatically, you can use the List<ModuleInstanceDTO> deployModules(UUID nnId, List<ModuleDTO> modules, UUID runtimeId, String... tags) method from DiannePlatform service to deploy modules on whichever runtime that is available.

@khalidpandit
Copy link
Author

lenet_dist
why does it always show 1 device always idle..

@tverbele
Copy link
Member

tverbele commented May 1, 2018

Only one evaluation job is running, which by default only uses 1 device.

@khalidpandit
Copy link
Author

Exception in thread "pool-9-thread-1" java.lang.OutOfMemoryError

i am getting this error when i am trying to learn the CNN on CIFAR data set..
is the problem with neural network setting or system memory..

@tverbele
Copy link
Member

It depends on your system. The default CIFAR implementation tries to load the entire dataset in (heap) memory, so it might be that you don't have enough memory, or maybe you just need to increase the maximum heap size for the Java JVM.

@khalidpandit
Copy link
Author

how can i add two learners on two different devices..for distributed learning.it seems both learners are present on same device
1

@tverbele
Copy link
Member

This is indeed not supported from the builder UI. For such a setup you'll need to submit a learn job either programatically either via the command line / dashboard UI and add an option, e.g. targetCount=2 if you want to run the job on two targets.

@khalidpandit
Copy link
Author

could u please provide any documentation for that..that would indeed be very helpful

@tverbele
Copy link
Member

There is some documentation available on the different configuration options at https://github.com/ibcn-cloudlet/dianne/blob/e62daeb3cd5febd4624f80ba965946c84e78206b/doc/configuration.md

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants