-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP/RFC: parallel population optimizer #25
base: master
Are you sure you want to change the base?
Conversation
Interesting. This is a classic Master/Slave optimization scheme then and might be useful and should be able to speed up convergence (vs time). However, presently it fails the travis CI build (0.3 issue?) so not sure how we handle this right now. |
Indeed it does speed up the convergence (interestingly, contrary to what R's I guess it will never run on 0.3, because it misses the key functionality. Since the PR uses MessageUtils master, which has no tagged version, it is not straightforward to run on 0.4 either. So it's not ready to merge, but I though it's worth publishing as PR to provide some parallelization starting point for those that might be interested. |
Parallelization is great. The functions I optimize generally use a lot of temporary memory, so I generally add a set preallocated arrays as an argument. When optimizing the function, I preallocate arrays, and then feed pa = PreallocateArray()
bboptimize(x -> f(x, pa)) I'm not sure how closures work with parallel workers. One solution would be to recreate arrays at every iterations bboptimize(x -> f(x, PreallocateArray())) but maybe there's a better way. I'd be great if this parallel solution handled these cases |
@matthieugomez I think we should be able to safely combine preallocation and parallelization -- we just need to be sure that each worker receives its own copy of "functional object" In v0.4 it could be even simpler. You can define |
Great. Thanks for the explanation. |
Suggestion: Allow multiple different WorkerMethods. That might increase robustness on some problems (since a single optimizer is never best for all problems) and might also help convergence (worse optimizers might get out of local optima by "inspiration" from a better optimizer etc). Would be very nice if one can have also non-population optimizers in there since some of them are very much quicker on "simple" problems and it is well known that DE benefits from early "help" with a few good solutions in the population. |
Definitely that would be very nice to have. But that requires more advanced communication to solve the load balancing problems, because the current scheme assumes migration rates are the same for all workers. Ultimately, it would be nice to have something like a parallelized version of Amalgam. |
Ok, yes. Amalgam is very nice but parallelism on a lower level then. We should experiment with a eval parallel "block" size that we do pmap on at some point. If we are only sending numdims floats over in a vector I guess it should be ok for most but the very simple fitness functions. |
Hi I am wondering if an MPI implementation of the above idea is welcome? if so I tried to find 'ParallelPopulationOptimizer' code but somehow I failed when forked Alexey Stukalov rep; how can I get a copy/clone of the relevant files? |
@steven-varga Hi, if you want to distribute the calculations over several machines, MPI could be a way to go (you can also consider ZeroMQ or use non-local cluster managers -- the latter should require minimal modifications to the current code). For one machine I think the current approach is better, because it uses native Julia task scheduling and data messaging. If you want to test this pull request in Julia, you can try Pkg.add("MessageUtils")
Pkg.checkout("MessageUtils") # parallel optimizer requires the master of MessageUtils
Pkg.clone("https://github.com/alyst/BlackBoxOptim.jl.git")
Pkg.checkout("BlackBoxOptim", branch="parallel_pop_opt") But be warned that the code is still experimental. Also I've rebased it on top of new BlackBoxOptim API, but haven't tested yet. Pls let me know if you have further questions. |
I've updated the code to use |
520bebd
to
20a2a69
Compare
Updated to use recently introduced type-parameterized RemoteRefs and parallel exception handling. |
9f09ea9
to
8310746
Compare
8310746
to
ed6702d
Compare
@alyst This can now be closed, right, since it was superseeded by the ParallelEvaluator code? |
@robertfeldt ParallelEvaluator covers methods that generate multiple candidates at once, e.g. NES. This master/slave optimizer [better] fits DE-like algorithms. So both can co-exist. I would leave it as PR. At least it gives a starting point for parallelization of multiple different optimizers. |
Ok, yes, I remember now. Ok lets leave this as PR for now. Great if you can take a look at the example of parallel eval so we can help guide people on its use. I guess we should use a NES alg in it, then. |
This is an attempt to implement some parallelization. Of course, the most straightforward way would be to parallelize fitness calculation in
rank_by_fitness!()
, but since Julia doesn't have threads [yet], it would mean lots of data sending/synchronization overhead. So the goal was to have something that require less messaging and synchronization points.ParallelPopulationOptimizer
basically starts N independent population optimizers on worker processes. AftermigrationPeriod
steps each worker randomly selectsmigrationSize
individuals and sends them to the master process. Master process just constantly listens to the incoming immigrants and distributes them among the other (N-1) workers (+ collects the best ones). The acceptance of the immigrants by the workers happens through the standardtell!()
interface: an immigrant replaces some random "aborigine" if it is more fit.Parallelization is implemented using native Julian
RemoteRef
,Channel
andTask
mechanisms.Channel
s were added only in Julia v0.4, so it would not work on v0.3.It's definitely not for production use yet, because there's no handling of exceptions in the worker processes.
Also, I don't know how efficient is this scheme in terms of convergence, and whether there's a way to improve it. At least in my experiments I can run 8 workers in parallel, and immigrants acceptance rate is around 20%.