Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parallel MCMC on 0.5 #109

Open
tpapp opened this issue Nov 29, 2016 · 23 comments
Open

parallel MCMC on 0.5 #109

tpapp opened this issue Nov 29, 2016 · 23 comments

Comments

@tpapp
Copy link
Contributor

tpapp commented Nov 29, 2016

Currently pmap seems to be disabled for v"0.5.0". I am wondering if there is a workaround, eg running the samples separately and then somehow merging. Usually run 3–5 chains and the speed gains would be significant.

@bdeonovic
Copy link
Contributor

Issue is with Julia's pmap, we are waiting on some updates to it: JuliaLang/julia#19000

@amitmurthy
Copy link

What exactly is the issue? While the keyword arguments to pmap have changed, the behavior has not.

@bdeonovic
Copy link
Contributor

I think @brian-j-smith will be better able to answer this question, but if I recall the issue was that not all anonymous functions were being sent out to the workers.

@amitmurthy
Copy link

AFAIK the issue is with capturing globals in the anonymous function. Which existed prior to 0.5 too. A workaround for that issue is to wrap the closure creation in a let block.

julia> a=1
1

julia> remotecall_fetch(()->a, 2)
ERROR: On worker 2:
UndefVarError: a not defined
 in #1 at ./REPL[3]:1
.....

julia> let a=a; remotecall_fetch(()->a, 2); end
1

@brian-j-smith
Copy link
Owner

The pmap errors we are getting now in 0.5 do not occur in 0.4. Below is the 0.5 error, for what its worth. When I get some free time, I'll try to come up with a stand-alone test case that reproduces the error.

ERROR: LoadError: On worker 2:
UndefVarError: ##298#299 not defined
 in deserialize_datatype at .\serialize.jl:823
 in handle_deserialize at .\serialize.jl:571
 in deserialize at .\serialize.jl:881
 in deserialize_datatype at .\serialize.jl:835
 in handle_deserialize at .\serialize.jl:571
 in deserialize at .\serialize.jl:894
 in deserialize_datatype at .\serialize.jl:835
 in handle_deserialize at .\serialize.jl:571
 in deserialize at .\serialize.jl:881
 in deserialize_datatype at .\serialize.jl:835
 in handle_deserialize at .\serialize.jl:571
 in deserialize_array at .\serialize.jl:709
 in handle_deserialize at .\serialize.jl:569
 in deserialize at .\serialize.jl:541
 in ntuple at .\tuple.jl:65
 in handle_deserialize at .\serialize.jl:562
 in deserialize_msg at .\multi.jl:120
 in message_handler_loop at .\multi.jl:1317
 in process_tcp_streams at .\multi.jl:1276
 in #620 at .\event.jl:68
 in #remotecall_fetch#608(::Array{Any,1}, ::Function, ::Function, ::Base.Worker,
 ::Array{Any,1}, ::Vararg{Array{Any,1},N}) at .\multi.jl:1070
 in remotecall_fetch(::Function, ::Base.Worker, ::Array{Any,1}, ::Vararg{Array{A
ny,1},N}) at .\multi.jl:1062
 in #remotecall_fetch#611(::Array{Any,1}, ::Function, ::Function, ::Int64, ::Arr
ay{Any,1}, ::Vararg{Array{Any,1},N}) at .\multi.jl:1080
 in remotecall_fetch(::Function, ::Int64, ::Array{Any,1}, ::Vararg{Array{Any,1},
N}) at .\multi.jl:1080
 in #remotecall_pool#691(::Array{Any,1}, ::Function, ::Function, ::Function, ::W
orkerPool, ::Array{Any,1}, ::Vararg{Array{Any,1},N}) at .\workerpool.jl:93
 in remotecall_pool(::Function, ::Function, ::WorkerPool, ::Array{Any,1}, ::Vara
rg{Array{Any,1},N}) at .\workerpool.jl:91
 in #remotecall_fetch#694(::Array{Any,1}, ::Function, ::Function, ::WorkerPool,
::Array{Any,1}, ::Vararg{Array{Any,1},N}) at .\workerpool.jl:124
 in remotecall_fetch(::Function, ::WorkerPool, ::Array{Any,1}, ::Vararg{Array{An
y,1},N}) at .\workerpool.jl:124
 in (::Base.###699#700#702{WorkerPool,Mamba.#mcmc_worker!})(::Array{Any,1}, ::Fu
nction, ::Array{Any,1}, ::Vararg{Array{Any,1},N}) at .\workerpool.jl:151
 in (::Base.##699#701)(::Array{Any,1}, ::Vararg{Array{Any,1},N}) at .\workerpool
.jl:151
 in macro expansion at .\asyncmap.jl:63 [inlined]
 in (::Base.##757#759{Base.AsyncCollector,Base.AsyncCollectorState})() at .\task
.jl:360

@amitmurthy
Copy link

This is the reduced case:

julia> c = x->x
(anonymous function)

julia> d = x->c(x)
(anonymous function)

On 0.4:

julia> pmap(d, 1:2)
2-element Array{Any,1}:
 1
 2

On 0.5:

julia> pmap(d, 1:2)
ERROR: On worker 2:
UndefVarError: c not defined
 in #3 at ./REPL[2]:1

@amitmurthy
Copy link

Ref: JuliaLang/julia#19456

@brian-j-smith
Copy link
Owner

Great! Thanks for coming up with a reduced case. I'll keep an eye on your other thread and possible developments in julia to address this. A work-around on our package side may be challenging since some anonymous functions are user-supplied.

@bdeonovic
Copy link
Contributor

@amitmurthy still having pmap issues in v0.6

julia> include("../deonovib\\.julia/v0.6\\Mamba\\doc/examples/rats.jl")
MCMC Simulation of 10000 Iterations x 2 Chains...

ERROR: LoadError: On worker 2:
UndefVarError: ##332#333 not defined
deserialize_datatype at .\serialize.jl:969
handle_deserialize at .\serialize.jl:674
deserialize at .\serialize.jl:634
handle_deserialize at .\serialize.jl:681
deserialize at .\serialize.jl:1075
handle_deserialize at .\serialize.jl:687
deserialize at .\serialize.jl:1088
handle_deserialize at .\serialize.jl:687
deserialize at .\serialize.jl:1075
handle_deserialize at .\serialize.jl:687
deserialize_array at .\serialize.jl:871
handle_deserialize at .\serialize.jl:672
deserialize at .\serialize.jl:634
ntuple at .\tuple.jl:108
handle_deserialize at .\serialize.jl:664
deserialize_msg at .\distributed\messages.jl:98
message_handler_loop at .\distributed\process_messages.jl:161
process_tcp_streams at .\distributed\process_messages.jl:118
#99 at .\event.jl:73
Stacktrace:
 [1] #570 at .\asyncmap.jl:178 [inlined]
 [2] foreach(::Base.##570#572, ::Array{Any,1}) at .\abstractarray.jl:1731
 [3] maptwice(::Function, ::Channel{Any}, ::Array{Any,1}, ::Array{Array{Any,1},1}, ::Vararg{Array{Array{Any,1},1},N} where N) at .\asyncmap.jl:178
 [4] wrap_n_exec_twice(::Channel{Any}, ::Array{Any,1}, ::Base.Distributed.##204#207{WorkerPool}, ::Function, ::Array{Array{Any,1},1}, ::Vararg{Array{Array{Any,1},1},N} where N) at .\asyncmap.jl:154
 [5] #async_usemap#555(::Function, ::Void, ::Function, ::Base.Distributed.##188#190, ::Array{Array{Any,1},1}, ::Vararg{Array{Array{Any,1},1},N} where N) at .\asyncmap.jl:103

 [6] (::Base.#kw##async_usemap)(::Array{Any,1}, ::Base.#async_usemap, ::Function, ::Array{Array{Any,1},1}, ::Vararg{Array{Array{Any,1},1},N} where N) at .\<missing>:0
 [7] (::Base.#kw##asyncmap)(::Array{Any,1}, ::Base.#asyncmap, ::Function, ::Array{Array{Any,1},1}) at .\<missing>:0
 [8] #pmap#203(::Bool, ::Int64, ::Void, ::Array{Any,1}, ::Void, ::Function, ::WorkerPool, ::Function, ::Array{Array{Any,1},1}) at .\distributed\pmap.jl:126
 [9] pmap(::WorkerPool, ::Function, ::Array{Array{Any,1},1}) at .\distributed\pmap.jl:101
 [10] #pmap#213(::Array{Any,1}, ::Function, ::Function, ::Array{Array{Any,1},1}) at .\distributed\pmap.jl:156
 [11] pmap2(::Function, ::Array{Array{Any,1},1}) at C:\Users\deonovib\.julia\v0.6\Mamba\src\utils.jl:97
 [12] mcmc_master!(::Mamba.Model, ::UnitRange{Int64}, ::Int64, ::Int64, ::UnitRange{Int64}, ::Bool) at C:\Users\deonovib\.julia\v0.6\Mamba\src\model\mcmc.jl:52
 [13] #mcmc#29(::Int64, ::Int64, ::Int64, ::Bool, ::Function, ::Mamba.Model, ::Dict{Symbol,Any}, ::Array{Dict{Symbol,Any},1}, ::Int64) at C:\Users\deonovib\.julia\v0.6\Mamba\src\model\mcmc.jl:32
 [14] (::Mamba.#kw##mcmc)(::Array{Any,1}, ::Mamba.#mcmc, ::Mamba.Model, ::Dict{Symbol,Any}, ::Array{Dict{Symbol,Any},1}, ::Int64) at .\<missing>:0
 [15] include_from_node1(::String) at .\loading.jl:569
 [16] include(::String) at .\sysimg.jl:14
while loading C:\Users\deonovib\.julia\v0.6\Mamba\doc\examples\rats.jl, in expression starting on line 121

@amitmurthy
Copy link

Does your code serialize the same anonymous function across multiple workers?
i.e., it is initially defined on wrkr_1 which sends it to wrkr_2 which sends it to wrkr_3 and so on (see code example JuliaLang/julia#22675 (comment)). If so, this should be fixed by JuliaLang/julia#22675 which is backport pending onto 0.6.

@bdeonovic
Copy link
Contributor

bdeonovic commented Jul 18, 2017

I don't think that is the issue. The relevant code seems like a straight forward use of pmap

  lsts = [
    Any[m, states[k], window, burnin, thin, ChainProgress(frame, k, N)]
    for k in chains
  ]
  results = pmap(mcmc_worker!, lsts)

where m is a Type that contains anonymous functions. I think m is the real issue. it is:

  type Model
    nodes::Dict{Symbol, Any}
    samplers::Vector{Sampler}
    states::Vector{ModelState}
    iter::Int
    burnin::Int
    hasinputs::Bool
    hasinits::Bool
  end

and Sampler is

  type Sampler{T}
    params::Vector{Symbol}
    eval::Function
    tune::T
    targets::Vector{Symbol}
  end

where a Sampler might look something like:

function RWM{T<:Real}(params::ElementOrVector{Symbol},
                      scale::ElementOrVector{T}; args...)
  samplerfx = function(model::Model, block::Integer)
    block = SamplingBlock(model, block, true)
    v = SamplerVariate(block, scale; args...)
    sample!(v, x -> logpdf!(block, x))
    relist(block, v)
  end
  Sampler(params, samplerfx, RWMTune())
end

so I think we are having trouble passing m to workers because it has quite a heavy nesting of types, named functions, and anonymous functions.

@amitmurthy
Copy link

Is there a sample code that I can test with locally? I assume all dependent packages are publicly available?

@bdeonovic
Copy link
Contributor

bdeonovic commented Jul 18, 2017

Currently we have it hacked such that only sequential processing is possible. Simply Pkg.add("Mamba") and then modify the file .julia/v0.6/Mamba/src/utils.jl by adding comment before Version hack.

...
  if (nprocs() > 1) #& (VERSION < v"0.5-")
...

Run any of the examples that utilize multiple chains you will get the error.

e.g.

 include(".julia/v0.6/Mamba/doc/examples/rats.jl")

or after doing the include you can also replicate error with

remotecall_fetch(fieldnames, 2, model)

which is a little more direct and shows that the issue is sending model out to the different workers.

@amitmurthy
Copy link

On OSX I had trouble installing Mamba (Julia 0.6) . Gadfly related errors - does not work even with a checkout of Gadfly master. Will try again after a few days.

@bdeonovic
Copy link
Contributor

What was the error? Seems strange since Travis shows it installing and all tests pass on OSX https://travis-ci.org/brian-j-smith/Mamba.jl

@amitmurthy
Copy link

Fails with

WARNING: could not import Multimedia.@try_display into Gadfly
ERROR: LoadError: UndefVarError: @try_display not defined
Stacktrace:
 [1] include_from_node1(::String) at ./loading.jl:569
 [2] include(::String) at ./sysimg.jl:14
 [3] anonymous at ./<missing>:2
while loading /Users/amitm/.julia/v0.6/Gadfly/src/Gadfly.jl, in expression starting on line 1052
ERROR: LoadError: Failed to precompile Gadfly to /Users/amitm/.julia/lib/v0.6/Gadfly.ji.
Stacktrace:
 [1] compilecache(::String) at ./loading.jl:703
 [2] _require(::Symbol) at ./loading.jl:490

@amitmurthy
Copy link

I'll delete the v0.6 packages directory and try again.

@amitmurthy
Copy link

The immediate culprit appears to be model.nodes[:s2_alpha].eval with

julia> typeof(model.nodes[:s2_alpha].eval)
Mamba.##338#339

It is not being serialized as it is defined under Mamba. Where and how is it being initialized?

@amitmurthy
Copy link

julia> addprocs(2)
julia> @everywhere module Foo
           type Bar
               x
               Bar() = new(()->1) 
           end
       end

WARNING: deprecated syntax "type" at REPL[2]:2.
Use "mutable struct" instead.

julia> bar = Foo.Bar()
Foo.Bar(Foo.#1)

julia> remotecall_fetch(typeof, 2, bar.x)
Foo.##1#2

However, if the anonymous function is explicitly created under module Foo

julia> bar.x = eval(Foo, :(()->1))
(::#3) (generic function with 1 method)

julia> remotecall_fetch(typeof, 2, bar.x)
ERROR: On worker 2:
UndefVarError: ##3#4 not defined
deserialize_datatype at ./serialize.jl:986
handle_deserialize at ./serialize.jl:687
deserialize at ./serialize.jl:647

@amitmurthy
Copy link

julia> addprocs(2);

julia> @everywhere module Foo
           mutable struct Bar
               x
               Bar() = new(()->1) 
           end

           make_bar() = (b=Bar(); b.x=()->1; b)
       end;


julia> bar = Foo.Bar();

julia> typeof(bar.x)
Foo.##1#2

julia> remotecall_fetch(typeof, 2, bar.x); # OK

julia> bar = Foo.make_bar();

julia> typeof(bar.x)
Foo.##3#4

julia> remotecall_fetch(typeof, 2, bar.x); # OK

julia> bar.x = eval(Foo, :(()->1));

julia> typeof(bar.x)
Foo.##5#6

julia> remotecall_fetch(typeof, 2, bar.x); # Not OK
ERROR: On worker 2:
UndefVarError: ##5#6 not defined
deserialize_datatype at ./serialize.jl:986
handle_deserialize at ./serialize.jl:687

Only when the anonymous function is explicitly defined under Foo does there seem to be a problem.

@bdeonovic
Copy link
Contributor

Thanks for the extensive analysis @amitmurthy what is the next step? Is this something that needs to be addressed in our package or is this something on the julialang side that could be addressed?

@amitmurthy
Copy link

Both I think.

You can check if you can avoid an eval under module Mamba. I'll open an issue on Julialang for addressing this.

@amitmurthy
Copy link

amitmurthy commented Jul 21, 2017

You can follow the discussion on the linked issue above.

Meanwhile, I could run rats.jl in parallel on 0.6 by eval'ing into Main.

diff --git a/src/utils.jl b/src/utils.jl
index 438dbd3..aee2a00 100644
--- a/src/utils.jl
+++ b/src/utils.jl
@@ -7,7 +7,7 @@ end
 function modelfxsrc(literalargs::Vector{Tuple{Symbol, DataType}}, f::Function)
   args = Expr(:tuple, map(arg -> Expr(:(::), arg[1], arg[2]), literalargs)...)
   expr, src = modelexprsrc(f, literalargs)
-  fx = eval(Expr(:function, args, expr))
+  fx = eval(Main, Expr(:function, args, expr))
   (fx, src)
 end
 
@@ -92,7 +92,7 @@ end
 ## called and will apply its error processing.
 
 function pmap2(f::Function, lsts::AbstractArray)
-  if (nprocs() > 1) & (VERSION < v"0.5-")
+  if (nprocs() > 1) #& (VERSION < v"0.5-")
     @everywhere importall Mamba
     pmap(f, lsts)
   else
diff --git a/src/variate.jl b/src/variate.jl
index 8875ad0..1c6b91c 100644
--- a/src/variate.jl
+++ b/src/variate.jl
@@ -112,7 +112,7 @@ const BinaryScalarMethods = [
 ]
 
 for op in BinaryScalarMethods
-  @eval ($op)(x::ScalarVariate, y::ScalarVariate) = ($op)(x.value, y.value)
+  @eval Main ($op)(x::Mamba.ScalarVariate, y::Mamba.ScalarVariate) = ($op)(x.value, y.value)
 end
 
 const RoundScalarMethods = [
@@ -123,8 +123,8 @@ const RoundScalarMethods = [
 ]
 
 for op in RoundScalarMethods
-  @eval ($op)(x::ScalarVariate) = ($op)(x.value)
-  @eval ($op){T}(::Type{T}, x::ScalarVariate) = ($op)(T, x.value)
+  @eval Main ($op)(x::Mamba.ScalarVariate) = ($op)(x.value)
+  @eval Main ($op){T}(::Type{T}, x::Mamba.ScalarVariate) = ($op)(T, x.value)
 end
 
 const UnaryScalarMethods = [
@@ -142,5 +142,5 @@ const UnaryScalarMethods = [
 ]
 
 for op in UnaryScalarMethods
-  @eval ($op)(x::ScalarVariate) = ($op)(x.value)
+  @eval Main ($op)(x::Mamba.ScalarVariate) = ($op)(x.value)
 end

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants