-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
parallel MCMC on 0.5 #109
Comments
Issue is with Julia's |
What exactly is the issue? While the keyword arguments to |
I think @brian-j-smith will be better able to answer this question, but if I recall the issue was that not all anonymous functions were being sent out to the workers. |
AFAIK the issue is with capturing globals in the anonymous function. Which existed prior to 0.5 too. A workaround for that issue is to wrap the closure creation in a
|
The pmap errors we are getting now in 0.5 do not occur in 0.4. Below is the 0.5 error, for what its worth. When I get some free time, I'll try to come up with a stand-alone test case that reproduces the error. ERROR: LoadError: On worker 2:
UndefVarError: ##298#299 not defined
in deserialize_datatype at .\serialize.jl:823
in handle_deserialize at .\serialize.jl:571
in deserialize at .\serialize.jl:881
in deserialize_datatype at .\serialize.jl:835
in handle_deserialize at .\serialize.jl:571
in deserialize at .\serialize.jl:894
in deserialize_datatype at .\serialize.jl:835
in handle_deserialize at .\serialize.jl:571
in deserialize at .\serialize.jl:881
in deserialize_datatype at .\serialize.jl:835
in handle_deserialize at .\serialize.jl:571
in deserialize_array at .\serialize.jl:709
in handle_deserialize at .\serialize.jl:569
in deserialize at .\serialize.jl:541
in ntuple at .\tuple.jl:65
in handle_deserialize at .\serialize.jl:562
in deserialize_msg at .\multi.jl:120
in message_handler_loop at .\multi.jl:1317
in process_tcp_streams at .\multi.jl:1276
in #620 at .\event.jl:68
in #remotecall_fetch#608(::Array{Any,1}, ::Function, ::Function, ::Base.Worker,
::Array{Any,1}, ::Vararg{Array{Any,1},N}) at .\multi.jl:1070
in remotecall_fetch(::Function, ::Base.Worker, ::Array{Any,1}, ::Vararg{Array{A
ny,1},N}) at .\multi.jl:1062
in #remotecall_fetch#611(::Array{Any,1}, ::Function, ::Function, ::Int64, ::Arr
ay{Any,1}, ::Vararg{Array{Any,1},N}) at .\multi.jl:1080
in remotecall_fetch(::Function, ::Int64, ::Array{Any,1}, ::Vararg{Array{Any,1},
N}) at .\multi.jl:1080
in #remotecall_pool#691(::Array{Any,1}, ::Function, ::Function, ::Function, ::W
orkerPool, ::Array{Any,1}, ::Vararg{Array{Any,1},N}) at .\workerpool.jl:93
in remotecall_pool(::Function, ::Function, ::WorkerPool, ::Array{Any,1}, ::Vara
rg{Array{Any,1},N}) at .\workerpool.jl:91
in #remotecall_fetch#694(::Array{Any,1}, ::Function, ::Function, ::WorkerPool,
::Array{Any,1}, ::Vararg{Array{Any,1},N}) at .\workerpool.jl:124
in remotecall_fetch(::Function, ::WorkerPool, ::Array{Any,1}, ::Vararg{Array{An
y,1},N}) at .\workerpool.jl:124
in (::Base.###699#700#702{WorkerPool,Mamba.#mcmc_worker!})(::Array{Any,1}, ::Fu
nction, ::Array{Any,1}, ::Vararg{Array{Any,1},N}) at .\workerpool.jl:151
in (::Base.##699#701)(::Array{Any,1}, ::Vararg{Array{Any,1},N}) at .\workerpool
.jl:151
in macro expansion at .\asyncmap.jl:63 [inlined]
in (::Base.##757#759{Base.AsyncCollector,Base.AsyncCollectorState})() at .\task
.jl:360 |
This is the reduced case:
On 0.4:
On 0.5:
|
Great! Thanks for coming up with a reduced case. I'll keep an eye on your other thread and possible developments in julia to address this. A work-around on our package side may be challenging since some anonymous functions are user-supplied. |
@amitmurthy still having pmap issues in v0.6 julia> include("../deonovib\\.julia/v0.6\\Mamba\\doc/examples/rats.jl")
MCMC Simulation of 10000 Iterations x 2 Chains...
ERROR: LoadError: On worker 2:
UndefVarError: ##332#333 not defined
deserialize_datatype at .\serialize.jl:969
handle_deserialize at .\serialize.jl:674
deserialize at .\serialize.jl:634
handle_deserialize at .\serialize.jl:681
deserialize at .\serialize.jl:1075
handle_deserialize at .\serialize.jl:687
deserialize at .\serialize.jl:1088
handle_deserialize at .\serialize.jl:687
deserialize at .\serialize.jl:1075
handle_deserialize at .\serialize.jl:687
deserialize_array at .\serialize.jl:871
handle_deserialize at .\serialize.jl:672
deserialize at .\serialize.jl:634
ntuple at .\tuple.jl:108
handle_deserialize at .\serialize.jl:664
deserialize_msg at .\distributed\messages.jl:98
message_handler_loop at .\distributed\process_messages.jl:161
process_tcp_streams at .\distributed\process_messages.jl:118
#99 at .\event.jl:73
Stacktrace:
[1] #570 at .\asyncmap.jl:178 [inlined]
[2] foreach(::Base.##570#572, ::Array{Any,1}) at .\abstractarray.jl:1731
[3] maptwice(::Function, ::Channel{Any}, ::Array{Any,1}, ::Array{Array{Any,1},1}, ::Vararg{Array{Array{Any,1},1},N} where N) at .\asyncmap.jl:178
[4] wrap_n_exec_twice(::Channel{Any}, ::Array{Any,1}, ::Base.Distributed.##204#207{WorkerPool}, ::Function, ::Array{Array{Any,1},1}, ::Vararg{Array{Array{Any,1},1},N} where N) at .\asyncmap.jl:154
[5] #async_usemap#555(::Function, ::Void, ::Function, ::Base.Distributed.##188#190, ::Array{Array{Any,1},1}, ::Vararg{Array{Array{Any,1},1},N} where N) at .\asyncmap.jl:103
[6] (::Base.#kw##async_usemap)(::Array{Any,1}, ::Base.#async_usemap, ::Function, ::Array{Array{Any,1},1}, ::Vararg{Array{Array{Any,1},1},N} where N) at .\<missing>:0
[7] (::Base.#kw##asyncmap)(::Array{Any,1}, ::Base.#asyncmap, ::Function, ::Array{Array{Any,1},1}) at .\<missing>:0
[8] #pmap#203(::Bool, ::Int64, ::Void, ::Array{Any,1}, ::Void, ::Function, ::WorkerPool, ::Function, ::Array{Array{Any,1},1}) at .\distributed\pmap.jl:126
[9] pmap(::WorkerPool, ::Function, ::Array{Array{Any,1},1}) at .\distributed\pmap.jl:101
[10] #pmap#213(::Array{Any,1}, ::Function, ::Function, ::Array{Array{Any,1},1}) at .\distributed\pmap.jl:156
[11] pmap2(::Function, ::Array{Array{Any,1},1}) at C:\Users\deonovib\.julia\v0.6\Mamba\src\utils.jl:97
[12] mcmc_master!(::Mamba.Model, ::UnitRange{Int64}, ::Int64, ::Int64, ::UnitRange{Int64}, ::Bool) at C:\Users\deonovib\.julia\v0.6\Mamba\src\model\mcmc.jl:52
[13] #mcmc#29(::Int64, ::Int64, ::Int64, ::Bool, ::Function, ::Mamba.Model, ::Dict{Symbol,Any}, ::Array{Dict{Symbol,Any},1}, ::Int64) at C:\Users\deonovib\.julia\v0.6\Mamba\src\model\mcmc.jl:32
[14] (::Mamba.#kw##mcmc)(::Array{Any,1}, ::Mamba.#mcmc, ::Mamba.Model, ::Dict{Symbol,Any}, ::Array{Dict{Symbol,Any},1}, ::Int64) at .\<missing>:0
[15] include_from_node1(::String) at .\loading.jl:569
[16] include(::String) at .\sysimg.jl:14
while loading C:\Users\deonovib\.julia\v0.6\Mamba\doc\examples\rats.jl, in expression starting on line 121
|
Does your code serialize the same anonymous function across multiple workers? |
I don't think that is the issue. The relevant code seems like a straight forward use of lsts = [
Any[m, states[k], window, burnin, thin, ChainProgress(frame, k, N)]
for k in chains
]
results = pmap(mcmc_worker!, lsts) where type Model
nodes::Dict{Symbol, Any}
samplers::Vector{Sampler}
states::Vector{ModelState}
iter::Int
burnin::Int
hasinputs::Bool
hasinits::Bool
end and type Sampler{T}
params::Vector{Symbol}
eval::Function
tune::T
targets::Vector{Symbol}
end where a Sampler might look something like: function RWM{T<:Real}(params::ElementOrVector{Symbol},
scale::ElementOrVector{T}; args...)
samplerfx = function(model::Model, block::Integer)
block = SamplingBlock(model, block, true)
v = SamplerVariate(block, scale; args...)
sample!(v, x -> logpdf!(block, x))
relist(block, v)
end
Sampler(params, samplerfx, RWMTune())
end so I think we are having trouble passing |
Is there a sample code that I can test with locally? I assume all dependent packages are publicly available? |
Currently we have it hacked such that only sequential processing is possible. Simply ...
if (nprocs() > 1) #& (VERSION < v"0.5-")
... Run any of the examples that utilize multiple chains you will get the error. e.g. include(".julia/v0.6/Mamba/doc/examples/rats.jl") or after doing the include you can also replicate error with remotecall_fetch(fieldnames, 2, model) which is a little more direct and shows that the issue is sending model out to the different workers. |
On OSX I had trouble installing |
What was the error? Seems strange since Travis shows it installing and all tests pass on OSX https://travis-ci.org/brian-j-smith/Mamba.jl |
Fails with
|
I'll delete the v0.6 packages directory and try again. |
The immediate culprit appears to be
It is not being serialized as it is defined under |
However, if the anonymous function is explicitly created under module
|
Only when the anonymous function is explicitly defined under |
Thanks for the extensive analysis @amitmurthy what is the next step? Is this something that needs to be addressed in our package or is this something on the julialang side that could be addressed? |
Both I think. You can check if you can avoid an |
You can follow the discussion on the linked issue above. Meanwhile, I could run rats.jl in parallel on 0.6 by eval'ing into Main.
|
Currently pmap seems to be disabled for
v"0.5.0"
. I am wondering if there is a workaround, eg running the samples separately and then somehow merging. Usually run 3–5 chains and the speed gains would be significant.The text was updated successfully, but these errors were encountered: