-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EOFError when multi-processing and orbit is variable #308
Comments
The error goes away if I change the sampling from This gives an EOFError:
This works with no problem. Notice the X in
|
Good find! I expect this has something to do with these multiprocessing hacks in PyMC3 has always had some serious issues with multiprocessing on Macs, and I never loved this "solution", but it typically seems to work! |
I get this bug persisting on an old Mac (Intel 2019 OSx13.3.1) even when using pymc3_ext.sample for >1 core. I believe this isn't an issue for my M1 chip Mac. starry v1.2.0 |
@catrionamurray — Bummer! I'm not too sure what to recommend and I don't have an Intel mac to test this locally. One option for the short term might be to just run multiple copies of your script (each with 1 CPU) and then combine the chains after... Unfortunately I'm about to go on leave and working at limited capacity so I can't be super helpful in the short term - sorry!! |
Actually it seems I still get this issue on my M1 chip, and changing to pmx.sample doesn't seem to solve it for me... |
Describe the bug
I had an issue with sampling in starry on multiple cores when the orbit is variable. This had me running everything more slowly on one core for a while. The error traceback I got was:
Error traceback
--------------------------------------------------------------------------- EOFError Traceback (most recent call last) Input In [24], in () 1 with model: 2 #trace = pmx.sample( ----> 3 trace = pm.sample( 4 tune=250, 5 draws=500, 6 start=map_soln, 7 chains=4, 8 cores=4, 9 target_accept=0.9, 10 )File ~/miniconda3/envs/ABATE/lib/python3.9/site-packages/pymc3/sampling.py:559, in sample(draws, step, init, n_init, start, trace, chain_idx, chains, cores, tune, progressbar, model, random_seed, discard_tuned_samples, compute_convergence_checks, callback, jitter_max_retries, return_inferencedata, idata_kwargs, mp_ctx, pickle_backend, **kwargs)
557 _print_step_hierarchy(step)
558 try:
--> 559 trace = _mp_sample(**sample_args, **parallel_args)
560 except pickle.PickleError:
561 _log.warning("Could not pickle model, sampling singlethreaded.")
File ~/miniconda3/envs/ABATE/lib/python3.9/site-packages/pymc3/sampling.py:1477, in _mp_sample(draws, tune, step, chains, cores, chain, random_seed, start, progressbar, trace, model, callback, discard_tuned_samples, mp_ctx, pickle_backend, **kwargs)
1475 try:
1476 with sampler:
-> 1477 for draw in sampler:
1478 trace = traces[draw.chain - chain]
1479 if trace.supports_sampler_stats and draw.stats is not None:
File ~/miniconda3/envs/ABATE/lib/python3.9/site-packages/pymc3/parallel_sampling.py:479, in ParallelSampler.iter(self)
476 self._progress.update(self._total_draws)
478 while self._active:
--> 479 draw = ProcessAdapter.recv_draw(self._active)
480 proc, is_last, draw, tuning, stats, warns = draw
481 self._total_draws += 1
File ~/miniconda3/envs/ABATE/lib/python3.9/site-packages/pymc3/parallel_sampling.py:351, in ProcessAdapter.recv_draw(processes, timeout)
349 idxs = {id(proc._msg_pipe): proc for proc in processes}
350 proc = idxs[id(ready[0])]
--> 351 msg = ready[0].recv()
353 if msg[0] == "error":
354 warns, old_error = msg[1:]
File ~/miniconda3/envs/ABATE/lib/python3.9/multiprocessing/connection.py:255, in _ConnectionBase.recv(self)
253 self._check_closed()
254 self._check_readable()
--> 255 buf = self._recv_bytes()
256 return _ForkingPickler.loads(buf.getbuffer())
File ~/miniconda3/envs/ABATE/lib/python3.9/multiprocessing/connection.py:419, in Connection._recv_bytes(self, maxsize)
418 def _recv_bytes(self, maxsize=None):
--> 419 buf = self._recv(4)
420 size, = struct.unpack("!i", buf.getvalue())
421 if size == -1:
File ~/miniconda3/envs/ABATE/lib/python3.9/multiprocessing/connection.py:388, in Connection._recv(self, size, read)
386 if n == 0:
387 if remaining == size:
--> 388 raise EOFError
389 else:
390 raise OSError("got end of file during message")
EOFError:
To Reproduce
Minimal-ish example adapted from the "Hot jupiter phase curve example"
Expected behavior
Should sample the posterior and calculate a trace object. Instead, I get the error.
Your setup (please complete the following information):
Additional context
This happened to me in a fairly specific set of circumstances:
However, this is the set of circumstances in which I primarily use
starry
.I have found a solution/workaround and wanted to share it in case anyone else gets EOFError.
The text was updated successfully, but these errors were encountered: