use `mp_context=multiprocessing.get_context("spawn")` in ProcessPoolExecutor will crash #126

qindazhu · 2020-11-13T13:00:24Z

With this PR k2-fsa/snowfall#5, I will get error

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/linuxbrew/.linuxbrew/opt/python@3.8/lib/python3.8/multiprocessing/spawn.py", line 116, in spawn_main
    exitcode = _main(fd, parent_sentinel)
  File "/home/linuxbrew/.linuxbrew/opt/python@3.8/lib/python3.8/multiprocessing/spawn.py", line 125, in _main
    prepare(preparation_data)
  File "/home/linuxbrew/.linuxbrew/opt/python@3.8/lib/python3.8/multiprocessing/spawn.py", line 236, in prepare
    _fixup_main_from_path(data['init_main_from_path'])
  File "/home/linuxbrew/.linuxbrew/opt/python@3.8/lib/python3.8/multiprocessing/spawn.py", line 287, in _fixup_main_from_path
    main_content = runpy.run_path(main_path,
  File "/home/linuxbrew/.linuxbrew/opt/python@3.8/lib/python3.8/runpy.py", line 265, in run_path
    return _run_module_code(code, init_globals, run_name,
  File "/home/linuxbrew/.linuxbrew/opt/python@3.8/lib/python3.8/runpy.py", line 97, in _run_module_code
    _run_code(code, mod_globals, init_globals,
  File "/home/linuxbrew/.linuxbrew/opt/python@3.8/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/ceph-hw/snowfall/egs/librispeech/asr/simple_v1/prepare.py", line 47, in <module>
    cut_set = CutSet.from_manifests(
  File "/ceph-hw/lhotse/lhotse/cut.py", line 1319, in compute_and_store_features
    executor.submit(
  File "/home/linuxbrew/.linuxbrew/Cellar/python@3.8/3.8.6_1/lib/python3.8/concurrent/futures/process.py", line 645, in submit
    self._start_queue_management_thread()
  File "/home/linuxbrew/.linuxbrew/Cellar/python@3.8/3.8.6_1/lib/python3.8/concurrent/futures/process.py", line 584, in _start_queue_management_thread
    self._adjust_process_count()
  File "/home/linuxbrew/.linuxbrew/Cellar/python@3.8/3.8.6_1/lib/python3.8/concurrent/futures/process.py", line 608, in _adjust_process_count
Traceback (most recent call last):
  File "./prepare.py", line 47, in <module>
    cut_set = CutSet.from_manifests(
  File "/ceph-hw/lhotse/lhotse/cut.py", line 1328, in compute_and_store_features
    cut_set = CutSet.from_cuts(f.result() for f in futures)
  File "/ceph-hw/lhotse/lhotse/cut.py", line 989, in from_cuts
    return CutSet({cut.id: cut for cut in cuts})
  File "/ceph-hw/lhotse/lhotse/cut.py", line 989, in <dictcomp>
    return CutSet({cut.id: cut for cut in cuts})
  File "/ceph-hw/lhotse/lhotse/cut.py", line 1328, in <genexpr>
    cut_set = CutSet.from_cuts(f.result() for f in futures)
  File "/home/linuxbrew/.linuxbrew/Cellar/python@3.8/3.8.6_1/lib/python3.8/concurrent/futures/_base.py", line 439, in result
    return self.__get_result()
  File "/home/linuxbrew/.linuxbrew/Cellar/python@3.8/3.8.6_1/lib/python3.8/concurrent/futures/_base.py", line 388, in __get_result
    raise self._exception
concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.
/home/linuxbrew/.linuxbrew/opt/python@3.8/lib/python3.8/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 10 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '

BTW, as we pass spawn , it starts a fresh python interpreter process, then will output too many duplicate logs
https://github.com/k2-fsa/snowfall/blob/7201fdebd18231df4c3a6a4c198e1d0a7d7c7d22/egs/librispeech/asr/simple_v1/prepare.py#L17-L21
which is a little bit annoying, if would be great if you can fix that together with the error above, but not urgent.

The text was updated successfully, but these errors were encountered:

pzelasko · 2020-11-13T13:39:03Z

Ugh, I think I am going to drop this parallelism method altogether. Can you try to use dask instead? pip install dask distributed from distributed import Client with Client() as executor: ... I use it on CLSP to distribute the jobs on the grid, but it also supports local execution which is used in the example above. Let me know if that solves the issues. pt., 13 lis 2020, 08:00 użytkownik Haowen Qiu <notifications@github.com> napisał:

…

With this PR k2-fsa/snowfall#5 <k2-fsa/snowfall#5>, I will get error Traceback (most recent call last): File "<string>", line 1, in <module> File ***@***.***/lib/python3.8/multiprocessing/spawn.py", line 116, in spawn_main exitcode = _main(fd, parent_sentinel) File ***@***.***/lib/python3.8/multiprocessing/spawn.py", line 125, in _main prepare(preparation_data) File ***@***.***/lib/python3.8/multiprocessing/spawn.py", line 236, in prepare _fixup_main_from_path(data['init_main_from_path']) File ***@***.***/lib/python3.8/multiprocessing/spawn.py", line 287, in _fixup_main_from_path main_content = runpy.run_path(main_path, File ***@***.***/lib/python3.8/runpy.py", line 265, in run_path return _run_module_code(code, init_globals, run_name, File ***@***.***/lib/python3.8/runpy.py", line 97, in _run_module_code _run_code(code, mod_globals, init_globals, File ***@***.***/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/ceph-hw/snowfall/egs/librispeech/asr/simple_v1/prepare.py", line 47, in <module> cut_set = CutSet.from_manifests( File "/ceph-hw/lhotse/lhotse/cut.py", line 1319, in compute_and_store_features executor.submit( File ***@***.***/3.8.6_1/lib/python3.8/concurrent/futures/process.py", line 645, in submit self._start_queue_management_thread() File ***@***.***/3.8.6_1/lib/python3.8/concurrent/futures/process.py", line 584, in _start_queue_management_thread self._adjust_process_count() File ***@***.***/3.8.6_1/lib/python3.8/concurrent/futures/process.py", line 608, in _adjust_process_count Traceback (most recent call last): File "./prepare.py", line 47, in <module> cut_set = CutSet.from_manifests( File "/ceph-hw/lhotse/lhotse/cut.py", line 1328, in compute_and_store_features cut_set = CutSet.from_cuts(f.result() for f in futures) File "/ceph-hw/lhotse/lhotse/cut.py", line 989, in from_cuts return CutSet({cut.id: cut for cut in cuts}) File "/ceph-hw/lhotse/lhotse/cut.py", line 989, in <dictcomp> return CutSet({cut.id: cut for cut in cuts}) File "/ceph-hw/lhotse/lhotse/cut.py", line 1328, in <genexpr> cut_set = CutSet.from_cuts(f.result() for f in futures) File ***@***.***/3.8.6_1/lib/python3.8/concurrent/futures/_base.py", line 439, in result return self.__get_result() File ***@***.***/3.8.6_1/lib/python3.8/concurrent/futures/_base.py", line 388, in __get_result raise self._exception concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending. ***@***.***/lib/python3.8/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 10 leaked semaphore objects to clean up at shutdown warnings.warn('resource_tracker: There appear to be %d ' BTW, as we pass spawn , it starts a fresh python interpreter process, then will output too many duplicate logs https://github.com/k2-fsa/snowfall/blob/7201fdebd18231df4c3a6a4c198e1d0a7d7c7d22/egs/librispeech/asr/simple_v1/prepare.py#L17-L21 which is a little bit annoying, if would be great if you can fix that together with the error above, but not urgent. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#126>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADZRKQGOFY3FNJGVBFG34Z3SPUUXRANCNFSM4TURVCQA> .

mthrok · 2020-11-13T14:33:37Z

@pzelasko

Further investigating pytorch/audio#1021 and following up your comment on OpenMP, I disabled OpenMP at the time of sox compilation pytorch/audio#1026, and the test seems to get unstack without the use of multiprocessing context with spawn. I am still not sure if this works on other OSs too, and I still have to talk with the team but if this works, we might be able to fix it on torchaudio side.

pzelasko · 2020-11-13T15:38:44Z

Thanks @mthrok - let me know when torchaudio conda/pip packages have the fix, I will then revert the "spawn" thing. Anyway, I expect that Dask executor is immune to this issue.

qindazhu · 2020-11-14T04:06:23Z

@pzelasko, with distributed, I get runtime error below.

File "/ceph-hw/.local/lib/python3.8/site-packages/distributed/process.py", line 33, in _call_and_set_future
    res = func(*args, **kwargs)
  File "/ceph-hw/.local/lib/python3.8/site-packages/distributed/process.py", line 203, in _start
    process.start()
  File "/home/linuxbrew/.linuxbrew/opt/python@3.8/lib/python3.8/multiprocessing/process.py", line 121, in start
    self._popen = self._Popen(self)
  File "/home/linuxbrew/.linuxbrew/opt/python@3.8/lib/python3.8/multiprocessing/context.py", line 284, in _Popen
    return Popen(process_obj)
  File "/home/linuxbrew/.linuxbrew/opt/python@3.8/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 32, in __init__
    super().__init__(process_obj)
  File "/home/linuxbrew/.linuxbrew/opt/python@3.8/lib/python3.8/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/home/linuxbrew/.linuxbrew/opt/python@3.8/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 42, in _launch
    prep_data = spawn.get_preparation_data(process_obj._name)
  File "/home/linuxbrew/.linuxbrew/opt/python@3.8/lib/python3.8/multiprocessing/spawn.py", line 154, in get_preparation_data
    _check_not_importing_main()
  File "/home/linuxbrew/.linuxbrew/opt/python@3.8/lib/python3.8/multiprocessing/spawn.py", line 134, in _check_not_importing_main
    raise RuntimeError('''
RuntimeError: 
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.

pzelasko · 2020-11-14T21:47:04Z

Ha, it could actually be the same reason "spawn" didn't work for you... Could you wrap the script's code into a function (called e.g. def main():) and at the end of script add:

if __name__ == '__main__': 
    main()

That should solve these issues. Basically I think the problem is that the new Python process executes the whole script again while initializing, and the if __name__ == '__main__' idiom prevents that.

qindazhu · 2020-11-16T03:28:05Z

@pzelasko I tried this yesterday, it can run successfully now, many thanks! (and sorry for forgetting tell you this yesterday as we were busy).
However, there is an issue now that the scripts k2-fsa/snowfall#11 will take much longer time to prepare train-clean-100 of Librispeech (more than 1 hours), I wonder what we do now in argumentation as I was thinking we don't need to take so long time to prepare time (before we add argumentation, it only takes about 10-15 minutes to prepare train-clean-100)

danpovey · 2020-11-16T03:44:21Z

Haowen, can you please make a PR for this fix?

…

On Mon, Nov 16, 2020 at 11:28 AM Haowen Qiu ***@***.***> wrote: @pzelasko <https://github.com/pzelasko> I tried this yesterday, it can run successfully now, many thanks! (and sorry for forgetting tell you this yesterday as we were busy). However, there is an issue now that the scripts k2-fsa/snowfall#11 <k2-fsa/snowfall#11> will take much longer time to prepare train-clean-100 of Librispeech (more than 1 hours), I wonder what we do now in argumentation as I was thinking we don't need to take so long time to prepare time (before we add argumentation, it only takes about 10-15 minutes to prepare train-clean-100) — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#126 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAZFLOYUWJB7SPAENX63UULSQCL5FANCNFSM4TURVCQA> .

jimbozhang · 2020-11-16T04:06:27Z

Haowen, can you please make a PR for this fix?
…
On Mon, Nov 16, 2020 at 11:28 AM Haowen Qiu @.***> wrote: @pzelasko https://github.com/pzelasko I tried this yesterday, it can run successfully now, many thanks! (and sorry for forgetting tell you this yesterday as we were busy). However, there is an issue now that the scripts k2-fsa/snowfall#11 <k2-fsa/snowfall#11> will take much longer time to prepare train-clean-100 of Librispeech (more than 1 hours), I wonder what we do now in argumentation as I was thinking we don't need to take so long time to prepare time (before we add argumentation, it only takes about 10-15 minutes to prepare train-clean-100) — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#126 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZFLOYUWJB7SPAENX63UULSQCL5FANCNFSM4TURVCQA .

I'm doing this.

qindazhu · 2020-11-16T04:09:06Z

Thanks @jimbozhang, also wondering how long time you take now to prepare train--clean-100 with the latest scripts, just to make sure it's not my local issue.

jimbozhang · 2020-11-16T04:11:29Z

Thanks @jimbozhang, also wondering how long time you take now to prepare train--clean-100 with the latest scripts, make sure it's not my local issue.

I just ran it on our shared machine (ip: 10.**.*.72) before 10 minutes. I'll let you know when the preparing finished.

qindazhu · 2020-11-16T04:11:48Z

OK, thanks

pzelasko · 2020-11-16T14:35:15Z

Since it helped I'm closing the issue.

mthrok · 2020-11-16T14:58:04Z

@qindazhu

For the clarification, with "spawn" method, you are not facing a crush but you are experiencing the slow down, right?

pzelasko · 2020-11-16T14:59:41Z

@mthrok they fixed the slowdown, it was about setting torch num threads and interop num threads to 1

qindazhu · 2020-11-16T15:25:04Z

@mthrok they fixed the slowdown, it was about setting torch num threads and interop num threads to 1

yes. see @danpovey experiment here k2-fsa/snowfall#18. You can just check the latest prepare.py in snowfall

qindazhu mentioned this issue Nov 13, 2020

revert Piotr's changes k2-fsa/snowfall#9

Closed

pzelasko mentioned this issue Nov 15, 2020

ProcessPoolExecutor error in writting features k2-fsa/snowfall#13

Closed

jimbozhang mentioned this issue Nov 16, 2020

Fix the issue #13 by wrapping the codes with a main() k2-fsa/snowfall#14

Merged

pzelasko closed this as completed Nov 16, 2020

pzelasko added the bug Something isn't working label Nov 16, 2020

mthrok mentioned this issue Nov 16, 2020

Temporarily Disable OpenMP support for libsox pytorch/audio#1026

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

use `mp_context=multiprocessing.get_context("spawn")` in ProcessPoolExecutor will crash #126

use `mp_context=multiprocessing.get_context("spawn")` in ProcessPoolExecutor will crash #126

qindazhu commented Nov 13, 2020

pzelasko commented Nov 13, 2020 via email

mthrok commented Nov 13, 2020

pzelasko commented Nov 13, 2020

qindazhu commented Nov 14, 2020

pzelasko commented Nov 14, 2020

qindazhu commented Nov 16, 2020

danpovey commented Nov 16, 2020 via email

jimbozhang commented Nov 16, 2020

qindazhu commented Nov 16, 2020 •

edited

Loading

jimbozhang commented Nov 16, 2020 •

edited

Loading

qindazhu commented Nov 16, 2020

pzelasko commented Nov 16, 2020

mthrok commented Nov 16, 2020

pzelasko commented Nov 16, 2020

qindazhu commented Nov 16, 2020

use mp_context=multiprocessing.get_context("spawn") in ProcessPoolExecutor will crash #126

use mp_context=multiprocessing.get_context("spawn") in ProcessPoolExecutor will crash #126

Comments

qindazhu commented Nov 13, 2020

pzelasko commented Nov 13, 2020 via email

mthrok commented Nov 13, 2020

pzelasko commented Nov 13, 2020

qindazhu commented Nov 14, 2020

pzelasko commented Nov 14, 2020

qindazhu commented Nov 16, 2020

danpovey commented Nov 16, 2020 via email

jimbozhang commented Nov 16, 2020

qindazhu commented Nov 16, 2020 • edited Loading

jimbozhang commented Nov 16, 2020 • edited Loading

qindazhu commented Nov 16, 2020

pzelasko commented Nov 16, 2020

mthrok commented Nov 16, 2020

pzelasko commented Nov 16, 2020

qindazhu commented Nov 16, 2020

use `mp_context=multiprocessing.get_context("spawn")` in ProcessPoolExecutor will crash #126

use `mp_context=multiprocessing.get_context("spawn")` in ProcessPoolExecutor will crash #126

qindazhu commented Nov 16, 2020 •

edited

Loading

jimbozhang commented Nov 16, 2020 •

edited

Loading