Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

segfault on mx1.8-cu110 with python3.7 #19556

Open
waytrue17 opened this issue Nov 18, 2020 · 0 comments
Open

segfault on mx1.8-cu110 with python3.7 #19556

waytrue17 opened this issue Nov 18, 2020 · 0 comments

Comments

@waytrue17
Copy link
Contributor

Description

Running mxnet-horovod example incubator-mxnet/example/distributed_training-horovod/gluon_mnist.py on mxnet1.8-cuda11.0 with python 3.7 encountered a segfault error. The error occurred after the example script finished.
The same script works fine on mxnet1.8-cuda10.2 with python 3.7 and mxnet1.8-cuda11.0 with python 3.6.

To Reproduce

Steps to reproduce

  1. Launch an EC2 p3.8x gpu instance with dlami: ami-02440419a5afe47ab
  2. Build mx1.8-cu110 from source
  3. Install Horovod python3 -m pip install horovod
  4. Run LD_LIBRARY_PATH=/usr/local/cuda-11.0/lib64:$LD_LIBRARY_PATH python3 \ incubator-mxnet/example/distributed_training-horovod/gluon_mnist.py to reproduce the error

What have you tried to solve it?

  1. Backport Remove cleanup on side threads #19378 to v1.8.x solved the issue
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants