Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

MKLDNN header missing in mxnet-cu102==1.7.0 pip package #19575

Closed
jasperzhong opened this issue Nov 22, 2020 · 22 comments
Closed

MKLDNN header missing in mxnet-cu102==1.7.0 pip package #19575

jasperzhong opened this issue Nov 22, 2020 · 22 comments

Comments

@jasperzhong
Copy link

I encountered an compilation error showing that <mkldnn.h> is not found when I tried to compile BytePS with mxnet-cu102==1.7.0. I checked the mxnet package and indeed there is no MKLDNN header.

However, interestingly I found that it supports MKLDNN. And later I found MKLDNN is enabled by default in 1.7.0. #16899

image

So I suggest including the headers in the pip package.

@bartekkuncer
Copy link
Contributor

bartekkuncer commented Nov 23, 2020

Hi @vycezhong,
I saw that there was simillar issue in the past: #18120, which has been fixed here: #18310 and #18355. I will check why this issue is occurring again.
Thanks for reporting.

@bartekkuncer
Copy link
Contributor

I have checked that the fix has been applied to branch v1.7.x but not to the tag 1.7.0 which apparently is used to build the pip package.

@bartekkuncer
Copy link
Contributor

Please try using: https://repo.mxnet.io/dist/python/cu102/mxnet_cu102-1.7.0b20200813-py2.py3-none-manylinux2014_x86_64.whl. I believe it has all necessary headers.

@jasperzhong
Copy link
Author

Please try using: https://repo.mxnet.io/dist/python/cu102/mxnet_cu102-1.7.0b20200813-py2.py3-none-manylinux2014_x86_64.whl. I believe it has all necessary headers.

Thanks for your help.

@bartekkuncer
Copy link
Contributor

Please try using: https://repo.mxnet.io/dist/python/cu102/mxnet_cu102-1.7.0b20200813-py2.py3-none-manylinux2014_x86_64.whl. I believe it has all necessary headers.

Thanks for your help.

@vycezhong I understand that the package I proposed works for you? :)

@szha @leezu Should we release patch v1.7.1 which will fix the lacking headers or just wait for official minor v1.8.0 release?

@jasperzhong
Copy link
Author

Yes it works for me. Thanks.

@szha
Copy link
Member

szha commented Nov 25, 2020

@bartekkuncer I think we should. At least it should be fixed with a post-release

@bartekkuncer
Copy link
Contributor

bartekkuncer commented Nov 26, 2020

@bartekkuncer I think we should. At least it should be fixed with a post-release

@szha Ok, I will look into it. What is the post-release flow?

@szha
Copy link
Member

szha commented Nov 26, 2020

@bartekkuncer it's just about repackaging and releasing on PyPI as another version. Since there is some ongoing license issue that needs to be sorted out first, we are putting changes to PyPI changes on hold. Will revisit once the issue is resolved.

@EnricoMi
Copy link

EnricoMi commented Jan 26, 2021

Is there an ETA on releasing v1.7.1 or v1.7.0.post1 for cu101 and cu102?

@bartekkuncer
Copy link
Contributor

Is there an ETA on releasing v1.7.1 or v1.7.0.post1 for cu101 and cu102?

@szha

@szha
Copy link
Member

szha commented Jan 26, 2021

Thanks for the ping. We resolved the license issue and I can get to it this week.

@EnricoMi
Copy link

Awesome, thanks for picking it up!

@szha
Copy link
Member

szha commented Feb 1, 2021

@EnricoMi
Copy link

EnricoMi commented Feb 2, 2021

@szha thanks for the cu101 and cu102 releases, is a release with MKLDNN headers planned for mxnet as well? Version mxnet-1.7.0.post1 does not include headers.

@EnricoMi
Copy link

EnricoMi commented Feb 2, 2021

With mxnet-cu101==1.7.0.post0, I get the following error:

    In file included from /usr/local/lib/python3.8/dist-packages/mxnet/include/mkldnn/mkldnn.hpp:24:0,
                     from /usr/local/lib/python3.8/dist-packages/mxnet/include/mxnet/ndarray.h:41,
                     from /tmp/pip-req-build-n16mfgkd/horovod/mxnet/mpi_ops.h:24,
                     from /tmp/pip-req-build-n16mfgkd/horovod/mxnet/mpi_ops.cc:21:
    /usr/local/lib/python3.8/dist-packages/mxnet/include/mkldnn/dnnl.hpp:23:10: fatal error: dnnl_config.h: No such file or directory

With mxnet-cu101==1.7.0b20200813, everything worked fine. Are there any changes between those two versions that I have to incorporate?

@leezu
Copy link
Contributor

leezu commented Feb 2, 2021

It's a bug in the post0. I'm reopening the issue

@leezu leezu reopened this Feb 2, 2021
@szha
Copy link
Member

szha commented Feb 5, 2021

Sorry, the two dnnl_* files were missed in post0. I posted 1.7.0.post1 to cu101/cu102 wheels which should include the necessary mkldnn headers (thanks @leezu for verifying). Let me look into including the headers for the CPU versions.

@EnricoMi
Copy link

EnricoMi commented Feb 6, 2021

I can confirm 1.7.0.post1 for cu101 works for Horovod. Is a release of 1.7.0.post1 for cu110 also planned for the near future?

@szha
Copy link
Member

szha commented Feb 7, 2021

@EnricoMi our 1.8.0 release is in the pipeline so I think it's more likely to have cu110 release for that.

@szha
Copy link
Member

szha commented Feb 7, 2021

I uploaded the post2 versions of the CPU wheels to include mkldnn headers.

@szha szha closed this as completed Feb 7, 2021
@EnricoMi
Copy link

@szha that post2 version for CPU works for Horovod, thanks!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants