Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

[FEATURE] AdaBelief operator #20065

Merged
merged 5 commits into from
Apr 30, 2021
Merged

[FEATURE] AdaBelief operator #20065

merged 5 commits into from
Apr 30, 2021

Conversation

khaotik
Copy link
Contributor

@khaotik khaotik commented Mar 20, 2021

Description

#19950
Hello, this is my first PR to apache MXNet. Glad to work with the community.

Checklist

Essentials

  • PR's title starts with a category (e.g. [BUGFIX], [MODEL], [TUTORIAL], [FEATURE], [DOC], etc)
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage
  • Code is well-documented

Changes

  • added C++ operator under src/operator/contrib/adabelief*
  • briefly modified src/operator/contrib/adamw* to avoid symbol name clashes
  • src/operator/contrib/adabelief*

Comments

  • This is mostly a copycat of AdamW optimizer, and therefore fair amount of code duplication. It might be a good idea to perform a refactor regarding adam-like optimizers later.

@mxnet-bot
Copy link

Hey @khaotik , Thanks for submitting the PR
All tests are already queued to run once. If tests fail, you can trigger one or more tests again with the following commands:

  • To trigger all jobs: @mxnet-bot run ci [all]
  • To trigger specific jobs: @mxnet-bot run ci [job1, job2]

CI supported jobs: [website, windows-gpu, unix-cpu, edge, miscellaneous, sanity, centos-cpu, clang, windows-cpu, centos-gpu, unix-gpu]


Note:
Only following 3 categories can trigger CI :PR Author, MXNet Committer, Jenkins Admin.
All CI tests must pass before the PR can be merged.

@szha
Copy link
Member

szha commented Mar 21, 2021

@khaotik thanks for the contribution! It looks like CI fails to start due to a merge failure. Perhaps you could squash the commits into a single commit, rebase to the latest master, and then do a force push?

@lanking520 lanking520 added pr-awaiting-testing PR is reviewed and waiting CI build and test pr-work-in-progress PR is still work in progress and removed pr-awaiting-testing PR is reviewed and waiting CI build and test labels Mar 21, 2021
@szha
Copy link
Member

szha commented Mar 21, 2021

OK the CI is working for this PR now. There are some formatting issues to address in the sanity check: https://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Fsanity/detail/PR-20065/2/pipeline#step-75-log-169

@lanking520 lanking520 added pr-awaiting-testing PR is reviewed and waiting CI build and test pr-work-in-progress PR is still work in progress and removed pr-work-in-progress PR is still work in progress pr-awaiting-testing PR is reviewed and waiting CI build and test labels Mar 21, 2021
@lanking520 lanking520 added pr-awaiting-testing PR is reviewed and waiting CI build and test pr-work-in-progress PR is still work in progress and removed pr-work-in-progress PR is still work in progress pr-awaiting-testing PR is reviewed and waiting CI build and test labels Mar 21, 2021
@lanking520 lanking520 added pr-awaiting-testing PR is reviewed and waiting CI build and test pr-work-in-progress PR is still work in progress and removed pr-work-in-progress PR is still work in progress pr-awaiting-testing PR is reviewed and waiting CI build and test labels Mar 21, 2021
@khaotik
Copy link
Contributor Author

khaotik commented Mar 21, 2021

Hmm, test failed due to gcc "internal compiler error" during build phase. I find this a bit puzzling.

[2021-03-21T15:52:36.650Z] FAILED: CMakeFiles/mxnet.dir/src/operator/numpy/np_kron.cc.o 

[2021-03-21T15:52:36.650Z] /usr/local/bin/ccache /usr/bin/g++-7  -DDMLC_CORE_USE_CMAKE -DDMLC_LOG_FATAL_THROW=1 -DDMLC_LOG_STACK_TRACE_SIZE=0 -DDMLC_MODERN_THREAD_LOCAL=0 -DDMLC_STRICT_CXX11 -DDMLC_USE_CXX11 -DDMLC_USE_CXX11=1 -DDMLC_USE_CXX14 -DMSHADOW_FORCE_STREAM -DMSHADOW_INT64_TENSOR_SIZE=1 -DMSHADOW_IN_CXX11 -DMSHADOW_USE_CBLAS=1 -DMSHADOW_USE_CUDA=1 -DMSHADOW_USE_CUDNN -DMSHADOW_USE_MKL=0 -DMSHADOW_USE_SSE -DMXNET_USE_BLAS_OPEN=1 -DMXNET_USE_CUDA=1 -DMXNET_USE_ILP64_LAPACKE=1 -DMXNET_USE_INTGEMM=1 -DMXNET_USE_LAPACK=1 -DMXNET_USE_LAPACKE_INTERFACE=1 -DMXNET_USE_LIBJPEG_TURBO=0 -DMXNET_USE_NVTX=1 -DMXNET_USE_ONEDNN=1 -DMXNET_USE_OPENCV=1 -DMXNET_USE_OPENMP=1 -DMXNET_USE_OPERATOR_TUNING=1 -DMXNET_USE_SIGNAL_HANDLER=1 -DUSE_CUDNN -D__USE_XOPEN2K8 -Dmxnet_EXPORTS -I/work/mxnet/3rdparty/onednn/include -I3rdparty/onednn/include -I/work/mxnet/include -I/work/mxnet/src -I/work/mxnet/3rdparty/tvm/nnvm/include -I/work/mxnet/3rdparty/tvm/include -I/work/mxnet/3rdparty/dmlc-core/include -I/work/mxnet/3rdparty/dlpack/include -I/work/mxnet/3rdparty/mshadow -I3rdparty/intgemm -I/work/mxnet/3rdparty/intgemm -I/work/mxnet/3rdparty/onednn/src/../include -I/work/mxnet/3rdparty/miniz -I3rdparty/dmlc-core/include -isystem /usr/include/opencv4 -isystem /usr/local/cuda/include -D_GLIBCXX_ASSERTIONS  -Wall -Wno-sign-compare -O3 -g -fopenmp -O2 -g -DNDEBUG -fPIC   -Werror -Wno-error=unused-variable -Wno-unused-parameter -Wno-unknown-pragmas -Wno-unused-local-typedefs -msse3 -mf16c -fopenmp -std=gnu++1z -MD -MT CMakeFiles/mxnet.dir/src/operator/numpy/np_kron.cc.o -MF CMakeFiles/mxnet.dir/src/operator/numpy/np_kron.cc.o.d -o CMakeFiles/mxnet.dir/src/operator/numpy/np_kron.cc.o -c /work/mxnet/src/operator/numpy/np_kron.cc

[2021-03-21T15:52:36.650Z] g++-7: internal compiler error: Killed (program cc1plus)

@szha
Copy link
Member

szha commented Mar 21, 2021

@khaotik that might have been the host running out of memory.

There are some additional errors related to AMP: https://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Fcentos-gpu/detail/PR-20065/1/pipeline/#step-170-log-891.

The way to fix it is to include the new operators in this list, which signifies that when AMP is enabled, these operators need not to be casted.

@lanking520 lanking520 added pr-awaiting-testing PR is reviewed and waiting CI build and test and removed pr-work-in-progress PR is still work in progress labels Mar 22, 2021
@lanking520 lanking520 added pr-work-in-progress PR is still work in progress and removed pr-awaiting-testing PR is reviewed and waiting CI build and test labels Mar 22, 2021
@khaotik
Copy link
Contributor Author

khaotik commented Mar 22, 2021

Here are failed tests:

tests/python/unittest/test_operator.py::test_layer_norm
tests/python/unittest/test_operator.py::test_norm

I can't see immediate relationship with my PR here. Do they just fail at random?

@mxnet-bot run ci [centos-cpu, unix-cpu]

@mxnet-bot
Copy link

Jenkins CI successfully triggered : [centos-cpu, unix-cpu]

@lanking520 lanking520 added pr-awaiting-testing PR is reviewed and waiting CI build and test pr-awaiting-review PR is waiting for code review and removed pr-work-in-progress PR is still work in progress pr-awaiting-testing PR is reviewed and waiting CI build and test labels Mar 22, 2021
@szha
Copy link
Member

szha commented Mar 22, 2021

Here are failed tests:


tests/python/unittest/test_operator.py::test_layer_norm

tests/python/unittest/test_operator.py::test_norm

I can't see immediate relationship with my PR here. Do they just fail at random?

@mxnet-bot run ci [centos-cpu, unix-cpu]

Yes I believe those are just flaky and we can overcome with rerunning the failed pipelines for now.

@mxnet-bot
Copy link

Jenkins CI successfully triggered : [unix-cpu, centos-cpu]

@lanking520 lanking520 added pr-awaiting-testing PR is reviewed and waiting CI build and test pr-work-in-progress PR is still work in progress and removed pr-awaiting-review PR is waiting for code review pr-awaiting-testing PR is reviewed and waiting CI build and test pr-work-in-progress PR is still work in progress labels Mar 22, 2021
@khaotik
Copy link
Contributor Author

khaotik commented Mar 22, 2021

@mxnet-bot run ci [centos-cpu]

@mxnet-bot
Copy link

Jenkins CI successfully triggered : [centos-cpu]

@lanking520 lanking520 added pr-awaiting-testing PR is reviewed and waiting CI build and test pr-awaiting-review PR is waiting for code review and removed pr-work-in-progress PR is still work in progress pr-awaiting-testing PR is reviewed and waiting CI build and test labels Mar 22, 2021
Copy link
Member

@szha szha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@szha szha merged commit 059a055 into apache:master Apr 30, 2021
@szha
Copy link
Member

szha commented Apr 30, 2021

@khaotik thanks for the contribution!

@pribadihcr
Copy link

Hi, How to enable AdaBelief or AdamW in c++ here https://github.com/apache/mxnet/blob/v2.0.0.beta1/cpp-package/include/mxnet-cpp/optimizer.hpp. Thanks

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
pr-awaiting-review PR is waiting for code review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants