Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

[1.7] Backport #17885 #18127

Closed
wants to merge 3 commits into from
Closed

[1.7] Backport #17885 #18127

wants to merge 3 commits into from

Conversation

samskalicky
Copy link
Contributor

Backport #17885

ptrendx and others added 3 commits April 18, 2020 16:24
apache#18095)

* Vectorized loads for binary elemwise kernel

* More generalization

* Add backwardusenone

* Remove the unused _backward_add op

* Add vectorized backwardusein

* Extending vectorization to more binary ops, binary ops with scalar and
unary ops

* Handling ElementwiseSum

* Get rid of half2 in mshadow

* Remove backward_elemwiseaddex

* Revert "Remove the unused _backward_add op"

This reverts commit f86da86.

* Revert "Remove backward_elemwiseaddex"

This reverts commit 7729114.

* Add back the backward_add since C++ test relies on it

* Test bcast implementations

* First version of vecotrized bcast

* Adding single side vectorized bcast kernel

* Removing debug prints

* Actually run the single side kernel

* Move the default implementation of bcast to the vectorized one

* Limit the new implementation to GPU only

* Enabling vectorization when broadcast does not actually do broadcast

* Cleaning

* Cleaning part 2

* Fix for numpy ops using stuff from broadcast

* Fix

* Fix lint

* Try to debug pinv numpy test

* Fix

* Fix the vectorized broadcast implementation for misaligned input
pointers

* Added tests

* Added docs to cuda_vectorization.cuh

* Another fix for broadcast and fix INT64 compilation

* Optimize for aligned=true

* 1 more addition to test

* Reverting the change to Numpy op test

* Trying mcmodel=medium to fix the failure in CMake static build

* Revert "Trying mcmodel=medium to fix the failure in CMake static build"

This reverts commit 1af684c.

* Limiting the PR to just elementwise ops
* add debug prints to debug error in CI

* add debug prints to debug error in CI

* remove prints

* initial commit

* enabled calling create for selector

* connected selector to call external class

* added code to remove temp graph attrs

* fixed build issues

* changed shape inference to use different attr names

* fixed selector class

* cleaned up APIs

* fixed sanity

* updated build for extensions

* sanity fix

* refactored MXLoadLib into separate functions

* undo rebase

* finished merge

* enabled verbose in library loading

* fixed example

* added passing args/aux down to graph pass

* added creating new args/aux for graph passes

* fixed return args/aux

* fixed sanity

* whitespace

* fixed lint

* updated perl API, README, added pass_lib to cmake build flow

* fixed mistake with relu example lib

* fixed perl syntax

* addressed comments

* addressed more comments

* fixed compile issues

Co-authored-by: Ubuntu <ubuntu@ip-172-31-31-148.us-west-2.compute.internal>
Co-authored-by: Ubuntu <ubuntu@ip-172-31-31-217.us-west-2.compute.internal>
@mxnet-bot
Copy link

Hey @samskalicky , Thanks for submitting the PR
All tests are already queued to run once. If tests fail, you can trigger one or more tests again with the following commands:

  • To trigger all jobs: @mxnet-bot run ci [all]
  • To trigger specific jobs: @mxnet-bot run ci [job1, job2]

CI supported jobs: [edge, windows-cpu, windows-gpu, miscellaneous, sanity, unix-cpu, centos-cpu, clang, website, unix-gpu, centos-gpu]


Note:
Only following 3 categories can trigger CI :PR Author, MXNet Committer, Jenkins Admin.
All CI tests must pass before the PR can be merged.

@samskalicky
Copy link
Contributor Author

#18128

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants