Add operator for dot(dns, csr) = csr #8938

anirudh2290 · 2017-12-04T17:38:47Z

Description

Adds operator for dot(dns, csr) = csr. Backward pass will fallback to default implementations.
The performance is better than dot(dns, dns) for sparsity less than 0.5% (c4.8xlarge). Below are the results for tests on c4.8xlarge with OMP_NUM_THREADS set to 32.

========================================================
  mxnet sparse dot benchmark: dot(default, csr) = csr
  (matrix multiplication: (m x k) * (k x n) = m x n)
========================================================
 lhs_density(%)  rhs_density(%)    context        m        k        n  t_sparse(ms)   t_dense(ms)  speedup
            1.0             0.1     cpu(0)     1000      128  1000000        337.74       1810.42     5.36
            1.0             0.1     cpu(0)     1000       64  1000000        172.71       1653.84     9.58
            1.0             0.1     cpu(0)     1000      128  1000000        345.05       1810.87     5.25
            1.0             0.1     cpu(0)      256      128  1000000         89.88        466.65     5.19
            1.0             0.1     cpu(0)     1000      128  1000000        335.76       1785.21     5.32
            0.1             0.1     cpu(0)     1000      128  1000000        332.17       1815.71     5.47
            0.5             0.5     cpu(0)     1000      128  1000000       1718.80       1764.55     1.03
            1.0             1.0     cpu(0)     1000      128  1000000       3434.07       1681.34     0.49
            2.0             2.0     cpu(0)     1000      128  1000000       7621.03       1689.58     0.22
            5.0             5.0     cpu(0)     1000      128  1000000      15715.25       1738.18     0.11
           10.0            10.0     cpu(0)     1000      128  1000000      22354.31       1602.44     0.07
           20.0            20.0     cpu(0)     1000      128  1000000      26913.53       1735.68     0.06

Checklist

Essentials

Passed code style checking (make lint)
Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage
For user-facing API changes, API doc string has been updated. For new C++ functions in header files, their functionalities and arguments are well-documented.
To my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

dot(dns, csr) = csr, tests, (and when applicable, API doc)

rahul003 · 2017-12-05T00:37:39Z

benchmark/python/sparse/dot.py

        if rhs_density > 1 or rhs_density < 0:
-            raise ValueError("rhs_density has to be between 0 and 1")
+            raise ValueError("Value other than csr for lhs not supported")


The check and the error statement don't seem to match?

Good catch! Fixed.

eric-haibin-lin · 2017-12-05T01:57:33Z

src/operator/tensor/dot-inl.h

+ * \brief CPU Kernel of PopulateCsrForNNC
+ * Parallelization by individual rows
+ */
+struct PopulateCsrForNNC {


Can you add brief description on what this kernel is for?

eric-haibin-lin · 2017-12-08T21:26:11Z

src/operator/tensor/dot-inl.h

@@ -231,6 +231,12 @@ inline bool DotForwardInferStorageType(const nnvm::NodeAttrs& attrs,
    dispatched = storage_type_assign(&out_stype, kDefaultStorage,
                                     dispatch_mode, DispatchMode::kFComputeEx);
  }
+  if (!dispatched && lhs_stype == kDefaultStorage && rhs_stype == kCSRStorage &&


Is the implementation only available on CPU? No fallback on GPU ctx?

I have added a check for CPU. will fallback to default storage for gpu

eric-haibin-lin · 2017-12-08T21:27:28Z

src/operator/tensor/dot-inl.h

+                             const OpReqType req, NDArray* ret) {
+  if (kNullOp == req) return;
+  CHECK_EQ(rhs.storage_type(), kCSRStorage);
+  if (!rhs.storage_initialized()) return;


Shall we set the result to be ZerosCsrImpl before return?

eric-haibin-lin · 2017-12-08T21:28:07Z

src/operator/tensor/dot-inl.h

+inline void DotDnsCsrCsrImpl(const OpContext& ctx, const cpu& cpu_dev,
+                             const TBlob& lhs, const NDArray& rhs,
+                             const OpReqType req, NDArray* ret) {
+  if (kNullOp == req) return;


Is kAddTo and kWriteInplace not checked?

Thanks for pointing this. Fixed.

eric-haibin-lin · 2017-12-08T21:30:05Z

src/operator/tensor/dot-inl.h

+          return;
+        }
+
+        dim_t num_threads = mxnet_op::get_num_threads<cpu>(num_rows_l);


nit: const for both

eric-haibin-lin · 2017-12-08T21:32:42Z

src/operator/tensor/dot-inl.h

+            s, num_rows_l, nnc_idx, indptr_out, col_idx_out, nnc, num_rows_l);
+        mxnet_op::Kernel<mxnet_op::set_zero, cpu>::Launch(s, nnz, data_out);
+
+        if (nnc == 0) {


Shouldn't nnc never be 0 here?

Why should nnc never be 0 ? This is possible when number of non zero columns are zero(matrix with all zeros) in the rhs. In this case we return the output correctly.

Because you already checked rhs.storage_initialized() in line 922?

I have removed the if and also added some documentation for storage_initialized

…ense

eric-haibin-lin · 2017-12-12T19:32:32Z

src/operator/tensor/dot-inl.h

+            s, num_rows_l, nnc_idx, indptr_out, col_idx_out, nnc, num_rows_l);
+        mxnet_op::Kernel<mxnet_op::set_zero, cpu>::Launch(s, nnz, data_out);
+
+        if (nnc == 0) {


Because you already checked rhs.storage_initialized() in line 922?

eric-haibin-lin · 2017-12-12T19:36:50Z

src/operator/tensor/dot-inl.h

+    // dns, csr -> csr
+    if (dev_mask == mshadow::cpu::kDevMask) {
+      dispatched = storage_type_assign(&out_stype, kCSRStorage, dispatch_mode,
+                                       DispatchMode::kFComputeEx);


Is output stype consistent on cpu and gpu? The output stype should be consistent to avoid confusion to users (see https://github.com/apache/incubator-mxnet/blob/d2a856a3a2abb4e72edc301b8b821f0b75f30722/src/operator/tensor/matrix_op-inl.h#L400-L418)
The only difference is that on GPU, it performs fallback. If the output stype infers sparse, then it first produce dense output, then cast it to sparse. The fallback is handled in executor already

Thanks! Fixed.

…ense

eric-haibin-lin

LGTM in general. A few minor comments

eric-haibin-lin · 2018-01-03T19:44:49Z

include/mxnet/ndarray.h

@@ -305,7 +305,10 @@ class NDArray {
  bool fresh_out_grad() const;
  /*! \return updated grad state in entry_ */
  void set_fresh_out_grad(bool state) const;
-  // returns true if a sparse ndarray's aux_data and storage are initialized
+  /*! \brief Returns true if a sparse ndarray's aux_data and storage are initialized
+   * Returns false if the indices array shape is inconsistent


"Returns false if the indices array shape is inconsistent" -> it actually throws an exception without returning false

eric-haibin-lin · 2018-01-03T19:51:15Z

src/operator/tensor/dot-inl.h

+    const auto dispatch_ex = invalid_ctx ? DispatchMode::kFComputeFallback
+                                         : DispatchMode::kFComputeEx;
+    dispatched = storage_type_assign(&out_stype, kCSRStorage, dispatch_mode,
+                                     dispatch_ex);
  }


Hmm. we should log storage fallback as long as dispatch mode is dispatch_fallback:
https://github.com/apache/incubator-mxnet/blob/d2a856a3a2abb4e72edc301b8b821f0b75f30722/src/operator/elemwise_op_common.h#L79-L81

Maybe I should move this logic to the common path instead of letting developers specify that in operators
https://github.com/apache/incubator-mxnet/blob/master/src/executor/infer_graph_attr_pass.cc#L45-L54

Yes moving the logic to common path would be nice. I see multiple places where we don't have this check. For example, https://github.com/apache/incubator-mxnet/blob/d2a856a3a2abb4e72edc301b8b821f0b75f30722/src/operator/tensor/elemwise_unary_op_basic.cc#L65 and https://github.com/apache/incubator-mxnet/blob/d2a856a3a2abb4e72edc301b8b821f0b75f30722/src/operator/tensor/elemwise_binary_scalar_op_basic.cc#L68 . These also need to be fixed right ?

Yes. We can fix that in a separate PR.

eric-haibin-lin · 2018-01-03T19:54:41Z

tests/python/unittest/test_sparse_operator.py

@@ -1248,10 +1273,12 @@ def test_sparse_dot_zero_output(lhs_shape, trans_lhs, rhs_num_cols):
        test_dot_csr(lhs_shape, (lhs_shape[0], 1), 'default', True,  lhs_d, rhs_d)  # (vector kernel)
        test_dot_csr(lhs_shape, (lhs_shape[1], rnd.randint(5, 10)), 'default', False, lhs_d, rhs_d)  # test gpu SpMM
        test_dot_csr(lhs_shape, (lhs_shape[0], rnd.randint(5, 10)), 'default', True, lhs_d, rhs_d)  # (scalar kernel)
+        test_dot_dns_csr(lhs_shape, (lhs_shape[1], rnd.randint(500, 1000)), lhs_d, lhs_d)


randint(50,200) is large (and slow) enough for testing. No need to increase the dim to 1000.

eric-haibin-lin · 2018-01-03T19:55:34Z

src/operator/tensor/dot-inl.h

+  using namespace mshadow::expr;
+  using nnvm::dim_t;
+
+  /*Initialize data structures*/


nit: space after /*

eric-haibin-lin · 2018-01-03T19:59:40Z

src/operator/tensor/dot-inl.h

+    const CType start_idx = i * nnc;
+    nnvm::dim_t cur = 0;
+    indptr_out[i] = start_idx;
+    if (i == static_cast<int>(num_rows_l - 1)) indptr_out[i + 1] = indptr_out[i] + nnc;


As we are adding large array support in the future, it's more appropriate to cast i up to dim_t instead of cast num_rows_l down to int.

…ense

anirudh2290 · 2018-01-04T00:26:56Z

@eric-haibin-lin Thank you for reviewing! I have made the necessary changes.

eric-haibin-lin · 2018-01-04T00:30:09Z

Is the operator documentation not updated?

anirudh2290 · 2018-01-04T00:40:55Z

Added dot(dns, csr) = csr to operator doc

* Add operator for dot(dns, csr) = csr * Fix whitespace * Add comments * Add comments and fix error message * Fixes for dot dns csr * Fixes * Remove non required statements * Add fallback for GPU * Remove unused if * Fix comments and casting * Add operator to the documentation

anirudh2290 added 3 commits December 4, 2017 17:18

Add operator for dot(dns, csr) = csr

cffb161

Fix whitespace

73a9f3d

Add comments

362af57

eric-haibin-lin self-assigned this Dec 4, 2017

rahul003 reviewed Dec 5, 2017

View reviewed changes

eric-haibin-lin reviewed Dec 5, 2017

View reviewed changes

Add comments and fix error message

a1b3db0

eric-haibin-lin reviewed Dec 8, 2017

View reviewed changes

anirudh2290 added 5 commits December 9, 2017 19:58

Fixes for dot dns csr

3fad020

Fixes

2521627

Merge branch 'master' of https://github.com/dmlc/mxnet into dot_csr_d…

4afbd3a

…ense

Remove non required statements

7077c53

Merge branch 'master' of https://github.com/dmlc/mxnet into dot_csr_d…

0747ada

…ense

eric-haibin-lin reviewed Dec 12, 2017

View reviewed changes

anirudh2290 added 3 commits December 13, 2017 01:20

Add fallback for GPU

fb20f45

Remove unused if

44789d2

Merge branch 'master' of https://github.com/dmlc/mxnet into dot_csr_d…

1525405

…ense

eric-haibin-lin reviewed Jan 3, 2018

View reviewed changes

anirudh2290 added 2 commits January 3, 2018 22:20

Merge branch 'master' of https://github.com/dmlc/mxnet into dot_csr_d…

d5d3216

…ense

Fix comments and casting

d6f13a3

Add operator to the documentation

9a81b78

eric-haibin-lin merged commit 8505442 into apache:master Jan 4, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add operator for dot(dns, csr) = csr #8938

Add operator for dot(dns, csr) = csr #8938

anirudh2290 commented Dec 4, 2017 •

edited

Loading

rahul003 Dec 5, 2017

anirudh2290 Dec 5, 2017

eric-haibin-lin Dec 5, 2017

anirudh2290 Dec 5, 2017

eric-haibin-lin Dec 8, 2017

anirudh2290 Dec 10, 2017

eric-haibin-lin Dec 8, 2017

anirudh2290 Dec 10, 2017

eric-haibin-lin Dec 8, 2017

anirudh2290 Dec 10, 2017

eric-haibin-lin Dec 8, 2017

eric-haibin-lin Dec 8, 2017

anirudh2290 Dec 10, 2017

eric-haibin-lin Dec 12, 2017

anirudh2290 Dec 13, 2017

eric-haibin-lin Dec 12, 2017

eric-haibin-lin Dec 12, 2017

anirudh2290 Dec 13, 2017

eric-haibin-lin left a comment •

edited

Loading

eric-haibin-lin Jan 3, 2018

eric-haibin-lin Jan 3, 2018

anirudh2290 Jan 3, 2018 •

edited

Loading

eric-haibin-lin Jan 4, 2018

eric-haibin-lin Jan 3, 2018

eric-haibin-lin Jan 3, 2018

eric-haibin-lin Jan 3, 2018

anirudh2290 commented Jan 4, 2018

eric-haibin-lin commented Jan 4, 2018

anirudh2290 commented Jan 4, 2018

Add operator for dot(dns, csr) = csr #8938

Add operator for dot(dns, csr) = csr #8938

Conversation

anirudh2290 commented Dec 4, 2017 • edited Loading

Description

Checklist

Essentials

Changes

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eric-haibin-lin left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

anirudh2290 Jan 3, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

anirudh2290 commented Jan 4, 2018

eric-haibin-lin commented Jan 4, 2018

anirudh2290 commented Jan 4, 2018

anirudh2290 commented Dec 4, 2017 •

edited

Loading

eric-haibin-lin left a comment •

edited

Loading

anirudh2290 Jan 3, 2018 •

edited

Loading