[REVIEW] libcudf: generic `reduction` and `scan` support #1005

kovaltan · 2019-02-20T13:00:24Z

Merge gdf_sum, gdf_product, gdf_sum_of_squares, gdf_min, gdf_max into a single cudf::reduction API.
Reneme gdf_prefixsum into cudf::scan and support 'min', 'max', 'product' operation.
Add min and max to cudf::reduction for support non-arithmetic types, e.g., GDF_CATEGORY, GDF_DATE32, GDF_DATE64, GDF_TIMESTAMP.
Use single step reduction for cudf::reduction and remove gdf_reduce_optimal_output_size.
Use gdf_scalar for output of gdf_reduction.

Closes #443 (Add unit tests for functionality in reductions.cu)
Closes #446 (CUDA reductions should have independent input and output types)
Closes #954 (libcudf reductions should support non-arithmetic types for some operations)
Closes #978 (libcudf should provide a generic scan function)
Related #1224 (Calling sum on a boolean column returns a boolean)

reduction tasks:

scan tasks:

Add min, max, product for scan
Rename gdf_prefixsum into gdf_scan (Issue [FEA] libcudf should provide a generic scan function #978)
Add gtest for gdf_scan
Rename python binding for gdf_scan
Update python test by using gdf_scan
Use GDF_EXPECTS macro and throw exceptions if failed.
migrate python test cpp/python/libgdf_cffi/tests -> python/tests
add "dtype" paramter to python Series.min, max, sum. product, ...

This PL also included these macros: CUDF_FAIL CUDF_EXPECT_NO_THROW
Then, RMM_TRY has been changed to throw exception at failed.

Examples:

switch(enum e){
case Enum_1: ...
case Enum_2: ...
default:
  CUDF_FAIL("Invalid enum input");
}

CUDF_FAIL(msg) is same with CUDF_EXPECTS(false, msg)

CUDF_EXPECT_NO_THROW( gdf_scan(raw_input, raw_output, op, inclusive));

CUDF_EXPECT_NO_THROW is used only in gtests, and it is the utility macro for debugging.
The testing is same with EXPECT_NO_THROW in gtests, but it prints out the error message from CUDF_FAIL and CUDF_EXPECTS.

Supported in `ReduceDispatcher` Not tested yet.

felipeblazing

Can we add some unit tests that perform aggregations for non-arithmetic types?

New file: reduction_operators.cuh

cpp/src/reductions/reductions.cu

jrhemstad · 2019-02-27T18:58:47Z

See https://github.com/rapidsai/cudf/pull/892/files#diff-4b0b6cd3d7dabc1501cc00b5b13d9370R93 for gdf_scalar which should be used for the return value of a reduction.

This is intermediate implementation. The grid size = 1 at this point. ToDo: work with grids using atomic operation.

…_non_arithmetic

Rename: ReduceOp::launch() -> Reduce() Remove: ReduceOp::launch_onece()

Single step reduction is performed by using `atomicCAS`.

Use `constexpr bool is_nonarithmetic_op` instead of `DeviceForNonArithmetic`

kovaltan · 2019-03-06T09:31:15Z

Hi @jrhemstad
Regarding to the memory location of the result ofgdf_reduction.
If gdf_scalar is at host, gdf_scalar.data is also at host memory.

cudf/cpp/include/cudf/types.h

Lines 110 to 114 in 78a3cc8

    
           typedef struct { 
        
             gdf_data  data;      /**< Pointer to the scalar data */ 
        
             gdf_dtype dtype;     /**< The datatype of the scalar's data */ 
        
             bool      is_valid;  /**< False if the value is null */ 
        
           } gdf_scalar;

I'd like to make sure if you mean gdf_reduction should write back the result into host memory.
If the result should be at device memory, using pointer of gdf_data in device memory seems enough.

…_non_arithmetic

eyalroz · 2019-03-07T08:39:04Z

How does this new reduction code handle overflows?

cpp/src/reductions/reductions.cu

cpp/src/reductions/scan.cu

cpp/src/reductions/reduction_operators.cuh

kovaltan · 2019-03-08T05:17:59Z

Hi @eyalroz

How does this new reduction code handle overflows?

No, there is no plan to handle overflows.
Instead, this PL will support output type (Issue #446).

GPUtester · 2019-03-08T05:18:00Z

Can one of the admins verify this patch?

eyalroz · 2019-03-08T10:42:43Z

@kovaltan :

No, there is no plan to handle overflows.

So, I suggest that we should use the opportunity of touching this code to clearly document this fact in the relevant Doxygen comments.

kovaltan · 2019-03-08T11:30:11Z

So, I suggest that we should use the opportunity of touching this code to clearly document this fact in the relevant Doxygen comments.

OK, I will document it at API doc after I implement output precision support.

Use scalar retval instead of numpy array Remove [0] suffix from min() for gpu_scale use np.float64( ) instead of astype(f8) because it's not numpy array.

- changed the behavior if input column is empty - minor correction

This commit fails at cython

This reverts commit d035194.

Move enums from cudf/types.h to reduction.hpp

kkraus14

Print statements need to be removed and then question regarding overflows

python/cudf/bindings/reduce.pyx

python/cudf/tests/test_reductions.py

Move `get_scalar_value` into `cudf_cpp`

…_non_arithmetic

…to bug_reduction_non_arithmetic

Addressed

Min/Max reduction support for non-arithmetic types

22ac4c0

Supported in `ReduceDispatcher` Not tested yet.

kkraus14 added 2 - In Progress Currently a work in progress libcudf Affects libcudf (C++/CUDA) code. labels Feb 20, 2019

felipeblazing suggested changes Feb 25, 2019

View reviewed changes

Separate devive operators to a header file

71a1c81

New file: reduction_operators.cuh

jrhemstad reviewed Feb 27, 2019

View reviewed changes

cpp/src/reductions/reductions.cu Outdated Show resolved Hide resolved

kovaltan changed the title ~~[WIP] libcudf reductions should support non-arithmetic types for some operations~~ [WIP] libcudf: generic reduction and scan support Feb 28, 2019

kovaltan added 7 commits February 28, 2019 20:46

Update operator

20fb844

Fake single step

dc5ec61

This is intermediate implementation. The grid size = 1 at this point. ToDo: work with grids using atomic operation.

Merge remote-tracking branch 'upstream/branch-0.6' into bug_reduction…

a338295

…_non_arithmetic

Simplify the code

8aad3c6

Rename: ReduceOp::launch() -> Reduce() Remove: ReduceOp::launch_onece()

Implement: single step reduction

cc3a457

Single step reduction is performed by using `atomicCAS`.

Add: specialized functions for operators

54a5219

Remove dev_result_size for internal calls

3ce2526

fondaing changed the base branch from branch-0.6 to branch-0.7 March 5, 2019 18:13

kovaltan added 2 commits March 6, 2019 13:33

Add: constexpr is_nonarithmetic_op

e012acb

Use `constexpr bool is_nonarithmetic_op` instead of `DeviceForNonArithmetic`

Add: candidate of generic scan function

e68dc43

kovaltan mentioned this pull request Mar 7, 2019

gdf_reduce_optimal_output_size() is a misnomer #610

Closed

Merge remote-tracking branch 'upstream/branch-0.7' into bug_reduction…

70e0d64

…_non_arithmetic

eyalroz reviewed Mar 7, 2019

View reviewed changes

cpp/src/reductions/reductions.cu Outdated Show resolved Hide resolved

cpp/src/reductions/scan.cu Show resolved Hide resolved

jrhemstad reviewed Mar 7, 2019

View reviewed changes

cpp/src/reductions/reduction_operators.cuh Outdated Show resolved Hide resolved

jrhemstad mentioned this pull request Mar 8, 2019

[REVIEW] Update to use type_dispatcher #1076

Merged

6 tasks

This was referenced Apr 9, 2019

[FEA] Series level cumulative min #1274

Closed

[FEA] Series level cumulative sum #1269

Closed

kovaltan added 4 commits April 10, 2019 12:50

Use scalar for retval of aepply_reduction

5685064

Use scalar retval instead of numpy array Remove [0] suffix from min() for gpu_scale use np.float64( ) instead of astype(f8) because it's not numpy array.

Fix flake8 error

eac4893

Update from review

4edc1ba

- changed the behavior if input column is empty - minor correction

Add test case for empty column

1d4f0aa

kovaltan requested review from thomcom and kkraus14 April 10, 2019 08:36

kovaltan added 4 commits April 10, 2019 17:50

Add: nullptr check as in/out columns for scan

f22e268

DO_NOT_MERGE: enum class for cython

d035194

This commit fails at cython

Revert "DO_NOT_MERGE: enum class for cython"

1ac55a3

This reverts commit d035194.

Move reduction scan operator enum

f78b95c

Move enums from cudf/types.h to reduction.hpp

kovaltan mentioned this pull request Apr 10, 2019

[BUG] Issues with device atomic overloads for wrapper types #1398

Closed

kovaltan and others added 2 commits April 12, 2019 12:37

Remove stream synchronization

b8a4e23

Merge branch 'branch-0.7' into bug_reduction_non_arithmetic

d5c7105

jrhemstad approved these changes Apr 12, 2019

View reviewed changes

kkraus14 requested changes Apr 12, 2019

View reviewed changes

kovaltan added 3 commits April 12, 2019 23:39

Remove print from python test

aa675f4

Move `get_scalar_value` into `cudf_cpp`

Merge remote-tracking branch 'upstream/branch-0.7' into bug_reduction…

7218c2e

…_non_arithmetic

Merge remote-tracking branch 'origin/bug_reduction_non_arithmetic' in…

8f5609c

…to bug_reduction_non_arithmetic

kkraus14 approved these changes Apr 12, 2019

View reviewed changes

kkraus14 added 5 - Ready to Merge Testing and reviews complete, ready to merge Python Affects Python cuDF API. and removed 0 - Waiting on Author Waiting for author to respond to review labels Apr 12, 2019

kkraus14 merged commit d92a980 into rapidsai:branch-0.7 Apr 12, 2019

harrism mentioned this pull request Apr 13, 2019

[FEA] Implement .min() and .max() of a column #953

Closed

beckernick mentioned this pull request Apr 17, 2019

[REVIEW] Add Series level cumulative ops (sum, min, max, prod) in python layer #1441

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[REVIEW] libcudf: generic `reduction` and `scan` support #1005

[REVIEW] libcudf: generic `reduction` and `scan` support #1005

kovaltan commented Feb 20, 2019 •

edited

Loading

felipeblazing left a comment

jrhemstad commented Feb 27, 2019

kovaltan commented Mar 6, 2019

eyalroz commented Mar 7, 2019

kovaltan commented Mar 8, 2019

GPUtester commented Mar 8, 2019

eyalroz commented Mar 8, 2019

kovaltan commented Mar 8, 2019

kkraus14 left a comment

[REVIEW] libcudf: generic reduction and scan support #1005

[REVIEW] libcudf: generic reduction and scan support #1005

Conversation

kovaltan commented Feb 20, 2019 • edited Loading

felipeblazing left a comment

Choose a reason for hiding this comment

jrhemstad commented Feb 27, 2019

kovaltan commented Mar 6, 2019

eyalroz commented Mar 7, 2019

kovaltan commented Mar 8, 2019

GPUtester commented Mar 8, 2019

eyalroz commented Mar 8, 2019

kovaltan commented Mar 8, 2019

kkraus14 left a comment

Choose a reason for hiding this comment

[REVIEW] libcudf: generic `reduction` and `scan` support #1005

[REVIEW] libcudf: generic `reduction` and `scan` support #1005

kovaltan commented Feb 20, 2019 •

edited

Loading