Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REVIEW] libcudf: generic reduction and scan support #1005

Merged
merged 81 commits into from
Apr 12, 2019

Conversation

kovaltan
Copy link
Contributor

@kovaltan kovaltan commented Feb 20, 2019

Merge gdf_sum, gdf_product, gdf_sum_of_squares, gdf_min, gdf_max into a single cudf::reduction API.
Reneme gdf_prefixsum into cudf::scan and support 'min', 'max', 'product' operation.
Add min and max to cudf::reduction for support non-arithmetic types, e.g., GDF_CATEGORY, GDF_DATE32, GDF_DATE64, GDF_TIMESTAMP.
Use single step reduction for cudf::reduction and remove gdf_reduce_optimal_output_size.
Use gdf_scalar for output of gdf_reduction.

Closes #443 (Add unit tests for functionality in reductions.cu)
Closes #446 (CUDA reductions should have independent input and output types)
Closes #954 (libcudf reductions should support non-arithmetic types for some operations)
Closes #978 (libcudf should provide a generic scan function)
Related #1224 (Calling sum on a boolean column returns a boolean)

reduction tasks:

scan tasks:

  • Add min, max, product for scan
  • Rename gdf_prefixsum into gdf_scan (Issue [FEA] libcudf should provide a generic scan function #978)
  • Add gtest for gdf_scan
  • Rename python binding for gdf_scan
  • Update python test by using gdf_scan
  • Use GDF_EXPECTS macro and throw exceptions if failed.
  • migrate python test cpp/python/libgdf_cffi/tests -> python/tests
  • add "dtype" paramter to python Series.min, max, sum. product, ...

This PL also included these macros: CUDF_FAIL CUDF_EXPECT_NO_THROW
Then, RMM_TRY has been changed to throw exception at failed.

Examples:

switch(enum e){
case Enum_1: ...
case Enum_2: ...
default:
  CUDF_FAIL("Invalid enum input");
}

CUDF_FAIL(msg) is same with CUDF_EXPECTS(false, msg)

CUDF_EXPECT_NO_THROW( gdf_scan(raw_input, raw_output, op, inclusive));

CUDF_EXPECT_NO_THROW is used only in gtests, and it is the utility macro for debugging.
The testing is same with EXPECT_NO_THROW in gtests, but it prints out the error message from CUDF_FAIL and CUDF_EXPECTS.

Supported in `ReduceDispatcher`
Not tested yet.
@kkraus14 kkraus14 added 2 - In Progress Currently a work in progress libcudf Affects libcudf (C++/CUDA) code. labels Feb 20, 2019
Copy link

@felipeblazing felipeblazing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add some unit tests that perform aggregations for non-arithmetic types?

New file: reduction_operators.cuh
@jrhemstad
Copy link
Contributor

See https://github.com/rapidsai/cudf/pull/892/files#diff-4b0b6cd3d7dabc1501cc00b5b13d9370R93 for gdf_scalar which should be used for the return value of a reduction.

@kovaltan kovaltan changed the title [WIP] libcudf reductions should support non-arithmetic types for some operations [WIP] libcudf: generic reduction and scan support Feb 28, 2019
This is intermediate implementation.
The grid size = 1 at this point.
ToDo: work with grids using atomic operation.
Rename: ReduceOp::launch() -> Reduce()
Remove: ReduceOp::launch_onece()
Single step reduction is performed by using `atomicCAS`.
@fondaing fondaing changed the base branch from branch-0.6 to branch-0.7 March 5, 2019 18:13
kovaltan added 2 commits March 6, 2019 13:33
Use `constexpr bool is_nonarithmetic_op` instead of `DeviceForNonArithmetic`
@kovaltan
Copy link
Contributor Author

kovaltan commented Mar 6, 2019

Hi @jrhemstad
Regarding to the memory location of the result ofgdf_reduction.
If gdf_scalar is at host, gdf_scalar.data is also at host memory.

typedef struct {
gdf_data data; /**< Pointer to the scalar data */
gdf_dtype dtype; /**< The datatype of the scalar's data */
bool is_valid; /**< False if the value is null */
} gdf_scalar;

I'd like to make sure if you mean gdf_reduction should write back the result into host memory.
If the result should be at device memory, using pointer of gdf_data in device memory seems enough.

@eyalroz
Copy link
Collaborator

eyalroz commented Mar 7, 2019

How does this new reduction code handle overflows?

cpp/src/reductions/reductions.cu Outdated Show resolved Hide resolved
cpp/src/reductions/scan.cu Show resolved Hide resolved
@kovaltan
Copy link
Contributor Author

kovaltan commented Mar 8, 2019

Hi @eyalroz

How does this new reduction code handle overflows?

No, there is no plan to handle overflows.
Instead, this PL will support output type (Issue #446).

@GPUtester
Copy link
Collaborator

Can one of the admins verify this patch?

@eyalroz
Copy link
Collaborator

eyalroz commented Mar 8, 2019

@kovaltan :

No, there is no plan to handle overflows.

So, I suggest that we should use the opportunity of touching this code to clearly document this fact in the relevant Doxygen comments.

@kovaltan
Copy link
Contributor Author

kovaltan commented Mar 8, 2019

So, I suggest that we should use the opportunity of touching this code to clearly document this fact in the relevant Doxygen comments.

OK, I will document it at API doc after I implement output precision support.

Use scalar retval instead of numpy array
Remove [0] suffix from min() for gpu_scale
use np.float64( ) instead of astype(f8) because it's not numpy array.
- changed the behavior if input column is empty
- minor correction
@kovaltan kovaltan requested review from thomcom and kkraus14 April 10, 2019 08:36
Copy link
Collaborator

@kkraus14 kkraus14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Print statements need to be removed and then question regarding overflows

python/cudf/bindings/reduce.pyx Outdated Show resolved Hide resolved
python/cudf/tests/test_reductions.py Outdated Show resolved Hide resolved
python/cudf/tests/test_reductions.py Outdated Show resolved Hide resolved
python/cudf/tests/test_reductions.py Outdated Show resolved Hide resolved
python/cudf/tests/test_reductions.py Outdated Show resolved Hide resolved
python/cudf/tests/test_reductions.py Show resolved Hide resolved
python/cudf/tests/test_reductions.py Outdated Show resolved Hide resolved
python/cudf/tests/test_reductions.py Outdated Show resolved Hide resolved
python/cudf/tests/test_reductions.py Outdated Show resolved Hide resolved
@kkraus14 kkraus14 added 5 - Ready to Merge Testing and reviews complete, ready to merge Python Affects Python cuDF API. and removed 0 - Waiting on Author Waiting for author to respond to review labels Apr 12, 2019
@kkraus14 kkraus14 merged commit d92a980 into rapidsai:branch-0.7 Apr 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to Merge Testing and reviews complete, ready to merge libcudf Affects libcudf (C++/CUDA) code. Python Affects Python cuDF API.
Projects
None yet
9 participants