Skip to content

Commit

Permalink
Implement concatenate_rows for list type (#8049)
Browse files Browse the repository at this point in the history
This PR closes #7767. It implements `lists::concatenate_rows` that performs concatenation of all list elements at the same rows from the given table of list elements.

For example:
```
s1 = [{0, 1}, {2, 3, 4}, {5}, {}, {6, 7}]
s2 = [{8}, {9}, {}, {10, 11, 12}, {13, 14, 15, 16}]
r = lists::concatenate_rows( table_view{s1, s2} )
r is now [{0, 1, 8}, {2, 3, 4, 9}, {5}, {10, 11, 12}, {6, 7, 13, 14, 15, 16}]
```

Currently, only lists columns of one depth level are supported.

Authors:
  - Nghia Truong (https://github.com/ttnghia)

Approvers:
  - Robert Maynard (https://github.com/robertmaynard)
  - Devavret Makkar (https://github.com/devavret)
  - Ray Douglass (https://github.com/raydouglass)

URL: #8049
  • Loading branch information
ttnghia authored May 3, 2021
1 parent 36eaa06 commit 1debb96
Show file tree
Hide file tree
Showing 7 changed files with 908 additions and 1 deletion.
1 change: 1 addition & 0 deletions conda/recipes/libcudf/meta.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -137,6 +137,7 @@ test:
- test -f $PREFIX/include/cudf/lists/detail/drop_list_duplicates.hpp
- test -f $PREFIX/include/cudf/lists/detail/interleave_columns.hpp
- test -f $PREFIX/include/cudf/lists/detail/sorting.hpp
- test -f $PREFIX/include/cudf/lists/concatenate_rows.hpp
- test -f $PREFIX/include/cudf/lists/count_elements.hpp
- test -f $PREFIX/include/cudf/lists/explode.hpp
- test -f $PREFIX/include/cudf/lists/drop_list_duplicates.hpp
Expand Down
1 change: 1 addition & 0 deletions cpp/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -263,6 +263,7 @@ add_library(cudf
src/join/join.cu
src/join/semi_join.cu
src/lists/contains.cu
src/lists/concatenate_rows.cu
src/lists/copying/concatenate.cu
src/lists/copying/copying.cu
src/lists/copying/gather.cu
Expand Down
68 changes: 68 additions & 0 deletions cpp/include/cudf/lists/concatenate_rows.hpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
/*
* Copyright (c) 2021, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#pragma once

#include <cudf/column/column.hpp>
#include <cudf/lists/lists_column_view.hpp>

namespace cudf {
namespace lists {
/**
* @addtogroup lists_concatenate_rows
* @{
* @file
*/

/*
* @brief Flag to specify whether a null list element will be ignored from concatenation, or the
* entire concatenation result involving null list elements will be a null element.
*/
enum class concatenate_null_policy { IGNORE, NULLIFY_OUTPUT_ROW };

/**
* @brief Row-wise concatenating multiple lists columns into a single lists column.
*
* The output column is generated by concatenating the elements within each row of the input
* table. If any row of the input table contains null elements, the concatenation process will
* either ignore those null elements, or will simply set the entire resulting row to be a null
* element.
*
* @code{.pseudo}
* s1 = [{0, 1}, {2, 3, 4}, {5}, {}, {6, 7}]
* s2 = [{8}, {9}, {}, {10, 11, 12}, {13, 14, 15, 16}]
* r = lists::concatenate_rows(s1, s2)
* r is now [{0, 1, 8}, {2, 3, 4, 9}, {5}, {10, 11, 12}, {6, 7, 13, 14, 15, 16}]
* @endcode
*
* @throws cudf::logic_error if any column of the input table is not a lists columns.
* @throws cudf::logic_error if any lists column contains nested typed entry.
* @throws cudf::logic_error if all lists columns do not have the same entry type.
*
* @param input Table of lists to be concatenated.
* @param null_policy The parameter to specify whether a null list element will be ignored from
* concatenation, or any concatenation involving a null list element will result in a null list.
* @param mr Device memory resource used to allocate the returned column's device memory.
* @return A new column in which each row is a list resulted from concatenating all list elements in
* the corresponding row of the input table.
*/
std::unique_ptr<column> concatenate_rows(
table_view const& input,
concatenate_null_policy null_policy = concatenate_null_policy::IGNORE,
rmm::mr::device_memory_resource* mr = rmm::mr::get_current_device_resource());

/** @} */ // end of group
} // namespace lists
} // namespace cudf
1 change: 1 addition & 0 deletions cpp/include/doxygen_groups.h
Original file line number Diff line number Diff line change
Expand Up @@ -143,6 +143,7 @@
* @}
* @defgroup lists_apis Lists
* @{
* @defgroup lists_concatenate_rows Combining
* @defgroup lists_extract Extracting
* @defgroup lists_contains Searching
* @defgroup lists_gather Gathering
Expand Down
Loading

0 comments on commit 1debb96

Please sign in to comment.