Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update join to use experimental row hasher and comparator #12787

Merged
merged 45 commits into from
Apr 6, 2023
Merged
Show file tree
Hide file tree
Changes from 35 commits
Commits
Show all changes
45 commits
Select commit Hold shift + click to select a range
4a8085a
building equality::self_comparator
divyegala Feb 2, 2023
f71d161
two table comp
divyegala Feb 2, 2023
3ca298c
copyright years
divyegala Feb 2, 2023
7c167a7
centralizing repeated logic
divyegala Feb 2, 2023
0ceb79e
address review to create functors
divyegala Feb 3, 2023
37e7326
updating has_nested_columns docs
divyegala Feb 3, 2023
b44f603
Merge remote-tracking branch 'upstream/branch-23.04' into equality-co…
divyegala Feb 3, 2023
c2ff1fc
address review for underscore prefixes in structs
divyegala Feb 7, 2023
c2ca8ee
Merge remote-tracking branch 'upstream/branch-23.04' into equality-co…
divyegala Feb 7, 2023
ffdf10c
Merge remote-tracking branch 'upstream/branch-23.04' into equality-co…
divyegala Feb 8, 2023
53e918f
add rank
divyegala Feb 8, 2023
65e2bce
fix compile times for rank
divyegala Feb 8, 2023
c6bc7f5
Merge remote-tracking branch 'upstream/branch-23.04' into equality-co…
divyegala Feb 8, 2023
1344e33
Apply suggestions from code review
divyegala Feb 11, 2023
4123379
address review
divyegala Feb 11, 2023
26f38b3
Merge remote-tracking branch 'upstream/branch-23.04' into equality-co…
divyegala Feb 11, 2023
9d0f7a6
address review, mark members of functors as private
divyegala Feb 11, 2023
fe41be8
removing partitioning
divyegala Feb 11, 2023
02dd5c5
simplify lists/contains since it already has a nested-type dispatch m…
divyegala Feb 12, 2023
5db4d03
Merge branch 'branch-23.04' into equality-comp-fast-path
divyegala Feb 13, 2023
9aa23a5
Merge branch 'branch-23.04' into equality-comp-fast-path
divyegala Feb 15, 2023
73adabc
trying to figure if build and probe switched
divyegala Feb 16, 2023
02edad7
figured out index inversion
divyegala Feb 16, 2023
fa8f639
trying legacy again
divyegala Feb 19, 2023
38464ef
Revert "trying legacy again"
divyegala Feb 20, 2023
0113589
fix slower times in small tables
divyegala Feb 20, 2023
36fc5e9
copyright years
divyegala Feb 20, 2023
5d75db8
Merge remote-tracking branch 'upstream/branch-23.04' into join-row-op…
divyegala Feb 20, 2023
9a787c6
add lists tests
divyegala Feb 21, 2023
a1cb220
explicitly instantiate shared function template
divyegala Feb 21, 2023
d3d4bb6
copyright year
divyegala Feb 21, 2023
cf92f34
address review
divyegala Mar 21, 2023
a6bbf8b
merge upstream
divyegala Mar 21, 2023
47fe8d2
address review
divyegala Mar 23, 2023
45fd37a
merge upstream
divyegala Mar 23, 2023
10b0406
Apply suggestions from code review
divyegala Mar 23, 2023
9c4bdfa
address review
divyegala Mar 23, 2023
e7fb4cd
address review
divyegala Mar 28, 2023
0fe1354
Merge remote-tracking branch 'upstream/branch-23.04' into join-row-op…
divyegala Mar 28, 2023
06d8d43
address review
divyegala Mar 29, 2023
6dd1200
fix distance call
divyegala Mar 30, 2023
f3257c0
Merge branch 'branch-23.06' into join-row-operators
divyegala Mar 30, 2023
e0b0837
fix indentation in docs
divyegala Apr 5, 2023
ead9ebf
Merge remote-tracking branch 'upstream/branch-23.06' into join-row-op…
divyegala Apr 5, 2023
ff5c934
Merge branch 'join-row-operators' of github.com:divyegala/cudf into j…
divyegala Apr 5, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 9 additions & 10 deletions cpp/include/cudf/detail/join.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -36,8 +36,8 @@
template <typename T>
class default_allocator;

namespace cudf::structs::detail {
class flattened_table;
namespace cudf::experimental::row::equality {
class preprocessed_table;
}

namespace cudf {
Expand Down Expand Up @@ -77,9 +77,9 @@ struct hash_join {
rmm::device_buffer const _composite_bitmask; ///< Bitmask to denote whether a row is valid
cudf::null_equality const _nulls_equal; ///< whether to consider nulls as equal
cudf::table_view _build; ///< input table to build the hash map
std::unique_ptr<cudf::structs::detail::flattened_table>
_flattened_build_table; ///< flattened data structures for `_build`
map_type _hash_table; ///< hash table built on `_build`
std::shared_ptr<cudf::experimental::row::equality::preprocessed_table>
divyegala marked this conversation as resolved.
Show resolved Hide resolved
_preprocessed_build; ///< input table preprocssed for row operators
map_type _hash_table; ///< hash table built on `_build`

public:
/**
Expand Down Expand Up @@ -152,21 +152,20 @@ struct hash_join {
* i.e. if full join is specified as the join type then left join is called. Behavior
* is undefined if the provided `output_size` is smaller than the actual output size.
*
* @throw cudf::logic_error if build table is empty and `JoinKind == INNER_JOIN`.
*
* @tparam JoinKind The type of join to be performed.
* @throw cudf::logic_error if build table is empty and `join == INNER_JOIN`.
*
* @param probe_table Table of probe side columns to join.
* @param join The type of join to be performed.
* @param output_size Optional value which allows users to specify the exact output size.
* @param stream CUDA stream used for device memory operations and kernel launches.
* @param mr Device memory resource used to allocate the returned vectors.
*
* @return Join output indices vector pair.
*/
template <cudf::detail::join_kind JoinKind>
std::pair<std::unique_ptr<rmm::device_uvector<size_type>>,
std::unique_ptr<rmm::device_uvector<size_type>>>
probe_join_indices(cudf::table_view const& probe_table,
join_kind join,
std::optional<std::size_t> output_size,
rmm::cuda_stream_view stream,
rmm::mr::device_memory_resource* mr) const;
Expand All @@ -179,10 +178,10 @@ struct hash_join {
* @throw cudf::logic_error if the number of columns in build table and probe table do not match.
* @throw cudf::logic_error if the column data types in build table and probe table do not match.
*/
template <cudf::detail::join_kind JoinKind>
std::pair<std::unique_ptr<rmm::device_uvector<size_type>>,
std::unique_ptr<rmm::device_uvector<size_type>>>
compute_hash_join(cudf::table_view const& probe,
join_kind join,
std::optional<std::size_t> output_size,
rmm::cuda_stream_view stream,
rmm::mr::device_memory_resource* mr) const;
Expand Down
Loading