Clean up and simplify `gpuDecideCompression` #13202

vuule · 2023-04-22T02:13:58Z

Description

Changed the block size to single warp, since only 32 threads are used in the kernel.
Simplify the kernel logic a bit and remove unnecessary atomic operations.

FWIW, the kernel is faster now; not important as it is a tiny part of E2E time.

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.

…clean-up-decide-compression

…e/cudf into clean-up-decide-compression

cpp/src/io/parquet/page_enc.cu

bdice · 2023-04-26T05:06:23Z

cpp/src/io/parquet/page_enc.cu

-        compressed_data_size += comp_res->bytes_written;
-        if (comp_res->status != compression_status::SUCCESS) { atomicAdd(&error_count, 1); }
-      }
+  __syncwarp();


Can we write this kernel in a block-size agnostic way? Unlike __syncthreads();, using __syncwarp(); assumes that block_size == warp_size == 32.

That depends on how we would scale the parallelism with multiple warps. If any warps worked on a single chunks element, then, yes, we would need to syn all threads in the block. But, with multiple warps, IMO this kernel should actually have each warp would work on a separate chunks element. In this case we don't need to synchronize different warps and __syncwarp is still the right option.
I understand that my change left this ambiguous as warp size is used interchangeably for block and warp size. I'll try to make this clearer.

Modified the kernel to work with any number of warps in a block. The size can be adjusted via constexpr decide_compression_warps_in_block. Used warp_size as well, so we should be magic number-free now :)

cpp/src/io/parquet/page_enc.cu

…clean-up-decide-compression

vuule · 2023-04-27T02:20:28Z

cpp/src/io/parquet/page_enc.cu

+  auto const lane_id  = threadIdx.x % cudf::detail::warp_size;
+  auto const warp_id  = threadIdx.x / cudf::detail::warp_size;


Question for the reviewers: Are there maybe helper functions for this? Looks very generic.

Not that I am aware of.

bdice

One more fix for warpSize. Otherwise I think this is better!

bdice · 2023-05-01T18:46:43Z

cpp/src/io/parquet/page_enc.cu

+  auto const lane_id  = threadIdx.x % cudf::detail::warp_size;
+  auto const warp_id  = threadIdx.x / cudf::detail::warp_size;


Not that I am aware of.

cpp/src/io/parquet/page_enc.cu

Co-authored-by: Bradley Dice <bdice@bradleydice.com>

ttnghia · 2023-05-01T19:47:13Z

cpp/src/io/parquet/page_enc.cu

+  __shared__ __align__(8) EncColumnChunk ck_g[decide_compression_warps_in_block];
+  __shared__ __align__(4) unsigned int compression_error[decide_compression_warps_in_block];


Why do we align them manually? And why do we need to align them?

It allows more efficient access, at least in theory. I'm not the one who added the alignment, and I also haven't tested how this alignment impacts performance in practice.

cpp/src/io/parquet/page_enc.cu

…clean-up-decide-compression

vuule · 2023-05-02T00:09:53Z

/merge

clean up and simplify gpuDecideCompression

678b371

vuule added cuIO cuIO issue improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Apr 22, 2023

vuule self-assigned this Apr 22, 2023

github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Apr 22, 2023

vuule added 5 commits April 24, 2023 10:27

Merge branch 'branch-23.06' into clean-up-decide-compression

2f1e493

Merge branch 'branch-23.06' of https://github.com/rapidsai/cudf into …

2a63e92

…clean-up-decide-compression

Merge branch 'clean-up-decide-compression' of https://github.com/vuul…

6660143

…e/cudf into clean-up-decide-compression

oops fix

02fd2a3

Merge branch 'branch-23.06' into clean-up-decide-compression

8da94ca

vuule marked this pull request as ready for review April 25, 2023 20:38

vuule requested a review from a team as a code owner April 25, 2023 20:38

vuule requested review from bdice and karthikeyann April 25, 2023 20:38

ttnghia approved these changes Apr 26, 2023

View reviewed changes

bdice requested changes Apr 26, 2023

View reviewed changes

karthikeyann reviewed Apr 26, 2023

View reviewed changes

cpp/src/io/parquet/page_enc.cu Outdated Show resolved Hide resolved

vuule added 3 commits April 26, 2023 11:01

Merge branch 'branch-23.06' of https://github.com/rapidsai/cudf into …

f30861b

…clean-up-decide-compression

make gpuDecideCompression block size adjustable

df95f7b

Merge branch 'branch-23.06' of https://github.com/rapidsai/cudf into …

0e541c0

…clean-up-decide-compression

vuule commented Apr 27, 2023

View reviewed changes

vuule requested a review from bdice April 27, 2023 02:24

Merge branch 'branch-23.06' into clean-up-decide-compression

618f070

vuule requested a review from karthikeyann May 1, 2023 17:57

vuule changed the base branch from branch-23.06 to branch-23.04 May 1, 2023 18:07

vuule requested review from a team as code owners May 1, 2023 18:07

vuule requested a review from a team as a code owner May 1, 2023 18:07

vuule requested review from isVoid and removed request for a team May 1, 2023 18:07

vuule changed the base branch from branch-23.04 to branch-23.06 May 1, 2023 18:07

bdice approved these changes May 1, 2023

View reviewed changes

ajschmidt8 removed the request for review from a team May 1, 2023 18:55

missed warp_size

4acb28f

Co-authored-by: Bradley Dice <bdice@bradleydice.com>

ttnghia removed request for a team May 1, 2023 19:46

ttnghia reviewed May 1, 2023

View reviewed changes

cpp/src/io/parquet/page_enc.cu Outdated Show resolved Hide resolved

ttnghia reviewed May 1, 2023

View reviewed changes

cpp/src/io/parquet/page_enc.cu Outdated Show resolved Hide resolved

vuule added 2 commits May 1, 2023 13:28

Merge branch 'branch-23.06' of https://github.com/rapidsai/cudf into …

2b18f54

…clean-up-decide-compression

bit of clean up

aba6ef2

vuule added the 5 - Ready to Merge Testing and reviews complete, ready to merge label May 1, 2023

rapids-bot bot merged commit ef4ebce into rapidsai:branch-23.06 May 2, 2023

vuule deleted the clean-up-decide-compression branch May 2, 2023 00:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clean up and simplify `gpuDecideCompression` #13202

Clean up and simplify `gpuDecideCompression` #13202

vuule commented Apr 22, 2023

bdice Apr 26, 2023

vuule Apr 26, 2023

vuule Apr 27, 2023

vuule Apr 27, 2023

bdice May 1, 2023

bdice left a comment

bdice May 1, 2023

ttnghia May 1, 2023

vuule May 1, 2023

vuule commented May 2, 2023

		auto const lane_id = threadIdx.x % cudf::detail::warp_size;
		auto const warp_id = threadIdx.x / cudf::detail::warp_size;

		__shared__ __align__(8) EncColumnChunk ck_g[decide_compression_warps_in_block];
		__shared__ __align__(4) unsigned int compression_error[decide_compression_warps_in_block];

Clean up and simplify gpuDecideCompression #13202

Clean up and simplify gpuDecideCompression #13202

Conversation

vuule commented Apr 22, 2023

Description

Checklist

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bdice left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vuule commented May 2, 2023

Clean up and simplify `gpuDecideCompression` #13202

Clean up and simplify `gpuDecideCompression` #13202