-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EXSWHTEC-272 - Implement tests for warp shfl_up and shfl_down functions #193
EXSWHTEC-272 - Implement tests for warp shfl_up and shfl_down functions #193
Conversation
nives-vukovic
commented
Mar 3, 2023
- Add tests for warp __shfl_up function
- Add tests for warp __shfl_down function
Change-Id: I66f0c09e9c7405ec7430b1883e0e89542fdb87a0
Change-Id: I212b82b1b3a78a368b85ea64e338371a34b405f9
Change-Id: Ib455f72b5be77e1a81137d15c07ea41161b16a3e
Change-Id: Ief96e274f4143e80ceb3e40f04d38ae217777583
Change-Id: I9c03cde09b42c8e3726153c2a177359efc8d6d29
- Add tests for warp __shfl_up function - Add tests for warp __shfl_down function
catch/unit/warp/warp_shfl_down.cc
Outdated
} | ||
|
||
const auto grid = cg::this_grid(); | ||
T var = static_cast<T>(grid.thread_rank() % warpSize); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
var
is going to be a small integer. One concern here is that shfl is going to be shuffling zeros most of the time. For larger data types, the higher order bits will always be zeros. The test could be improved by using better quality data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should avoid generating the input data and the output data (in the validate
function) on the fly because this would reduce the usefulness of this unit test. We need to have separation of concern in input/expected output data generation and in testing of the actual functionality.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@scchan Generating input data has been added (more detailed explanation is added in the comment below). Input data is not generated on the fly anymore.
catch/unit/warp/warp_shfl_down.cc
Outdated
|
||
const auto grid = cg::this_grid(); | ||
T var = static_cast<T>(grid.thread_rank() % warpSize); | ||
out[grid.thread_rank()] = __shfl_down(var, delta, width); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need a version of this test that has a non-uniform delta within a single wrap.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@scchan Test has been changed to have non-uniform delta in each warp that has random value from 0 to width.
catch/include/cpu_grid.h
Outdated
unsigned int thread_count_; | ||
}; | ||
|
||
inline dim3 GenerateThreadDimensions() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs comments on how the Generate*Dimensions() functions work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@scchan Added basic comments, if more extensive comment is required it can be added. It is mostly self-explanatory, it uses Catch2 GENERATE_COPY to generate different dimensions for blocks of threads to cover a range of dimensions, some depending on warp size, including dimensions that are smaller than one warp size or not a multiple of warp size, and some arbitrary values that have been randomly chosen to make unit testing more robust.
catch/unit/warp/warp_common.hh
Outdated
const auto block_rank = (blockIdx.z * gridDim.y + blockIdx.y) * gridDim.x + blockIdx.x; | ||
const auto idx = block_rank * warps_per_block + block.thread_rank() / warpSize; | ||
|
||
return !(active_masks[idx] & (static_cast<uint64_t>(1) << warp.thread_rank())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest implementing a generic bitmap and use the grid thread rank as the index to retrieve the active bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@scchan We agree that is also a possible approach, but this current approach is more in line with the current implementation.
catch/unit/warp/warp_common.hh
Outdated
warps_in_grid * sizeof(uint64_t)); | ||
active_masks_.resize(warps_in_grid); | ||
std::generate(active_masks_.begin(), active_masks_.end(), | ||
[] { return GenerateRandomInteger(0ul, std::numeric_limits<uint64_t>().max()); }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should avoid using only random values as input because that hurts reproducibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should avoid using only random values as input because that hurts reproducibility.
Agree that not only random patterns need to be checked. Do you have particular patterns in mind though?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@scchan @b-sumner Test has been expanded to have a version that uses random inputs and random active_masks, and version that uses predefined active masks and inputs (5 different patterns for active masks have been chosen and thread id has been used as input data, changed from warp id to include larger input values).
catch/unit/warp/warp_common.hh
Outdated
return dist(GetRandomGenerator()); | ||
} | ||
|
||
inline uint64_t get_predicate_mask(unsigned int test_case, unsigned int warp_size) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not being used anywhere so what is this for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@scchan All warp related PRs have a common warp_base branch, this is the function that is used in ballot, all and any tests.
PR has been merged into develop branch via commit 26a5250. |
…ns (#193) Change-Id: I3013d16f48ad5f607ee0f252b497fde24c7b9164