Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EXSWHTEC-272 - Implement tests for warp shfl_up and shfl_down functions #193

Closed

Conversation

nives-vukovic
Copy link
Contributor

  • Add tests for warp __shfl_up function
  • Add tests for warp __shfl_down function

gargrahul and others added 30 commits October 26, 2022 03:59
Change-Id: I66f0c09e9c7405ec7430b1883e0e89542fdb87a0
Change-Id: I212b82b1b3a78a368b85ea64e338371a34b405f9
Change-Id: Ib455f72b5be77e1a81137d15c07ea41161b16a3e
Change-Id: Ief96e274f4143e80ceb3e40f04d38ae217777583
Change-Id: I9c03cde09b42c8e3726153c2a177359efc8d6d29
- Add tests for warp __shfl_up function
- Add tests for warp __shfl_down function
@nives-vukovic nives-vukovic marked this pull request as ready for review March 3, 2023 16:04
@chrispaquot chrispaquot requested review from yxsamliu and b-sumner March 9, 2023 03:17
@searlmc1 searlmc1 requested a review from scchan April 13, 2023 18:40
}

const auto grid = cg::this_grid();
T var = static_cast<T>(grid.thread_rank() % warpSize);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

var is going to be a small integer. One concern here is that shfl is going to be shuffling zeros most of the time. For larger data types, the higher order bits will always be zeros. The test could be improved by using better quality data.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should avoid generating the input data and the output data (in the validate function) on the fly because this would reduce the usefulness of this unit test. We need to have separation of concern in input/expected output data generation and in testing of the actual functionality.

Copy link
Contributor Author

@nives-vukovic nives-vukovic May 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@scchan Generating input data has been added (more detailed explanation is added in the comment below). Input data is not generated on the fly anymore.


const auto grid = cg::this_grid();
T var = static_cast<T>(grid.thread_rank() % warpSize);
out[grid.thread_rank()] = __shfl_down(var, delta, width);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need a version of this test that has a non-uniform delta within a single wrap.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@scchan Test has been changed to have non-uniform delta in each warp that has random value from 0 to width.

unsigned int thread_count_;
};

inline dim3 GenerateThreadDimensions() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs comments on how the Generate*Dimensions() functions work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@scchan Added basic comments, if more extensive comment is required it can be added. It is mostly self-explanatory, it uses Catch2 GENERATE_COPY to generate different dimensions for blocks of threads to cover a range of dimensions, some depending on warp size, including dimensions that are smaller than one warp size or not a multiple of warp size, and some arbitrary values that have been randomly chosen to make unit testing more robust.

const auto block_rank = (blockIdx.z * gridDim.y + blockIdx.y) * gridDim.x + blockIdx.x;
const auto idx = block_rank * warps_per_block + block.thread_rank() / warpSize;

return !(active_masks[idx] & (static_cast<uint64_t>(1) << warp.thread_rank()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest implementing a generic bitmap and use the grid thread rank as the index to retrieve the active bit.

Copy link
Contributor Author

@nives-vukovic nives-vukovic May 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@scchan We agree that is also a possible approach, but this current approach is more in line with the current implementation.

warps_in_grid * sizeof(uint64_t));
active_masks_.resize(warps_in_grid);
std::generate(active_masks_.begin(), active_masks_.end(),
[] { return GenerateRandomInteger(0ul, std::numeric_limits<uint64_t>().max()); });
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should avoid using only random values as input because that hurts reproducibility.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should avoid using only random values as input because that hurts reproducibility.

Agree that not only random patterns need to be checked. Do you have particular patterns in mind though?

Copy link
Contributor Author

@nives-vukovic nives-vukovic May 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@scchan @b-sumner Test has been expanded to have a version that uses random inputs and random active_masks, and version that uses predefined active masks and inputs (5 different patterns for active masks have been chosen and thread id has been used as input data, changed from warp id to include larger input values).

return dist(GetRandomGenerator());
}

inline uint64_t get_predicate_mask(unsigned int test_case, unsigned int warp_size) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not being used anywhere so what is this for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@scchan All warp related PRs have a common warp_base branch, this is the function that is used in ballot, all and any tests.

@rakesroy
Copy link
Contributor

PR has been merged into develop branch via commit 26a5250.

@rakesroy rakesroy closed this Feb 20, 2024
rocm-ci pushed a commit that referenced this pull request Feb 26, 2024
…ns (#193)

Change-Id: I3013d16f48ad5f607ee0f252b497fde24c7b9164
rocm-ci pushed a commit that referenced this pull request Feb 26, 2024
- #154
- #438
- #425
- #424
- #423
- #365
- #356
- #279
- #274
- #190
- #189
- #188
- #156
- #49
- #439
- #437
- #436
- #435
- #193

Change-Id: I2529d0baf0f8d47d6215863321720cde2b1a846c
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants