Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adjust block size for reduce-then-scan based on input type #1782

Conversation

adamfidel
Copy link
Contributor

Targeting to be included in #1762. This PR adjusts the block size to be bigger or smaller based on the input type.

@adamfidel adamfidel force-pushed the dev/adamfidel/transform_reduce_then_scan_block_size branch from 7865493 to deb9be3 Compare August 15, 2024 14:53
@adamfidel adamfidel force-pushed the dev/adamfidel/transform_reduce_then_scan_block_size branch from deb9be3 to 4dd3068 Compare August 22, 2024 20:01
Comment on lines 741 to 742
constexpr std::uint8_t __block_size_factor = sizeof(_ValueType) > sizeof(std::uint32_t) ? sizeof(_ValueType) / sizeof(std::uint32_t) : sizeof(std::uint32_t) / sizeof(_ValueType);
constexpr std::uint16_t __max_inputs_per_item = sizeof(_ValueType) > sizeof(std::uint32_t) ? 128 / __block_size_factor : 128 * __block_size_factor;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we adjust this so our baseline is for sizeof(double) types: 64 elements, then scale it by however much smaller the ValueType is?

std::uint8_t __scale = std::min(1, sizeof(double) / sizeof(_ValueType));
std::uing16_t __max_inputs_per_item = 64 * __scale; 

We could also limit it to 4x scale if we don't think think 1 byte types will benefit from 512.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this is a cleaner way to write this expression. Although I believe we want to use std::max here instead of std::min.

@mmichel11
Copy link
Contributor

Can we also adjust the __max_inputs_per_item template parameters to be std::uint16_t?

Copy link
Contributor

@danhoeflinger danhoeflinger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@adamfidel adamfidel merged commit 8a36d5a into dev/shared/transform_reduce_then_scan Aug 22, 2024
@adamfidel adamfidel deleted the dev/adamfidel/transform_reduce_then_scan_block_size branch August 22, 2024 22:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants