Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a one workgroup argmax benchmark #49

Merged
merged 1 commit into from
Aug 2, 2024
Merged

Conversation

angelz913
Copy link
Contributor

@angelz913 angelz913 commented Aug 2, 2024

This PR is based on #47. I opened a new one because the old one got stale.

Copy link

google-cla bot commented Aug 2, 2024

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@angelz913
Copy link
Contributor Author

@kuhar @antiagainst Could you please review? Thanks

Copy link
Collaborator

@kuhar kuhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good modulo copyright headers. For code that is primarily based on some existing benchmarks, we use dual copyright (the original author + the new one). You can see an example here: https://github.com/google/uVkCompute/blob/main/benchmarks/vmt/vmt_main.cc.

benchmarks/argmax/CMakeLists.txt Outdated Show resolved Hide resolved
benchmarks/argmax/one_workgroup_argmax_main.cc Outdated Show resolved Hide resolved
#extension GL_KHR_shader_subgroup_arithmetic : enable
#extension GL_KHR_shader_subgroup_ballot : enable

layout(local_size_x = 16, local_size_y = 1, local_size_z = 1) in;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is smaller than the native subgroup size for desktop GPU. How does this perform if we increase this to 32 or 64?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image
image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not see any performance improvement

@angelz913 angelz913 force-pushed the argmax branch 2 times, most recently from e0a8a84 to 6604dbd Compare August 2, 2024 20:41
Copy link
Collaborator

@kuhar kuhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kuhar kuhar merged commit 32f2f57 into google:main Aug 2, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants