Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Erase Functionality for static_map #142

Merged
merged 28 commits into from
May 6, 2022

Conversation

Nicolas-Iskos
Copy link

@Nicolas-Iskos Nicolas-Iskos commented Mar 10, 2022

Erase functionality is now supported for static_map via the use of an additional sentinel value. Erased slots can be reused during future insertions. Users can specify that they want erase functionality by providing an erased_key_sentinel during construction. Included below are some plots showing the performance of erase and demonstrating that neither insert nor find incur a performance regression as a result of adding erase support. For each plot, num_keys=1E8.
Insert throughput vs  load_factor
Search-all throughput vs  load_factor
Erase-all Throughput vs  Load factor (Gaussian 32-bit K_V pairs)
Erase-all Throughput vs  Load factor (Gaussian 64-bit K_V pairs)
Erase-none vs  Load Factor (unique 64-bit K_V pairs)
Erase-none vs  Load Factor (unique 32-bit K_V pairs)

@GPUtester
Copy link

Can one of the admins verify this patch?

@jrhemstad
Copy link
Collaborator

add to whitelist

@jrhemstad
Copy link
Collaborator

ok to test

@jrhemstad
Copy link
Collaborator

@Nicolas-Iskos could you include some of the performance graphs in the PR description?

Copy link
Collaborator

@jrhemstad jrhemstad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you update the summary documentation above the static_map class to include a discussion of erase? Include:

  • Limitations about concurrent insert/find/erase
  • Limitations for erase when a erased sentinel was not provided
    • Enforced for bulk API, but not device API

@PointKernel PointKernel added type: feature request New feature request topic: static_map Issue related to the static_map labels Mar 10, 2022
@PointKernel
Copy link
Member

First commit from auto formatter! 🎉

benchmarks/hash_table/static_map_bench.cu Show resolved Hide resolved
include/cuco/detail/static_map.inl Outdated Show resolved Hide resolved
include/cuco/detail/static_map.inl Outdated Show resolved Hide resolved
include/cuco/detail/static_map.inl Show resolved Hide resolved
include/cuco/detail/static_map.inl Outdated Show resolved Hide resolved
include/cuco/detail/static_map.inl Show resolved Hide resolved
include/cuco/detail/static_map.inl Outdated Show resolved Hide resolved
include/cuco/detail/static_map_kernels.cuh Outdated Show resolved Hide resolved
include/cuco/detail/static_map_kernels.cuh Outdated Show resolved Hide resolved
include/cuco/static_map.cuh Outdated Show resolved Hide resolved
Copy link
Collaborator

@sleeepyjack sleeepyjack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for implementing this feature!
I've stepped through the core functionality and compared it to WarpCore's SingleValueHashTable implementation. tl;dr: looks good. 👍

include/cuco/detail/static_map.inl Outdated Show resolved Hide resolved
benchmarks/hash_table/static_map_bench.cu Show resolved Hide resolved
tests/static_map/erase_test.cu Outdated Show resolved Hide resolved
tests/static_map/erase_test.cu Outdated Show resolved Hide resolved
@PointKernel
Copy link
Member

We can eliminate the stride parameter by directly inferring the optimal grid_size for a given architecture:

Depending on the use case, one way is not always better than the other. If I remember correctly, detail::get_grid_size is really efficient when map occupancy is low. cudaOccupancyMaxActiveBlocksPerMultiprocessor tends to return a relatively small grid size so one thread will handle multiple input elements as opposed to one per thread in manual calculations.

include/cuco/static_map.cuh Outdated Show resolved Hide resolved
tests/static_map/erase_test.cu Outdated Show resolved Hide resolved
Copy link
Member

@PointKernel PointKernel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Nicolas-Iskos Thanks for your care and persistency being put into this PR! Looks great to me! Depending on how you see it, we can leave the benchmark refactoring to a separate PR.

@Nicolas-Iskos
Copy link
Author

@Nicolas-Iskos Thanks for your care and persistency being put into this PR! Looks great to me! Depending on how you see it, we can leave the benchmark refactoring to a separate PR.

Of course, and thank you guys for all the feedback! I'd be fine leaving the benchmark refactoring to a separate PR. I could even do that with the PR for dynamic_map::erase, where we could refactor the dynamic map benchmark as well.

@PointKernel
Copy link
Member

I could even do that with the PR for dynamic_map::erase, where we could refactor the dynamic map benchmark as well.

We can do this once NVIDIA/nvbench#80 is merged. zip_axes can simplify the code a lot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic: static_map Issue related to the static_map type: feature request New feature request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants