Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Add PARTED kernels #382

Open
wants to merge 39 commits into
base: develop
Choose a base branch
from
Open

WIP: Add PARTED kernels #382

wants to merge 39 commits into from

Commits on Jan 19, 2024

  1. Add TRIAD_PARTED kernel

    This does the same thing as TRIAD but breaks it into multiple
    for loops over the data instead of a single for loop over the data.
    MrBurmark committed Jan 19, 2024
    Configuration menu
    Copy the full SHA
    027d6f0 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    51057e3 View commit details
    Browse the repository at this point in the history
  3. Use direct dispatch in RAJA TRIAD_PARTED_FUSED

    Leave in comments of other dispatch options.
    MrBurmark committed Jan 19, 2024
    Configuration menu
    Copy the full SHA
    08e5c9f View commit details
    Browse the repository at this point in the history
  4. Add Geometric partition

    This makes each partition a multiple of the size of the
    prevoius partition
    MrBurmark committed Jan 19, 2024
    Configuration menu
    Copy the full SHA
    f1dc134 View commit details
    Browse the repository at this point in the history
  5. Add reuse tuning of TRIAD_PARTED_FUSED

    This tuning provides a best case scenario where the overhead
    of capturing the state and synchronizing per rep is removed.
    MrBurmark committed Jan 19, 2024
    Configuration menu
    Copy the full SHA
    84e1e6c View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    36fb292 View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    8c06b66 View commit details
    Browse the repository at this point in the history
  8. Add len to triad_holder and add gpu tuning

    The new gpu tuning is a AOS version using triad_holder.
    This is now in addition to the SOA tuning.
    MrBurmark committed Jan 19, 2024
    Configuration menu
    Copy the full SHA
    7c94b83 View commit details
    Browse the repository at this point in the history
  9. Add a smart memory pool tuning

    This copies the basic mempool from RAJA and adds
    a capability to synchronize as necessary to avoid host
    device race conditions when memory is needed on the host
    and but all the memory has been used on the device.
    MrBurmark committed Jan 19, 2024
    Configuration menu
    Copy the full SHA
    1bbd169 View commit details
    Browse the repository at this point in the history
  10. Add SOA reuse tuning

    MrBurmark committed Jan 19, 2024
    Configuration menu
    Copy the full SHA
    b1d4e24 View commit details
    Browse the repository at this point in the history
  11. Add option to shuffle_partition_sizes

    Default is on so the sizes of partitions are not always
    in non-decreasing order.
    MrBurmark committed Jan 19, 2024
    Configuration menu
    Copy the full SHA
    00fb6cf View commit details
    Browse the repository at this point in the history
  12. fixup part_type

    MrBurmark committed Jan 19, 2024
    Configuration menu
    Copy the full SHA
    cf08b3e View commit details
    Browse the repository at this point in the history
  13. Configuration menu
    Copy the full SHA
    72fd10c View commit details
    Browse the repository at this point in the history
  14. fixup part_type

    MrBurmark committed Jan 19, 2024
    Configuration menu
    Copy the full SHA
    94bd68f View commit details
    Browse the repository at this point in the history
  15. fixup part_size_order

    MrBurmark committed Jan 19, 2024
    Configuration menu
    Copy the full SHA
    37e4475 View commit details
    Browse the repository at this point in the history
  16. Add scanAOSreuse tuning

    This uses a scan and binary search to schedule work to blocks
    instead of a 2d grid. Thus it avoids blocks with no work.
    MrBurmark committed Jan 19, 2024
    Configuration menu
    Copy the full SHA
    c2de49f View commit details
    Browse the repository at this point in the history
  17. Add block wide search impl to triad_parted_fused_scan_aos

    This is faster for cuda but slower for hip.
    MrBurmark committed Jan 19, 2024
    Configuration menu
    Copy the full SHA
    a072b3f View commit details
    Browse the repository at this point in the history
  18. Configuration menu
    Copy the full SHA
    876a16b View commit details
    Browse the repository at this point in the history
  19. Use device memory for hip triad parted fused

    This has a minimal effect
    MrBurmark committed Jan 19, 2024
    Configuration menu
    Copy the full SHA
    431c80e View commit details
    Browse the repository at this point in the history
  20. Use cuda managed device preferred host accessed

    with triad parted fused
    This has a large effect and makes a block size of 256
    as good or better than 1024
    MrBurmark committed Jan 19, 2024
    Configuration menu
    Copy the full SHA
    9141573 View commit details
    Browse the repository at this point in the history
  21. Remove block wide search code

    always use binary search code
    MrBurmark committed Jan 19, 2024
    Configuration menu
    Copy the full SHA
    bbe8272 View commit details
    Browse the repository at this point in the history
  22. Add some missing includes

    MrBurmark committed Jan 19, 2024
    Configuration menu
    Copy the full SHA
    8f884f5 View commit details
    Browse the repository at this point in the history
  23. Configuration menu
    Copy the full SHA
    0e567b1 View commit details
    Browse the repository at this point in the history
  24. add TRIAD_PARTED stream (non-omp) tuning

    reorder TRIAD_PARTED gpu tuning declarations
    MrBurmark committed Jan 19, 2024
    Configuration menu
    Copy the full SHA
    ddf9c9d View commit details
    Browse the repository at this point in the history
  25. Configuration menu
    Copy the full SHA
    5926c63 View commit details
    Browse the repository at this point in the history
  26. Add gpu event tunings of TRIAD_PARTED

    These tuning use events to "fork-join" the streams as would be
    required in more realistic code. Though it would not always
    have to be done as frequently.
    MrBurmark committed Jan 19, 2024
    Configuration menu
    Copy the full SHA
    54d8094 View commit details
    Browse the repository at this point in the history

Commits on Jan 22, 2024

  1. Rename parted_fused tunings

    MrBurmark committed Jan 22, 2024
    Configuration menu
    Copy the full SHA
    5ba0d3b View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    9c696d9 View commit details
    Browse the repository at this point in the history

Commits on Jan 23, 2024

  1. Configuration menu
    Copy the full SHA
    489f23f View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    18da3c7 View commit details
    Browse the repository at this point in the history
  3. Add dataspace_allocator

    MrBurmark committed Jan 23, 2024
    Configuration menu
    Copy the full SHA
    c75aa00 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    5337794 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    b82f291 View commit details
    Browse the repository at this point in the history
  6. fixup dataspace_allocator

    MrBurmark committed Jan 23, 2024
    Configuration menu
    Copy the full SHA
    1f82e28 View commit details
    Browse the repository at this point in the history
  7. fixup includes

    MrBurmark committed Jan 23, 2024
    Configuration menu
    Copy the full SHA
    457b829 View commit details
    Browse the repository at this point in the history
  8. Add openmp compile guards

    MrBurmark committed Jan 23, 2024
    Configuration menu
    Copy the full SHA
    2a469a3 View commit details
    Browse the repository at this point in the history
  9. Configuration menu
    Copy the full SHA
    4f143c3 View commit details
    Browse the repository at this point in the history
  10. Configuration menu
    Copy the full SHA
    455baee View commit details
    Browse the repository at this point in the history
  11. Configuration menu
    Copy the full SHA
    f1d0120 View commit details
    Browse the repository at this point in the history