Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds benchmarks for DeviceMemcpy::Batched #11

Merged
merged 7 commits into from
Jan 13, 2023

Conversation

elstehle
Copy link
Contributor

This PR adds benchmarks for DeviceMemcpy::Batched (see NVIDIA/cub#359).

-- Sample benchmark output on a V100 --

| AtomicT | Buffer Order | Min. buffer size | Max. buffer size |     Elements     |    data     | buffer src offsets | buffer dst offsets | buffer sizes |    data     | Samples |  CPU Time  | Noise |  GPU Time  | Noise |  Elem/s  | GlobalMem BW | BWUtil | Samples | Batch GPU  |
|---------|--------------|------------------|------------------|------------------|-------------|--------------------|--------------------|--------------|-------------|---------|------------|-------|------------|-------|----------|--------------|--------|---------|------------|
|      U8 |       Random |                1 |                8 |  2^25 = 33554432 |  32.001 MiB |         28.444 MiB |         28.444 MiB |   28.444 MiB |  32.001 MiB |    517x |   4.092 ms | 0.54% |   4.084 ms | 0.50% |   8.216G |  38.342 GB/s |  4.27% |    518x |   4.085 ms |
|      U8 |       Random |                1 |               64 |  2^25 = 33554432 |  31.990 MiB |          3.938 MiB |          3.938 MiB |    3.938 MiB |  31.990 MiB |    655x | 771.500 us | 1.13% | 763.795 us | 0.44% |  43.917G | 104.055 GB/s | 11.59% |    689x | 761.105 us |
|      U8 |       Random |                1 |              256 |  2^25 = 33554432 |  31.989 MiB |       1020.012 KiB |       1020.012 KiB | 1020.012 KiB |  31.989 MiB |   1216x | 422.961 us | 6.55% | 414.632 us | 0.69% |  80.897G | 169.352 GB/s | 18.86% |   1253x | 412.009 us |
|      U8 |       Random |                1 |             1024 |  2^25 = 33554432 |  31.959 MiB |        255.750 KiB |        255.750 KiB |  255.750 KiB |  31.959 MiB |   1440x | 355.732 us | 2.28% | 348.211 us | 0.66% |  96.239G | 194.733 GB/s | 21.68% |   1489x | 345.318 us |
|      U8 |       Random |                1 |            65536 |  2^25 = 33554432 |  32.237 MiB |          3.996 KiB |          3.996 KiB |    3.996 KiB |  32.237 MiB |   2464x | 211.271 us | 3.76% | 203.847 us | 0.86% | 165.824G | 331.708 GB/s | 36.94% |   2519x | 202.225 us |
|      U8 |       Random |            65536 |            65536 |  2^25 = 33554432 |  32.000 MiB |          2.000 KiB |          2.000 KiB |    2.000 KiB |  32.000 MiB |   4624x | 115.669 us | 7.17% | 108.148 us | 1.21% | 310.263G | 620.582 GB/s | 69.10% |   4722x | 105.905 us |
|      U8 |       Random |                1 |                8 | 2^27 = 134217728 | 128.003 MiB |        113.778 MiB |        113.778 MiB |  113.778 MiB | 128.003 MiB |     28x |  18.518 ms | 0.12% |  18.510 ms | 0.11% |   7.251G |  33.838 GB/s |  3.77% |     29x |  18.504 ms |
|      U8 |       Random |                1 |               64 | 2^27 = 134217728 | 127.998 MiB |         15.754 MiB |         15.754 MiB |   15.754 MiB | 127.998 MiB |    140x |   3.599 ms | 0.28% |   3.592 ms | 0.17% |  37.369G |  88.537 GB/s |  9.86% |    146x |   3.588 ms |
|      U8 |       Random |                1 |              256 | 2^27 = 134217728 | 127.957 MiB |          3.984 MiB |          3.984 MiB |    3.984 MiB | 127.957 MiB |    277x |   1.814 ms | 0.50% |   1.807 ms | 0.27% |  74.267G | 155.471 GB/s | 17.31% |    291x |   1.805 ms |
|      U8 |       Random |                1 |             1024 | 2^27 = 134217728 | 127.962 MiB |       1023.000 KiB |       1023.000 KiB | 1023.000 KiB | 127.962 MiB |    624x | 830.058 us | 1.15% | 822.539 us | 0.69% | 163.126G | 330.073 GB/s | 36.75% |    634x | 821.350 us |
|      U8 |       Random |                1 |            65536 | 2^27 = 134217728 | 127.406 MiB |         15.996 KiB |         15.996 KiB |   15.996 KiB | 127.406 MiB |    944x | 546.140 us | 2.46% | 538.377 us | 1.95% | 248.145G | 496.380 GB/s | 55.27% |    951x | 537.380 us |
|      U8 |       Random |            65536 |            65536 | 2^27 = 134217728 | 128.000 MiB |          8.000 KiB |          8.000 KiB |    8.000 KiB | 128.000 MiB |   1248x | 409.992 us | 2.68% | 402.381 us | 1.85% | 333.559G | 667.178 GB/s | 74.29% |   1250x | 400.153 us |
|      U8 |       Random |                1 |                8 | 2^29 = 536870912 | 512.018 MiB |        455.111 MiB |        455.111 MiB |  455.111 MiB | 512.018 MiB |     11x |  77.477 ms | 0.27% |  77.469 ms | 0.27% |   6.930G |  32.341 GB/s |  3.60% |     12x |  77.412 ms |
|      U8 |       Random |                1 |               64 | 2^29 = 536870912 | 511.959 MiB |         63.015 MiB |         63.015 MiB |   63.015 MiB | 511.959 MiB |    877x |  14.988 ms | 0.82% |  14.977 ms | 0.50% |  35.843G |  84.921 GB/s |  9.46% |    878x |  14.968 ms |
|      U8 |       Random |                1 |              256 | 2^29 = 536870912 | 511.953 MiB |         15.938 MiB |         15.938 MiB |   15.938 MiB | 511.953 MiB |     68x |   7.408 ms | 0.15% |   7.400 ms | 0.11% |  72.541G | 151.857 GB/s | 16.91% |     71x |   7.399 ms |
|      U8 |       Random |                1 |             1024 | 2^29 = 536870912 | 511.826 MiB |          3.996 MiB |          3.996 MiB |    3.996 MiB | 511.826 MiB |    147x |   3.421 ms | 0.41% |   3.413 ms | 0.33% | 157.245G | 318.173 GB/s | 35.43% |    153x |   3.412 ms |
|      U8 |       Random |                1 |            65536 | 2^29 = 536870912 | 512.119 MiB |         63.996 KiB |         63.996 KiB |   63.996 KiB | 512.119 MiB |   1328x |   1.871 ms | 1.15% |   1.863 ms | 1.06% | 288.169G | 576.444 GB/s | 64.19% |   1329x |   1.860 ms |
|      U8 |       Random |            65536 |            65536 | 2^29 = 536870912 | 512.000 MiB |         32.000 KiB |         32.000 KiB |   32.000 KiB | 512.000 MiB |    544x |   1.639 ms | 1.40% |   1.631 ms | 1.32% | 329.132G | 658.324 GB/s | 73.31% |    545x |   1.629 ms |
|      U8 |  Consecutive |                1 |                8 |  2^25 = 33554432 |  32.001 MiB |         28.444 MiB |         28.444 MiB |   28.444 MiB |  32.001 MiB |    921x | 551.061 us | 1.51% | 543.437 us | 0.47% |  61.747G | 288.146 GB/s | 32.09% |    966x | 541.166 us |
|      U8 |  Consecutive |                1 |               64 |  2^25 = 33554432 |  31.990 MiB |          3.938 MiB |          3.938 MiB |    3.938 MiB |  31.990 MiB |   1872x | 275.374 us | 3.07% | 267.692 us | 0.85% | 125.307G | 296.896 GB/s | 33.06% |   1939x | 264.682 us |
|      U8 |  Consecutive |                1 |              256 |  2^25 = 33554432 |  31.989 MiB |       1020.012 KiB |       1020.012 KiB | 1020.012 KiB |  31.989 MiB |   1680x | 306.972 us | 2.72% | 299.444 us | 0.93% | 112.016G | 234.497 GB/s | 26.11% |   1724x | 297.211 us |
|      U8 |  Consecutive |                1 |             1024 |  2^25 = 33554432 |  31.959 MiB |        255.750 KiB |        255.750 KiB |  255.750 KiB |  31.959 MiB |   1632x | 315.254 us | 2.66% | 307.674 us | 0.84% | 108.918G | 220.390 GB/s | 24.54% |   1690x | 304.732 us |
|      U8 |  Consecutive |                1 |            65536 |  2^25 = 33554432 |  32.237 MiB |          3.996 KiB |          3.996 KiB |    3.996 KiB |  32.237 MiB |   2432x | 213.354 us | 3.93% | 205.773 us | 1.11% | 164.271G | 328.603 GB/s | 36.59% |   2482x | 204.049 us |
|      U8 |  Consecutive |            65536 |            65536 |  2^25 = 33554432 |  32.000 MiB |          2.000 KiB |          2.000 KiB |    2.000 KiB |  32.000 MiB |   4576x | 117.007 us | 7.10% | 109.494 us | 1.24% | 306.450G | 612.956 GB/s | 68.25% |   4630x | 108.126 us |
|      U8 |  Consecutive |                1 |                8 | 2^27 = 134217728 | 128.003 MiB |        113.778 MiB |        113.778 MiB |  113.778 MiB | 128.003 MiB |    243x |   2.073 ms | 0.42% |   2.065 ms | 0.19% |  64.984G | 303.256 GB/s | 33.77% |    254x |   2.063 ms |
|      U8 |  Consecutive |                1 |               64 | 2^27 = 134217728 | 127.998 MiB |         15.754 MiB |         15.754 MiB |   15.754 MiB | 127.998 MiB |    548x | 920.746 us | 0.89% | 913.146 us | 0.25% | 146.981G | 348.234 GB/s | 38.78% |    576x | 912.896 us |
|      U8 |  Consecutive |                1 |              256 | 2^27 = 134217728 | 127.957 MiB |          3.984 MiB |          3.984 MiB |    3.984 MiB | 127.957 MiB |    512x |   1.129 ms | 0.91% |   1.121 ms | 0.58% | 119.663G | 250.504 GB/s | 27.89% |    513x |   1.120 ms |
|      U8 |  Consecutive |                1 |             1024 | 2^27 = 134217728 | 127.962 MiB |       1023.000 KiB |       1023.000 KiB | 1023.000 KiB | 127.962 MiB |    832x | 616.784 us | 1.51% | 609.174 us | 0.81% | 220.262G | 445.682 GB/s | 49.63% |    848x | 606.870 us |
|      U8 |  Consecutive |                1 |            65536 | 2^27 = 134217728 | 127.406 MiB |         15.996 KiB |         15.996 KiB |   15.996 KiB | 127.406 MiB |    944x | 545.238 us | 2.18% | 537.587 us | 1.63% | 248.509G | 497.110 GB/s | 55.35% |    960x | 536.930 us |
|      U8 |  Consecutive |            65536 |            65536 | 2^27 = 134217728 | 128.000 MiB |          8.000 KiB |          8.000 KiB |    8.000 KiB | 128.000 MiB |   1248x | 412.070 us | 2.71% | 404.448 us | 1.88% | 331.854G | 663.769 GB/s | 73.91% |   1314x | 402.786 us |
|      U8 |  Consecutive |                1 |                8 | 2^29 = 536870912 | 512.018 MiB |        455.111 MiB |        455.111 MiB |  455.111 MiB | 512.018 MiB |   1136x |   8.255 ms | 0.51% |   8.247 ms | 0.50% |  65.099G | 303.790 GB/s | 33.83% |   1137x |   8.245 ms |
|      U8 |  Consecutive |                1 |               64 | 2^29 = 536870912 | 511.959 MiB |         63.015 MiB |         63.015 MiB |   63.015 MiB | 511.959 MiB |    143x |   3.529 ms | 0.40% |   3.521 ms | 0.31% | 152.485G | 361.276 GB/s | 40.23% |    149x |   3.534 ms |
|      U8 |  Consecutive |                1 |              256 | 2^29 = 536870912 | 511.953 MiB |         15.938 MiB |         15.938 MiB |   15.938 MiB | 511.953 MiB |    118x |   4.279 ms | 0.39% |   4.271 ms | 0.34% | 125.682G | 263.101 GB/s | 29.30% |    122x |   4.264 ms |
|      U8 |  Consecutive |                1 |             1024 | 2^29 = 536870912 | 511.826 MiB |          3.996 MiB |          3.996 MiB |    3.996 MiB | 511.826 MiB |   2192x |   2.502 ms | 0.79% |   2.494 ms | 0.71% | 215.200G | 435.441 GB/s | 48.49% |   2193x |   2.492 ms |
|      U8 |  Consecutive |                1 |            65536 | 2^29 = 536870912 | 512.119 MiB |         63.996 KiB |         63.996 KiB |   63.996 KiB | 512.119 MiB |    816x |   1.852 ms | 0.84% |   1.845 ms | 0.73% | 291.088G | 582.282 GB/s | 64.84% |    817x |   1.845 ms |
|      U8 |  Consecutive |            65536 |            65536 | 2^29 = 536870912 | 512.000 MiB |         32.000 KiB |         32.000 KiB |   32.000 KiB | 512.000 MiB |    448x |   1.657 ms | 1.73% |   1.649 ms | 1.66% | 325.521G | 651.102 GB/s | 72.50% |    449x |   1.651 ms |
|     U32 |       Random |                1 |                8 |  2^25 = 33554432 |  25.600 MiB |         21.333 MiB |         21.333 MiB |   21.333 MiB |  25.600 MiB |    173x |   2.910 ms | 0.31% |   2.902 ms | 0.12% |   9.250G |  41.625 GB/s |  4.64% |    181x |   2.900 ms |
|     U32 |       Random |                1 |               64 |  2^25 = 33554432 |  30.600 MiB |          3.765 MiB |          3.765 MiB |    3.765 MiB |  30.600 MiB |    689x | 733.819 us | 1.13% | 726.298 us | 0.44% |  44.179G | 104.663 GB/s | 11.65% |    721x | 723.095 us |
|     U32 |       Random |                1 |              256 |  2^25 = 33554432 |  31.621 MiB |       1008.246 KiB |       1008.246 KiB | 1008.246 KiB |  31.621 MiB |   1280x | 401.438 us | 2.07% | 393.978 us | 0.78% |  84.160G | 176.183 GB/s | 19.62% |   1304x | 391.656 us |
|     U32 |       Random |                1 |             1024 |  2^25 = 33554432 |  31.864 MiB |        255.000 KiB |        255.000 KiB |  255.000 KiB |  31.864 MiB |   1536x | 333.816 us | 2.48% | 326.264 us | 0.75% | 102.408G | 207.218 GB/s | 23.07% |   1571x | 323.548 us |
|     U32 |       Random |                1 |            65536 |  2^25 = 33554432 |  32.236 MiB |          3.996 KiB |          3.996 KiB |    3.996 KiB |  32.236 MiB |   2496x | 208.359 us | 4.07% | 200.752 us | 0.82% | 168.377G | 336.816 GB/s | 37.51% |   2519x | 198.631 us |
|     U32 |       Random |            65536 |            65536 |  2^25 = 33554432 |  32.000 MiB |          2.000 KiB |          2.000 KiB |    2.000 KiB |  32.000 MiB |   4624x | 115.714 us | 7.23% | 108.143 us | 1.18% | 310.279G | 620.615 GB/s | 69.11% |   4721x | 105.911 us |
|     U32 |       Random |                1 |                8 | 2^27 = 134217728 | 102.404 MiB |         85.333 MiB |         85.333 MiB |   85.333 MiB | 102.404 MiB |     37x |  13.533 ms | 0.11% |  13.525 ms | 0.09% |   7.940G |  35.727 GB/s |  3.98% |     38x |  13.518 ms |
|     U32 |       Random |                1 |               64 | 2^27 = 134217728 | 122.443 MiB |         15.059 MiB |         15.059 MiB |   15.059 MiB | 122.443 MiB |    147x |   3.416 ms | 0.28% |   3.408 ms | 0.15% |  37.675G |  89.249 GB/s |  9.94% |    154x |   3.405 ms |
|     U32 |       Random |                1 |              256 | 2^27 = 134217728 | 126.487 MiB |          3.938 MiB |          3.938 MiB |    3.938 MiB | 126.487 MiB |    293x |   1.714 ms | 0.50% |   1.707 ms | 0.25% |  77.711G | 162.681 GB/s | 18.11% |    308x |   1.705 ms |
|     U32 |       Random |                1 |             1024 | 2^27 = 134217728 | 127.583 MiB |       1020.012 KiB |       1020.012 KiB | 1020.012 KiB | 127.583 MiB |    640x | 798.392 us | 1.24% | 790.806 us | 0.75% | 169.169G | 342.301 GB/s | 38.12% |    658x | 789.990 us |
|     U32 |       Random |                1 |            65536 | 2^27 = 134217728 | 127.405 MiB |         15.996 KiB |         15.996 KiB |   15.996 KiB | 127.405 MiB |    960x | 534.503 us | 1.92% | 526.907 us | 1.23% | 253.543G | 507.179 GB/s | 56.48% |    973x | 525.767 us |
|     U32 |       Random |            65536 |            65536 | 2^27 = 134217728 | 128.000 MiB |          8.000 KiB |          8.000 KiB |    8.000 KiB | 128.000 MiB |   1248x | 410.341 us | 2.68% | 402.738 us | 1.88% | 333.263G | 666.587 GB/s | 74.23% |   1288x | 400.597 us |
|     U32 |       Random |                1 |                8 | 2^29 = 536870912 | 409.605 MiB |        341.333 MiB |        341.333 MiB |  341.333 MiB | 409.605 MiB |     38x |  57.039 ms | 0.50% |  57.031 ms | 0.50% |   7.531G |  33.889 GB/s |  3.77% |     39x |  56.961 ms |
|     U32 |       Random |                1 |               64 | 2^29 = 536870912 | 489.777 MiB |         60.235 MiB |         60.235 MiB |   60.235 MiB | 489.777 MiB |     36x |  14.211 ms | 0.10% |  14.204 ms | 0.08% |  36.157G |  85.655 GB/s |  9.54% |     37x |  14.198 ms |
|     U32 |       Random |                1 |              256 | 2^29 = 536870912 | 506.106 MiB |         15.754 MiB |         15.754 MiB |   15.754 MiB | 506.106 MiB |   2131x |   7.006 ms | 0.92% |   6.998 ms | 0.64% |  75.840G | 158.762 GB/s | 17.68% |   2132x |   6.994 ms |
|     U32 |       Random |                1 |             1024 | 2^29 = 536870912 | 510.337 MiB |          3.984 MiB |          3.984 MiB |    3.984 MiB | 510.337 MiB |    155x |   3.249 ms | 0.42% |   3.241 ms | 0.33% | 165.125G | 334.118 GB/s | 37.20% |    161x |   3.238 ms |
|     U32 |       Random |                1 |            65536 | 2^29 = 536870912 | 512.143 MiB |         63.996 KiB |         63.996 KiB |   63.996 KiB | 512.143 MiB |    480x |   1.814 ms | 0.70% |   1.806 ms | 0.55% | 297.302G | 594.713 GB/s | 66.22% |    481x |   1.804 ms |
|     U32 |       Random |            65536 |            65536 | 2^29 = 536870912 | 512.000 MiB |         32.000 KiB |         32.000 KiB |   32.000 KiB | 512.000 MiB |   1072x |   1.645 ms | 1.64% |   1.637 ms | 1.57% | 327.917G | 655.895 GB/s | 73.04% |   1073x |   1.630 ms |
|     U32 |  Consecutive |                1 |                8 |  2^25 = 33554432 |  25.600 MiB |         21.333 MiB |         21.333 MiB |   21.333 MiB |  25.600 MiB |   1200x | 426.991 us | 1.92% | 419.402 us | 0.52% |  64.004G | 288.019 GB/s | 32.07% |   1253x | 416.851 us |
|     U32 |  Consecutive |                1 |               64 |  2^25 = 33554432 |  30.600 MiB |          3.765 MiB |          3.765 MiB |    3.765 MiB |  30.600 MiB |   2000x | 258.648 us | 3.13% | 251.166 us | 0.83% | 127.752G | 302.655 GB/s | 33.70% |   2055x | 248.213 us |
|     U32 |  Consecutive |                1 |              256 |  2^25 = 33554432 |  31.621 MiB |       1008.246 KiB |       1008.246 KiB | 1008.246 KiB |  31.621 MiB |   1760x | 292.521 us | 2.76% | 285.065 us | 0.84% | 116.315G | 243.495 GB/s | 27.11% |   1809x | 282.731 us |
|     U32 |  Consecutive |                1 |             1024 |  2^25 = 33554432 |  31.864 MiB |        255.000 KiB |        255.000 KiB |  255.000 KiB |  31.864 MiB |   1696x | 303.994 us | 2.64% | 296.563 us | 0.76% | 112.665G | 227.971 GB/s | 25.39% |   1772x | 293.705 us |
|     U32 |  Consecutive |                1 |            65536 |  2^25 = 33554432 |  32.236 MiB |          3.996 KiB |          3.996 KiB |    3.996 KiB |  32.236 MiB |   2496x | 208.106 us | 4.01% | 200.558 us | 1.11% | 168.541G | 337.143 GB/s | 37.54% |   2557x | 198.809 us |
|     U32 |  Consecutive |            65536 |            65536 |  2^25 = 33554432 |  32.000 MiB |          2.000 KiB |          2.000 KiB |    2.000 KiB |  32.000 MiB |   4576x | 116.968 us | 6.97% | 109.539 us | 1.26% | 306.325G | 612.706 GB/s | 68.23% |   4627x | 108.081 us |
|     U32 |  Consecutive |                1 |                8 | 2^27 = 134217728 | 102.404 MiB |         85.333 MiB |         85.333 MiB |   85.333 MiB | 102.404 MiB |    316x |   1.595 ms | 0.55% |   1.587 ms | 0.25% |  67.657G | 304.449 GB/s | 33.90% |    330x |   1.584 ms |
|     U32 |  Consecutive |                1 |               64 | 2^27 = 134217728 | 122.443 MiB |         15.059 MiB |         15.059 MiB |   15.059 MiB | 122.443 MiB |    584x | 863.644 us | 0.93% | 856.206 us | 0.30% | 149.953G | 355.232 GB/s | 39.56% |    613x | 853.353 us |
|     U32 |  Consecutive |                1 |              256 | 2^27 = 134217728 | 126.487 MiB |          3.938 MiB |          3.938 MiB |    3.938 MiB | 126.487 MiB |    544x |   1.081 ms | 0.90% |   1.074 ms | 0.56% | 123.518G | 258.575 GB/s | 28.79% |    545x |   1.072 ms |
|     U32 |  Consecutive |                1 |             1024 | 2^27 = 134217728 | 127.583 MiB |       1020.012 KiB |       1020.012 KiB | 1020.012 KiB | 127.583 MiB |    848x | 607.563 us | 1.42% | 600.037 us | 0.65% | 222.953G | 451.128 GB/s | 50.23% |    872x | 599.604 us |
|     U32 |  Consecutive |                1 |            65536 | 2^27 = 134217728 | 127.405 MiB |         15.996 KiB |         15.996 KiB |   15.996 KiB | 127.405 MiB |    944x | 537.607 us | 1.86% | 529.964 us | 1.12% | 252.080G | 504.252 GB/s | 56.15% |    975x | 529.114 us |
|     U32 |  Consecutive |            65536 |            65536 | 2^27 = 134217728 | 128.000 MiB |          8.000 KiB |          8.000 KiB |    8.000 KiB | 128.000 MiB |   1248x | 411.161 us | 2.64% | 403.555 us | 1.80% | 332.588G | 665.237 GB/s | 74.08% |   1249x | 402.649 us |
|     U32 |  Consecutive |                1 |                8 | 2^29 = 536870912 | 409.605 MiB |        341.333 MiB |        341.333 MiB |  341.333 MiB | 409.605 MiB |     80x |   6.278 ms | 0.18% |   6.270 ms | 0.13% |  68.499G | 308.245 GB/s | 34.32% |     83x |   6.268 ms |
|     U32 |  Consecutive |                1 |               64 | 2^29 = 536870912 | 489.777 MiB |         60.235 MiB |         60.235 MiB |   60.235 MiB | 489.777 MiB |   3216x |   3.284 ms | 0.83% |   3.277 ms | 0.79% | 156.742G | 371.315 GB/s | 41.35% |   3217x |   3.283 ms |
|     U32 |  Consecutive |                1 |              256 | 2^29 = 536870912 | 506.106 MiB |         15.754 MiB |         15.754 MiB |   15.754 MiB | 506.106 MiB |    123x |   4.102 ms | 0.39% |   4.094 ms | 0.34% | 129.634G | 271.374 GB/s | 30.22% |    128x |   4.091 ms |
|     U32 |  Consecutive |                1 |             1024 | 2^29 = 536870912 | 510.337 MiB |          3.984 MiB |          3.984 MiB |    3.984 MiB | 510.337 MiB |    205x |   2.458 ms | 0.54% |   2.451 ms | 0.44% | 218.363G | 441.841 GB/s | 49.20% |    214x |   2.450 ms |
|     U32 |  Consecutive |                1 |            65536 | 2^29 = 536870912 | 512.143 MiB |         63.996 KiB |         63.996 KiB |   63.996 KiB | 512.143 MiB |    576x |   1.820 ms | 0.75% |   1.812 ms | 0.61% | 296.392G | 592.893 GB/s | 66.02% |    577x |   1.811 ms |
|     U32 |  Consecutive |            65536 |            65536 | 2^29 = 536870912 | 512.000 MiB |         32.000 KiB |         32.000 KiB |   32.000 KiB | 512.000 MiB |    544x |   1.671 ms | 2.07% |   1.663 ms | 2.00% | 322.867G | 645.792 GB/s | 71.91% |    545x |   1.651 ms |

@gevtushenko gevtushenko self-requested a review January 11, 2023 15:06
Copy link
Collaborator

@gevtushenko gevtushenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments that aim code simplification

benches/cub/device/memcpy/basic.cu Outdated Show resolved Hide resolved
benches/cub/device/memcpy/basic.cu Outdated Show resolved Hide resolved
benches/cub/device/memcpy/basic.cu Outdated Show resolved Hide resolved
benches/cub/device/memcpy/basic.cu Outdated Show resolved Hide resolved
benches/cub/device/memcpy/basic.cu Outdated Show resolved Hide resolved
benches/cub/device/memcpy/basic.cu Outdated Show resolved Hide resolved
benches/cub/device/memcpy/basic.cu Outdated Show resolved Hide resolved
@gevtushenko gevtushenko merged commit baa92ff into alliepiper:main Jan 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants