Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prune CUB's ChainedPolicy by __CUDA_ARCH_LIST__ #2154

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

bernhardmgruber
Copy link
Contributor

@bernhardmgruber bernhardmgruber commented Jul 31, 2024

Motivated by @gevtushenko and @elstehle explaining to me why CUB instantiates so many kernels and why having many tuning policies is bad, here is a mitigation: When the macro __CUDA_ARCH_LIST__ is available, we know at compile time what runtime values the ptx version can have, so we can prune the number of dispatches CUB generates from the tuning policies to only those versions. This should give us faster compilation and allow us to use tuning policies more liberally.

Compile time and binary size of cub.example.device.radix_sort before and after:

before:
    ARCH=50;60;70;80;90: 23.826s 3886008B
    ARCH=86:              8.462s 1685520B
after:
    ARCH=50;60;70;80;90: 23.646s 3877904B
    ARCH=86:              6.095s 1232912B

cub/cub/util_device.cuh Outdated Show resolved Hide resolved
@bernhardmgruber bernhardmgruber force-pushed the chained_policy_prune branch 2 times, most recently from 71a0f9a to 04e42c9 Compare August 2, 2024 13:55
@bernhardmgruber
Copy link
Contributor Author

The unit tests now list all virtual architectures, since the list was shorter than I expected.

@bernhardmgruber bernhardmgruber marked this pull request as ready for review August 2, 2024 13:56
@bernhardmgruber bernhardmgruber requested review from a team as code owners August 2, 2024 13:56
cub/cub/util_device.cuh Outdated Show resolved Hide resolved
@bernhardmgruber
Copy link
Contributor Author

I reworked the feature to now only ever instantiate to the PTX versions that appear in __CUDA_ARCH_LIST__, which is even better.

@bernhardmgruber bernhardmgruber force-pushed the chained_policy_prune branch 3 times, most recently from f6522ab to 75d0f10 Compare August 14, 2024 21:30
Copy link
Contributor

🟨 CI finished in 6h 48m: Pass: 80%/250 | Total: 6d 04h | Avg: 35m 38s | Max: 1h 27m | Hits: 64%/17277
  • 🟨 cub: Pass: 74%/131 | Total: 3d 23h | Avg: 43m 53s | Max: 1h 27m | Hits: 41%/4272

    🔍 cudacxx_family: nvcc 🔍
      🟩 ClangCUDA          Pass: 100%/2   | Total: 44m 00s | Avg: 22m 00s | Max: 22m 58s
      🔍 nvcc               Pass:  74%/129 | Total:  3d 23h | Avg: 44m 13s | Max:  1h 27m | Hits:  41%/4272  
    🟨 ctk
      🟩 11.1               Pass: 100%/15  | Total: 11h 09m | Avg: 44m 39s | Max: 53m 39s | Hits:  41%/712   
      🟥 11.8               Pass:   0%/3   | Total:  4h 01m | Avg:  1h 20m | Max:  1h 27m
      🟨 12.5               Pass:  73%/113 | Total:  3d 08h | Avg: 42m 49s | Max:  1h 11m | Hits:  41%/3560  
    🟨 cudacxx
      🟩 ClangCUDA17        Pass: 100%/2   | Total: 44m 00s | Avg: 22m 00s | Max: 22m 58s
      🟩 nvcc11.1           Pass: 100%/15  | Total: 11h 09m | Avg: 44m 39s | Max: 53m 39s | Hits:  41%/712   
      🟥 nvcc11.8           Pass:   0%/3   | Total:  4h 01m | Avg:  1h 20m | Max:  1h 27m
      🟨 nvcc12.5           Pass:  72%/111 | Total:  3d 07h | Avg: 43m 11s | Max:  1h 11m | Hits:  41%/3560  
    🟨 cxx
      🟨 Clang9             Pass:  50%/6   | Total:  5h 22m | Avg: 53m 48s | Max:  1h 06m
      🟩 Clang10            Pass: 100%/3   | Total:  2h 26m | Avg: 48m 59s | Max: 49m 04s
      🟩 Clang11            Pass: 100%/4   | Total:  3h 20m | Avg: 50m 12s | Max: 51m 58s
      🟩 Clang12            Pass: 100%/4   | Total:  3h 16m | Avg: 49m 07s | Max: 49m 26s
      🟩 Clang13            Pass: 100%/4   | Total:  3h 14m | Avg: 48m 32s | Max: 49m 17s
      🟨 Clang14            Pass:  75%/4   | Total:  3h 32m | Avg: 53m 13s | Max:  1h 05m
      🟥 Clang15            Pass:   0%/4   | Total:  4h 13m | Avg:  1h 03m | Max:  1h 05m
      🟥 Clang16            Pass:   0%/4   | Total:  4h 08m | Avg:  1h 02m | Max:  1h 04m
      🟩 Clang17            Pass: 100%/26  | Total: 12h 50m | Avg: 29m 38s | Max: 57m 40s
      🟩 GCC6               Pass: 100%/2   | Total:  1h 28m | Avg: 44m 17s | Max: 44m 28s
      🟩 GCC7               Pass: 100%/6   | Total:  4h 44m | Avg: 47m 20s | Max: 52m 29s
      🟩 GCC8               Pass: 100%/6   | Total:  4h 44m | Avg: 47m 23s | Max: 52m 42s
      🟩 GCC9               Pass: 100%/6   | Total:  4h 42m | Avg: 47m 03s | Max: 51m 27s
      🟥 GCC10              Pass:   0%/4   | Total:  4h 06m | Avg:  1h 01m | Max:  1h 07m
      🟥 GCC11              Pass:   0%/7   | Total:  8h 00m | Avg:  1h 08m | Max:  1h 27m
      🟨 GCC12              Pass:  25%/4   | Total:  4h 07m | Avg:  1h 01m | Max:  1h 04m
      🟨 GCC13              Pass:  75%/28  | Total: 12h 47m | Avg: 27m 24s | Max: 55m 08s
      🟩 Intel2023.2.0      Pass: 100%/3   | Total:  2h 37m | Avg: 52m 29s | Max: 53m 22s
      🟩 MSVC14.16          Pass: 100%/1   | Total: 53m 39s | Avg: 53m 39s | Max: 53m 39s | Hits:  41%/712   
      🟩 MSVC14.29          Pass: 100%/2   | Total:  1h 55m | Avg: 57m 47s | Max: 59m 10s | Hits:  41%/1424  
      🟩 MSVC14.39          Pass: 100%/3   | Total:  3h 15m | Avg:  1h 05m | Max:  1h 11m | Hits:  41%/2136  
    🟨 cxx_family
      🟨 Clang              Pass:  79%/59  | Total:  1d 18h | Avg: 43m 09s | Max:  1h 06m
      🟨 GCC                Pass:  66%/63  | Total:  1d 20h | Avg: 42m 33s | Max:  1h 27m
      🟩 Intel              Pass: 100%/3   | Total:  2h 37m | Avg: 52m 29s | Max: 53m 22s
      🟩 MSVC               Pass: 100%/6   | Total:  6h 04m | Avg:  1h 00m | Max:  1h 11m | Hits:  41%/4272  
    🟨 jobs
      🟨 Build              Pass:  68%/99  | Total:  3d 13h | Avg: 52m 05s | Max:  1h 27m | Hits:  41%/4272  
      🟩 DeviceLaunch       Pass: 100%/8   | Total:  2h 31m | Avg: 18m 55s | Max: 20m 53s
      🟩 GraphCapture       Pass: 100%/8   | Total:  2h 05m | Avg: 15m 41s | Max: 17m 29s
      🟨 HostLaunch         Pass:  87%/8   | Total:  2h 23m | Avg: 17m 57s | Max: 20m 41s
      🟨 TestGPU            Pass:  87%/8   | Total:  2h 52m | Avg: 21m 33s | Max: 27m 35s
    🟨 gpu
      🟨 v100               Pass:  74%/131 | Total:  3d 23h | Avg: 43m 53s | Max:  1h 27m | Hits:  41%/4272  
    🟨 cpu
      🟨 amd64              Pass:  73%/123 | Total:  3d 17h | Avg: 43m 29s | Max:  1h 27m | Hits:  41%/4272  
      🟨 arm64              Pass:  87%/8   | Total:  6h 40m | Avg: 50m 01s | Max: 55m 08s
    🟥 sm
      🟥 60;70;80;90        Pass:   0%/3   | Total:  4h 01m | Avg:  1h 20m | Max:  1h 27m
      🟥 90a                Pass:   0%/4   | Total:  1h 17m | Avg: 19m 17s | Max: 19m 45s
    🟨 std
      🟨 11                 Pass:  79%/34  | Total:  1d 01h | Avg: 44m 24s | Max:  1h 27m
      🟨 14                 Pass:  75%/37  | Total:  1d 03h | Avg: 45m 07s | Max:  1h 16m | Hits:  41%/2136  
      🟨 17                 Pass:  72%/36  | Total:  1d 02h | Avg: 44m 27s | Max:  1h 17m | Hits:  41%/1424  
      🟨 20                 Pass:  70%/24  | Total: 16h 09m | Avg: 40m 23s | Max:  1h 11m | Hits:  41%/712   
    
  • 🟨 thrust: Pass: 87%/118 | Total: 2d 04h | Avg: 26m 47s | Max: 56m 29s | Hits: 71%/13005

    🔍 cpu: amd64 🔍
      🔍 amd64              Pass:  86%/110 | Total:  2d 01h | Avg: 26m 50s | Max: 56m 29s | Hits:  71%/13005 
      🟩 arm64              Pass: 100%/8   | Total:  3h 28m | Avg: 26m 02s | Max: 30m 21s
    🔍 jobs: Build 🔍
      🔍 Build              Pass:  84%/99  | Total:  2d 00h | Avg: 29m 38s | Max: 56m 29s | Hits:  57%/8670  
      🟩 TestCPU            Pass: 100%/11  | Total:  1h 48m | Avg:  9m 51s | Max: 19m 20s | Hits:  99%/4335  
      🟩 TestGPU            Pass: 100%/8   | Total:  1h 58m | Avg: 14m 50s | Max: 17m 31s
    🟨 ctk
      🟩 11.1               Pass: 100%/15  | Total:  6h 52m | Avg: 27m 29s | Max: 56m 29s | Hits:  57%/1445  
      🟨 11.8               Pass:  66%/3   | Total:  2h 11m | Avg: 43m 42s | Max: 46m 18s
      🟨 12.5               Pass:  86%/100 | Total:  1d 19h | Avg: 26m 10s | Max: 55m 23s | Hits:  73%/11560 
    🟨 cudacxx
      🟥 ClangCUDA17        Pass:   0%/2   | Total:  1h 12m | Avg: 36m 25s | Max: 37m 46s
      🟩 nvcc11.1           Pass: 100%/15  | Total:  6h 52m | Avg: 27m 29s | Max: 56m 29s | Hits:  57%/1445  
      🟨 nvcc11.8           Pass:  66%/3   | Total:  2h 11m | Avg: 43m 42s | Max: 46m 18s
      🟨 nvcc12.5           Pass:  87%/98  | Total:  1d 18h | Avg: 25m 57s | Max: 55m 23s | Hits:  73%/11560 
    🟨 cxx
      🟩 Clang9             Pass: 100%/6   | Total:  2h 35m | Avg: 25m 53s | Max: 29m 20s
      🟩 Clang10            Pass: 100%/3   | Total:  1h 22m | Avg: 27m 37s | Max: 28m 55s
      🟩 Clang11            Pass: 100%/4   | Total:  1h 47m | Avg: 26m 51s | Max: 27m 45s
      🟩 Clang12            Pass: 100%/4   | Total:  1h 48m | Avg: 27m 04s | Max: 28m 26s
      🟩 Clang13            Pass: 100%/4   | Total:  1h 50m | Avg: 27m 35s | Max: 30m 09s
      🟩 Clang14            Pass: 100%/4   | Total:  1h 52m | Avg: 28m 11s | Max: 29m 22s
      🟩 Clang15            Pass: 100%/4   | Total:  1h 57m | Avg: 29m 15s | Max: 30m 15s
      🟩 Clang16            Pass: 100%/4   | Total:  1h 52m | Avg: 28m 00s | Max: 31m 23s
      🟨 Clang17            Pass:  88%/18  | Total:  6h 10m | Avg: 20m 34s | Max: 37m 46s
      🟩 GCC6               Pass: 100%/2   | Total: 50m 26s | Avg: 25m 13s | Max: 26m 27s
      🟨 GCC7               Pass:  50%/6   | Total:  1h 11m | Avg: 11m 54s | Max: 24m 40s
      🟨 GCC8               Pass:  83%/6   | Total:  2h 15m | Avg: 22m 35s | Max: 28m 42s
      🟩 GCC9               Pass: 100%/6   | Total:  2h 46m | Avg: 27m 40s | Max: 30m 27s
      🟨 GCC10              Pass:  50%/4   | Total:  2h 32m | Avg: 38m 03s | Max: 39m 18s
      🟨 GCC11              Pass:  71%/7   | Total:  4h 43m | Avg: 40m 31s | Max: 46m 18s
      🟨 GCC12              Pass:  75%/4   | Total:  2h 32m | Avg: 38m 03s | Max: 38m 36s
      🟨 GCC13              Pass:  80%/20  | Total:  6h 26m | Avg: 19m 19s | Max: 32m 02s
      🟩 Intel2023.2.0      Pass: 100%/3   | Total:  1h 59m | Avg: 39m 46s | Max: 39m 47s
      🟩 MSVC14.16          Pass: 100%/1   | Total: 56m 29s | Avg: 56m 29s | Max: 56m 29s | Hits:  57%/1445  
      🟩 MSVC14.29          Pass: 100%/2   | Total:  1h 39m | Avg: 49m 47s | Max: 50m 20s | Hits:  57%/2890  
      🟩 MSVC14.39          Pass: 100%/6   | Total:  3h 30m | Avg: 35m 08s | Max: 55m 23s | Hits:  78%/8670  
    🟨 cxx_family
      🟨 Clang              Pass:  96%/51  | Total: 21h 16m | Avg: 25m 01s | Max: 37m 46s
      🟨 GCC                Pass:  76%/55  | Total: 23h 18m | Avg: 25m 25s | Max: 46m 18s
      🟩 Intel              Pass: 100%/3   | Total:  1h 59m | Avg: 39m 46s | Max: 39m 47s
      🟩 MSVC               Pass: 100%/9   | Total:  6h 06m | Avg: 40m 45s | Max: 56m 29s | Hits:  71%/13005 
    🟨 gpu
      🟨 v100               Pass:  87%/118 | Total:  2d 04h | Avg: 26m 47s | Max: 56m 29s | Hits:  71%/13005 
    🟨 cudacxx_family
      🟥 ClangCUDA          Pass:   0%/2   | Total:  1h 12m | Avg: 36m 25s | Max: 37m 46s
      🟨 nvcc               Pass:  88%/116 | Total:  2d 03h | Avg: 26m 37s | Max: 56m 29s | Hits:  71%/13005 
    🟨 sm
      🟨 60;70;80;90        Pass:  66%/3   | Total:  2h 11m | Avg: 43m 42s | Max: 46m 18s
      🟥 90a                Pass:   0%/4   | Total:  1h 19m | Avg: 19m 58s | Max: 26m 20s
    🟨 std
      🟨 11                 Pass:  83%/30  | Total: 12h 31m | Avg: 25m 02s | Max: 44m 12s
      🟨 14                 Pass:  85%/34  | Total: 15h 38m | Avg: 27m 36s | Max: 56m 29s | Hits:  67%/5780  
      🟨 17                 Pass:  90%/33  | Total: 15h 22m | Avg: 27m 57s | Max: 50m 13s | Hits:  71%/4335  
      🟨 20                 Pass:  90%/21  | Total:  9h 08m | Avg: 26m 06s | Max: 49m 32s | Hits:  78%/2890  
    
  • 🟥 pycuda: Pass: 0%/1

    🟥 cpu
      🟥 amd64              Pass:   0%/1  
    🟥 ctk
      🟥 12.5               Pass:   0%/1  
    🟥 cudacxx
      🟥 nvcc12.5           Pass:   0%/1  
    🟥 cudacxx_family
      🟥 nvcc               Pass:   0%/1  
    🟥 cxx
      🟥 GCC13              Pass:   0%/1  
    🟥 cxx_family
      🟥 GCC                Pass:   0%/1  
    🟥 gpu
      🟥 v100               Pass:   0%/1  
    🟥 jobs
      🟥 Test               Pass:   0%/1  
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
+/- CUB
Thrust
CUDA Experimental
pycuda

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
+/- pycuda

🏃‍ Runner counts (total jobs: 250)

# Runner
178 linux-amd64-cpu16
41 linux-amd64-gpu-v100-latest-1
16 linux-arm64-cpu16
15 windows-amd64-cpu16

Copy link
Contributor

🟨 CI finished in 1d 15h: Pass: 98%/250 | Total: 4d 21h | Avg: 28m 17s | Max: 1h 11m | Hits: 64%/17277
  • 🟨 cub: Pass: 96%/131 | Total: 2d 21h | Avg: 31m 58s | Max: 1h 11m | Hits: 41%/4272

    🔍 cpu: amd64 🔍
      🔍 amd64              Pass:  96%/123 | Total:  2d 14h | Avg: 30m 41s | Max:  1h 11m | Hits:  41%/4272  
      🟩 arm64              Pass: 100%/8   | Total:  6h 53m | Avg: 51m 41s | Max: 55m 08s
    🔍 ctk: 12.5 🔍
      🟩 11.1               Pass: 100%/15  | Total: 11h 09m | Avg: 44m 39s | Max: 53m 39s | Hits:  41%/712   
      🟩 11.8               Pass: 100%/3   | Total: 14m 02s | Avg:  4m 40s | Max:  5m 13s
      🔍 12.5               Pass:  96%/113 | Total:  2d 10h | Avg: 31m 00s | Max:  1h 11m | Hits:  41%/3560  
    🔍 cudacxx: nvcc12.5 🔍
      🟩 ClangCUDA17        Pass: 100%/2   | Total: 44m 00s | Avg: 22m 00s | Max: 22m 58s
      🟩 nvcc11.1           Pass: 100%/15  | Total: 11h 09m | Avg: 44m 39s | Max: 53m 39s | Hits:  41%/712   
      🟩 nvcc11.8           Pass: 100%/3   | Total: 14m 02s | Avg:  4m 40s | Max:  5m 13s
      🔍 nvcc12.5           Pass:  96%/111 | Total:  2d 09h | Avg: 31m 10s | Max:  1h 11m | Hits:  41%/3560  
    🔍 cudacxx_family: nvcc 🔍
      🟩 ClangCUDA          Pass: 100%/2   | Total: 44m 00s | Avg: 22m 00s | Max: 22m 58s
      🔍 nvcc               Pass:  96%/129 | Total:  2d 21h | Avg: 32m 07s | Max:  1h 11m | Hits:  41%/4272  
    🔍 cxx: GCC13 🔍
      🟩 Clang9             Pass: 100%/6   | Total:  2h 26m | Avg: 24m 20s | Max: 45m 57s
      🟩 Clang10            Pass: 100%/3   | Total:  2h 26m | Avg: 48m 59s | Max: 49m 04s
      🟩 Clang11            Pass: 100%/4   | Total:  3h 20m | Avg: 50m 12s | Max: 51m 58s
      🟩 Clang12            Pass: 100%/4   | Total:  3h 16m | Avg: 49m 07s | Max: 49m 26s
      🟩 Clang13            Pass: 100%/4   | Total:  3h 14m | Avg: 48m 32s | Max: 49m 17s
      🟩 Clang14            Pass: 100%/4   | Total:  2h 31m | Avg: 37m 54s | Max: 50m 49s
      🟩 Clang15            Pass: 100%/4   | Total: 18m 17s | Avg:  4m 34s | Max:  4m 58s
      🟩 Clang16            Pass: 100%/4   | Total: 17m 49s | Avg:  4m 27s | Max:  4m 39s
      🟩 Clang17            Pass: 100%/26  | Total: 12h 50m | Avg: 29m 38s | Max: 57m 40s
      🟩 GCC6               Pass: 100%/2   | Total:  1h 28m | Avg: 44m 17s | Max: 44m 28s
      🟩 GCC7               Pass: 100%/6   | Total:  4h 44m | Avg: 47m 20s | Max: 52m 29s
      🟩 GCC8               Pass: 100%/6   | Total:  4h 44m | Avg: 47m 23s | Max: 52m 42s
      🟩 GCC9               Pass: 100%/6   | Total:  4h 42m | Avg: 47m 03s | Max: 51m 27s
      🟩 GCC10              Pass: 100%/4   | Total: 16m 46s | Avg:  4m 11s | Max:  4m 22s
      🟩 GCC11              Pass: 100%/7   | Total: 31m 10s | Avg:  4m 27s | Max:  5m 13s
      🟩 GCC12              Pass: 100%/4   | Total:  1h 18m | Avg: 19m 36s | Max:  1h 04m
      🔍 GCC13              Pass:  85%/28  | Total: 12h 37m | Avg: 27m 03s | Max: 55m 08s
      🟩 Intel2023.2.0      Pass: 100%/3   | Total:  2h 37m | Avg: 52m 29s | Max: 53m 22s
      🟩 MSVC14.16          Pass: 100%/1   | Total: 53m 39s | Avg: 53m 39s | Max: 53m 39s | Hits:  41%/712   
      🟩 MSVC14.29          Pass: 100%/2   | Total:  1h 55m | Avg: 57m 47s | Max: 59m 10s | Hits:  41%/1424  
      🟩 MSVC14.39          Pass: 100%/3   | Total:  3h 15m | Avg:  1h 05m | Max:  1h 11m | Hits:  41%/2136  
    🔍 cxx_family: GCC 🔍
      🟩 Clang              Pass: 100%/59  | Total:  1d 06h | Avg: 31m 14s | Max: 57m 40s
      🔍 GCC                Pass:  93%/63  | Total:  1d 06h | Avg: 28m 56s | Max:  1h 04m
      🟩 Intel              Pass: 100%/3   | Total:  2h 37m | Avg: 52m 29s | Max: 53m 22s
      🟩 MSVC               Pass: 100%/6   | Total:  6h 04m | Avg:  1h 00m | Max:  1h 11m | Hits:  41%/4272  
    🔍 jobs: Build 🔍
      🔍 Build              Pass:  95%/99  | Total:  2d 11h | Avg: 35m 59s | Max:  1h 11m | Hits:  41%/4272  
      🟩 DeviceLaunch       Pass: 100%/8   | Total:  2h 31m | Avg: 18m 55s | Max: 20m 53s
      🟩 GraphCapture       Pass: 100%/8   | Total:  2h 05m | Avg: 15m 41s | Max: 17m 29s
      🟩 HostLaunch         Pass: 100%/8   | Total:  2h 34m | Avg: 19m 19s | Max: 20m 41s
      🟩 TestGPU            Pass: 100%/8   | Total:  3h 12m | Avg: 24m 05s | Max: 27m 35s
    🚨 sm: 90a 🚨
      🟩 60;70;80;90        Pass: 100%/3   | Total: 14m 02s | Avg:  4m 40s | Max:  5m 13s
      🔥 90a                Pass:   0%/4   | Total: 22m 54s | Avg:  5m 43s | Max:  6m 41s
    🟨 gpu
      🟨 v100               Pass:  96%/131 | Total:  2d 21h | Avg: 31m 58s | Max:  1h 11m | Hits:  41%/4272  
    🟨 std
      🟨 11                 Pass:  97%/34  | Total: 18h 46m | Avg: 33m 08s | Max:  1h 04m
      🟨 14                 Pass:  97%/37  | Total: 20h 58m | Avg: 34m 00s | Max:  1h 05m | Hits:  41%/2136  
      🟨 17                 Pass:  97%/36  | Total: 18h 33m | Avg: 30m 56s | Max: 58m 15s | Hits:  41%/1424  
      🟨 20                 Pass:  95%/24  | Total: 11h 28m | Avg: 28m 42s | Max:  1h 11m | Hits:  41%/712   
    
  • 🟨 thrust: Pass: 99%/118 | Total: 1d 23h | Avg: 24m 20s | Max: 56m 29s | Hits: 71%/13005

    🔍 cpu: amd64 🔍
      🔍 amd64              Pass:  99%/110 | Total:  1d 20h | Avg: 24m 12s | Max: 56m 29s | Hits:  71%/13005 
      🟩 arm64              Pass: 100%/8   | Total:  3h 28m | Avg: 26m 02s | Max: 30m 21s
    🔍 ctk: 12.5 🔍
      🟩 11.1               Pass: 100%/15  | Total:  6h 52m | Avg: 27m 29s | Max: 56m 29s | Hits:  57%/1445  
      🟩 11.8               Pass: 100%/3   | Total:  1h 30m | Avg: 30m 07s | Max: 46m 18s
      🔍 12.5               Pass:  99%/100 | Total:  1d 15h | Avg: 23m 41s | Max: 55m 23s | Hits:  73%/11560 
    🔍 cudacxx: nvcc12.5 🔍
      🟩 ClangCUDA17        Pass: 100%/2   | Total:  8m 18s | Avg:  4m 09s | Max:  4m 12s
      🟩 nvcc11.1           Pass: 100%/15  | Total:  6h 52m | Avg: 27m 29s | Max: 56m 29s | Hits:  57%/1445  
      🟩 nvcc11.8           Pass: 100%/3   | Total:  1h 30m | Avg: 30m 07s | Max: 46m 18s
      🔍 nvcc12.5           Pass:  98%/98  | Total:  1d 15h | Avg: 24m 05s | Max: 55m 23s | Hits:  73%/11560 
    🔍 cudacxx_family: nvcc 🔍
      🟩 ClangCUDA          Pass: 100%/2   | Total:  8m 18s | Avg:  4m 09s | Max:  4m 12s
      🔍 nvcc               Pass:  99%/116 | Total:  1d 23h | Avg: 24m 41s | Max: 56m 29s | Hits:  71%/13005 
    🔍 cxx: GCC13 🔍
      🟩 Clang9             Pass: 100%/6   | Total:  2h 35m | Avg: 25m 53s | Max: 29m 20s
      🟩 Clang10            Pass: 100%/3   | Total:  1h 22m | Avg: 27m 37s | Max: 28m 55s
      🟩 Clang11            Pass: 100%/4   | Total:  1h 47m | Avg: 26m 51s | Max: 27m 45s
      🟩 Clang12            Pass: 100%/4   | Total:  1h 48m | Avg: 27m 04s | Max: 28m 26s
      🟩 Clang13            Pass: 100%/4   | Total:  1h 50m | Avg: 27m 35s | Max: 30m 09s
      🟩 Clang14            Pass: 100%/4   | Total:  1h 52m | Avg: 28m 11s | Max: 29m 22s
      🟩 Clang15            Pass: 100%/4   | Total:  1h 57m | Avg: 29m 15s | Max: 30m 15s
      🟩 Clang16            Pass: 100%/4   | Total:  1h 52m | Avg: 28m 00s | Max: 31m 23s
      🟩 Clang17            Pass: 100%/18  | Total:  5h 05m | Avg: 16m 59s | Max: 28m 45s
      🟩 GCC6               Pass: 100%/2   | Total: 50m 26s | Avg: 25m 13s | Max: 26m 27s
      🟩 GCC7               Pass: 100%/6   | Total:  2h 31m | Avg: 25m 16s | Max: 29m 26s
      🟩 GCC8               Pass: 100%/6   | Total:  2h 42m | Avg: 27m 04s | Max: 28m 42s
      🟩 GCC9               Pass: 100%/6   | Total:  2h 46m | Avg: 27m 40s | Max: 30m 27s
      🟩 GCC10              Pass: 100%/4   | Total:  1h 26m | Avg: 21m 31s | Max: 39m 18s
      🟩 GCC11              Pass: 100%/7   | Total:  2h 57m | Avg: 25m 20s | Max: 46m 18s
      🟩 GCC12              Pass: 100%/4   | Total: 51m 08s | Avg: 12m 47s | Max: 38m 15s
      🔍 GCC13              Pass:  95%/20  | Total:  5h 28m | Avg: 16m 26s | Max: 32m 02s
      🟩 Intel2023.2.0      Pass: 100%/3   | Total:  1h 59m | Avg: 39m 46s | Max: 39m 47s
      🟩 MSVC14.16          Pass: 100%/1   | Total: 56m 29s | Avg: 56m 29s | Max: 56m 29s | Hits:  57%/1445  
      🟩 MSVC14.29          Pass: 100%/2   | Total:  1h 39m | Avg: 49m 47s | Max: 50m 20s | Hits:  57%/2890  
      🟩 MSVC14.39          Pass: 100%/6   | Total:  3h 30m | Avg: 35m 08s | Max: 55m 23s | Hits:  78%/8670  
    🔍 cxx_family: GCC 🔍
      🟩 Clang              Pass: 100%/51  | Total: 20h 12m | Avg: 23m 45s | Max: 31m 23s
      🔍 GCC                Pass:  98%/55  | Total: 19h 33m | Avg: 21m 20s | Max: 46m 18s
      🟩 Intel              Pass: 100%/3   | Total:  1h 59m | Avg: 39m 46s | Max: 39m 47s
      🟩 MSVC               Pass: 100%/9   | Total:  6h 06m | Avg: 40m 45s | Max: 56m 29s | Hits:  71%/13005 
    🔍 jobs: Build 🔍
      🔍 Build              Pass:  98%/99  | Total:  1d 20h | Avg: 26m 43s | Max: 56m 29s | Hits:  57%/8670  
      🟩 TestCPU            Pass: 100%/11  | Total:  1h 48m | Avg:  9m 51s | Max: 19m 20s | Hits:  99%/4335  
      🟩 TestGPU            Pass: 100%/8   | Total:  1h 58m | Avg: 14m 50s | Max: 17m 31s
    🔍 sm: 90a 🔍
      🟩 60;70;80;90        Pass: 100%/3   | Total:  1h 30m | Avg: 30m 07s | Max: 46m 18s
      🔍 90a                Pass:  75%/4   | Total: 22m 01s | Avg:  5m 30s | Max: 11m 41s
    🔍 std: 20 🔍
      🟩 11                 Pass: 100%/30  | Total: 10h 09m | Avg: 20m 19s | Max: 39m 47s
      🟩 14                 Pass: 100%/34  | Total: 15h 07m | Avg: 26m 41s | Max: 56m 29s | Hits:  67%/5780  
      🟩 17                 Pass: 100%/33  | Total: 14h 25m | Avg: 26m 13s | Max: 50m 13s | Hits:  71%/4335  
      🔍 20                 Pass:  95%/21  | Total:  8h 09m | Avg: 23m 18s | Max: 49m 32s | Hits:  78%/2890  
    🟨 gpu
      🟨 v100               Pass:  99%/118 | Total:  1d 23h | Avg: 24m 20s | Max: 56m 29s | Hits:  71%/13005 
    
  • 🟩 pycuda: Pass: 100%/1 | Total: 11m 13s | Avg: 11m 13s | Max: 11m 13s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 11m 13s | Avg: 11m 13s | Max: 11m 13s
    🟩 ctk
      🟩 12.5               Pass: 100%/1   | Total: 11m 13s | Avg: 11m 13s | Max: 11m 13s
    🟩 cudacxx
      🟩 nvcc12.5           Pass: 100%/1   | Total: 11m 13s | Avg: 11m 13s | Max: 11m 13s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 11m 13s | Avg: 11m 13s | Max: 11m 13s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 11m 13s | Avg: 11m 13s | Max: 11m 13s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 11m 13s | Avg: 11m 13s | Max: 11m 13s
    🟩 gpu
      🟩 v100               Pass: 100%/1   | Total: 11m 13s | Avg: 11m 13s | Max: 11m 13s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 11m 13s | Avg: 11m 13s | Max: 11m 13s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
+/- CUB
Thrust
CUDA Experimental
pycuda

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
+/- pycuda

🏃‍ Runner counts (total jobs: 250)

# Runner
178 linux-amd64-cpu16
41 linux-amd64-gpu-v100-latest-1
16 linux-arm64-cpu16
15 windows-amd64-cpu16

Copy link
Collaborator

@elstehle elstehle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the great work, I love the idea! 💚
Since the mechanism is at the core of CUB, I want to make sure all works as expected. I left a few comments that I hope will further improve test coverage.

cub/cub/util_device.cuh Outdated Show resolved Hide resolved
cub/test/catch2_test_util_device.cu Outdated Show resolved Hide resolved
cub/test/catch2_test_util_device.cu Outdated Show resolved Hide resolved
cub/cub/util_device.cuh Show resolved Hide resolved
cub/cub/util_device.cuh Outdated Show resolved Hide resolved
cub/test/catch2_test_util_device.cu Outdated Show resolved Hide resolved
cub/test/catch2_test_util_device.cu Outdated Show resolved Hide resolved
cub/test/catch2_test_util_device.cu Outdated Show resolved Hide resolved
@bernhardmgruber bernhardmgruber force-pushed the chained_policy_prune branch 3 times, most recently from 9c627c6 to 8766281 Compare August 16, 2024 18:52
Copy link
Contributor

🟨 CI finished in 9h 27m: Pass: 83%/250 | Total: 1d 23h | Avg: 11m 19s | Max: 48m 33s | Hits: 98%/16565
  • 🟨 cub: Pass: 68%/131 | Total: 1d 07h | Avg: 14m 30s | Max: 48m 33s | Hits: 97%/3560

    🔍 cpu: amd64 🔍
      🔍 amd64              Pass:  66%/123 | Total:  1d 06h | Avg: 15m 04s | Max: 48m 33s | Hits:  97%/3560  
      🟩 arm64              Pass: 100%/8   | Total: 46m 16s | Avg:  5m 47s | Max:  7m 42s
    🟨 ctk
      🟥 11.1               Pass:   0%/15  | Total:  6h 51m | Avg: 27m 24s | Max: 30m 32s
      🟩 11.8               Pass: 100%/3   | Total: 17m 17s | Avg:  5m 45s | Max:  8m 23s
      🟨 12.5               Pass:  76%/113 | Total:  1d 00h | Avg: 13m 01s | Max: 48m 33s | Hits:  97%/3560  
    🟨 cudacxx
      🟥 ClangCUDA17        Pass:   0%/2   | Total: 19m 42s | Avg:  9m 51s | Max:  9m 59s
      🟥 nvcc11.1           Pass:   0%/15  | Total:  6h 51m | Avg: 27m 24s | Max: 30m 32s
      🟩 nvcc11.8           Pass: 100%/3   | Total: 17m 17s | Avg:  5m 45s | Max:  8m 23s
      🟨 nvcc12.5           Pass:  78%/111 | Total:  1d 00h | Avg: 13m 04s | Max: 48m 33s | Hits:  97%/3560  
    🟨 cxx
      🟨 Clang9             Pass:  50%/6   | Total:  1h 44m | Avg: 17m 20s | Max: 28m 45s
      🟩 Clang10            Pass: 100%/3   | Total: 20m 52s | Avg:  6m 57s | Max:  7m 14s
      🟩 Clang11            Pass: 100%/4   | Total: 25m 07s | Avg:  6m 16s | Max:  6m 37s
      🟩 Clang12            Pass: 100%/4   | Total: 25m 27s | Avg:  6m 21s | Max:  6m 51s
      🟩 Clang13            Pass: 100%/4   | Total: 25m 36s | Avg:  6m 24s | Max:  6m 54s
      🟩 Clang14            Pass: 100%/4   | Total: 19m 56s | Avg:  4m 59s | Max:  6m 24s
      🟩 Clang15            Pass: 100%/4   | Total: 20m 44s | Avg:  5m 11s | Max:  6m 27s
      🟩 Clang16            Pass: 100%/4   | Total: 20m 49s | Avg:  5m 12s | Max:  6m 47s
      🟨 Clang17            Pass:  46%/26  | Total:  8h 44m | Avg: 20m 09s | Max: 41m 58s
      🟥 GCC6               Pass:   0%/2   | Total: 56m 49s | Avg: 28m 24s | Max: 29m 21s
      🟨 GCC7               Pass:  50%/6   | Total:  1h 42m | Avg: 17m 01s | Max: 29m 58s
      🟨 GCC8               Pass:  50%/6   | Total:  1h 37m | Avg: 16m 16s | Max: 28m 36s
      🟨 GCC9               Pass:  50%/6   | Total:  1h 42m | Avg: 17m 03s | Max: 30m 32s
      🟩 GCC10              Pass: 100%/4   | Total:  1h 01m | Avg: 15m 28s | Max: 48m 33s
      🟩 GCC11              Pass: 100%/7   | Total: 38m 04s | Avg:  5m 26s | Max:  8m 23s
      🟩 GCC12              Pass: 100%/4   | Total: 20m 12s | Avg:  5m 03s | Max:  6m 37s
      🟨 GCC13              Pass:  57%/28  | Total:  8h 49m | Avg: 18m 54s | Max: 48m 21s
      🟩 Intel2023.2.0      Pass: 100%/3   | Total: 23m 23s | Avg:  7m 47s | Max:  8m 12s
      🟥 MSVC14.16          Pass:   0%/1   | Total: 16m 49s | Avg: 16m 49s | Max: 16m 49s
      🟩 MSVC14.29          Pass: 100%/2   | Total: 25m 15s | Avg: 12m 37s | Max: 13m 37s | Hits:  97%/1424  
      🟩 MSVC14.39          Pass: 100%/3   | Total: 39m 28s | Avg: 13m 09s | Max: 14m 18s | Hits:  97%/2136  
    🟨 cxx_family
      🟨 Clang              Pass:  71%/59  | Total: 13h 06m | Avg: 13m 20s | Max: 41m 58s
      🟨 GCC                Pass:  63%/63  | Total: 16h 48m | Avg: 16m 00s | Max: 48m 33s
      🟩 Intel              Pass: 100%/3   | Total: 23m 23s | Avg:  7m 47s | Max:  8m 12s
      🟨 MSVC               Pass:  83%/6   | Total:  1h 21m | Avg: 13m 35s | Max: 16m 49s | Hits:  97%/3560  
    🟨 jobs
      🟨 Build              Pass:  82%/99  | Total: 21h 41m | Avg: 13m 08s | Max: 48m 33s | Hits:  97%/3560  
      🟥 DeviceLaunch       Pass:   0%/8   | Total:  2h 24m | Avg: 18m 01s | Max: 27m 45s
      🟥 GraphCapture       Pass:   0%/8   | Total:  2h 05m | Avg: 15m 41s | Max: 19m 17s
      🟥 HostLaunch         Pass:   0%/8   | Total:  2h 18m | Avg: 17m 19s | Max: 18m 27s
      🟩 TestGPU            Pass: 100%/8   | Total:  3h 10m | Avg: 23m 51s | Max: 26m 54s
    🟨 gpu
      🟨 v100               Pass:  68%/131 | Total:  1d 07h | Avg: 14m 30s | Max: 48m 33s | Hits:  97%/3560  
    🟨 cudacxx_family
      🟥 ClangCUDA          Pass:   0%/2   | Total: 19m 42s | Avg:  9m 51s | Max:  9m 59s
      🟨 nvcc               Pass:  69%/129 | Total:  1d 07h | Avg: 14m 34s | Max: 48m 33s | Hits:  97%/3560  
    🟩 sm
      🟩 60;70;80;90        Pass: 100%/3   | Total: 17m 17s | Avg:  5m 45s | Max:  8m 23s
      🟩 90a                Pass: 100%/4   | Total: 46m 10s | Avg: 11m 32s | Max: 15m 50s
    🟨 std
      🟨 11                 Pass:  67%/34  | Total:  9h 13m | Avg: 16m 16s | Max: 48m 33s
      🟨 14                 Pass:  67%/37  | Total:  8h 50m | Avg: 14m 19s | Max: 45m 33s | Hits:  97%/1424  
      🟨 17                 Pass:  69%/36  | Total:  8h 05m | Avg: 13m 28s | Max: 45m 48s | Hits:  97%/1424  
      🟨 20                 Pass:  70%/24  | Total:  5h 31m | Avg: 13m 49s | Max: 48m 21s | Hits:  97%/712   
    
  • 🟨 thrust: Pass: 99%/118 | Total: 15h 21m | Avg: 7m 48s | Max: 24m 11s | Hits: 99%/13005

    🔍 cpu: amd64 🔍
      🔍 amd64              Pass:  99%/110 | Total: 14h 50m | Avg:  8m 05s | Max: 24m 11s | Hits:  99%/13005 
      🟩 arm64              Pass: 100%/8   | Total: 30m 17s | Avg:  3m 47s | Max:  4m 28s
    🔍 ctk: 12.5 🔍
      🟩 11.1               Pass: 100%/15  | Total:  1h 03m | Avg:  4m 14s | Max: 13m 52s | Hits:  99%/1445  
      🟩 11.8               Pass: 100%/3   | Total: 13m 32s | Avg:  4m 30s | Max:  5m 31s
      🔍 12.5               Pass:  99%/100 | Total: 14h 03m | Avg:  8m 26s | Max: 24m 11s | Hits:  99%/11560 
    🔍 cudacxx: nvcc12.5 🔍
      🟩 ClangCUDA17        Pass: 100%/2   | Total:  8m 11s | Avg:  4m 05s | Max:  4m 08s
      🟩 nvcc11.1           Pass: 100%/15  | Total:  1h 03m | Avg:  4m 14s | Max: 13m 52s | Hits:  99%/1445  
      🟩 nvcc11.8           Pass: 100%/3   | Total: 13m 32s | Avg:  4m 30s | Max:  5m 31s
      🔍 nvcc12.5           Pass:  98%/98  | Total: 13h 55m | Avg:  8m 31s | Max: 24m 11s | Hits:  99%/11560 
    🔍 cudacxx_family: nvcc 🔍
      🟩 ClangCUDA          Pass: 100%/2   | Total:  8m 11s | Avg:  4m 05s | Max:  4m 08s
      🔍 nvcc               Pass:  99%/116 | Total: 15h 13m | Avg:  7m 52s | Max: 24m 11s | Hits:  99%/13005 
    🔍 cxx: GCC13 🔍
      🟩 Clang9             Pass: 100%/6   | Total: 24m 59s | Avg:  4m 09s | Max:  5m 06s
      🟩 Clang10            Pass: 100%/3   | Total: 14m 48s | Avg:  4m 56s | Max:  5m 24s
      🟩 Clang11            Pass: 100%/4   | Total: 16m 38s | Avg:  4m 09s | Max:  4m 18s
      🟩 Clang12            Pass: 100%/4   | Total: 15m 45s | Avg:  3m 56s | Max:  4m 04s
      🟩 Clang13            Pass: 100%/4   | Total: 15m 47s | Avg:  3m 56s | Max:  4m 21s
      🟩 Clang14            Pass: 100%/4   | Total: 15m 53s | Avg:  3m 58s | Max:  4m 21s
      🟩 Clang15            Pass: 100%/4   | Total: 16m 52s | Avg:  4m 13s | Max:  4m 29s
      🟩 Clang16            Pass: 100%/4   | Total: 16m 42s | Avg:  4m 10s | Max:  4m 21s
      🟩 Clang17            Pass: 100%/18  | Total:  2h 57m | Avg:  9m 51s | Max: 20m 35s
      🟩 GCC6               Pass: 100%/2   | Total:  7m 11s | Avg:  3m 35s | Max:  3m 47s
      🟩 GCC7               Pass: 100%/6   | Total: 44m 35s | Avg:  7m 25s | Max: 14m 17s
      🟩 GCC8               Pass: 100%/6   | Total: 55m 04s | Avg:  9m 10s | Max: 18m 49s
      🟩 GCC9               Pass: 100%/6   | Total:  1h 01m | Avg: 10m 11s | Max: 21m 30s
      🟩 GCC10              Pass: 100%/4   | Total: 21m 31s | Avg:  5m 22s | Max:  9m 36s
      🟩 GCC11              Pass: 100%/7   | Total: 35m 23s | Avg:  5m 03s | Max: 10m 09s
      🟩 GCC12              Pass: 100%/4   | Total: 46m 13s | Avg: 11m 33s | Max: 13m 30s
      🔍 GCC13              Pass:  95%/20  | Total:  3h 04m | Avg:  9m 12s | Max: 24m 11s
      🟩 Intel2023.2.0      Pass: 100%/3   | Total: 15m 42s | Avg:  5m 14s | Max:  6m 05s
      🟩 MSVC14.16          Pass: 100%/1   | Total: 13m 52s | Avg: 13m 52s | Max: 13m 52s | Hits:  99%/1445  
      🟩 MSVC14.29          Pass: 100%/2   | Total: 25m 14s | Avg: 12m 37s | Max: 12m 58s | Hits:  99%/2890  
      🟩 MSVC14.39          Pass: 100%/6   | Total:  1h 36m | Avg: 16m 02s | Max: 18m 44s | Hits:  99%/8670  
    🔍 cxx_family: GCC 🔍
      🟩 Clang              Pass: 100%/51  | Total:  5h 14m | Avg:  6m 10s | Max: 20m 35s
      🔍 GCC                Pass:  98%/55  | Total:  7h 35m | Avg:  8m 16s | Max: 24m 11s
      🟩 Intel              Pass: 100%/3   | Total: 15m 42s | Avg:  5m 14s | Max:  6m 05s
      🟩 MSVC               Pass: 100%/9   | Total:  2h 15m | Avg: 15m 02s | Max: 18m 44s | Hits:  99%/13005 
    🔍 jobs: Build 🔍
      🔍 Build              Pass:  98%/99  | Total: 11h 45m | Avg:  7m 07s | Max: 24m 11s | Hits:  99%/8670  
      🟩 TestCPU            Pass: 100%/11  | Total:  1h 47m | Avg:  9m 49s | Max: 18m 44s | Hits:  99%/4335  
      🟩 TestGPU            Pass: 100%/8   | Total:  1h 47m | Avg: 13m 25s | Max: 15m 56s
    🔍 sm: 90a 🔍
      🟩 60;70;80;90        Pass: 100%/3   | Total: 13m 32s | Avg:  4m 30s | Max:  5m 31s
      🔍 90a                Pass:  75%/4   | Total: 14m 41s | Avg:  3m 40s | Max:  4m 01s
    🔍 std: 20 🔍
      🟩 11                 Pass: 100%/30  | Total:  3h 02m | Avg:  6m 05s | Max: 17m 09s
      🟩 14                 Pass: 100%/34  | Total:  4h 40m | Avg:  8m 15s | Max: 23m 14s | Hits:  99%/5780  
      🟩 17                 Pass: 100%/33  | Total:  4h 42m | Avg:  8m 32s | Max: 24m 11s | Hits:  99%/4335  
      🔍 20                 Pass:  95%/21  | Total:  2h 55m | Avg:  8m 22s | Max: 20m 35s | Hits:  99%/2890  
    🟨 gpu
      🟨 v100               Pass:  99%/118 | Total: 15h 21m | Avg:  7m 48s | Max: 24m 11s | Hits:  99%/13005 
    
  • 🟩 pycuda: Pass: 100%/1 | Total: 11m 39s | Avg: 11m 39s | Max: 11m 39s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 11m 39s | Avg: 11m 39s | Max: 11m 39s
    🟩 ctk
      🟩 12.5               Pass: 100%/1   | Total: 11m 39s | Avg: 11m 39s | Max: 11m 39s
    🟩 cudacxx
      🟩 nvcc12.5           Pass: 100%/1   | Total: 11m 39s | Avg: 11m 39s | Max: 11m 39s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 11m 39s | Avg: 11m 39s | Max: 11m 39s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 11m 39s | Avg: 11m 39s | Max: 11m 39s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 11m 39s | Avg: 11m 39s | Max: 11m 39s
    🟩 gpu
      🟩 v100               Pass: 100%/1   | Total: 11m 39s | Avg: 11m 39s | Max: 11m 39s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 11m 39s | Avg: 11m 39s | Max: 11m 39s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
+/- CUB
Thrust
CUDA Experimental
pycuda

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
+/- pycuda

🏃‍ Runner counts (total jobs: 250)

# Runner
178 linux-amd64-cpu16
41 linux-amd64-gpu-v100-latest-1
16 linux-arm64-cpu16
15 windows-amd64-cpu16

@bernhardmgruber bernhardmgruber force-pushed the chained_policy_prune branch 2 times, most recently from bd97bb1 to bedf081 Compare August 19, 2024 16:22
Copy link
Contributor

🟩 CI finished in 1d 00h: Pass: 100%/250 | Total: 6d 00h | Avg: 34m 43s | Max: 1h 26m | Hits: 64%/17355
  • 🟩 cub: Pass: 100%/131 | Total: 3d 22h | Avg: 43m 22s | Max: 1h 26m | Hits: 41%/4278

    🟩 cpu
      🟩 amd64              Pass: 100%/123 | Total:  3d 15h | Avg: 42m 40s | Max:  1h 26m | Hits:  41%/4278  
      🟩 arm64              Pass: 100%/8   | Total:  7h 12m | Avg: 54m 04s | Max: 57m 15s
    🟩 ctk
      🟩 11.1               Pass: 100%/15  | Total: 11h 20m | Avg: 45m 20s | Max: 54m 06s | Hits:  41%/713   
      🟩 11.8               Pass: 100%/3   | Total:  3h 20m | Avg:  1h 06m | Max:  1h 10m
      🟩 12.5               Pass: 100%/113 | Total:  3d 08h | Avg: 42m 29s | Max:  1h 26m | Hits:  41%/3565  
    🟩 cudacxx
      🟩 ClangCUDA17        Pass: 100%/2   | Total: 49m 17s | Avg: 24m 38s | Max: 25m 19s
      🟩 nvcc11.1           Pass: 100%/15  | Total: 11h 20m | Avg: 45m 20s | Max: 54m 06s | Hits:  41%/713   
      🟩 nvcc11.8           Pass: 100%/3   | Total:  3h 20m | Avg:  1h 06m | Max:  1h 10m
      🟩 nvcc12.5           Pass: 100%/111 | Total:  3d 07h | Avg: 42m 49s | Max:  1h 26m | Hits:  41%/3565  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 49m 17s | Avg: 24m 38s | Max: 25m 19s
      🟩 nvcc               Pass: 100%/129 | Total:  3d 21h | Avg: 43m 40s | Max:  1h 26m | Hits:  41%/4278  
    🟩 cxx
      🟩 Clang9             Pass: 100%/6   | Total:  4h 52m | Avg: 48m 42s | Max: 55m 29s
      🟩 Clang10            Pass: 100%/3   | Total:  2h 33m | Avg: 51m 00s | Max: 53m 10s
      🟩 Clang11            Pass: 100%/4   | Total:  3h 25m | Avg: 51m 29s | Max: 52m 53s
      🟩 Clang12            Pass: 100%/4   | Total:  3h 26m | Avg: 51m 31s | Max: 54m 47s
      🟩 Clang13            Pass: 100%/4   | Total:  3h 25m | Avg: 51m 29s | Max: 53m 44s
      🟩 Clang14            Pass: 100%/4   | Total:  3h 26m | Avg: 51m 38s | Max: 54m 23s
      🟩 Clang15            Pass: 100%/4   | Total:  3h 19m | Avg: 49m 53s | Max: 53m 35s
      🟩 Clang16            Pass: 100%/4   | Total:  3h 35m | Avg: 53m 51s | Max: 57m 39s
      🟩 Clang17            Pass: 100%/26  | Total: 12h 50m | Avg: 29m 39s | Max: 54m 32s
      🟩 GCC6               Pass: 100%/2   | Total:  1h 28m | Avg: 44m 27s | Max: 46m 11s
      🟩 GCC7               Pass: 100%/6   | Total:  4h 38m | Avg: 46m 20s | Max: 51m 14s
      🟩 GCC8               Pass: 100%/6   | Total:  4h 51m | Avg: 48m 36s | Max: 52m 41s
      🟩 GCC9               Pass: 100%/6   | Total:  4h 46m | Avg: 47m 43s | Max: 50m 46s
      🟩 GCC10              Pass: 100%/4   | Total:  3h 18m | Avg: 49m 38s | Max: 50m 16s
      🟩 GCC11              Pass: 100%/7   | Total:  6h 41m | Avg: 57m 22s | Max:  1h 10m
      🟩 GCC12              Pass: 100%/4   | Total:  3h 23m | Avg: 50m 53s | Max: 51m 39s
      🟩 GCC13              Pass: 100%/28  | Total: 15h 31m | Avg: 33m 16s | Max:  1h 26m
      🟩 Intel2023.2.0      Pass: 100%/3   | Total:  2h 44m | Avg: 54m 55s | Max: 56m 08s
      🟩 MSVC14.16          Pass: 100%/1   | Total: 54m 06s | Avg: 54m 06s | Max: 54m 06s | Hits:  41%/713   
      🟩 MSVC14.29          Pass: 100%/2   | Total:  2h 10m | Avg:  1h 05m | Max:  1h 05m | Hits:  41%/1426  
      🟩 MSVC14.39          Pass: 100%/3   | Total:  3h 17m | Avg:  1h 05m | Max:  1h 08m | Hits:  41%/2139  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/59  | Total:  1d 16h | Avg: 41m 37s | Max: 57m 39s
      🟩 GCC                Pass: 100%/63  | Total:  1d 20h | Avg: 42m 32s | Max:  1h 26m
      🟩 Intel              Pass: 100%/3   | Total:  2h 44m | Avg: 54m 55s | Max: 56m 08s
      🟩 MSVC               Pass: 100%/6   | Total:  6h 22m | Avg:  1h 03m | Max:  1h 08m | Hits:  41%/4278  
    🟩 gpu
      🟩 v100               Pass: 100%/131 | Total:  3d 22h | Avg: 43m 22s | Max:  1h 26m | Hits:  41%/4278  
    🟩 jobs
      🟩 Build              Pass: 100%/99  | Total:  3d 10h | Avg: 50m 08s | Max:  1h 10m | Hits:  41%/4278  
      🟩 DeviceLaunch       Pass: 100%/8   | Total:  2h 34m | Avg: 19m 17s | Max: 29m 34s
      🟩 GraphCapture       Pass: 100%/8   | Total:  3h 25m | Avg: 25m 43s | Max:  1h 26m
      🟩 HostLaunch         Pass: 100%/8   | Total:  2h 44m | Avg: 20m 30s | Max: 29m 21s
      🟩 TestGPU            Pass: 100%/8   | Total:  3h 14m | Avg: 24m 18s | Max: 36m 11s
    🟩 sm
      🟩 60;70;80;90        Pass: 100%/3   | Total:  3h 20m | Avg:  1h 06m | Max:  1h 10m
      🟩 90a                Pass: 100%/4   | Total:  1h 30m | Avg: 22m 35s | Max: 24m 25s
    🟩 std
      🟩 11                 Pass: 100%/34  | Total:  1d 00h | Avg: 43m 58s | Max:  1h 26m
      🟩 14                 Pass: 100%/37  | Total:  1d 03h | Avg: 44m 52s | Max:  1h 10m | Hits:  41%/2139  
      🟩 17                 Pass: 100%/36  | Total:  1d 01h | Avg: 43m 17s | Max:  1h 06m | Hits:  41%/1426  
      🟩 20                 Pass: 100%/24  | Total: 16h 08m | Avg: 40m 21s | Max:  1h 08m | Hits:  41%/713   
    
  • 🟩 thrust: Pass: 100%/118 | Total: 2d 01h | Avg: 25m 17s | Max: 53m 32s | Hits: 71%/13077

    🟩 cpu
      🟩 amd64              Pass: 100%/110 | Total:  1d 22h | Avg: 25m 16s | Max: 53m 32s | Hits:  71%/13077 
      🟩 arm64              Pass: 100%/8   | Total:  3h 24m | Avg: 25m 37s | Max: 29m 04s
    🟩 ctk
      🟩 11.1               Pass: 100%/15  | Total:  6h 32m | Avg: 26m 08s | Max: 47m 31s | Hits:  57%/1453  
      🟩 11.8               Pass: 100%/3   | Total:  1h 45m | Avg: 35m 12s | Max: 37m 29s
      🟩 12.5               Pass: 100%/100 | Total:  1d 17h | Avg: 24m 52s | Max: 53m 32s | Hits:  73%/11624 
    🟩 cudacxx
      🟩 ClangCUDA17        Pass: 100%/2   | Total: 52m 10s | Avg: 26m 05s | Max: 26m 27s
      🟩 nvcc11.1           Pass: 100%/15  | Total:  6h 32m | Avg: 26m 08s | Max: 47m 31s | Hits:  57%/1453  
      🟩 nvcc11.8           Pass: 100%/3   | Total:  1h 45m | Avg: 35m 12s | Max: 37m 29s
      🟩 nvcc12.5           Pass: 100%/98  | Total:  1d 16h | Avg: 24m 51s | Max: 53m 32s | Hits:  73%/11624 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 52m 10s | Avg: 26m 05s | Max: 26m 27s
      🟩 nvcc               Pass: 100%/116 | Total:  2d 00h | Avg: 25m 17s | Max: 53m 32s | Hits:  71%/13077 
    🟩 cxx
      🟩 Clang9             Pass: 100%/6   | Total:  2h 33m | Avg: 25m 38s | Max: 29m 46s
      🟩 Clang10            Pass: 100%/3   | Total:  1h 25m | Avg: 28m 20s | Max: 29m 46s
      🟩 Clang11            Pass: 100%/4   | Total:  1h 48m | Avg: 27m 02s | Max: 30m 41s
      🟩 Clang12            Pass: 100%/4   | Total:  1h 50m | Avg: 27m 33s | Max: 29m 02s
      🟩 Clang13            Pass: 100%/4   | Total:  1h 44m | Avg: 26m 12s | Max: 29m 35s
      🟩 Clang14            Pass: 100%/4   | Total:  1h 45m | Avg: 26m 28s | Max: 28m 55s
      🟩 Clang15            Pass: 100%/4   | Total:  1h 46m | Avg: 26m 37s | Max: 28m 33s
      🟩 Clang16            Pass: 100%/4   | Total:  1h 46m | Avg: 26m 30s | Max: 29m 56s
      🟩 Clang17            Pass: 100%/18  | Total:  5h 30m | Avg: 18m 21s | Max: 30m 26s
      🟩 GCC6               Pass: 100%/2   | Total: 48m 17s | Avg: 24m 08s | Max: 27m 44s
      🟩 GCC7               Pass: 100%/6   | Total:  2h 31m | Avg: 25m 11s | Max: 27m 40s
      🟩 GCC8               Pass: 100%/6   | Total:  2h 36m | Avg: 26m 09s | Max: 28m 44s
      🟩 GCC9               Pass: 100%/6   | Total:  2h 39m | Avg: 26m 32s | Max: 32m 34s
      🟩 GCC10              Pass: 100%/4   | Total:  1h 47m | Avg: 26m 46s | Max: 30m 48s
      🟩 GCC11              Pass: 100%/7   | Total:  3h 34m | Avg: 30m 42s | Max: 37m 29s
      🟩 GCC12              Pass: 100%/4   | Total:  1h 53m | Avg: 28m 20s | Max: 32m 25s
      🟩 GCC13              Pass: 100%/20  | Total:  5h 59m | Avg: 17m 59s | Max: 30m 46s
      🟩 Intel2023.2.0      Pass: 100%/3   | Total:  1h 37m | Avg: 32m 30s | Max: 35m 03s
      🟩 MSVC14.16          Pass: 100%/1   | Total: 47m 31s | Avg: 47m 31s | Max: 47m 31s | Hits:  57%/1453  
      🟩 MSVC14.29          Pass: 100%/2   | Total:  1h 42m | Avg: 51m 26s | Max: 51m 29s | Hits:  57%/2906  
      🟩 MSVC14.39          Pass: 100%/6   | Total:  3h 35m | Avg: 35m 59s | Max: 53m 32s | Hits:  78%/8718  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/51  | Total: 20h 10m | Avg: 23m 44s | Max: 30m 41s
      🟩 GCC                Pass: 100%/55  | Total: 21h 50m | Avg: 23m 49s | Max: 37m 29s
      🟩 Intel              Pass: 100%/3   | Total:  1h 37m | Avg: 32m 30s | Max: 35m 03s
      🟩 MSVC               Pass: 100%/9   | Total:  6h 06m | Avg: 40m 42s | Max: 53m 32s | Hits:  71%/13077 
    🟩 gpu
      🟩 v100               Pass: 100%/118 | Total:  2d 01h | Avg: 25m 17s | Max: 53m 32s | Hits:  71%/13077 
    🟩 jobs
      🟩 Build              Pass: 100%/99  | Total:  1d 22h | Avg: 27m 58s | Max: 53m 32s | Hits:  57%/8718  
      🟩 TestCPU            Pass: 100%/11  | Total:  1h 55m | Avg: 10m 31s | Max: 20m 57s | Hits:  99%/4359  
      🟩 TestGPU            Pass: 100%/8   | Total:  1h 39m | Avg: 12m 25s | Max: 15m 38s
    🟩 sm
      🟩 60;70;80;90        Pass: 100%/3   | Total:  1h 45m | Avg: 35m 12s | Max: 37m 29s
      🟩 90a                Pass: 100%/4   | Total:  1h 00m | Avg: 15m 12s | Max: 16m 08s
    🟩 std
      🟩 11                 Pass: 100%/30  | Total: 10h 25m | Avg: 20m 51s | Max: 30m 46s
      🟩 14                 Pass: 100%/34  | Total: 15h 21m | Avg: 27m 05s | Max: 51m 29s | Hits:  67%/5812  
      🟩 17                 Pass: 100%/33  | Total: 15h 04m | Avg: 27m 25s | Max: 51m 24s | Hits:  71%/4359  
      🟩 20                 Pass: 100%/21  | Total:  8h 53m | Avg: 25m 25s | Max: 53m 32s | Hits:  78%/2906  
    
  • 🟩 pycuda: Pass: 100%/1 | Total: 11m 25s | Avg: 11m 25s | Max: 11m 25s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 11m 25s | Avg: 11m 25s | Max: 11m 25s
    🟩 ctk
      🟩 12.5               Pass: 100%/1   | Total: 11m 25s | Avg: 11m 25s | Max: 11m 25s
    🟩 cudacxx
      🟩 nvcc12.5           Pass: 100%/1   | Total: 11m 25s | Avg: 11m 25s | Max: 11m 25s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 11m 25s | Avg: 11m 25s | Max: 11m 25s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 11m 25s | Avg: 11m 25s | Max: 11m 25s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 11m 25s | Avg: 11m 25s | Max: 11m 25s
    🟩 gpu
      🟩 v100               Pass: 100%/1   | Total: 11m 25s | Avg: 11m 25s | Max: 11m 25s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 11m 25s | Avg: 11m 25s | Max: 11m 25s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
pycuda

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
+/- pycuda

🏃‍ Runner counts (total jobs: 250)

# Runner
178 linux-amd64-cpu16
41 linux-amd64-gpu-v100-latest-1
16 linux-arm64-cpu16
15 windows-amd64-cpu16

@bernhardmgruber
Copy link
Contributor Author

@gevtushenko: @elstehle said he wouldn't want to merge without your approval. So we are waiting for to merge this PR.

Copy link
Contributor

🟨 CI finished in 12h 04m: Pass: 99%/250 | Total: 5d 23h | Avg: 34m 21s | Max: 1h 11m | Hits: 64%/17373
  • 🟨 cub: Pass: 99%/131 | Total: 3d 20h | Avg: 42m 22s | Max: 1h 11m | Hits: 42%/4296

    🔍 cpu: amd64 🔍
      🔍 amd64              Pass:  99%/123 | Total:  3d 13h | Avg: 41m 31s | Max:  1h 11m | Hits:  42%/4296  
      🟩 arm64              Pass: 100%/8   | Total:  7h 24m | Avg: 55m 32s | Max:  1h 03m
    🔍 ctk: 12.5 🔍
      🟩 11.1               Pass: 100%/15  | Total: 11h 23m | Avg: 45m 32s | Max: 53m 50s | Hits:  42%/716   
      🟩 11.8               Pass: 100%/3   | Total:  3h 24m | Avg:  1h 08m | Max:  1h 11m
      🔍 12.5               Pass:  99%/113 | Total:  3d 05h | Avg: 41m 16s | Max:  1h 04m | Hits:  42%/3580  
    🔍 cudacxx: nvcc12.5 🔍
      🟩 ClangCUDA17        Pass: 100%/2   | Total: 47m 19s | Avg: 23m 39s | Max: 25m 27s
      🟩 nvcc11.1           Pass: 100%/15  | Total: 11h 23m | Avg: 45m 32s | Max: 53m 50s | Hits:  42%/716   
      🟩 nvcc11.8           Pass: 100%/3   | Total:  3h 24m | Avg:  1h 08m | Max:  1h 11m
      🔍 nvcc12.5           Pass:  99%/111 | Total:  3d 04h | Avg: 41m 35s | Max:  1h 04m | Hits:  42%/3580  
    🔍 cudacxx_family: nvcc 🔍
      🟩 ClangCUDA          Pass: 100%/2   | Total: 47m 19s | Avg: 23m 39s | Max: 25m 27s
      🔍 nvcc               Pass:  99%/129 | Total:  3d 19h | Avg: 42m 40s | Max:  1h 11m | Hits:  42%/4296  
    🔍 cxx: GCC7 🔍
      🟩 Clang9             Pass: 100%/6   | Total:  4h 50m | Avg: 48m 20s | Max: 51m 37s
      🟩 Clang10            Pass: 100%/3   | Total:  2h 42m | Avg: 54m 14s | Max: 58m 05s
      🟩 Clang11            Pass: 100%/4   | Total:  3h 34m | Avg: 53m 35s | Max: 55m 54s
      🟩 Clang12            Pass: 100%/4   | Total:  3h 21m | Avg: 50m 15s | Max: 52m 27s
      🟩 Clang13            Pass: 100%/4   | Total:  3h 23m | Avg: 50m 47s | Max: 51m 44s
      🟩 Clang14            Pass: 100%/4   | Total:  3h 17m | Avg: 49m 22s | Max: 51m 21s
      🟩 Clang15            Pass: 100%/4   | Total:  3h 24m | Avg: 51m 01s | Max: 53m 13s
      🟩 Clang16            Pass: 100%/4   | Total:  3h 15m | Avg: 48m 57s | Max: 49m 20s
      🟩 Clang17            Pass: 100%/26  | Total: 13h 01m | Avg: 30m 04s | Max: 59m 36s
      🟩 GCC6               Pass: 100%/2   | Total:  1h 26m | Avg: 43m 23s | Max: 43m 38s
      🔍 GCC7               Pass:  83%/6   | Total:  4h 38m | Avg: 46m 23s | Max: 51m 23s
      🟩 GCC8               Pass: 100%/6   | Total:  4h 53m | Avg: 48m 57s | Max: 52m 54s
      🟩 GCC9               Pass: 100%/6   | Total:  4h 43m | Avg: 47m 19s | Max: 53m 01s
      🟩 GCC10              Pass: 100%/4   | Total:  3h 18m | Avg: 49m 42s | Max: 50m 11s
      🟩 GCC11              Pass: 100%/7   | Total:  6h 44m | Avg: 57m 49s | Max:  1h 11m
      🟩 GCC12              Pass: 100%/4   | Total:  3h 24m | Avg: 51m 10s | Max: 54m 33s
      🟩 GCC13              Pass: 100%/28  | Total: 13h 35m | Avg: 29m 07s | Max:  1h 03m
      🟩 Intel2023.2.0      Pass: 100%/3   | Total:  2h 43m | Avg: 54m 31s | Max: 55m 41s
      🟩 MSVC14.16          Pass: 100%/1   | Total: 53m 50s | Avg: 53m 50s | Max: 53m 50s | Hits:  42%/716   
      🟩 MSVC14.29          Pass: 100%/2   | Total:  2h 08m | Avg:  1h 04m | Max:  1h 04m | Hits:  42%/1432  
      🟩 MSVC14.39          Pass: 100%/3   | Total:  3h 09m | Avg:  1h 03m | Max:  1h 04m | Hits:  42%/2148  
    🔍 cxx_family: GCC 🔍
      🟩 Clang              Pass: 100%/59  | Total:  1d 16h | Avg: 41m 32s | Max: 59m 36s
      🔍 GCC                Pass:  98%/63  | Total:  1d 18h | Avg: 40m 44s | Max:  1h 11m
      🟩 Intel              Pass: 100%/3   | Total:  2h 43m | Avg: 54m 31s | Max: 55m 41s
      🟩 MSVC               Pass: 100%/6   | Total:  6h 11m | Avg:  1h 01m | Max:  1h 04m | Hits:  42%/4296  
    🔍 jobs: Build 🔍
      🔍 Build              Pass:  98%/99  | Total:  3d 10h | Avg: 49m 56s | Max:  1h 11m | Hits:  42%/4296  
      🟩 DeviceLaunch       Pass: 100%/8   | Total:  2h 27m | Avg: 18m 27s | Max: 19m 48s
      🟩 GraphCapture       Pass: 100%/8   | Total:  1h 59m | Avg: 14m 58s | Max: 16m 30s
      🟩 HostLaunch         Pass: 100%/8   | Total:  2h 21m | Avg: 17m 43s | Max: 19m 41s
      🟩 TestGPU            Pass: 100%/8   | Total:  3h 18m | Avg: 24m 46s | Max: 32m 55s
    🔍 std: 11 🔍
      🔍 11                 Pass:  97%/34  | Total:  1d 00h | Avg: 42m 22s | Max:  1h 09m
      🟩 14                 Pass: 100%/37  | Total:  1d 02h | Avg: 43m 45s | Max:  1h 11m | Hits:  42%/2148  
      🟩 17                 Pass: 100%/36  | Total:  1d 01h | Avg: 42m 53s | Max:  1h 04m | Hits:  42%/1432  
      🟩 20                 Pass: 100%/24  | Total: 15h 47m | Avg: 39m 28s | Max:  1h 03m | Hits:  42%/716   
    🟨 gpu
      🟨 v100               Pass:  99%/131 | Total:  3d 20h | Avg: 42m 22s | Max:  1h 11m | Hits:  42%/4296  
    🟩 sm
      🟩 60;70;80;90        Pass: 100%/3   | Total:  3h 24m | Avg:  1h 08m | Max:  1h 11m
      🟩 90a                Pass: 100%/4   | Total:  1h 25m | Avg: 21m 23s | Max: 22m 06s
    
  • 🟨 thrust: Pass: 99%/118 | Total: 2d 02h | Avg: 25m 38s | Max: 58m 54s | Hits: 71%/13077

    🔍 cpu: amd64 🔍
      🔍 amd64              Pass:  99%/110 | Total:  1d 22h | Avg: 25m 36s | Max: 58m 54s | Hits:  71%/13077 
      🟩 arm64              Pass: 100%/8   | Total:  3h 28m | Avg: 26m 05s | Max: 29m 56s
    🔍 ctk: 12.5 🔍
      🟩 11.1               Pass: 100%/15  | Total:  6h 39m | Avg: 26m 36s | Max: 49m 19s | Hits:  57%/1453  
      🟩 11.8               Pass: 100%/3   | Total:  1h 40m | Avg: 33m 32s | Max: 36m 35s
      🔍 12.5               Pass:  99%/100 | Total:  1d 18h | Avg: 25m 15s | Max: 58m 54s | Hits:  73%/11624 
    🔍 cudacxx: nvcc12.5 🔍
      🟩 ClangCUDA17        Pass: 100%/2   | Total: 54m 18s | Avg: 27m 09s | Max: 27m 14s
      🟩 nvcc11.1           Pass: 100%/15  | Total:  6h 39m | Avg: 26m 36s | Max: 49m 19s | Hits:  57%/1453  
      🟩 nvcc11.8           Pass: 100%/3   | Total:  1h 40m | Avg: 33m 32s | Max: 36m 35s
      🔍 nvcc12.5           Pass:  98%/98  | Total:  1d 17h | Avg: 25m 13s | Max: 58m 54s | Hits:  73%/11624 
    🔍 cudacxx_family: nvcc 🔍
      🟩 ClangCUDA          Pass: 100%/2   | Total: 54m 18s | Avg: 27m 09s | Max: 27m 14s
      🔍 nvcc               Pass:  99%/116 | Total:  2d 01h | Avg: 25m 36s | Max: 58m 54s | Hits:  71%/13077 
    🔍 cxx: GCC13 🔍
      🟩 Clang9             Pass: 100%/6   | Total:  2h 35m | Avg: 25m 50s | Max: 30m 48s
      🟩 Clang10            Pass: 100%/3   | Total:  1h 18m | Avg: 26m 05s | Max: 29m 47s
      🟩 Clang11            Pass: 100%/4   | Total:  1h 41m | Avg: 25m 25s | Max: 27m 28s
      🟩 Clang12            Pass: 100%/4   | Total:  1h 44m | Avg: 26m 12s | Max: 28m 15s
      🟩 Clang13            Pass: 100%/4   | Total:  1h 44m | Avg: 26m 07s | Max: 29m 02s
      🟩 Clang14            Pass: 100%/4   | Total:  1h 45m | Avg: 26m 24s | Max: 30m 38s
      🟩 Clang15            Pass: 100%/4   | Total:  1h 44m | Avg: 26m 08s | Max: 27m 32s
      🟩 Clang16            Pass: 100%/4   | Total:  1h 46m | Avg: 26m 42s | Max: 30m 22s
      🟩 Clang17            Pass: 100%/18  | Total:  5h 53m | Avg: 19m 37s | Max: 28m 20s
      🟩 GCC6               Pass: 100%/2   | Total: 46m 50s | Avg: 23m 25s | Max: 26m 17s
      🟩 GCC7               Pass: 100%/6   | Total:  2h 35m | Avg: 25m 57s | Max: 30m 11s
      🟩 GCC8               Pass: 100%/6   | Total:  2h 34m | Avg: 25m 49s | Max: 29m 14s
      🟩 GCC9               Pass: 100%/6   | Total:  2h 38m | Avg: 26m 26s | Max: 28m 57s
      🟩 GCC10              Pass: 100%/4   | Total:  1h 52m | Avg: 28m 02s | Max: 32m 49s
      🟩 GCC11              Pass: 100%/7   | Total:  3h 37m | Avg: 31m 04s | Max: 36m 35s
      🟩 GCC12              Pass: 100%/4   | Total:  1h 51m | Avg: 27m 49s | Max: 30m 49s
      🔍 GCC13              Pass:  95%/20  | Total:  6h 14m | Avg: 18m 43s | Max: 32m 23s
      🟩 Intel2023.2.0      Pass: 100%/3   | Total:  1h 35m | Avg: 31m 41s | Max: 34m 19s
      🟩 MSVC14.16          Pass: 100%/1   | Total: 49m 19s | Avg: 49m 19s | Max: 49m 19s | Hits:  57%/1453  
      🟩 MSVC14.29          Pass: 100%/2   | Total:  1h 40m | Avg: 50m 28s | Max: 51m 38s | Hits:  57%/2906  
      🟩 MSVC14.39          Pass: 100%/6   | Total:  3h 54m | Avg: 39m 01s | Max: 58m 54s | Hits:  78%/8718  
    🔍 cxx_family: GCC 🔍
      🟩 Clang              Pass: 100%/51  | Total: 20h 14m | Avg: 23m 48s | Max: 30m 48s
      🔍 GCC                Pass:  98%/55  | Total: 22h 11m | Avg: 24m 12s | Max: 36m 35s
      🟩 Intel              Pass: 100%/3   | Total:  1h 35m | Avg: 31m 41s | Max: 34m 19s
      🟩 MSVC               Pass: 100%/9   | Total:  6h 24m | Avg: 42m 42s | Max: 58m 54s | Hits:  71%/13077 
    🔍 jobs: Build 🔍
      🔍 Build              Pass:  98%/99  | Total:  1d 22h | Avg: 27m 58s | Max: 58m 54s | Hits:  57%/8718  
      🟩 TestCPU            Pass: 100%/11  | Total:  1h 55m | Avg: 10m 31s | Max: 22m 04s | Hits:  99%/4359  
      🟩 TestGPU            Pass: 100%/8   | Total:  2h 20m | Avg: 17m 37s | Max: 19m 48s
    🔍 sm: 90a 🔍
      🟩 60;70;80;90        Pass: 100%/3   | Total:  1h 40m | Avg: 33m 32s | Max: 36m 35s
      🔍 90a                Pass:  75%/4   | Total: 52m 23s | Avg: 13m 05s | Max: 15m 16s
    🔍 std: 20 🔍
      🟩 11                 Pass: 100%/30  | Total: 10h 33m | Avg: 21m 06s | Max: 29m 29s
      🟩 14                 Pass: 100%/34  | Total: 15h 25m | Avg: 27m 12s | Max: 57m 04s | Hits:  67%/5812  
      🟩 17                 Pass: 100%/33  | Total: 15h 15m | Avg: 27m 45s | Max: 53m 38s | Hits:  71%/4359  
      🔍 20                 Pass:  95%/21  | Total:  9h 11m | Avg: 26m 15s | Max: 58m 54s | Hits:  78%/2906  
    🟨 gpu
      🟨 v100               Pass:  99%/118 | Total:  2d 02h | Avg: 25m 38s | Max: 58m 54s | Hits:  71%/13077 
    
  • 🟩 pycuda: Pass: 100%/1 | Total: 13m 13s | Avg: 13m 13s | Max: 13m 13s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 13m 13s | Avg: 13m 13s | Max: 13m 13s
    🟩 ctk
      🟩 12.5               Pass: 100%/1   | Total: 13m 13s | Avg: 13m 13s | Max: 13m 13s
    🟩 cudacxx
      🟩 nvcc12.5           Pass: 100%/1   | Total: 13m 13s | Avg: 13m 13s | Max: 13m 13s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 13m 13s | Avg: 13m 13s | Max: 13m 13s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 13m 13s | Avg: 13m 13s | Max: 13m 13s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 13m 13s | Avg: 13m 13s | Max: 13m 13s
    🟩 gpu
      🟩 v100               Pass: 100%/1   | Total: 13m 13s | Avg: 13m 13s | Max: 13m 13s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 13m 13s | Avg: 13m 13s | Max: 13m 13s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
pycuda

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
+/- pycuda

🏃‍ Runner counts (total jobs: 250)

# Runner
178 linux-amd64-cpu16
41 linux-amd64-gpu-v100-latest-1
16 linux-arm64-cpu16
15 windows-amd64-cpu16

Copy link
Contributor

🟨 CI finished in 4h 01m: Pass: 98%/250 | Total: 5d 21h | Avg: 33m 57s | Max: 1h 17m | Hits: 65%/16657
  • 🟨 cub: Pass: 96%/131 | Total: 3d 19h | Avg: 42m 06s | Max: 1h 17m | Hits: 42%/3580

    🔍 cpu: amd64 🔍
      🔍 amd64              Pass:  96%/123 | Total:  3d 12h | Avg: 41m 16s | Max:  1h 17m | Hits:  42%/3580  
      🟩 arm64              Pass: 100%/8   | Total:  7h 18m | Avg: 54m 46s | Max: 59m 14s
    🔍 ctk: 12.5 🔍
      🟩 11.1               Pass: 100%/15  | Total: 11h 09m | Avg: 44m 37s | Max: 55m 04s | Hits:  42%/716   
      🟩 11.8               Pass: 100%/3   | Total:  3h 20m | Avg:  1h 06m | Max:  1h 09m
      🔍 12.5               Pass:  96%/113 | Total:  3d 05h | Avg: 41m 06s | Max:  1h 17m | Hits:  42%/2864  
    🔍 cudacxx: nvcc12.5 🔍
      🟩 ClangCUDA17        Pass: 100%/2   | Total: 46m 08s | Avg: 23m 04s | Max: 23m 54s
      🟩 nvcc11.1           Pass: 100%/15  | Total: 11h 09m | Avg: 44m 37s | Max: 55m 04s | Hits:  42%/716   
      🟩 nvcc11.8           Pass: 100%/3   | Total:  3h 20m | Avg:  1h 06m | Max:  1h 09m
      🔍 nvcc12.5           Pass:  96%/111 | Total:  3d 04h | Avg: 41m 26s | Max:  1h 17m | Hits:  42%/2864  
    🔍 cudacxx_family: nvcc 🔍
      🟩 ClangCUDA          Pass: 100%/2   | Total: 46m 08s | Avg: 23m 04s | Max: 23m 54s
      🔍 nvcc               Pass:  96%/129 | Total:  3d 19h | Avg: 42m 23s | Max:  1h 17m | Hits:  42%/3580  
    🟨 cxx
      🟩 Clang9             Pass: 100%/6   | Total:  4h 33m | Avg: 45m 39s | Max: 49m 23s
      🟩 Clang10            Pass: 100%/3   | Total:  2h 25m | Avg: 48m 34s | Max: 48m 56s
      🟩 Clang11            Pass: 100%/4   | Total:  3h 18m | Avg: 49m 32s | Max: 54m 03s
      🟩 Clang12            Pass: 100%/4   | Total:  3h 15m | Avg: 48m 56s | Max: 51m 04s
      🟩 Clang13            Pass: 100%/4   | Total:  3h 16m | Avg: 49m 09s | Max: 50m 30s
      🟩 Clang14            Pass: 100%/4   | Total:  3h 17m | Avg: 49m 23s | Max: 50m 30s
      🟩 Clang15            Pass: 100%/4   | Total:  3h 28m | Avg: 52m 09s | Max: 56m 34s
      🟩 Clang16            Pass: 100%/4   | Total:  3h 12m | Avg: 48m 06s | Max: 49m 05s
      🟨 Clang17            Pass:  96%/26  | Total: 12h 57m | Avg: 29m 53s | Max: 59m 14s
      🟩 GCC6               Pass: 100%/2   | Total:  1h 27m | Avg: 43m 40s | Max: 44m 27s
      🟨 GCC7               Pass:  83%/6   | Total:  4h 39m | Avg: 46m 37s | Max: 50m 54s
      🟩 GCC8               Pass: 100%/6   | Total:  4h 42m | Avg: 47m 09s | Max: 53m 13s
      🟩 GCC9               Pass: 100%/6   | Total:  4h 49m | Avg: 48m 15s | Max: 53m 35s
      🟩 GCC10              Pass: 100%/4   | Total:  3h 22m | Avg: 50m 30s | Max: 52m 01s
      🟩 GCC11              Pass: 100%/7   | Total:  6h 41m | Avg: 57m 18s | Max:  1h 09m
      🟩 GCC12              Pass: 100%/4   | Total:  3h 31m | Avg: 52m 57s | Max: 55m 23s
      🟨 GCC13              Pass:  96%/28  | Total: 14h 31m | Avg: 31m 07s | Max:  1h 17m
      🟩 Intel2023.2.0      Pass: 100%/3   | Total:  2h 34m | Avg: 51m 35s | Max: 53m 12s
      🟩 MSVC14.16          Pass: 100%/1   | Total: 55m 04s | Avg: 55m 04s | Max: 55m 04s | Hits:  42%/716   
      🟩 MSVC14.29          Pass: 100%/2   | Total:  1h 58m | Avg: 59m 18s | Max:  1h 00m | Hits:  42%/1432  
      🟨 MSVC14.39          Pass:  66%/3   | Total:  2h 55m | Avg: 58m 26s | Max:  1h 01m | Hits:  42%/1432  
    🟨 cxx_family
      🟨 Clang              Pass:  98%/59  | Total:  1d 15h | Avg: 40m 26s | Max: 59m 14s
      🟨 GCC                Pass:  96%/63  | Total:  1d 19h | Avg: 41m 40s | Max:  1h 17m
      🟩 Intel              Pass: 100%/3   | Total:  2h 34m | Avg: 51m 35s | Max: 53m 12s
      🟨 MSVC               Pass:  83%/6   | Total:  5h 48m | Avg: 58m 09s | Max:  1h 01m | Hits:  42%/3580  
    🟨 jobs
      🟨 Build              Pass:  97%/99  | Total:  3d 08h | Avg: 48m 51s | Max:  1h 09m | Hits:  42%/3580  
      🟨 DeviceLaunch       Pass:  75%/8   | Total:  2h 04m | Avg: 15m 32s | Max: 26m 32s
      🟩 GraphCapture       Pass: 100%/8   | Total:  2h 18m | Avg: 17m 18s | Max: 23m 53s
      🟩 HostLaunch         Pass: 100%/8   | Total:  3h 30m | Avg: 26m 16s | Max:  1h 17m
      🟩 TestGPU            Pass: 100%/8   | Total:  3h 25m | Avg: 25m 42s | Max: 31m 35s
    🟨 std
      🟨 11                 Pass:  94%/34  | Total: 23h 18m | Avg: 41m 08s | Max:  1h 03m
      🟨 14                 Pass:  97%/37  | Total:  1d 02h | Avg: 43m 19s | Max:  1h 09m | Hits:  42%/2148  
      🟩 17                 Pass: 100%/36  | Total:  1d 02h | Avg: 44m 14s | Max:  1h 17m | Hits:  42%/1432  
      🟨 20                 Pass:  95%/24  | Total: 15h 21m | Avg: 38m 23s | Max: 57m 04s
    🟨 gpu
      🟨 v100               Pass:  96%/131 | Total:  3d 19h | Avg: 42m 06s | Max:  1h 17m | Hits:  42%/3580  
    🟩 sm
      🟩 60;70;80;90        Pass: 100%/3   | Total:  3h 20m | Avg:  1h 06m | Max:  1h 09m
      🟩 90a                Pass: 100%/4   | Total:  1h 22m | Avg: 20m 32s | Max: 21m 09s
    
  • 🟨 thrust: Pass: 99%/118 | Total: 2d 01h | Avg: 25m 07s | Max: 57m 18s | Hits: 71%/13077

    🔍 cpu: amd64 🔍
      🔍 amd64              Pass:  99%/110 | Total:  1d 22h | Avg: 25m 05s | Max: 57m 18s | Hits:  71%/13077 
      🟩 arm64              Pass: 100%/8   | Total:  3h 24m | Avg: 25m 31s | Max: 28m 57s
    🔍 ctk: 12.5 🔍
      🟩 11.1               Pass: 100%/15  | Total:  6h 30m | Avg: 26m 02s | Max: 50m 42s | Hits:  57%/1453  
      🟩 11.8               Pass: 100%/3   | Total:  1h 41m | Avg: 33m 41s | Max: 36m 23s
      🔍 12.5               Pass:  99%/100 | Total:  1d 17h | Avg: 24m 43s | Max: 57m 18s | Hits:  73%/11624 
    🔍 cudacxx: nvcc12.5 🔍
      🟩 ClangCUDA17        Pass: 100%/2   | Total: 52m 15s | Avg: 26m 07s | Max: 27m 40s
      🟩 nvcc11.1           Pass: 100%/15  | Total:  6h 30m | Avg: 26m 02s | Max: 50m 42s | Hits:  57%/1453  
      🟩 nvcc11.8           Pass: 100%/3   | Total:  1h 41m | Avg: 33m 41s | Max: 36m 23s
      🔍 nvcc12.5           Pass:  98%/98  | Total:  1d 16h | Avg: 24m 42s | Max: 57m 18s | Hits:  73%/11624 
    🔍 cudacxx_family: nvcc 🔍
      🟩 ClangCUDA          Pass: 100%/2   | Total: 52m 15s | Avg: 26m 07s | Max: 27m 40s
      🔍 nvcc               Pass:  99%/116 | Total:  2d 00h | Avg: 25m 06s | Max: 57m 18s | Hits:  71%/13077 
    🔍 cxx: GCC13 🔍
      🟩 Clang9             Pass: 100%/6   | Total:  2h 27m | Avg: 24m 33s | Max: 28m 12s
      🟩 Clang10            Pass: 100%/3   | Total:  1h 20m | Avg: 26m 42s | Max: 29m 22s
      🟩 Clang11            Pass: 100%/4   | Total:  1h 42m | Avg: 25m 38s | Max: 28m 20s
      🟩 Clang12            Pass: 100%/4   | Total:  1h 43m | Avg: 25m 57s | Max: 30m 03s
      🟩 Clang13            Pass: 100%/4   | Total:  1h 44m | Avg: 26m 10s | Max: 28m 51s
      🟩 Clang14            Pass: 100%/4   | Total:  1h 44m | Avg: 26m 09s | Max: 28m 15s
      🟩 Clang15            Pass: 100%/4   | Total:  1h 45m | Avg: 26m 16s | Max: 28m 57s
      🟩 Clang16            Pass: 100%/4   | Total:  1h 42m | Avg: 25m 44s | Max: 28m 02s
      🟩 Clang17            Pass: 100%/18  | Total:  5h 47m | Avg: 19m 18s | Max: 27m 41s
      🟩 GCC6               Pass: 100%/2   | Total: 45m 17s | Avg: 22m 38s | Max: 25m 59s
      🟩 GCC7               Pass: 100%/6   | Total:  2h 38m | Avg: 26m 21s | Max: 33m 26s
      🟩 GCC8               Pass: 100%/6   | Total:  2h 35m | Avg: 25m 55s | Max: 29m 12s
      🟩 GCC9               Pass: 100%/6   | Total:  2h 32m | Avg: 25m 23s | Max: 30m 22s
      🟩 GCC10              Pass: 100%/4   | Total:  1h 50m | Avg: 27m 41s | Max: 30m 10s
      🟩 GCC11              Pass: 100%/7   | Total:  3h 35m | Avg: 30m 50s | Max: 36m 23s
      🟩 GCC12              Pass: 100%/4   | Total:  1h 55m | Avg: 28m 58s | Max: 31m 16s
      🔍 GCC13              Pass:  95%/20  | Total:  5h 51m | Avg: 17m 34s | Max: 32m 40s
      🟩 Intel2023.2.0      Pass: 100%/3   | Total:  1h 34m | Avg: 31m 25s | Max: 34m 34s
      🟩 MSVC14.16          Pass: 100%/1   | Total: 50m 42s | Avg: 50m 42s | Max: 50m 42s | Hits:  57%/1453  
      🟩 MSVC14.29          Pass: 100%/2   | Total:  1h 36m | Avg: 48m 20s | Max: 48m 58s | Hits:  57%/2906  
      🟩 MSVC14.39          Pass: 100%/6   | Total:  3h 39m | Avg: 36m 32s | Max: 57m 18s | Hits:  78%/8718  
    🔍 cxx_family: GCC 🔍
      🟩 Clang              Pass: 100%/51  | Total: 19h 58m | Avg: 23m 30s | Max: 30m 03s
      🔍 GCC                Pass:  98%/55  | Total: 21h 45m | Avg: 23m 43s | Max: 36m 23s
      🟩 Intel              Pass: 100%/3   | Total:  1h 34m | Avg: 31m 25s | Max: 34m 34s
      🟩 MSVC               Pass: 100%/9   | Total:  6h 06m | Avg: 40m 43s | Max: 57m 18s | Hits:  71%/13077 
    🔍 jobs: Build 🔍
      🔍 Build              Pass:  98%/99  | Total:  1d 21h | Avg: 27m 39s | Max: 57m 18s | Hits:  57%/8718  
      🟩 TestCPU            Pass: 100%/11  | Total:  1h 50m | Avg: 10m 04s | Max: 19m 44s | Hits:  99%/4359  
      🟩 TestGPU            Pass: 100%/8   | Total:  1h 56m | Avg: 14m 33s | Max: 22m 02s
    🔍 sm: 90a 🔍
      🟩 60;70;80;90        Pass: 100%/3   | Total:  1h 41m | Avg: 33m 41s | Max: 36m 23s
      🔍 90a                Pass:  75%/4   | Total: 53m 25s | Avg: 13m 21s | Max: 15m 49s
    🔍 std: 20 🔍
      🟩 11                 Pass: 100%/30  | Total: 10h 21m | Avg: 20m 43s | Max: 29m 18s
      🟩 14                 Pass: 100%/34  | Total: 15h 27m | Avg: 27m 17s | Max: 57m 18s | Hits:  67%/5812  
      🟩 17                 Pass: 100%/33  | Total: 14h 55m | Avg: 27m 07s | Max: 49m 38s | Hits:  71%/4359  
      🔍 20                 Pass:  95%/21  | Total:  8h 40m | Avg: 24m 47s | Max: 54m 28s | Hits:  78%/2906  
    🟨 gpu
      🟨 v100               Pass:  99%/118 | Total:  2d 01h | Avg: 25m 07s | Max: 57m 18s | Hits:  71%/13077 
    
  • 🟩 pycuda: Pass: 100%/1 | Total: 10m 59s | Avg: 10m 59s | Max: 10m 59s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 10m 59s | Avg: 10m 59s | Max: 10m 59s
    🟩 ctk
      🟩 12.5               Pass: 100%/1   | Total: 10m 59s | Avg: 10m 59s | Max: 10m 59s
    🟩 cudacxx
      🟩 nvcc12.5           Pass: 100%/1   | Total: 10m 59s | Avg: 10m 59s | Max: 10m 59s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 10m 59s | Avg: 10m 59s | Max: 10m 59s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 10m 59s | Avg: 10m 59s | Max: 10m 59s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 10m 59s | Avg: 10m 59s | Max: 10m 59s
    🟩 gpu
      🟩 v100               Pass: 100%/1   | Total: 10m 59s | Avg: 10m 59s | Max: 10m 59s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 10m 59s | Avg: 10m 59s | Max: 10m 59s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
pycuda

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
+/- pycuda

🏃‍ Runner counts (total jobs: 250)

# Runner
178 linux-amd64-cpu16
41 linux-amd64-gpu-v100-latest-1
16 linux-arm64-cpu16
15 windows-amd64-cpu16

cub/cub/util_device.cuh Outdated Show resolved Hide resolved
Copy link
Contributor

🟨 CI finished in 7h 42m: Pass: 99%/251 | Total: 5d 23h | Avg: 34m 12s | Max: 1h 05m | Hits: 64%/17373
  • 🟨 cub: Pass: 99%/132 | Total: 3d 20h | Avg: 41m 59s | Max: 1h 05m | Hits: 42%/4296

    🔍 cpu: amd64 🔍
      🔍 amd64              Pass:  99%/124 | Total:  3d 12h | Avg: 40m 59s | Max:  1h 04m | Hits:  42%/4296  
      🟩 arm64              Pass: 100%/8   | Total:  7h 39m | Avg: 57m 28s | Max:  1h 05m
    🔍 ctk: 12.5 🔍
      🟩 11.1               Pass: 100%/15  | Total: 11h 12m | Avg: 44m 48s | Max: 54m 14s | Hits:  42%/716   
      🟩 11.8               Pass: 100%/3   | Total:  3h 12m | Avg:  1h 04m | Max:  1h 04m
      🔍 12.5               Pass:  99%/114 | Total:  3d 05h | Avg: 41m 02s | Max:  1h 05m | Hits:  42%/3580  
    🔍 cudacxx: nvcc12.5 🔍
      🟩 ClangCUDA17        Pass: 100%/2   | Total: 47m 53s | Avg: 23m 56s | Max: 25m 02s
      🟩 nvcc11.1           Pass: 100%/15  | Total: 11h 12m | Avg: 44m 48s | Max: 54m 14s | Hits:  42%/716   
      🟩 nvcc11.8           Pass: 100%/3   | Total:  3h 12m | Avg:  1h 04m | Max:  1h 04m
      🔍 nvcc12.5           Pass:  99%/112 | Total:  3d 05h | Avg: 41m 20s | Max:  1h 05m | Hits:  42%/3580  
    🔍 cudacxx_family: nvcc 🔍
      🟩 ClangCUDA          Pass: 100%/2   | Total: 47m 53s | Avg: 23m 56s | Max: 25m 02s
      🔍 nvcc               Pass:  99%/130 | Total:  3d 19h | Avg: 42m 16s | Max:  1h 05m | Hits:  42%/4296  
    🔍 cxx: Clang17 🔍
      🟩 Clang9             Pass: 100%/6   | Total:  4h 49m | Avg: 48m 15s | Max: 54m 39s
      🟩 Clang10            Pass: 100%/3   | Total:  2h 40m | Avg: 53m 39s | Max:  1h 00m
      🟩 Clang11            Pass: 100%/4   | Total:  3h 26m | Avg: 51m 30s | Max: 53m 44s
      🟩 Clang12            Pass: 100%/4   | Total:  3h 20m | Avg: 50m 14s | Max: 51m 04s
      🟩 Clang13            Pass: 100%/4   | Total:  3h 16m | Avg: 49m 04s | Max: 50m 23s
      🟩 Clang14            Pass: 100%/4   | Total:  3h 22m | Avg: 50m 40s | Max: 52m 19s
      🟩 Clang15            Pass: 100%/4   | Total:  3h 29m | Avg: 52m 15s | Max: 54m 03s
      🟩 Clang16            Pass: 100%/4   | Total:  3h 19m | Avg: 49m 50s | Max: 54m 02s
      🔍 Clang17            Pass:  96%/26  | Total: 12h 55m | Avg: 29m 48s | Max:  1h 03m
      🟩 GCC6               Pass: 100%/2   | Total:  1h 32m | Avg: 46m 08s | Max: 48m 19s
      🟩 GCC7               Pass: 100%/6   | Total:  4h 36m | Avg: 46m 00s | Max: 50m 38s
      🟩 GCC8               Pass: 100%/6   | Total:  4h 35m | Avg: 45m 52s | Max: 48m 40s
      🟩 GCC9               Pass: 100%/6   | Total:  4h 55m | Avg: 49m 17s | Max: 56m 14s
      🟩 GCC10              Pass: 100%/4   | Total:  3h 26m | Avg: 51m 41s | Max: 54m 50s
      🟩 GCC11              Pass: 100%/7   | Total:  6h 27m | Avg: 55m 23s | Max:  1h 04m
      🟩 GCC12              Pass: 100%/4   | Total:  3h 20m | Avg: 50m 11s | Max: 54m 01s
      🟩 GCC13              Pass: 100%/29  | Total: 14h 00m | Avg: 28m 59s | Max:  1h 05m
      🟩 Intel2023.2.0      Pass: 100%/3   | Total:  2h 43m | Avg: 54m 25s | Max: 56m 59s
      🟩 MSVC14.16          Pass: 100%/1   | Total: 54m 14s | Avg: 54m 14s | Max: 54m 14s | Hits:  42%/716   
      🟩 MSVC14.29          Pass: 100%/2   | Total:  2h 01m | Avg:  1h 00m | Max:  1h 02m | Hits:  42%/1432  
      🟩 MSVC14.39          Pass: 100%/3   | Total:  3h 09m | Avg:  1h 03m | Max:  1h 03m | Hits:  42%/2148  
    🔍 cxx_family: Clang 🔍
      🔍 Clang              Pass:  98%/59  | Total:  1d 16h | Avg: 41m 21s | Max:  1h 03m
      🟩 GCC                Pass: 100%/64  | Total:  1d 18h | Avg: 40m 14s | Max:  1h 05m
      🟩 Intel              Pass: 100%/3   | Total:  2h 43m | Avg: 54m 25s | Max: 56m 59s
      🟩 MSVC               Pass: 100%/6   | Total:  6h 04m | Avg:  1h 00m | Max:  1h 03m | Hits:  42%/4296  
    🔍 jobs: GraphCapture 🔍
      🟩 Build              Pass: 100%/99  | Total:  3d 10h | Avg: 49m 45s | Max:  1h 05m | Hits:  42%/4296  
      🟩 DeviceLaunch       Pass: 100%/8   | Total:  2h 22m | Avg: 17m 48s | Max: 19m 52s
      🔍 GraphCapture       Pass:  87%/8   | Total:  1h 50m | Avg: 13m 49s | Max: 16m 48s
      🟩 HostLaunch         Pass: 100%/8   | Total:  2h 17m | Avg: 17m 08s | Max: 19m 02s
      🟩 SmallGMem          Pass: 100%/1   | Total: 31m 16s | Avg: 31m 16s | Max: 31m 16s
      🟩 TestGPU            Pass: 100%/8   | Total:  3h 15m | Avg: 24m 25s | Max: 30m 55s
    🔍 std: 14 🔍
      🟩 11                 Pass: 100%/34  | Total: 23h 28m | Avg: 41m 25s | Max:  1h 02m
      🔍 14                 Pass:  97%/37  | Total:  1d 02h | Avg: 42m 57s | Max:  1h 04m | Hits:  42%/2148  
      🟩 17                 Pass: 100%/37  | Total:  1d 02h | Avg: 43m 11s | Max:  1h 04m | Hits:  42%/1432  
      🟩 20                 Pass: 100%/24  | Total: 15h 46m | Avg: 39m 26s | Max:  1h 05m | Hits:  42%/716   
    🟨 gpu
      🟨 v100               Pass:  99%/132 | Total:  3d 20h | Avg: 41m 59s | Max:  1h 05m | Hits:  42%/4296  
    🟩 sm
      🟩 60;70;80;90        Pass: 100%/3   | Total:  3h 12m | Avg:  1h 04m | Max:  1h 04m
      🟩 90a                Pass: 100%/4   | Total:  1h 21m | Avg: 20m 26s | Max: 21m 19s
    
  • 🟨 thrust: Pass: 99%/118 | Total: 2d 02h | Avg: 25m 40s | Max: 56m 46s | Hits: 71%/13077

    🔍 cpu: amd64 🔍
      🔍 amd64              Pass:  99%/110 | Total:  1d 23h | Avg: 25m 41s | Max: 56m 46s | Hits:  71%/13077 
      🟩 arm64              Pass: 100%/8   | Total:  3h 24m | Avg: 25m 31s | Max: 28m 40s
    🔍 ctk: 12.5 🔍
      🟩 11.1               Pass: 100%/15  | Total:  6h 34m | Avg: 26m 16s | Max: 56m 19s | Hits:  57%/1453  
      🟩 11.8               Pass: 100%/3   | Total:  1h 44m | Avg: 34m 40s | Max: 39m 16s
      🔍 12.5               Pass:  99%/100 | Total:  1d 18h | Avg: 25m 19s | Max: 56m 46s | Hits:  73%/11624 
    🔍 cudacxx: nvcc12.5 🔍
      🟩 ClangCUDA17        Pass: 100%/2   | Total: 52m 45s | Avg: 26m 22s | Max: 27m 20s
      🟩 nvcc11.1           Pass: 100%/15  | Total:  6h 34m | Avg: 26m 16s | Max: 56m 19s | Hits:  57%/1453  
      🟩 nvcc11.8           Pass: 100%/3   | Total:  1h 44m | Avg: 34m 40s | Max: 39m 16s
      🔍 nvcc12.5           Pass:  98%/98  | Total:  1d 17h | Avg: 25m 17s | Max: 56m 46s | Hits:  73%/11624 
    🔍 cudacxx_family: nvcc 🔍
      🟩 ClangCUDA          Pass: 100%/2   | Total: 52m 45s | Avg: 26m 22s | Max: 27m 20s
      🔍 nvcc               Pass:  99%/116 | Total:  2d 01h | Avg: 25m 39s | Max: 56m 46s | Hits:  71%/13077 
    🔍 cxx: GCC13 🔍
      🟩 Clang9             Pass: 100%/6   | Total:  2h 29m | Avg: 24m 54s | Max: 28m 52s
      🟩 Clang10            Pass: 100%/3   | Total:  1h 26m | Avg: 28m 55s | Max: 32m 23s
      🟩 Clang11            Pass: 100%/4   | Total:  1h 44m | Avg: 26m 05s | Max: 29m 15s
      🟩 Clang12            Pass: 100%/4   | Total:  1h 47m | Avg: 26m 47s | Max: 30m 23s
      🟩 Clang13            Pass: 100%/4   | Total:  1h 45m | Avg: 26m 22s | Max: 28m 20s
      🟩 Clang14            Pass: 100%/4   | Total:  1h 48m | Avg: 27m 04s | Max: 30m 22s
      🟩 Clang15            Pass: 100%/4   | Total:  1h 46m | Avg: 26m 34s | Max: 29m 27s
      🟩 Clang16            Pass: 100%/4   | Total:  1h 43m | Avg: 25m 54s | Max: 28m 48s
      🟩 Clang17            Pass: 100%/18  | Total:  5h 57m | Avg: 19m 50s | Max: 29m 44s
      🟩 GCC6               Pass: 100%/2   | Total: 46m 38s | Avg: 23m 19s | Max: 24m 46s
      🟩 GCC7               Pass: 100%/6   | Total:  2h 32m | Avg: 25m 23s | Max: 30m 26s
      🟩 GCC8               Pass: 100%/6   | Total:  2h 31m | Avg: 25m 15s | Max: 28m 48s
      🟩 GCC9               Pass: 100%/6   | Total:  2h 38m | Avg: 26m 20s | Max: 30m 02s
      🟩 GCC10              Pass: 100%/4   | Total:  1h 54m | Avg: 28m 38s | Max: 32m 18s
      🟩 GCC11              Pass: 100%/7   | Total:  3h 40m | Avg: 31m 28s | Max: 39m 16s
      🟩 GCC12              Pass: 100%/4   | Total:  1h 54m | Avg: 28m 33s | Max: 33m 08s
      🔍 GCC13              Pass:  95%/20  | Total:  6h 05m | Avg: 18m 15s | Max: 31m 04s
      🟩 Intel2023.2.0      Pass: 100%/3   | Total:  1h 37m | Avg: 32m 39s | Max: 38m 10s
      🟩 MSVC14.16          Pass: 100%/1   | Total: 56m 19s | Avg: 56m 19s | Max: 56m 19s | Hits:  57%/1453  
      🟩 MSVC14.29          Pass: 100%/2   | Total:  1h 44m | Avg: 52m 16s | Max: 54m 21s | Hits:  57%/2906  
      🟩 MSVC14.39          Pass: 100%/6   | Total:  3h 39m | Avg: 36m 39s | Max: 56m 46s | Hits:  78%/8718  
    🔍 cxx_family: GCC 🔍
      🟩 Clang              Pass: 100%/51  | Total: 20h 28m | Avg: 24m 05s | Max: 32m 23s
      🔍 GCC                Pass:  98%/55  | Total: 22h 02m | Avg: 24m 03s | Max: 39m 16s
      🟩 Intel              Pass: 100%/3   | Total:  1h 37m | Avg: 32m 39s | Max: 38m 10s
      🟩 MSVC               Pass: 100%/9   | Total:  6h 20m | Avg: 42m 18s | Max: 56m 46s | Hits:  71%/13077 
    🔍 jobs: Build 🔍
      🔍 Build              Pass:  98%/99  | Total:  1d 22h | Avg: 28m 01s | Max: 56m 46s | Hits:  57%/8718  
      🟩 TestCPU            Pass: 100%/11  | Total:  1h 55m | Avg: 10m 27s | Max: 21m 39s | Hits:  99%/4359  
      🟩 TestGPU            Pass: 100%/8   | Total:  2h 20m | Avg: 17m 34s | Max: 19m 42s
    🔍 sm: 90a 🔍
      🟩 60;70;80;90        Pass: 100%/3   | Total:  1h 44m | Avg: 34m 40s | Max: 39m 16s
      🔍 90a                Pass:  75%/4   | Total: 52m 52s | Avg: 13m 13s | Max: 15m 15s
    🔍 std: 20 🔍
      🟩 11                 Pass: 100%/30  | Total: 10h 35m | Avg: 21m 11s | Max: 29m 46s
      🟩 14                 Pass: 100%/34  | Total: 15h 25m | Avg: 27m 14s | Max: 56m 19s | Hits:  67%/5812  
      🟩 17                 Pass: 100%/33  | Total: 15h 17m | Avg: 27m 48s | Max: 54m 21s | Hits:  71%/4359  
      🔍 20                 Pass:  95%/21  | Total:  9h 10m | Avg: 26m 12s | Max: 56m 46s | Hits:  78%/2906  
    🟨 gpu
      🟨 v100               Pass:  99%/118 | Total:  2d 02h | Avg: 25m 40s | Max: 56m 46s | Hits:  71%/13077 
    
  • 🟩 pycuda: Pass: 100%/1 | Total: 11m 40s | Avg: 11m 40s | Max: 11m 40s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 11m 40s | Avg: 11m 40s | Max: 11m 40s
    🟩 ctk
      🟩 12.5               Pass: 100%/1   | Total: 11m 40s | Avg: 11m 40s | Max: 11m 40s
    🟩 cudacxx
      🟩 nvcc12.5           Pass: 100%/1   | Total: 11m 40s | Avg: 11m 40s | Max: 11m 40s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 11m 40s | Avg: 11m 40s | Max: 11m 40s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 11m 40s | Avg: 11m 40s | Max: 11m 40s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 11m 40s | Avg: 11m 40s | Max: 11m 40s
    🟩 gpu
      🟩 v100               Pass: 100%/1   | Total: 11m 40s | Avg: 11m 40s | Max: 11m 40s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 11m 40s | Avg: 11m 40s | Max: 11m 40s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
pycuda

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
+/- pycuda

🏃‍ Runner counts (total jobs: 251)

# Runner
178 linux-amd64-cpu16
42 linux-amd64-gpu-v100-latest-1
16 linux-arm64-cpu16
15 windows-amd64-cpu16

bernhardmgruber and others added 4 commits August 30, 2024 12:23
Co-authored-by: Elias Stehle <3958403+elstehle@users.noreply.github.com>
```
/home/coder/cccl/thrust/thrust/cmake/../../thrust/iterator/detail/transform_input_output_iterator.inl:68:9: error: writing 1 byte into a region of size 0 [-Werror=stringop-overflow=]
     68 |     *io = output_function(x);
        |     ~~~~^~~~~~~~~~~~~~~~~~~~~
```
Copy link
Contributor

🟨 CI finished in 4h 12m: Pass: 99%/251 | Total: 3d 15h | Avg: 21m 01s | Max: 1h 40m | Hits: 72%/17373
  • 🟨 cub: Pass: 98%/132 | Total: 2d 08h | Avg: 25m 35s | Max: 1h 40m | Hits: 59%/4296

    🔍 cpu: amd64 🔍
      🔍 amd64              Pass:  98%/124 | Total:  2d 04h | Avg: 25m 16s | Max:  1h 40m | Hits:  59%/4296  
      🟩 arm64              Pass: 100%/8   | Total:  4h 05m | Avg: 30m 37s | Max: 55m 28s
    🔍 ctk: 12.5 🔍
      🟩 11.1               Pass: 100%/15  | Total:  5h 03m | Avg: 20m 14s | Max: 52m 23s | Hits:  59%/716   
      🟩 11.8               Pass: 100%/3   | Total:  1h 29m | Avg: 29m 48s | Max:  1h 11m
      🔍 12.5               Pass:  98%/114 | Total:  2d 01h | Avg: 26m 11s | Max:  1h 40m | Hits:  59%/3580  
    🔍 cudacxx: nvcc12.5 🔍
      🟩 ClangCUDA17        Pass: 100%/2   | Total:  7m 07s | Avg:  3m 33s | Max:  3m 40s
      🟩 nvcc11.1           Pass: 100%/15  | Total:  5h 03m | Avg: 20m 14s | Max: 52m 23s | Hits:  59%/716   
      🟩 nvcc11.8           Pass: 100%/3   | Total:  1h 29m | Avg: 29m 48s | Max:  1h 11m
      🔍 nvcc12.5           Pass:  98%/112 | Total:  2d 01h | Avg: 26m 35s | Max:  1h 40m | Hits:  59%/3580  
    🔍 cudacxx_family: nvcc 🔍
      🟩 ClangCUDA          Pass: 100%/2   | Total:  7m 07s | Avg:  3m 33s | Max:  3m 40s
      🔍 nvcc               Pass:  98%/130 | Total:  2d 08h | Avg: 25m 55s | Max:  1h 40m | Hits:  59%/4296  
    🔍 cxx: Clang17 🔍
      🟩 Clang9             Pass: 100%/6   | Total:  2h 03m | Avg: 20m 39s | Max: 53m 29s
      🟩 Clang10            Pass: 100%/3   | Total:  1h 06m | Avg: 22m 06s | Max: 50m 46s
      🟩 Clang11            Pass: 100%/4   | Total:  1h 53m | Avg: 28m 18s | Max: 50m 10s
      🟩 Clang12            Pass: 100%/4   | Total:  1h 54m | Avg: 28m 39s | Max: 50m 12s
      🟩 Clang13            Pass: 100%/4   | Total:  2h 04m | Avg: 31m 04s | Max: 54m 35s
      🟩 Clang14            Pass: 100%/4   | Total:  2h 03m | Avg: 30m 54s | Max: 54m 39s
      🟩 Clang15            Pass: 100%/4   | Total:  1h 53m | Avg: 28m 26s | Max: 51m 24s
      🟩 Clang16            Pass: 100%/4   | Total:  1h 56m | Avg: 29m 03s | Max: 52m 03s
      🔍 Clang17            Pass:  92%/26  | Total:  8h 55m | Avg: 20m 35s | Max: 53m 48s
      🟩 GCC6               Pass: 100%/2   | Total:  6m 55s | Avg:  3m 27s | Max:  3m 31s
      🟩 GCC7               Pass: 100%/6   | Total:  1h 49m | Avg: 18m 18s | Max: 47m 17s
      🟩 GCC8               Pass: 100%/6   | Total:  2h 00m | Avg: 20m 02s | Max: 52m 47s
      🟩 GCC9               Pass: 100%/6   | Total:  2h 37m | Avg: 26m 18s | Max: 51m 09s
      🟩 GCC10              Pass: 100%/4   | Total:  1h 57m | Avg: 29m 17s | Max: 51m 53s
      🟩 GCC11              Pass: 100%/7   | Total:  3h 27m | Avg: 29m 37s | Max:  1h 11m
      🟩 GCC12              Pass: 100%/4   | Total:  1h 58m | Avg: 29m 33s | Max: 51m 53s
      🟩 GCC13              Pass: 100%/29  | Total: 11h 37m | Avg: 24m 03s | Max:  1h 40m
      🟩 Intel2023.2.0      Pass: 100%/3   | Total:  1h 02m | Avg: 20m 58s | Max: 52m 53s
      🟩 MSVC14.16          Pass: 100%/1   | Total: 52m 23s | Avg: 52m 23s | Max: 52m 23s | Hits:  59%/716   
      🟩 MSVC14.29          Pass: 100%/2   | Total:  2h 00m | Avg:  1h 00m | Max:  1h 02m | Hits:  59%/1432  
      🟩 MSVC14.39          Pass: 100%/3   | Total:  2h 56m | Avg: 58m 45s | Max: 59m 26s | Hits:  59%/2148  
    🔍 cxx_family: Clang 🔍
      🔍 Clang              Pass:  96%/59  | Total: 23h 51m | Avg: 24m 15s | Max: 54m 39s
      🟩 GCC                Pass: 100%/64  | Total:  1d 01h | Avg: 23m 59s | Max:  1h 40m
      🟩 Intel              Pass: 100%/3   | Total:  1h 02m | Avg: 20m 58s | Max: 52m 53s
      🟩 MSVC               Pass: 100%/6   | Total:  5h 48m | Avg: 58m 08s | Max:  1h 02m | Hits:  59%/4296  
    🔍 std: 20 🔍
      🟩 11                 Pass: 100%/34  | Total:  6h 05m | Avg: 10m 44s | Max: 41m 23s
      🟩 14                 Pass: 100%/37  | Total:  8h 16m | Avg: 13m 24s | Max: 58m 20s | Hits:  59%/2148  
      🟩 17                 Pass: 100%/37  | Total:  1d 03h | Avg: 44m 07s | Max:  1h 40m | Hits:  59%/1432  
      🔍 20                 Pass:  91%/24  | Total: 14h 44m | Avg: 36m 51s | Max: 59m 26s | Hits:  59%/716   
    🟨 jobs
      🟩 Build              Pass: 100%/99  | Total:  1d 20h | Avg: 27m 09s | Max:  1h 11m | Hits:  59%/4296  
      🟨 DeviceLaunch       Pass:  87%/8   | Total:  2h 10m | Avg: 16m 21s | Max: 19m 55s
      🟩 GraphCapture       Pass: 100%/8   | Total:  2h 13m | Avg: 16m 42s | Max: 32m 53s
      🟨 HostLaunch         Pass:  87%/8   | Total:  2h 03m | Avg: 15m 28s | Max: 18m 33s
      🟩 SmallGMem          Pass: 100%/1   | Total: 31m 38s | Avg: 31m 38s | Max: 31m 38s
      🟩 TestGPU            Pass: 100%/8   | Total:  4h 29m | Avg: 33m 39s | Max:  1h 40m
    🟨 gpu
      🟨 v100               Pass:  98%/132 | Total:  2d 08h | Avg: 25m 35s | Max:  1h 40m | Hits:  59%/4296  
    🟩 sm
      🟩 60;70;80;90        Pass: 100%/3   | Total:  1h 29m | Avg: 29m 48s | Max:  1h 11m
      🟩 90a                Pass: 100%/4   | Total: 52m 24s | Avg: 13m 06s | Max: 21m 43s
    
  • 🟩 thrust: Pass: 100%/118 | Total: 1d 07h | Avg: 16m 00s | Max: 52m 53s | Hits: 76%/13077

    🟩 cpu
      🟩 amd64              Pass: 100%/110 | Total:  1d 05h | Avg: 16m 04s | Max: 52m 53s | Hits:  76%/13077 
      🟩 arm64              Pass: 100%/8   | Total:  2h 00m | Avg: 15m 05s | Max: 27m 02s
    🟩 ctk
      🟩 11.1               Pass: 100%/15  | Total:  3h 12m | Avg: 12m 50s | Max: 48m 28s | Hits:  65%/1453  
      🟩 11.8               Pass: 100%/3   | Total: 43m 46s | Avg: 14m 35s | Max: 35m 56s
      🟩 12.5               Pass: 100%/100 | Total:  1d 03h | Avg: 16m 31s | Max: 52m 53s | Hits:  78%/11624 
    🟩 cudacxx
      🟩 ClangCUDA17        Pass: 100%/2   | Total:  8m 41s | Avg:  4m 20s | Max:  4m 34s
      🟩 nvcc11.1           Pass: 100%/15  | Total:  3h 12m | Avg: 12m 50s | Max: 48m 28s | Hits:  65%/1453  
      🟩 nvcc11.8           Pass: 100%/3   | Total: 43m 46s | Avg: 14m 35s | Max: 35m 56s
      🟩 nvcc12.5           Pass: 100%/98  | Total:  1d 03h | Avg: 16m 46s | Max: 52m 53s | Hits:  78%/11624 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total:  8m 41s | Avg:  4m 20s | Max:  4m 34s
      🟩 nvcc               Pass: 100%/116 | Total:  1d 07h | Avg: 16m 12s | Max: 52m 53s | Hits:  76%/13077 
    🟩 cxx
      🟩 Clang9             Pass: 100%/6   | Total:  1h 12m | Avg: 12m 09s | Max: 28m 39s
      🟩 Clang10            Pass: 100%/3   | Total: 41m 59s | Avg: 13m 59s | Max: 31m 52s
      🟩 Clang11            Pass: 100%/4   | Total:  1h 03m | Avg: 15m 45s | Max: 29m 03s
      🟩 Clang12            Pass: 100%/4   | Total:  1h 04m | Avg: 16m 13s | Max: 28m 53s
      🟩 Clang13            Pass: 100%/4   | Total:  1h 02m | Avg: 15m 37s | Max: 27m 09s
      🟩 Clang14            Pass: 100%/4   | Total:  1h 02m | Avg: 15m 34s | Max: 28m 02s
      🟩 Clang15            Pass: 100%/4   | Total:  1h 05m | Avg: 16m 28s | Max: 28m 48s
      🟩 Clang16            Pass: 100%/4   | Total:  1h 06m | Avg: 16m 35s | Max: 29m 46s
      🟩 Clang17            Pass: 100%/18  | Total:  3h 30m | Avg: 11m 41s | Max: 28m 29s
      🟩 GCC6               Pass: 100%/2   | Total: 11m 34s | Avg:  5m 47s | Max:  8m 21s
      🟩 GCC7               Pass: 100%/6   | Total:  1h 10m | Avg: 11m 45s | Max: 29m 45s
      🟩 GCC8               Pass: 100%/6   | Total:  1h 13m | Avg: 12m 12s | Max: 31m 52s
      🟩 GCC9               Pass: 100%/6   | Total:  1h 10m | Avg: 11m 46s | Max: 30m 33s
      🟩 GCC10              Pass: 100%/4   | Total:  1h 10m | Avg: 17m 34s | Max: 31m 09s
      🟩 GCC11              Pass: 100%/7   | Total:  1h 51m | Avg: 15m 52s | Max: 35m 56s
      🟩 GCC12              Pass: 100%/4   | Total:  1h 08m | Avg: 17m 04s | Max: 30m 58s
      🟩 GCC13              Pass: 100%/20  | Total:  4h 53m | Avg: 14m 41s | Max: 52m 53s
      🟩 Intel2023.2.0      Pass: 100%/3   | Total: 47m 03s | Avg: 15m 41s | Max: 36m 38s
      🟩 MSVC14.16          Pass: 100%/1   | Total: 48m 28s | Avg: 48m 28s | Max: 48m 28s | Hits:  65%/1453  
      🟩 MSVC14.29          Pass: 100%/2   | Total:  1h 43m | Avg: 51m 36s | Max: 51m 56s | Hits:  65%/2906  
      🟩 MSVC14.39          Pass: 100%/6   | Total:  3h 29m | Avg: 34m 55s | Max: 51m 37s | Hits:  82%/8718  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/51  | Total: 11h 50m | Avg: 13m 55s | Max: 31m 52s
      🟩 GCC                Pass: 100%/55  | Total: 12h 49m | Avg: 13m 59s | Max: 52m 53s
      🟩 Intel              Pass: 100%/3   | Total: 47m 03s | Avg: 15m 41s | Max: 36m 38s
      🟩 MSVC               Pass: 100%/9   | Total:  6h 01m | Avg: 40m 07s | Max: 51m 56s | Hits:  76%/13077 
    🟩 gpu
      🟩 v100               Pass: 100%/118 | Total:  1d 07h | Avg: 16m 00s | Max: 52m 53s | Hits:  76%/13077 
    🟩 jobs
      🟩 Build              Pass: 100%/99  | Total:  1d 03h | Avg: 16m 27s | Max: 51m 56s | Hits:  65%/8718  
      🟩 TestCPU            Pass: 100%/11  | Total:  1h 53m | Avg: 10m 21s | Max: 20m 22s | Hits:  99%/4359  
      🟩 TestGPU            Pass: 100%/8   | Total:  2h 25m | Avg: 18m 12s | Max: 52m 53s
    🟩 sm
      🟩 60;70;80;90        Pass: 100%/3   | Total: 43m 46s | Avg: 14m 35s | Max: 35m 56s
      🟩 90a                Pass: 100%/4   | Total: 38m 12s | Avg:  9m 33s | Max: 16m 00s
    🟩 std
      🟩 11                 Pass: 100%/30  | Total:  2h 24m | Avg:  4m 48s | Max: 14m 42s
      🟩 14                 Pass: 100%/34  | Total:  5h 25m | Avg:  9m 35s | Max: 51m 16s | Hits:  74%/5812  
      🟩 17                 Pass: 100%/33  | Total: 14h 41m | Avg: 26m 43s | Max: 51m 56s | Hits:  76%/4359  
      🟩 20                 Pass: 100%/21  | Total:  8h 56m | Avg: 25m 32s | Max: 52m 53s | Hits:  82%/2906  
    
  • 🟩 pycuda: Pass: 100%/1 | Total: 11m 36s | Avg: 11m 36s | Max: 11m 36s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 11m 36s | Avg: 11m 36s | Max: 11m 36s
    🟩 ctk
      🟩 12.5               Pass: 100%/1   | Total: 11m 36s | Avg: 11m 36s | Max: 11m 36s
    🟩 cudacxx
      🟩 nvcc12.5           Pass: 100%/1   | Total: 11m 36s | Avg: 11m 36s | Max: 11m 36s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 11m 36s | Avg: 11m 36s | Max: 11m 36s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 11m 36s | Avg: 11m 36s | Max: 11m 36s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 11m 36s | Avg: 11m 36s | Max: 11m 36s
    🟩 gpu
      🟩 v100               Pass: 100%/1   | Total: 11m 36s | Avg: 11m 36s | Max: 11m 36s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 11m 36s | Avg: 11m 36s | Max: 11m 36s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
pycuda

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
+/- pycuda

🏃‍ Runner counts (total jobs: 251)

# Runner
178 linux-amd64-cpu16
42 linux-amd64-gpu-v100-latest-1
16 linux-arm64-cpu16
15 windows-amd64-cpu16

Copy link
Contributor

🟨 CI finished in 4h 49m: Pass: 99%/251 | Total: 3d 16h | Avg: 21m 03s | Max: 1h 40m | Hits: 72%/17373
  • 🟨 cub: Pass: 99%/132 | Total: 2d 08h | Avg: 25m 39s | Max: 1h 40m | Hits: 59%/4296

    🔍 cpu: amd64 🔍
      🔍 amd64              Pass:  99%/124 | Total:  2d 04h | Avg: 25m 20s | Max:  1h 40m | Hits:  59%/4296  
      🟩 arm64              Pass: 100%/8   | Total:  4h 05m | Avg: 30m 37s | Max: 55m 28s
    🔍 ctk: 12.5 🔍
      🟩 11.1               Pass: 100%/15  | Total:  5h 03m | Avg: 20m 14s | Max: 52m 23s | Hits:  59%/716   
      🟩 11.8               Pass: 100%/3   | Total:  1h 29m | Avg: 29m 48s | Max:  1h 11m
      🔍 12.5               Pass:  99%/114 | Total:  2d 01h | Avg: 26m 15s | Max:  1h 40m | Hits:  59%/3580  
    🔍 cudacxx: nvcc12.5 🔍
      🟩 ClangCUDA17        Pass: 100%/2   | Total:  7m 07s | Avg:  3m 33s | Max:  3m 40s
      🟩 nvcc11.1           Pass: 100%/15  | Total:  5h 03m | Avg: 20m 14s | Max: 52m 23s | Hits:  59%/716   
      🟩 nvcc11.8           Pass: 100%/3   | Total:  1h 29m | Avg: 29m 48s | Max:  1h 11m
      🔍 nvcc12.5           Pass:  99%/112 | Total:  2d 01h | Avg: 26m 40s | Max:  1h 40m | Hits:  59%/3580  
    🔍 cudacxx_family: nvcc 🔍
      🟩 ClangCUDA          Pass: 100%/2   | Total:  7m 07s | Avg:  3m 33s | Max:  3m 40s
      🔍 nvcc               Pass:  99%/130 | Total:  2d 08h | Avg: 25m 59s | Max:  1h 40m | Hits:  59%/4296  
    🔍 cxx: Clang17 🔍
      🟩 Clang9             Pass: 100%/6   | Total:  2h 03m | Avg: 20m 39s | Max: 53m 29s
      🟩 Clang10            Pass: 100%/3   | Total:  1h 06m | Avg: 22m 06s | Max: 50m 46s
      🟩 Clang11            Pass: 100%/4   | Total:  1h 53m | Avg: 28m 18s | Max: 50m 10s
      🟩 Clang12            Pass: 100%/4   | Total:  1h 54m | Avg: 28m 39s | Max: 50m 12s
      🟩 Clang13            Pass: 100%/4   | Total:  2h 04m | Avg: 31m 04s | Max: 54m 35s
      🟩 Clang14            Pass: 100%/4   | Total:  2h 03m | Avg: 30m 54s | Max: 54m 39s
      🟩 Clang15            Pass: 100%/4   | Total:  1h 53m | Avg: 28m 26s | Max: 51m 24s
      🟩 Clang16            Pass: 100%/4   | Total:  1h 56m | Avg: 29m 03s | Max: 52m 03s
      🔍 Clang17            Pass:  96%/26  | Total:  9h 04m | Avg: 20m 55s | Max: 53m 48s
      🟩 GCC6               Pass: 100%/2   | Total:  6m 55s | Avg:  3m 27s | Max:  3m 31s
      🟩 GCC7               Pass: 100%/6   | Total:  1h 49m | Avg: 18m 18s | Max: 47m 17s
      🟩 GCC8               Pass: 100%/6   | Total:  2h 00m | Avg: 20m 02s | Max: 52m 47s
      🟩 GCC9               Pass: 100%/6   | Total:  2h 37m | Avg: 26m 18s | Max: 51m 09s
      🟩 GCC10              Pass: 100%/4   | Total:  1h 57m | Avg: 29m 17s | Max: 51m 53s
      🟩 GCC11              Pass: 100%/7   | Total:  3h 27m | Avg: 29m 37s | Max:  1h 11m
      🟩 GCC12              Pass: 100%/4   | Total:  1h 58m | Avg: 29m 33s | Max: 51m 53s
      🟩 GCC13              Pass: 100%/29  | Total: 11h 37m | Avg: 24m 03s | Max:  1h 40m
      🟩 Intel2023.2.0      Pass: 100%/3   | Total:  1h 02m | Avg: 20m 58s | Max: 52m 53s
      🟩 MSVC14.16          Pass: 100%/1   | Total: 52m 23s | Avg: 52m 23s | Max: 52m 23s | Hits:  59%/716   
      🟩 MSVC14.29          Pass: 100%/2   | Total:  2h 00m | Avg:  1h 00m | Max:  1h 02m | Hits:  59%/1432  
      🟩 MSVC14.39          Pass: 100%/3   | Total:  2h 56m | Avg: 58m 45s | Max: 59m 26s | Hits:  59%/2148  
    🔍 cxx_family: Clang 🔍
      🔍 Clang              Pass:  98%/59  | Total:  1d 00h | Avg: 24m 24s | Max: 54m 39s
      🟩 GCC                Pass: 100%/64  | Total:  1d 01h | Avg: 23m 59s | Max:  1h 40m
      🟩 Intel              Pass: 100%/3   | Total:  1h 02m | Avg: 20m 58s | Max: 52m 53s
      🟩 MSVC               Pass: 100%/6   | Total:  5h 48m | Avg: 58m 08s | Max:  1h 02m | Hits:  59%/4296  
    🔍 jobs: DeviceLaunch 🔍
      🟩 Build              Pass: 100%/99  | Total:  1d 20h | Avg: 27m 09s | Max:  1h 11m | Hits:  59%/4296  
      🔍 DeviceLaunch       Pass:  87%/8   | Total:  2h 09m | Avg: 16m 14s | Max: 19m 55s
      🟩 GraphCapture       Pass: 100%/8   | Total:  2h 13m | Avg: 16m 42s | Max: 32m 53s
      🟩 HostLaunch         Pass: 100%/8   | Total:  2h 13m | Avg: 16m 41s | Max: 18m 33s
      🟩 SmallGMem          Pass: 100%/1   | Total: 31m 38s | Avg: 31m 38s | Max: 31m 38s
      🟩 TestGPU            Pass: 100%/8   | Total:  4h 29m | Avg: 33m 39s | Max:  1h 40m
    🔍 std: 20 🔍
      🟩 11                 Pass: 100%/34  | Total:  6h 05m | Avg: 10m 44s | Max: 41m 23s
      🟩 14                 Pass: 100%/37  | Total:  8h 16m | Avg: 13m 24s | Max: 58m 20s | Hits:  59%/2148  
      🟩 17                 Pass: 100%/37  | Total:  1d 03h | Avg: 44m 07s | Max:  1h 40m | Hits:  59%/1432  
      🔍 20                 Pass:  95%/24  | Total: 14h 53m | Avg: 37m 13s | Max: 59m 26s | Hits:  59%/716   
    🟨 gpu
      🟨 v100               Pass:  99%/132 | Total:  2d 08h | Avg: 25m 39s | Max:  1h 40m | Hits:  59%/4296  
    🟩 sm
      🟩 60;70;80;90        Pass: 100%/3   | Total:  1h 29m | Avg: 29m 48s | Max:  1h 11m
      🟩 90a                Pass: 100%/4   | Total: 52m 24s | Avg: 13m 06s | Max: 21m 43s
    
  • 🟩 thrust: Pass: 100%/118 | Total: 1d 07h | Avg: 16m 00s | Max: 52m 53s | Hits: 76%/13077

    🟩 cpu
      🟩 amd64              Pass: 100%/110 | Total:  1d 05h | Avg: 16m 04s | Max: 52m 53s | Hits:  76%/13077 
      🟩 arm64              Pass: 100%/8   | Total:  2h 00m | Avg: 15m 05s | Max: 27m 02s
    🟩 ctk
      🟩 11.1               Pass: 100%/15  | Total:  3h 12m | Avg: 12m 50s | Max: 48m 28s | Hits:  65%/1453  
      🟩 11.8               Pass: 100%/3   | Total: 43m 46s | Avg: 14m 35s | Max: 35m 56s
      🟩 12.5               Pass: 100%/100 | Total:  1d 03h | Avg: 16m 31s | Max: 52m 53s | Hits:  78%/11624 
    🟩 cudacxx
      🟩 ClangCUDA17        Pass: 100%/2   | Total:  8m 41s | Avg:  4m 20s | Max:  4m 34s
      🟩 nvcc11.1           Pass: 100%/15  | Total:  3h 12m | Avg: 12m 50s | Max: 48m 28s | Hits:  65%/1453  
      🟩 nvcc11.8           Pass: 100%/3   | Total: 43m 46s | Avg: 14m 35s | Max: 35m 56s
      🟩 nvcc12.5           Pass: 100%/98  | Total:  1d 03h | Avg: 16m 46s | Max: 52m 53s | Hits:  78%/11624 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total:  8m 41s | Avg:  4m 20s | Max:  4m 34s
      🟩 nvcc               Pass: 100%/116 | Total:  1d 07h | Avg: 16m 12s | Max: 52m 53s | Hits:  76%/13077 
    🟩 cxx
      🟩 Clang9             Pass: 100%/6   | Total:  1h 12m | Avg: 12m 09s | Max: 28m 39s
      🟩 Clang10            Pass: 100%/3   | Total: 41m 59s | Avg: 13m 59s | Max: 31m 52s
      🟩 Clang11            Pass: 100%/4   | Total:  1h 03m | Avg: 15m 45s | Max: 29m 03s
      🟩 Clang12            Pass: 100%/4   | Total:  1h 04m | Avg: 16m 13s | Max: 28m 53s
      🟩 Clang13            Pass: 100%/4   | Total:  1h 02m | Avg: 15m 37s | Max: 27m 09s
      🟩 Clang14            Pass: 100%/4   | Total:  1h 02m | Avg: 15m 34s | Max: 28m 02s
      🟩 Clang15            Pass: 100%/4   | Total:  1h 05m | Avg: 16m 28s | Max: 28m 48s
      🟩 Clang16            Pass: 100%/4   | Total:  1h 06m | Avg: 16m 35s | Max: 29m 46s
      🟩 Clang17            Pass: 100%/18  | Total:  3h 30m | Avg: 11m 41s | Max: 28m 29s
      🟩 GCC6               Pass: 100%/2   | Total: 11m 34s | Avg:  5m 47s | Max:  8m 21s
      🟩 GCC7               Pass: 100%/6   | Total:  1h 10m | Avg: 11m 45s | Max: 29m 45s
      🟩 GCC8               Pass: 100%/6   | Total:  1h 13m | Avg: 12m 12s | Max: 31m 52s
      🟩 GCC9               Pass: 100%/6   | Total:  1h 10m | Avg: 11m 46s | Max: 30m 33s
      🟩 GCC10              Pass: 100%/4   | Total:  1h 10m | Avg: 17m 34s | Max: 31m 09s
      🟩 GCC11              Pass: 100%/7   | Total:  1h 51m | Avg: 15m 52s | Max: 35m 56s
      🟩 GCC12              Pass: 100%/4   | Total:  1h 08m | Avg: 17m 04s | Max: 30m 58s
      🟩 GCC13              Pass: 100%/20  | Total:  4h 53m | Avg: 14m 41s | Max: 52m 53s
      🟩 Intel2023.2.0      Pass: 100%/3   | Total: 47m 03s | Avg: 15m 41s | Max: 36m 38s
      🟩 MSVC14.16          Pass: 100%/1   | Total: 48m 28s | Avg: 48m 28s | Max: 48m 28s | Hits:  65%/1453  
      🟩 MSVC14.29          Pass: 100%/2   | Total:  1h 43m | Avg: 51m 36s | Max: 51m 56s | Hits:  65%/2906  
      🟩 MSVC14.39          Pass: 100%/6   | Total:  3h 29m | Avg: 34m 55s | Max: 51m 37s | Hits:  82%/8718  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/51  | Total: 11h 50m | Avg: 13m 55s | Max: 31m 52s
      🟩 GCC                Pass: 100%/55  | Total: 12h 49m | Avg: 13m 59s | Max: 52m 53s
      🟩 Intel              Pass: 100%/3   | Total: 47m 03s | Avg: 15m 41s | Max: 36m 38s
      🟩 MSVC               Pass: 100%/9   | Total:  6h 01m | Avg: 40m 07s | Max: 51m 56s | Hits:  76%/13077 
    🟩 gpu
      🟩 v100               Pass: 100%/118 | Total:  1d 07h | Avg: 16m 00s | Max: 52m 53s | Hits:  76%/13077 
    🟩 jobs
      🟩 Build              Pass: 100%/99  | Total:  1d 03h | Avg: 16m 27s | Max: 51m 56s | Hits:  65%/8718  
      🟩 TestCPU            Pass: 100%/11  | Total:  1h 53m | Avg: 10m 21s | Max: 20m 22s | Hits:  99%/4359  
      🟩 TestGPU            Pass: 100%/8   | Total:  2h 25m | Avg: 18m 12s | Max: 52m 53s
    🟩 sm
      🟩 60;70;80;90        Pass: 100%/3   | Total: 43m 46s | Avg: 14m 35s | Max: 35m 56s
      🟩 90a                Pass: 100%/4   | Total: 38m 12s | Avg:  9m 33s | Max: 16m 00s
    🟩 std
      🟩 11                 Pass: 100%/30  | Total:  2h 24m | Avg:  4m 48s | Max: 14m 42s
      🟩 14                 Pass: 100%/34  | Total:  5h 25m | Avg:  9m 35s | Max: 51m 16s | Hits:  74%/5812  
      🟩 17                 Pass: 100%/33  | Total: 14h 41m | Avg: 26m 43s | Max: 51m 56s | Hits:  76%/4359  
      🟩 20                 Pass: 100%/21  | Total:  8h 56m | Avg: 25m 32s | Max: 52m 53s | Hits:  82%/2906  
    
  • 🟩 pycuda: Pass: 100%/1 | Total: 11m 36s | Avg: 11m 36s | Max: 11m 36s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 11m 36s | Avg: 11m 36s | Max: 11m 36s
    🟩 ctk
      🟩 12.5               Pass: 100%/1   | Total: 11m 36s | Avg: 11m 36s | Max: 11m 36s
    🟩 cudacxx
      🟩 nvcc12.5           Pass: 100%/1   | Total: 11m 36s | Avg: 11m 36s | Max: 11m 36s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 11m 36s | Avg: 11m 36s | Max: 11m 36s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 11m 36s | Avg: 11m 36s | Max: 11m 36s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 11m 36s | Avg: 11m 36s | Max: 11m 36s
    🟩 gpu
      🟩 v100               Pass: 100%/1   | Total: 11m 36s | Avg: 11m 36s | Max: 11m 36s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 11m 36s | Avg: 11m 36s | Max: 11m 36s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
pycuda

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
+/- pycuda

🏃‍ Runner counts (total jobs: 251)

# Runner
178 linux-amd64-cpu16
42 linux-amd64-gpu-v100-latest-1
16 linux-arm64-cpu16
15 windows-amd64-cpu16

Copy link
Contributor

🟩 CI finished in 5h 24m: Pass: 100%/251 | Total: 3d 16h | Avg: 21m 08s | Max: 1h 40m | Hits: 72%/17373
  • 🟩 cub: Pass: 100%/132 | Total: 2d 08h | Avg: 25m 48s | Max: 1h 40m | Hits: 59%/4296

    🟩 cpu
      🟩 amd64              Pass: 100%/124 | Total:  2d 04h | Avg: 25m 29s | Max:  1h 40m | Hits:  59%/4296  
      🟩 arm64              Pass: 100%/8   | Total:  4h 05m | Avg: 30m 37s | Max: 55m 28s
    🟩 ctk
      🟩 11.1               Pass: 100%/15  | Total:  5h 03m | Avg: 20m 14s | Max: 52m 23s | Hits:  59%/716   
      🟩 11.8               Pass: 100%/3   | Total:  1h 29m | Avg: 29m 48s | Max:  1h 11m
      🟩 12.5               Pass: 100%/114 | Total:  2d 02h | Avg: 26m 26s | Max:  1h 40m | Hits:  59%/3580  
    🟩 cudacxx
      🟩 ClangCUDA17        Pass: 100%/2   | Total:  7m 07s | Avg:  3m 33s | Max:  3m 40s
      🟩 nvcc11.1           Pass: 100%/15  | Total:  5h 03m | Avg: 20m 14s | Max: 52m 23s | Hits:  59%/716   
      🟩 nvcc11.8           Pass: 100%/3   | Total:  1h 29m | Avg: 29m 48s | Max:  1h 11m
      🟩 nvcc12.5           Pass: 100%/112 | Total:  2d 02h | Avg: 26m 50s | Max:  1h 40m | Hits:  59%/3580  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total:  7m 07s | Avg:  3m 33s | Max:  3m 40s
      🟩 nvcc               Pass: 100%/130 | Total:  2d 08h | Avg: 26m 09s | Max:  1h 40m | Hits:  59%/4296  
    🟩 cxx
      🟩 Clang9             Pass: 100%/6   | Total:  2h 03m | Avg: 20m 39s | Max: 53m 29s
      🟩 Clang10            Pass: 100%/3   | Total:  1h 06m | Avg: 22m 06s | Max: 50m 46s
      🟩 Clang11            Pass: 100%/4   | Total:  1h 53m | Avg: 28m 18s | Max: 50m 10s
      🟩 Clang12            Pass: 100%/4   | Total:  1h 54m | Avg: 28m 39s | Max: 50m 12s
      🟩 Clang13            Pass: 100%/4   | Total:  2h 04m | Avg: 31m 04s | Max: 54m 35s
      🟩 Clang14            Pass: 100%/4   | Total:  2h 03m | Avg: 30m 54s | Max: 54m 39s
      🟩 Clang15            Pass: 100%/4   | Total:  1h 53m | Avg: 28m 26s | Max: 51m 24s
      🟩 Clang16            Pass: 100%/4   | Total:  1h 56m | Avg: 29m 03s | Max: 52m 03s
      🟩 Clang17            Pass: 100%/26  | Total:  9h 23m | Avg: 21m 41s | Max: 53m 48s
      🟩 GCC6               Pass: 100%/2   | Total:  6m 55s | Avg:  3m 27s | Max:  3m 31s
      🟩 GCC7               Pass: 100%/6   | Total:  1h 49m | Avg: 18m 18s | Max: 47m 17s
      🟩 GCC8               Pass: 100%/6   | Total:  2h 00m | Avg: 20m 02s | Max: 52m 47s
      🟩 GCC9               Pass: 100%/6   | Total:  2h 37m | Avg: 26m 18s | Max: 51m 09s
      🟩 GCC10              Pass: 100%/4   | Total:  1h 57m | Avg: 29m 17s | Max: 51m 53s
      🟩 GCC11              Pass: 100%/7   | Total:  3h 27m | Avg: 29m 37s | Max:  1h 11m
      🟩 GCC12              Pass: 100%/4   | Total:  1h 58m | Avg: 29m 33s | Max: 51m 53s
      🟩 GCC13              Pass: 100%/29  | Total: 11h 37m | Avg: 24m 03s | Max:  1h 40m
      🟩 Intel2023.2.0      Pass: 100%/3   | Total:  1h 02m | Avg: 20m 58s | Max: 52m 53s
      🟩 MSVC14.16          Pass: 100%/1   | Total: 52m 23s | Avg: 52m 23s | Max: 52m 23s | Hits:  59%/716   
      🟩 MSVC14.29          Pass: 100%/2   | Total:  2h 00m | Avg:  1h 00m | Max:  1h 02m | Hits:  59%/1432  
      🟩 MSVC14.39          Pass: 100%/3   | Total:  2h 56m | Avg: 58m 45s | Max: 59m 26s | Hits:  59%/2148  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/59  | Total:  1d 00h | Avg: 24m 44s | Max: 54m 39s
      🟩 GCC                Pass: 100%/64  | Total:  1d 01h | Avg: 23m 59s | Max:  1h 40m
      🟩 Intel              Pass: 100%/3   | Total:  1h 02m | Avg: 20m 58s | Max: 52m 53s
      🟩 MSVC               Pass: 100%/6   | Total:  5h 48m | Avg: 58m 08s | Max:  1h 02m | Hits:  59%/4296  
    🟩 gpu
      🟩 v100               Pass: 100%/132 | Total:  2d 08h | Avg: 25m 48s | Max:  1h 40m | Hits:  59%/4296  
    🟩 jobs
      🟩 Build              Pass: 100%/99  | Total:  1d 20h | Avg: 27m 09s | Max:  1h 11m | Hits:  59%/4296  
      🟩 DeviceLaunch       Pass: 100%/8   | Total:  2h 29m | Avg: 18m 42s | Max: 23m 29s
      🟩 GraphCapture       Pass: 100%/8   | Total:  2h 13m | Avg: 16m 42s | Max: 32m 53s
      🟩 HostLaunch         Pass: 100%/8   | Total:  2h 13m | Avg: 16m 41s | Max: 18m 33s
      🟩 SmallGMem          Pass: 100%/1   | Total: 31m 38s | Avg: 31m 38s | Max: 31m 38s
      🟩 TestGPU            Pass: 100%/8   | Total:  4h 29m | Avg: 33m 39s | Max:  1h 40m
    🟩 sm
      🟩 60;70;80;90        Pass: 100%/3   | Total:  1h 29m | Avg: 29m 48s | Max:  1h 11m
      🟩 90a                Pass: 100%/4   | Total: 52m 24s | Avg: 13m 06s | Max: 21m 43s
    🟩 std
      🟩 11                 Pass: 100%/34  | Total:  6h 05m | Avg: 10m 44s | Max: 41m 23s
      🟩 14                 Pass: 100%/37  | Total:  8h 16m | Avg: 13m 24s | Max: 58m 20s | Hits:  59%/2148  
      🟩 17                 Pass: 100%/37  | Total:  1d 03h | Avg: 44m 07s | Max:  1h 40m | Hits:  59%/1432  
      🟩 20                 Pass: 100%/24  | Total: 15h 13m | Avg: 38m 03s | Max: 59m 26s | Hits:  59%/716   
    
  • 🟩 thrust: Pass: 100%/118 | Total: 1d 07h | Avg: 16m 00s | Max: 52m 53s | Hits: 76%/13077

    🟩 cpu
      🟩 amd64              Pass: 100%/110 | Total:  1d 05h | Avg: 16m 04s | Max: 52m 53s | Hits:  76%/13077 
      🟩 arm64              Pass: 100%/8   | Total:  2h 00m | Avg: 15m 05s | Max: 27m 02s
    🟩 ctk
      🟩 11.1               Pass: 100%/15  | Total:  3h 12m | Avg: 12m 50s | Max: 48m 28s | Hits:  65%/1453  
      🟩 11.8               Pass: 100%/3   | Total: 43m 46s | Avg: 14m 35s | Max: 35m 56s
      🟩 12.5               Pass: 100%/100 | Total:  1d 03h | Avg: 16m 31s | Max: 52m 53s | Hits:  78%/11624 
    🟩 cudacxx
      🟩 ClangCUDA17        Pass: 100%/2   | Total:  8m 41s | Avg:  4m 20s | Max:  4m 34s
      🟩 nvcc11.1           Pass: 100%/15  | Total:  3h 12m | Avg: 12m 50s | Max: 48m 28s | Hits:  65%/1453  
      🟩 nvcc11.8           Pass: 100%/3   | Total: 43m 46s | Avg: 14m 35s | Max: 35m 56s
      🟩 nvcc12.5           Pass: 100%/98  | Total:  1d 03h | Avg: 16m 46s | Max: 52m 53s | Hits:  78%/11624 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total:  8m 41s | Avg:  4m 20s | Max:  4m 34s
      🟩 nvcc               Pass: 100%/116 | Total:  1d 07h | Avg: 16m 12s | Max: 52m 53s | Hits:  76%/13077 
    🟩 cxx
      🟩 Clang9             Pass: 100%/6   | Total:  1h 12m | Avg: 12m 09s | Max: 28m 39s
      🟩 Clang10            Pass: 100%/3   | Total: 41m 59s | Avg: 13m 59s | Max: 31m 52s
      🟩 Clang11            Pass: 100%/4   | Total:  1h 03m | Avg: 15m 45s | Max: 29m 03s
      🟩 Clang12            Pass: 100%/4   | Total:  1h 04m | Avg: 16m 13s | Max: 28m 53s
      🟩 Clang13            Pass: 100%/4   | Total:  1h 02m | Avg: 15m 37s | Max: 27m 09s
      🟩 Clang14            Pass: 100%/4   | Total:  1h 02m | Avg: 15m 34s | Max: 28m 02s
      🟩 Clang15            Pass: 100%/4   | Total:  1h 05m | Avg: 16m 28s | Max: 28m 48s
      🟩 Clang16            Pass: 100%/4   | Total:  1h 06m | Avg: 16m 35s | Max: 29m 46s
      🟩 Clang17            Pass: 100%/18  | Total:  3h 30m | Avg: 11m 41s | Max: 28m 29s
      🟩 GCC6               Pass: 100%/2   | Total: 11m 34s | Avg:  5m 47s | Max:  8m 21s
      🟩 GCC7               Pass: 100%/6   | Total:  1h 10m | Avg: 11m 45s | Max: 29m 45s
      🟩 GCC8               Pass: 100%/6   | Total:  1h 13m | Avg: 12m 12s | Max: 31m 52s
      🟩 GCC9               Pass: 100%/6   | Total:  1h 10m | Avg: 11m 46s | Max: 30m 33s
      🟩 GCC10              Pass: 100%/4   | Total:  1h 10m | Avg: 17m 34s | Max: 31m 09s
      🟩 GCC11              Pass: 100%/7   | Total:  1h 51m | Avg: 15m 52s | Max: 35m 56s
      🟩 GCC12              Pass: 100%/4   | Total:  1h 08m | Avg: 17m 04s | Max: 30m 58s
      🟩 GCC13              Pass: 100%/20  | Total:  4h 53m | Avg: 14m 41s | Max: 52m 53s
      🟩 Intel2023.2.0      Pass: 100%/3   | Total: 47m 03s | Avg: 15m 41s | Max: 36m 38s
      🟩 MSVC14.16          Pass: 100%/1   | Total: 48m 28s | Avg: 48m 28s | Max: 48m 28s | Hits:  65%/1453  
      🟩 MSVC14.29          Pass: 100%/2   | Total:  1h 43m | Avg: 51m 36s | Max: 51m 56s | Hits:  65%/2906  
      🟩 MSVC14.39          Pass: 100%/6   | Total:  3h 29m | Avg: 34m 55s | Max: 51m 37s | Hits:  82%/8718  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/51  | Total: 11h 50m | Avg: 13m 55s | Max: 31m 52s
      🟩 GCC                Pass: 100%/55  | Total: 12h 49m | Avg: 13m 59s | Max: 52m 53s
      🟩 Intel              Pass: 100%/3   | Total: 47m 03s | Avg: 15m 41s | Max: 36m 38s
      🟩 MSVC               Pass: 100%/9   | Total:  6h 01m | Avg: 40m 07s | Max: 51m 56s | Hits:  76%/13077 
    🟩 gpu
      🟩 v100               Pass: 100%/118 | Total:  1d 07h | Avg: 16m 00s | Max: 52m 53s | Hits:  76%/13077 
    🟩 jobs
      🟩 Build              Pass: 100%/99  | Total:  1d 03h | Avg: 16m 27s | Max: 51m 56s | Hits:  65%/8718  
      🟩 TestCPU            Pass: 100%/11  | Total:  1h 53m | Avg: 10m 21s | Max: 20m 22s | Hits:  99%/4359  
      🟩 TestGPU            Pass: 100%/8   | Total:  2h 25m | Avg: 18m 12s | Max: 52m 53s
    🟩 sm
      🟩 60;70;80;90        Pass: 100%/3   | Total: 43m 46s | Avg: 14m 35s | Max: 35m 56s
      🟩 90a                Pass: 100%/4   | Total: 38m 12s | Avg:  9m 33s | Max: 16m 00s
    🟩 std
      🟩 11                 Pass: 100%/30  | Total:  2h 24m | Avg:  4m 48s | Max: 14m 42s
      🟩 14                 Pass: 100%/34  | Total:  5h 25m | Avg:  9m 35s | Max: 51m 16s | Hits:  74%/5812  
      🟩 17                 Pass: 100%/33  | Total: 14h 41m | Avg: 26m 43s | Max: 51m 56s | Hits:  76%/4359  
      🟩 20                 Pass: 100%/21  | Total:  8h 56m | Avg: 25m 32s | Max: 52m 53s | Hits:  82%/2906  
    
  • 🟩 pycuda: Pass: 100%/1 | Total: 11m 36s | Avg: 11m 36s | Max: 11m 36s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 11m 36s | Avg: 11m 36s | Max: 11m 36s
    🟩 ctk
      🟩 12.5               Pass: 100%/1   | Total: 11m 36s | Avg: 11m 36s | Max: 11m 36s
    🟩 cudacxx
      🟩 nvcc12.5           Pass: 100%/1   | Total: 11m 36s | Avg: 11m 36s | Max: 11m 36s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 11m 36s | Avg: 11m 36s | Max: 11m 36s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 11m 36s | Avg: 11m 36s | Max: 11m 36s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 11m 36s | Avg: 11m 36s | Max: 11m 36s
    🟩 gpu
      🟩 v100               Pass: 100%/1   | Total: 11m 36s | Avg: 11m 36s | Max: 11m 36s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 11m 36s | Avg: 11m 36s | Max: 11m 36s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
pycuda

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
+/- pycuda

🏃‍ Runner counts (total jobs: 251)

# Runner
178 linux-amd64-cpu16
42 linux-amd64-gpu-v100-latest-1
16 linux-arm64-cpu16
15 windows-amd64-cpu16

@bernhardmgruber
Copy link
Contributor Author

@gevtushenko: @elstehle said he wouldn't want to merge without your approval. So we are waiting for to merge this PR.

@gevtushenko ping.

@davidwendt
Copy link

Per @elstehle request, I tested this successfully with libcudf 24.10 without our current scan-tuning patch and this worked well us.

@bernhardmgruber
Copy link
Contributor Author

@gevtushenko: @elstehle said he wouldn't want to merge without your approval. So we are waiting for to merge this PR.

@gevtushenko ping.

@gevtushenko ping.

// we instantiate invoke_static for each CudaArches, but only call the one matching device_ptx_version
cudaError_t e = cudaSuccess;
const cudaError_t dummy[] = {
(device_ptx_version == CudaArches ? (e = invoke_static<CudaArches>(op, ::cuda::std::true_type{}))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: this approach works because device_ptx_version is not CC of the current device but rather PTX version closest to current device CC from the set of PTX versions that cub::EmptyKernel was compiled for. When magic namespace is enabled, this property provides us a guarantee that device_ptx_version is one of the CudaArches because cub::EmptyKernel was compiled for CudaArches. At some point we discussed switching to a querying CC of current device directly instead of using empty kernel. This change would be one of the reasons for us not to do that, because then device_ptx_version could be outside of CudaArches, leaving algorithm not executed when someone compiled for, say, on Ampere but tried running code on Ada. I'd suggest adding a note somewhere on PtxVersion saying that we should always query CC from empty kernel for that reason.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here was the issue for the alternative approach: #898

CUB_RUNTIME_FUNCTION _CCCL_FORCEINLINE static cudaError_t runtime_to_compiletime(int device_ptx_version, FunctorT& op)
{
// we instantiate invoke_static for each CudaArches, but only call the one matching device_ptx_version
cudaError_t e = cudaSuccess;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

important: there are a few potential situations that could lead us to a situation where device_ptx_version is outside of CudaArches. For instance, change in PtxVersion function I described below, or disabled namespace magic. This situation would lead to this function returning cudaSuccess while not invoking any algorithms. Given that it'd be a corrupted use case that we want to report to the user, I'd prefer having something other than cudaSuccess as a default value here.

@@ -635,18 +636,79 @@ struct ChainedPolicy
template <typename FunctorT>
CUB_RUNTIME_FUNCTION _CCCL_FORCEINLINE static cudaError_t Invoke(int device_ptx_version, FunctorT& op)
{
// __CUDA_ARCH_LIST__ is only available from CTK 11.5 onwards
#ifdef __CUDA_ARCH_LIST__
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggested: as described below, this is safe only when we have namespace magic. Although this leads to UB when linking TUs compiled for different architecture sets, old chained policy would at least find a policy to get executed. New chained policy would not. So, I'd prefer something along the lines of:

Suggested change
#ifdef __CUDA_ARCH_LIST__
#if defined(__CUDA_ARCH_LIST__) && !defined(CUB_DISABLE_NAMESPACE_MAGIC)

but rapids is going to disable namespace magic, so a change like that would make this PR less useful. I guess if we change cudaError_t e = cudaSuccess; to return an actual error as suggested below, we'll at least catch the problem at runtime. I'd recommend adding a note close to the error we are going to return when no arch from CudaArches matched device ptx suggesting users to re-enable namespace magic or wrap namespace.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cub For all items related to CUB
Projects
Status: In Review
Development

Successfully merging this pull request may close these issues.

6 participants