Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: marking layers as deprecated #856

Merged
merged 5 commits into from
Aug 27, 2024
Merged

chore: marking layers as deprecated #856

merged 5 commits into from
Aug 27, 2024

Conversation

avik-pal
Copy link
Member

@avik-pal avik-pal commented Aug 24, 2024

these layers are not being removed, we are simply moving them to Boltz.jl

needs LuxDL/Boltz.jl#48. The symbolic optimal control tutorial has been moved to boltz.

Base automatically changed from ap/docs_improve to main August 24, 2024 09:03
Copy link
Contributor

github-actions bot commented Aug 24, 2024

Benchmark Results (ASV)

main c5f0256... main/c5f025665d3f66...
basics/overhead 0.0877 ± 0.0073 μs 0.0914 ± 0.003 μs 0.96
time_to_load 1.06 ± 0.0095 s 1.06 ± 0.0019 s 1

Benchmark Plots

A plot of the benchmark results have been uploaded as an artifact to the workflow run for this PR.
Go to "Actions"->"Benchmark a pull request"->[the most recent run]->"Artifacts" (at the bottom).

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lux Benchmarks

Benchmark suite Current: c5f0256 Previous: e6dea49 Ratio
Dense(512 => 512, identity)(512 x 128)/forward/CPU/2 thread(s) 413958 ns 411750 ns 1.01
Dense(512 => 512, identity)(512 x 128)/forward/CPU/4 thread(s) 323416.5 ns 243959 ns 1.33
Dense(512 => 512, identity)(512 x 128)/forward/CPU/8 thread(s) 322291.5 ns 323604 ns 1.00
Dense(512 => 512, identity)(512 x 128)/forward/CPU/1 thread(s) 740750 ns 741209 ns 1.00
Dense(512 => 512, identity)(512 x 128)/forward/GPU/CUDA 43875 ns 44008 ns 1.00
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/2 thread(s) 1342958 ns 1392458 ns 0.96
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/4 thread(s) 2458917 ns 1249333 ns 1.97
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/8 thread(s) 13993208 ns 14034875 ns 1.00
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/1 thread(s) 2209625 ns 2247000 ns 0.98
Dense(512 => 512, identity)(512 x 128)/zygote/GPU/CUDA 206353.5 ns 206485 ns 1.00
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/2 thread(s) 1417667 ns 1411375 ns 1.00
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/4 thread(s) 897333 ns 949209 ns 0.95
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/8 thread(s) 1693834 ns 1539667 ns 1.10
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/1 thread(s) 2244375 ns 2262146 ns 0.99
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/2 thread(s) 1712709 ns 1751333.5 ns 0.98
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/4 thread(s) 1090208 ns 1096875 ns 0.99
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/8 thread(s) 1547604.5 ns 1541583 ns 1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/1 thread(s) 2820208.5 ns 3026749.5 ns 0.93
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/GPU/CUDA 209139 ns 209127 ns 1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/2 thread(s) 12169250.5 ns 12111771 ns 1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/4 thread(s) 8816354.5 ns 8833083 ns 1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/8 thread(s) 9245937.5 ns 9198584 ns 1.01
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/1 thread(s) 18567708 ns 18601167 ns 1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/GPU/CUDA 1486840 ns 1480357.5 ns 1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/2 thread(s) 17317312.5 ns 17231270.5 ns 1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/4 thread(s) 13996500 ns 13987541.5 ns 1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/8 thread(s) 14568291 ns 14519729 ns 1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/1 thread(s) 21838000 ns 21836292 ns 1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/2 thread(s) 251187416.5 ns 250395646 ns 1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/4 thread(s) 148580500 ns 148855375 ns 1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/8 thread(s) 116013292 ns 115834062 ns 1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/1 thread(s) 448755542 ns 446839208 ns 1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/GPU/CUDA 5477446 ns 5444163 ns 1.01
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/2 thread(s) 1140829041 ns 1176608458 ns 0.97
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/4 thread(s) 978530333 ns 976012000 ns 1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/8 thread(s) 848602646 ns 837397979.5 ns 1.01
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/1 thread(s) 1776494083 ns 1759902458 ns 1.01
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/GPU/CUDA 31494023 ns 31490812.5 ns 1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/2 thread(s) 1044068833 ns 1129305209 ns 0.92
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/4 thread(s) 991054895.5 ns 991324229.5 ns 1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/8 thread(s) 1313072395.5 ns 1295080375.5 ns 1.01
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/1 thread(s) 1738222791.5 ns 1730828646 ns 1.00
lenet(28, 28, 1, 32)/forward/CPU/2 thread(s) 1127458.5 ns 1075249.5 ns 1.05
lenet(28, 28, 1, 32)/forward/CPU/4 thread(s) 1634354.5 ns 1662353.5 ns 0.98
lenet(28, 28, 1, 32)/forward/CPU/8 thread(s) 3595625 ns 3521959 ns 1.02
lenet(28, 28, 1, 32)/forward/CPU/1 thread(s) 782958.5 ns 782750 ns 1.00
lenet(28, 28, 1, 32)/forward/GPU/CUDA 272294 ns 268581 ns 1.01
lenet(28, 28, 1, 32)/zygote/CPU/2 thread(s) 3037041.5 ns 3020312.5 ns 1.01
lenet(28, 28, 1, 32)/zygote/CPU/4 thread(s) 4153667 ns 4174708 ns 0.99
lenet(28, 28, 1, 32)/zygote/CPU/8 thread(s) 9203584 ns 11483792 ns 0.80
lenet(28, 28, 1, 32)/zygote/CPU/1 thread(s) 3247042 ns 3174584 ns 1.02
lenet(28, 28, 1, 32)/zygote/GPU/CUDA 1198821 ns 1187325 ns 1.01
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/2 thread(s) 2254208 ns 2334458.5 ns 0.97
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/4 thread(s) 1432542 ns 1326625 ns 1.08
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/8 thread(s) 1674750 ns 1671667 ns 1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/1 thread(s) 4207458 ns 4228083 ns 1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/GPU/CUDA 209578 ns 208877 ns 1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/2 thread(s) 19418625 ns 19371042 ns 1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/4 thread(s) 16120875 ns 16106687.5 ns 1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/8 thread(s) 17399667 ns 17334333 ns 1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/1 thread(s) 25810458.5 ns 25864812.5 ns 1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/GPU/CUDA 1594538 ns 1587675 ns 1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/2 thread(s) 33877854.5 ns 33974334 ns 1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/4 thread(s) 30947292 ns 30652312 ns 1.01
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/8 thread(s) 31018375 ns 30965958 ns 1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/1 thread(s) 36694917 ns 36591917 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/2 thread(s) 4534479 ns 4502750 ns 1.01
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/4 thread(s) 2770187 ns 2520667 ns 1.10
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/8 thread(s) 2920334 ns 2914750 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/1 thread(s) 8399916.5 ns 8397770.5 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/GPU/CUDA 425224 ns 422071 ns 1.01
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/2 thread(s) 38714000 ns 38880875 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/4 thread(s) 32095750 ns 32118083 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/8 thread(s) 32358750 ns 32210354 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/1 thread(s) 51813250 ns 51886833.5 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/GPU/CUDA 2632544 ns 2617174 ns 1.01
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/2 thread(s) 88904417 ns 88740458.5 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/4 thread(s) 113781750 ns 114655499.5 ns 0.99
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/8 thread(s) 222885375 ns 222624292 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/1 thread(s) 74248875 ns 74153520.5 ns 1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/2 thread(s) 269072833 ns 267012709 ns 1.01
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/4 thread(s) 159341750 ns 156293291 ns 1.02
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/8 thread(s) 126660708 ns 126425563 ns 1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/1 thread(s) 484935416 ns 484968208 ns 1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/GPU/CUDA 7018614 ns 7022844 ns 1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/2 thread(s) 1469565458.5 ns 1472853458 ns 1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/4 thread(s) 1174323562 ns 1171430875 ns 1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/8 thread(s) 1065769708 ns 1066813500 ns 1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/1 thread(s) 2006501479 ns 2007065229.5 ns 1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/GPU/CUDA 34548410 ns 34464520 ns 1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/2 thread(s) 1691647542 ns 1687201334 ns 1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/4 thread(s) 1555581333.5 ns 1531380729 ns 1.02
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/8 thread(s) 1715862792 ns 1779981833 ns 0.96
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/1 thread(s) 2207662959 ns 2205561250 ns 1.00
lenet(28, 28, 1, 128)/forward/CPU/2 thread(s) 2042375 ns 2055417 ns 0.99
lenet(28, 28, 1, 128)/forward/CPU/4 thread(s) 2954917 ns 3039333 ns 0.97
lenet(28, 28, 1, 128)/forward/CPU/8 thread(s) 8015209 ns 6418334 ns 1.25
lenet(28, 28, 1, 128)/forward/CPU/1 thread(s) 2319979 ns 2491084 ns 0.93
lenet(28, 28, 1, 128)/forward/GPU/CUDA 278639.5 ns 270182.5 ns 1.03
lenet(28, 28, 1, 128)/zygote/CPU/2 thread(s) 9745000 ns 9710917 ns 1.00
lenet(28, 28, 1, 128)/zygote/CPU/4 thread(s) 12103020.5 ns 12102375 ns 1.00
lenet(28, 28, 1, 128)/zygote/CPU/8 thread(s) 23834834 ns 24324021 ns 0.98
lenet(28, 28, 1, 128)/zygote/CPU/1 thread(s) 11772250 ns 11813792 ns 1.00
lenet(28, 28, 1, 128)/zygote/GPU/CUDA 1277190 ns 1260525.5 ns 1.01
vgg16(32, 32, 3, 32)/forward/CPU/2 thread(s) 380424041 ns 379862541 ns 1.00
vgg16(32, 32, 3, 32)/forward/CPU/4 thread(s) 283990583 ns 310947959 ns 0.91
vgg16(32, 32, 3, 32)/forward/CPU/8 thread(s) 240880125 ns 239644500 ns 1.01
vgg16(32, 32, 3, 32)/forward/CPU/1 thread(s) 456341687.5 ns 453270542 ns 1.01
vgg16(32, 32, 3, 32)/forward/GPU/CUDA 4854782 ns 4854774.5 ns 1.00
vgg16(32, 32, 3, 32)/zygote/CPU/2 thread(s) 1456049250 ns 1326926500 ns 1.10
vgg16(32, 32, 3, 32)/zygote/CPU/4 thread(s) 989980833 ns 962218875 ns 1.03
vgg16(32, 32, 3, 32)/zygote/CPU/8 thread(s) 904220334 ns 954450208 ns 0.95
vgg16(32, 32, 3, 32)/zygote/CPU/1 thread(s) 1524555125 ns 1593232541 ns 0.96
vgg16(32, 32, 3, 32)/zygote/GPU/CUDA 17744221 ns 19082921 ns 0.93
lenet(28, 28, 1, 64)/forward/CPU/2 thread(s) 1508542 ns 1392292 ns 1.08
lenet(28, 28, 1, 64)/forward/CPU/4 thread(s) 2075209 ns 1700416 ns 1.22
lenet(28, 28, 1, 64)/forward/CPU/8 thread(s) 7759292 ns 5764584 ns 1.35
lenet(28, 28, 1, 64)/forward/CPU/1 thread(s) 1386458 ns 1353979 ns 1.02
lenet(28, 28, 1, 64)/forward/GPU/CUDA 284088 ns 270953.5 ns 1.05
lenet(28, 28, 1, 64)/zygote/CPU/2 thread(s) 6872666.5 ns 6765209 ns 1.02
lenet(28, 28, 1, 64)/zygote/CPU/4 thread(s) 12476229 ns 13257604.5 ns 0.94
lenet(28, 28, 1, 64)/zygote/CPU/8 thread(s) 19674500 ns 19997334 ns 0.98
lenet(28, 28, 1, 64)/zygote/CPU/1 thread(s) 6112292 ns 6085271 ns 1.00
lenet(28, 28, 1, 64)/zygote/GPU/CUDA 1328629 ns 1315018 ns 1.01
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/2 thread(s) 70588458 ns 70450771.5 ns 1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/4 thread(s) 43527917 ns 43794458 ns 0.99
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/8 thread(s) 39548208 ns 39565125 ns 1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/1 thread(s) 132831042 ns 132519812 ns 1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/GPU/CUDA 1868833 ns 1877581 ns 1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/2 thread(s) 381802812.5 ns 383421896 ns 1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/4 thread(s) 295420250 ns 297391833.5 ns 0.99
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/8 thread(s) 285141625 ns 282075208 ns 1.01
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/1 thread(s) 534057167 ns 534360167 ns 1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/GPU/CUDA 12293921 ns 12294712.5 ns 1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/2 thread(s) 413538875 ns 407452917 ns 1.01
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/4 thread(s) 392708542 ns 368882167 ns 1.06
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/8 thread(s) 667775416 ns 664901875 ns 1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/1 thread(s) 712637708 ns 711106916 ns 1.00
vgg16(32, 32, 3, 128)/forward/CPU/2 thread(s) 1189256750 ns 1188807000 ns 1.00
vgg16(32, 32, 3, 128)/forward/CPU/4 thread(s) 687269895.5 ns 829881458 ns 0.83
vgg16(32, 32, 3, 128)/forward/CPU/8 thread(s) 637915292 ns 629069625 ns 1.01
vgg16(32, 32, 3, 128)/forward/CPU/1 thread(s) 1878390292 ns 1864484709 ns 1.01
vgg16(32, 32, 3, 128)/forward/GPU/CUDA 12530160.5 ns 12531429 ns 1.00
vgg16(32, 32, 3, 128)/zygote/CPU/2 thread(s) 3548301438 ns 3583219562.5 ns 0.99
vgg16(32, 32, 3, 128)/zygote/CPU/4 thread(s) 2747173416 ns 2743701542 ns 1.00
vgg16(32, 32, 3, 128)/zygote/CPU/8 thread(s) 2829751417 ns 2801027834 ns 1.01
vgg16(32, 32, 3, 128)/zygote/CPU/1 thread(s) 4995040500 ns 5095250291 ns 0.98
vgg16(32, 32, 3, 128)/zygote/GPU/CUDA 49469809 ns 49598783 ns 1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/2 thread(s) 3416896 ns 3396375 ns 1.01
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/4 thread(s) 2068542 ns 2056770.5 ns 1.01
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/8 thread(s) 2543625 ns 2516292 ns 1.01
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/1 thread(s) 6022208 ns 6032625 ns 1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/GPU/CUDA 289314 ns 288270 ns 1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/2 thread(s) 25418875 ns 25431417 ns 1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/4 thread(s) 18550687.5 ns 18519687.5 ns 1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/8 thread(s) 18952542 ns 18816834 ns 1.01
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/1 thread(s) 39001458 ns 38902042 ns 1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/GPU/CUDA 2470580.5 ns 2461496 ns 1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/2 thread(s) 54526646 ns 53959750 ns 1.01
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/4 thread(s) 78866000 ns 80411625 ns 0.98
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/8 thread(s) 171786292 ns 170419166.5 ns 1.01
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/1 thread(s) 45676500 ns 45563604 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/2 thread(s) 1742916 ns 1774500 ns 0.98
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/4 thread(s) 1095625 ns 1086541.5 ns 1.01
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/8 thread(s) 1589625 ns 1585833 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/1 thread(s) 3036062.5 ns 3036958.5 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/GPU/CUDA 212504.5 ns 210199.5 ns 1.01
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/2 thread(s) 12549437.5 ns 12515229.5 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/4 thread(s) 9241625 ns 9203541 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/8 thread(s) 9648666 ns 9648916 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/1 thread(s) 18955146 ns 18975125 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/GPU/CUDA 1539516.5 ns 1537145.5 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/2 thread(s) 17677812.5 ns 17611666 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/4 thread(s) 14341395.5 ns 14341209 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/8 thread(s) 14612542 ns 14587625.5 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/1 thread(s) 22175250 ns 22164499.5 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/2 thread(s) 70504729 ns 70184479 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/4 thread(s) 43425979 ns 43685354 ns 0.99
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/8 thread(s) 39549458 ns 39447354 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/1 thread(s) 132689042 ns 132435229.5 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/GPU/CUDA 1905038.5 ns 1876651 ns 1.02
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/2 thread(s) 358037958 ns 363077416 ns 0.99
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/4 thread(s) 290593291.5 ns 286830229.5 ns 1.01
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/8 thread(s) 289418292 ns 287419458 ns 1.01
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/1 thread(s) 625098708.5 ns 619680250 ns 1.01
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/GPU/CUDA 13385298 ns 13399859 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/2 thread(s) 419809562.5 ns 417202479 ns 1.01
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/4 thread(s) 420935208 ns 427236167 ns 0.99
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/8 thread(s) 705973458.5 ns 702264812 ns 1.01
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/1 thread(s) 718107000 ns 716642625 ns 1.00
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/2 thread(s) 1492375 ns 1597458 ns 0.93
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/4 thread(s) 1216291 ns 1041875 ns 1.17
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/8 thread(s) 1240042 ns 1238521 ns 1.00
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/1 thread(s) 2343521 ns 2311000 ns 1.01
mlp7layer_bn(gelu)(32 x 256)/forward/GPU/CUDA 589007.5 ns 591624 ns 1.00
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/2 thread(s) 8784750 ns 8828125 ns 1.00
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/4 thread(s) 12953958 ns 13456667 ns 0.96
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/8 thread(s) 30669125 ns 30478124.5 ns 1.01
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/1 thread(s) 9811041 ns 9827250 ns 1.00
mlp7layer_bn(gelu)(32 x 256)/zygote/GPU/CUDA 1439550 ns 1454764 ns 0.99
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/2 thread(s) 17983625 ns 17855792 ns 1.01
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/4 thread(s) 17039667 ns 17325084 ns 0.98
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/8 thread(s) 28987562.5 ns 28978667 ns 1.00
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/1 thread(s) 14313000 ns 14301167 ns 1.00
Dense(512 => 512, relu)(512 x 128)/forward/CPU/2 thread(s) 830958 ns 785166.5 ns 1.06
Dense(512 => 512, relu)(512 x 128)/forward/CPU/4 thread(s) 614354 ns 635083 ns 0.97
Dense(512 => 512, relu)(512 x 128)/forward/CPU/8 thread(s) 1037417 ns 1023416 ns 1.01
Dense(512 => 512, relu)(512 x 128)/forward/CPU/1 thread(s) 724750 ns 724437.5 ns 1.00
Dense(512 => 512, relu)(512 x 128)/forward/GPU/CUDA 47424 ns 48101 ns 0.99
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/2 thread(s) 1559770.5 ns 1546042 ns 1.01
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/4 thread(s) 1030250 ns 1039334 ns 0.99
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/8 thread(s) 1374124.5 ns 1418437.5 ns 0.97
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/1 thread(s) 2247584 ns 2186167 ns 1.03
Dense(512 => 512, relu)(512 x 128)/zygote/GPU/CUDA 233675.5 ns 237446 ns 0.98
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/2 thread(s) 1763437.5 ns 1701854 ns 1.04
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/4 thread(s) 1243999.5 ns 1239604.5 ns 1.00
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/8 thread(s) 2409333.5 ns 1785437.5 ns 1.35
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/1 thread(s) 2300000 ns 2312500 ns 0.99
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/2 thread(s) 3364292 ns 3387542 ns 0.99
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/4 thread(s) 2059396 ns 2038042 ns 1.01
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/8 thread(s) 2509500 ns 2513500 ns 1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/1 thread(s) 6005313 ns 6020041 ns 1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/GPU/CUDA 284303.5 ns 285597 ns 1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/2 thread(s) 24092812.5 ns 24084791 ns 1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/4 thread(s) 17288208 ns 17173834 ns 1.01
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/8 thread(s) 17100625 ns 17124375 ns 1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/1 thread(s) 37463583 ns 37508333 ns 1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/GPU/CUDA 2406179 ns 2411179 ns 1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/2 thread(s) 52881166 ns 52430334 ns 1.01
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/4 thread(s) 78782375 ns 80022625 ns 0.98
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/8 thread(s) 169706375 ns 168792792 ns 1.01
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/1 thread(s) 44653500 ns 44511146 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/2 thread(s) 250671041.5 ns 248838833 ns 1.01
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/4 thread(s) 147833583 ns 148459125 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/8 thread(s) 115740916.5 ns 115539229 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/1 thread(s) 447340625 ns 447599646 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/GPU/CUDA 5444407.5 ns 5455438 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/2 thread(s) 1128204000 ns 1123889625 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/4 thread(s) 881423937 ns 882004229 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/8 thread(s) 810219792 ns 805342291 ns 1.01
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/1 thread(s) 1745335417 ns 1746632750 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/GPU/CUDA 29318810.5 ns 29283460 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/2 thread(s) 1070060208 ns 1005172333.5 ns 1.06
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/4 thread(s) 976817459 ns 985652583 ns 0.99
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/8 thread(s) 1251612583 ns 1246518292 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/1 thread(s) 1744948041.5 ns 1720077833.5 ns 1.01
mlp7layer_bn(relu)(32 x 256)/forward/CPU/2 thread(s) 1242417 ns 1224042 ns 1.02
mlp7layer_bn(relu)(32 x 256)/forward/CPU/4 thread(s) 949750 ns 780375 ns 1.22
mlp7layer_bn(relu)(32 x 256)/forward/CPU/8 thread(s) 920542 ns 903792 ns 1.02
mlp7layer_bn(relu)(32 x 256)/forward/CPU/1 thread(s) 1941542 ns 1941500 ns 1.00
mlp7layer_bn(relu)(32 x 256)/forward/GPU/CUDA 571684.5 ns 574544.5 ns 1.00
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/2 thread(s) 5838417 ns 5625979 ns 1.04
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/4 thread(s) 6339271 ns 8687417 ns 0.73
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/8 thread(s) 23581062 ns 23834542 ns 0.99
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/1 thread(s) 7057292 ns 7099270.5 ns 0.99
mlp7layer_bn(relu)(32 x 256)/zygote/GPU/CUDA 1366685 ns 1400097 ns 0.98
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/2 thread(s) 11446145.5 ns 10299146 ns 1.11
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/4 thread(s) 9962895.5 ns 10509708.5 ns 0.95
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/8 thread(s) 16217021 ns 16674333 ns 0.97
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/1 thread(s) 8676875 ns 8726562.5 ns 0.99
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/2 thread(s) 522749.5 ns 386792 ns 1.35
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/4 thread(s) 478458 ns 494500 ns 0.97
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/8 thread(s) 1640208.5 ns 2152000 ns 0.76
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/1 thread(s) 88333.5 ns 88104.5 ns 1.00
Dense(128 => 128, gelu)(128 x 128)/forward/GPU/CUDA 27599 ns 27980 ns 0.99
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/2 thread(s) 388750 ns 339145.5 ns 1.15
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/4 thread(s) 429917 ns 436062.5 ns 0.99
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/8 thread(s) 4518875 ns 4118937.5 ns 1.10
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/1 thread(s) 261084 ns 261500 ns 1.00
Dense(128 => 128, gelu)(128 x 128)/zygote/GPU/CUDA 219602.5 ns 223966.5 ns 0.98
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/2 thread(s) 715125 ns 643312.5 ns 1.11
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/4 thread(s) 701020.5 ns 709125 ns 0.99
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/8 thread(s) 710541.5 ns 884729 ns 0.80
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/1 thread(s) 444417 ns 445958 ns 1.00
Dense(128 => 128, relu)(128 x 128)/forward/CPU/2 thread(s) 466562.5 ns 331291 ns 1.41
Dense(128 => 128, relu)(128 x 128)/forward/CPU/4 thread(s) 417542 ns 438416 ns 0.95
Dense(128 => 128, relu)(128 x 128)/forward/CPU/8 thread(s) 665709 ns 601249.5 ns 1.11
Dense(128 => 128, relu)(128 x 128)/forward/CPU/1 thread(s) 54833 ns 53958 ns 1.02
Dense(128 => 128, relu)(128 x 128)/forward/GPU/CUDA 27780 ns 28317 ns 0.98
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/2 thread(s) 347791 ns 277271 ns 1.25
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/4 thread(s) 320188 ns 319417 ns 1.00
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/8 thread(s) 787520.5 ns 679875.5 ns 1.16
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/1 thread(s) 153334 ns 153292 ns 1.00
Dense(128 => 128, relu)(128 x 128)/zygote/GPU/CUDA 204855 ns 208854.5 ns 0.98
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/2 thread(s) 416125 ns 344375 ns 1.21
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/4 thread(s) 384000 ns 389750 ns 0.99
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/8 thread(s) 873187.5 ns 870042 ns 1.00
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/1 thread(s) 174834 ns 174063 ns 1.00
vgg16(32, 32, 3, 64)/forward/CPU/2 thread(s) 601112041 ns 602013959 ns 1.00
vgg16(32, 32, 3, 64)/forward/CPU/4 thread(s) 422330834 ns 430551375 ns 0.98
vgg16(32, 32, 3, 64)/forward/CPU/8 thread(s) 380600646 ns 375744687.5 ns 1.01
vgg16(32, 32, 3, 64)/forward/CPU/1 thread(s) 875866000 ns 873334145.5 ns 1.00
vgg16(32, 32, 3, 64)/forward/GPU/CUDA 7025885.5 ns 7025763 ns 1.00
vgg16(32, 32, 3, 64)/zygote/CPU/2 thread(s) 2054908312.5 ns 2078756625 ns 0.99
vgg16(32, 32, 3, 64)/zygote/CPU/4 thread(s) 1616674312.5 ns 1607808875 ns 1.01
vgg16(32, 32, 3, 64)/zygote/CPU/8 thread(s) 1596823937.5 ns 1638666770.5 ns 0.97
vgg16(32, 32, 3, 64)/zygote/CPU/1 thread(s) 2751744541 ns 2782335083 ns 0.99
vgg16(32, 32, 3, 64)/zygote/GPU/CUDA 25842507.5 ns 25908154 ns 1.00
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/2 thread(s) 524562.5 ns 518875 ns 1.01
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/4 thread(s) 429042 ns 395396 ns 1.09
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/8 thread(s) 1592709 ns 1924520.5 ns 0.83
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/1 thread(s) 864916.5 ns 865667 ns 1.00
Dense(512 => 512, gelu)(512 x 128)/forward/GPU/CUDA 47515 ns 46907 ns 1.01
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/2 thread(s) 1864437.5 ns 1851083 ns 1.01
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/4 thread(s) 2338167 ns 1779896 ns 1.31
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/8 thread(s) 14751917 ns 14384125 ns 1.03
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/1 thread(s) 2765542 ns 2660187.5 ns 1.04
Dense(512 => 512, gelu)(512 x 128)/zygote/GPU/CUDA 246164.5 ns 247071 ns 1.00
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/2 thread(s) 3189666.5 ns 2699458.5 ns 1.18
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/4 thread(s) 2261709 ns 2245375 ns 1.01
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/8 thread(s) 4310687.5 ns 3691833 ns 1.17
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/1 thread(s) 3371271 ns 3398958 ns 0.99
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/2 thread(s) 1467187.5 ns 1486833.5 ns 0.99
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/4 thread(s) 1194250 ns 933395.5 ns 1.28
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/8 thread(s) 1230250 ns 1185562.5 ns 1.04
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/1 thread(s) 2300292 ns 2210083 ns 1.04
mlp7layer_bn(tanh)(32 x 256)/forward/GPU/CUDA 544421 ns 550416 ns 0.99
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/2 thread(s) 5777145.5 ns 5783000 ns 1.00
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/4 thread(s) 6879750 ns 7999604 ns 0.86
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/8 thread(s) 24684625 ns 23905084 ns 1.03
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/1 thread(s) 7281875 ns 7315812.5 ns 1.00
mlp7layer_bn(tanh)(32 x 256)/zygote/GPU/CUDA 1362898 ns 1359665.5 ns 1.00
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/2 thread(s) 13183458.5 ns 12501603.5 ns 1.05
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/4 thread(s) 12027104.5 ns 12176000 ns 0.99
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/8 thread(s) 21058792 ns 20858687.5 ns 1.01
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/1 thread(s) 10417209 ns 10743417 ns 0.97
Dense(16 => 16, relu)(16 x 128)/forward/CPU/2 thread(s) 2792 ns 3916.5 ns 0.71
Dense(16 => 16, relu)(16 x 128)/forward/CPU/4 thread(s) 2458.5 ns 2875 ns 0.86
Dense(16 => 16, relu)(16 x 128)/forward/CPU/8 thread(s) 3250 ns 5104.5 ns 0.64
Dense(16 => 16, relu)(16 x 128)/forward/CPU/1 thread(s) 3479 ns 2645.5 ns 1.32
Dense(16 => 16, relu)(16 x 128)/forward/GPU/CUDA 24741 ns 22876 ns 1.08
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/2 thread(s) 8208 ns 8541 ns 0.96
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/4 thread(s) 8583 ns 8500 ns 1.01
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/8 thread(s) 8834 ns 8583.5 ns 1.03
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/1 thread(s) 8916 ns 8625 ns 1.03
Dense(16 => 16, relu)(16 x 128)/zygote/GPU/CUDA 212597.5 ns 209808.5 ns 1.01
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/2 thread(s) 16604.5 ns 16667 ns 1.00
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/4 thread(s) 16459 ns 16875 ns 0.98
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/8 thread(s) 16791.5 ns 16708 ns 1.00
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/1 thread(s) 10750 ns 10792 ns 1.00
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/2 thread(s) 10458 ns 10166.5 ns 1.03
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/4 thread(s) 14583 ns 15625 ns 0.93
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/8 thread(s) 10625 ns 11375 ns 0.93
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/1 thread(s) 8000 ns 7625 ns 1.05
Dense(16 => 16, gelu)(16 x 128)/forward/GPU/CUDA 24785 ns 24722 ns 1.00
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/2 thread(s) 22334 ns 22270.5 ns 1.00
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/4 thread(s) 22250 ns 22333 ns 1.00
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/8 thread(s) 22625 ns 22667 ns 1.00
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/1 thread(s) 22666.5 ns 22500 ns 1.01
Dense(16 => 16, gelu)(16 x 128)/zygote/GPU/CUDA 233932.5 ns 230109 ns 1.02
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/2 thread(s) 52541 ns 52292 ns 1.00
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/4 thread(s) 52208 ns 52458 ns 1.00
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/8 thread(s) 52687.5 ns 52250 ns 1.01
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/1 thread(s) 44020.5 ns 43916 ns 1.00
Dense(128 => 128, identity)(128 x 128)/forward/CPU/2 thread(s) 29084 ns 28708 ns 1.01
Dense(128 => 128, identity)(128 x 128)/forward/CPU/4 thread(s) 28687.5 ns 29083 ns 0.99
Dense(128 => 128, identity)(128 x 128)/forward/CPU/8 thread(s) 29541 ns 29708 ns 0.99
Dense(128 => 128, identity)(128 x 128)/forward/CPU/1 thread(s) 46125 ns 46250 ns 1.00
Dense(128 => 128, identity)(128 x 128)/forward/GPU/CUDA 25993 ns 25756.5 ns 1.01
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/2 thread(s) 226000 ns 207687.5 ns 1.09
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/4 thread(s) 260354 ns 259000 ns 1.01
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/8 thread(s) 4296459 ns 4070042 ns 1.06
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/1 thread(s) 148208 ns 147583 ns 1.00
Dense(128 => 128, identity)(128 x 128)/zygote/GPU/CUDA 222796 ns 223667.5 ns 1.00
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/2 thread(s) 327500 ns 309125 ns 1.06
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/4 thread(s) 292812.5 ns 289833 ns 1.01
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/8 thread(s) 816083 ns 766104 ns 1.07
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/1 thread(s) 162250 ns 161834 ns 1.00
Dense(16 => 16, identity)(16 x 128)/forward/CPU/2 thread(s) 2042 ns 2000 ns 1.02
Dense(16 => 16, identity)(16 x 128)/forward/CPU/4 thread(s) 2000 ns 2292 ns 0.87
Dense(16 => 16, identity)(16 x 128)/forward/CPU/8 thread(s) 2584 ns 4416 ns 0.59
Dense(16 => 16, identity)(16 x 128)/forward/CPU/1 thread(s) 2167 ns 2083 ns 1.04
Dense(16 => 16, identity)(16 x 128)/forward/GPU/CUDA 23103 ns 22925 ns 1.01
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/2 thread(s) 7125 ns 7604.5 ns 0.94
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/4 thread(s) 7250 ns 7208 ns 1.01
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/8 thread(s) 7750 ns 7542 ns 1.03
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/1 thread(s) 7667 ns 7333 ns 1.05
Dense(16 => 16, identity)(16 x 128)/zygote/GPU/CUDA 292532.5 ns 270317 ns 1.08
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/2 thread(s) 11375 ns 11541 ns 0.99
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/4 thread(s) 11375 ns 11542 ns 0.99
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/8 thread(s) 11666 ns 11542 ns 1.01
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/1 thread(s) 7084 ns 7125 ns 0.99
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/2 thread(s) 79948625 ns 79878271 ns 1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/4 thread(s) 49022000 ns 47895875 ns 1.02
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/8 thread(s) 44937249.5 ns 44952396.5 ns 1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/1 thread(s) 151500083 ns 151396791 ns 1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/GPU/CUDA 2720047 ns 2712095.5 ns 1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/2 thread(s) 604334625 ns 498390209 ns 1.21
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/4 thread(s) 411604000 ns 410182750 ns 1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/8 thread(s) 399438334 ns 398143833 ns 1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/1 thread(s) 693648750 ns 683908500 ns 1.01
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/GPU/CUDA 14586033 ns 14599220 ns 1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/2 thread(s) 712920291.5 ns 686490667 ns 1.04
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/4 thread(s) 669190583 ns 660533166 ns 1.01
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/8 thread(s) 983677708 ns 1012950542 ns 0.97
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/1 thread(s) 998398000 ns 997113125 ns 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

codecov bot commented Aug 26, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 91.53%. Comparing base (e6dea49) to head (c5f0256).
Report is 6 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #856      +/-   ##
==========================================
- Coverage   93.77%   91.53%   -2.24%     
==========================================
  Files          59       59              
  Lines        2954     2964      +10     
==========================================
- Hits         2770     2713      -57     
- Misses        184      251      +67     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@avik-pal avik-pal marked this pull request as ready for review August 27, 2024 01:05
@avik-pal avik-pal merged commit b3d21e8 into main Aug 27, 2024
72 of 83 checks passed
@avik-pal avik-pal deleted the ap/new_deprecations branch August 27, 2024 19:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant