-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore: marking layers as deprecated #856
Conversation
ecc2f37
to
5b3c184
Compare
Benchmark Results (ASV)
Benchmark PlotsA plot of the benchmark results have been uploaded as an artifact to the workflow run for this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lux Benchmarks
Benchmark suite | Current: c5f0256 | Previous: e6dea49 | Ratio |
---|---|---|---|
Dense(512 => 512, identity)(512 x 128)/forward/CPU/2 thread(s) |
413958 ns |
411750 ns |
1.01 |
Dense(512 => 512, identity)(512 x 128)/forward/CPU/4 thread(s) |
323416.5 ns |
243959 ns |
1.33 |
Dense(512 => 512, identity)(512 x 128)/forward/CPU/8 thread(s) |
322291.5 ns |
323604 ns |
1.00 |
Dense(512 => 512, identity)(512 x 128)/forward/CPU/1 thread(s) |
740750 ns |
741209 ns |
1.00 |
Dense(512 => 512, identity)(512 x 128)/forward/GPU/CUDA |
43875 ns |
44008 ns |
1.00 |
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/2 thread(s) |
1342958 ns |
1392458 ns |
0.96 |
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/4 thread(s) |
2458917 ns |
1249333 ns |
1.97 |
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/8 thread(s) |
13993208 ns |
14034875 ns |
1.00 |
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/1 thread(s) |
2209625 ns |
2247000 ns |
0.98 |
Dense(512 => 512, identity)(512 x 128)/zygote/GPU/CUDA |
206353.5 ns |
206485 ns |
1.00 |
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/2 thread(s) |
1417667 ns |
1411375 ns |
1.00 |
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/4 thread(s) |
897333 ns |
949209 ns |
0.95 |
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/8 thread(s) |
1693834 ns |
1539667 ns |
1.10 |
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/1 thread(s) |
2244375 ns |
2262146 ns |
0.99 |
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/2 thread(s) |
1712709 ns |
1751333.5 ns |
0.98 |
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/4 thread(s) |
1090208 ns |
1096875 ns |
0.99 |
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/8 thread(s) |
1547604.5 ns |
1541583 ns |
1.00 |
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/1 thread(s) |
2820208.5 ns |
3026749.5 ns |
0.93 |
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/GPU/CUDA |
209139 ns |
209127 ns |
1.00 |
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/2 thread(s) |
12169250.5 ns |
12111771 ns |
1.00 |
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/4 thread(s) |
8816354.5 ns |
8833083 ns |
1.00 |
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/8 thread(s) |
9245937.5 ns |
9198584 ns |
1.01 |
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/1 thread(s) |
18567708 ns |
18601167 ns |
1.00 |
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/GPU/CUDA |
1486840 ns |
1480357.5 ns |
1.00 |
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/2 thread(s) |
17317312.5 ns |
17231270.5 ns |
1.00 |
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/4 thread(s) |
13996500 ns |
13987541.5 ns |
1.00 |
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/8 thread(s) |
14568291 ns |
14519729 ns |
1.00 |
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/1 thread(s) |
21838000 ns |
21836292 ns |
1.00 |
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/2 thread(s) |
251187416.5 ns |
250395646 ns |
1.00 |
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/4 thread(s) |
148580500 ns |
148855375 ns |
1.00 |
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/8 thread(s) |
116013292 ns |
115834062 ns |
1.00 |
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/1 thread(s) |
448755542 ns |
446839208 ns |
1.00 |
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/GPU/CUDA |
5477446 ns |
5444163 ns |
1.01 |
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/2 thread(s) |
1140829041 ns |
1176608458 ns |
0.97 |
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/4 thread(s) |
978530333 ns |
976012000 ns |
1.00 |
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/8 thread(s) |
848602646 ns |
837397979.5 ns |
1.01 |
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/1 thread(s) |
1776494083 ns |
1759902458 ns |
1.01 |
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/GPU/CUDA |
31494023 ns |
31490812.5 ns |
1.00 |
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/2 thread(s) |
1044068833 ns |
1129305209 ns |
0.92 |
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/4 thread(s) |
991054895.5 ns |
991324229.5 ns |
1.00 |
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/8 thread(s) |
1313072395.5 ns |
1295080375.5 ns |
1.01 |
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/1 thread(s) |
1738222791.5 ns |
1730828646 ns |
1.00 |
lenet(28, 28, 1, 32)/forward/CPU/2 thread(s) |
1127458.5 ns |
1075249.5 ns |
1.05 |
lenet(28, 28, 1, 32)/forward/CPU/4 thread(s) |
1634354.5 ns |
1662353.5 ns |
0.98 |
lenet(28, 28, 1, 32)/forward/CPU/8 thread(s) |
3595625 ns |
3521959 ns |
1.02 |
lenet(28, 28, 1, 32)/forward/CPU/1 thread(s) |
782958.5 ns |
782750 ns |
1.00 |
lenet(28, 28, 1, 32)/forward/GPU/CUDA |
272294 ns |
268581 ns |
1.01 |
lenet(28, 28, 1, 32)/zygote/CPU/2 thread(s) |
3037041.5 ns |
3020312.5 ns |
1.01 |
lenet(28, 28, 1, 32)/zygote/CPU/4 thread(s) |
4153667 ns |
4174708 ns |
0.99 |
lenet(28, 28, 1, 32)/zygote/CPU/8 thread(s) |
9203584 ns |
11483792 ns |
0.80 |
lenet(28, 28, 1, 32)/zygote/CPU/1 thread(s) |
3247042 ns |
3174584 ns |
1.02 |
lenet(28, 28, 1, 32)/zygote/GPU/CUDA |
1198821 ns |
1187325 ns |
1.01 |
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/2 thread(s) |
2254208 ns |
2334458.5 ns |
0.97 |
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/4 thread(s) |
1432542 ns |
1326625 ns |
1.08 |
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/8 thread(s) |
1674750 ns |
1671667 ns |
1.00 |
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/1 thread(s) |
4207458 ns |
4228083 ns |
1.00 |
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/GPU/CUDA |
209578 ns |
208877 ns |
1.00 |
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/2 thread(s) |
19418625 ns |
19371042 ns |
1.00 |
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/4 thread(s) |
16120875 ns |
16106687.5 ns |
1.00 |
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/8 thread(s) |
17399667 ns |
17334333 ns |
1.00 |
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/1 thread(s) |
25810458.5 ns |
25864812.5 ns |
1.00 |
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/GPU/CUDA |
1594538 ns |
1587675 ns |
1.00 |
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/2 thread(s) |
33877854.5 ns |
33974334 ns |
1.00 |
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/4 thread(s) |
30947292 ns |
30652312 ns |
1.01 |
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/8 thread(s) |
31018375 ns |
30965958 ns |
1.00 |
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/1 thread(s) |
36694917 ns |
36591917 ns |
1.00 |
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/2 thread(s) |
4534479 ns |
4502750 ns |
1.01 |
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/4 thread(s) |
2770187 ns |
2520667 ns |
1.10 |
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/8 thread(s) |
2920334 ns |
2914750 ns |
1.00 |
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/1 thread(s) |
8399916.5 ns |
8397770.5 ns |
1.00 |
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/GPU/CUDA |
425224 ns |
422071 ns |
1.01 |
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/2 thread(s) |
38714000 ns |
38880875 ns |
1.00 |
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/4 thread(s) |
32095750 ns |
32118083 ns |
1.00 |
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/8 thread(s) |
32358750 ns |
32210354 ns |
1.00 |
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/1 thread(s) |
51813250 ns |
51886833.5 ns |
1.00 |
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/GPU/CUDA |
2632544 ns |
2617174 ns |
1.01 |
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/2 thread(s) |
88904417 ns |
88740458.5 ns |
1.00 |
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/4 thread(s) |
113781750 ns |
114655499.5 ns |
0.99 |
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/8 thread(s) |
222885375 ns |
222624292 ns |
1.00 |
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/1 thread(s) |
74248875 ns |
74153520.5 ns |
1.00 |
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/2 thread(s) |
269072833 ns |
267012709 ns |
1.01 |
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/4 thread(s) |
159341750 ns |
156293291 ns |
1.02 |
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/8 thread(s) |
126660708 ns |
126425563 ns |
1.00 |
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/1 thread(s) |
484935416 ns |
484968208 ns |
1.00 |
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/GPU/CUDA |
7018614 ns |
7022844 ns |
1.00 |
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/2 thread(s) |
1469565458.5 ns |
1472853458 ns |
1.00 |
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/4 thread(s) |
1174323562 ns |
1171430875 ns |
1.00 |
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/8 thread(s) |
1065769708 ns |
1066813500 ns |
1.00 |
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/1 thread(s) |
2006501479 ns |
2007065229.5 ns |
1.00 |
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/GPU/CUDA |
34548410 ns |
34464520 ns |
1.00 |
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/2 thread(s) |
1691647542 ns |
1687201334 ns |
1.00 |
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/4 thread(s) |
1555581333.5 ns |
1531380729 ns |
1.02 |
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/8 thread(s) |
1715862792 ns |
1779981833 ns |
0.96 |
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/1 thread(s) |
2207662959 ns |
2205561250 ns |
1.00 |
lenet(28, 28, 1, 128)/forward/CPU/2 thread(s) |
2042375 ns |
2055417 ns |
0.99 |
lenet(28, 28, 1, 128)/forward/CPU/4 thread(s) |
2954917 ns |
3039333 ns |
0.97 |
lenet(28, 28, 1, 128)/forward/CPU/8 thread(s) |
8015209 ns |
6418334 ns |
1.25 |
lenet(28, 28, 1, 128)/forward/CPU/1 thread(s) |
2319979 ns |
2491084 ns |
0.93 |
lenet(28, 28, 1, 128)/forward/GPU/CUDA |
278639.5 ns |
270182.5 ns |
1.03 |
lenet(28, 28, 1, 128)/zygote/CPU/2 thread(s) |
9745000 ns |
9710917 ns |
1.00 |
lenet(28, 28, 1, 128)/zygote/CPU/4 thread(s) |
12103020.5 ns |
12102375 ns |
1.00 |
lenet(28, 28, 1, 128)/zygote/CPU/8 thread(s) |
23834834 ns |
24324021 ns |
0.98 |
lenet(28, 28, 1, 128)/zygote/CPU/1 thread(s) |
11772250 ns |
11813792 ns |
1.00 |
lenet(28, 28, 1, 128)/zygote/GPU/CUDA |
1277190 ns |
1260525.5 ns |
1.01 |
vgg16(32, 32, 3, 32)/forward/CPU/2 thread(s) |
380424041 ns |
379862541 ns |
1.00 |
vgg16(32, 32, 3, 32)/forward/CPU/4 thread(s) |
283990583 ns |
310947959 ns |
0.91 |
vgg16(32, 32, 3, 32)/forward/CPU/8 thread(s) |
240880125 ns |
239644500 ns |
1.01 |
vgg16(32, 32, 3, 32)/forward/CPU/1 thread(s) |
456341687.5 ns |
453270542 ns |
1.01 |
vgg16(32, 32, 3, 32)/forward/GPU/CUDA |
4854782 ns |
4854774.5 ns |
1.00 |
vgg16(32, 32, 3, 32)/zygote/CPU/2 thread(s) |
1456049250 ns |
1326926500 ns |
1.10 |
vgg16(32, 32, 3, 32)/zygote/CPU/4 thread(s) |
989980833 ns |
962218875 ns |
1.03 |
vgg16(32, 32, 3, 32)/zygote/CPU/8 thread(s) |
904220334 ns |
954450208 ns |
0.95 |
vgg16(32, 32, 3, 32)/zygote/CPU/1 thread(s) |
1524555125 ns |
1593232541 ns |
0.96 |
vgg16(32, 32, 3, 32)/zygote/GPU/CUDA |
17744221 ns |
19082921 ns |
0.93 |
lenet(28, 28, 1, 64)/forward/CPU/2 thread(s) |
1508542 ns |
1392292 ns |
1.08 |
lenet(28, 28, 1, 64)/forward/CPU/4 thread(s) |
2075209 ns |
1700416 ns |
1.22 |
lenet(28, 28, 1, 64)/forward/CPU/8 thread(s) |
7759292 ns |
5764584 ns |
1.35 |
lenet(28, 28, 1, 64)/forward/CPU/1 thread(s) |
1386458 ns |
1353979 ns |
1.02 |
lenet(28, 28, 1, 64)/forward/GPU/CUDA |
284088 ns |
270953.5 ns |
1.05 |
lenet(28, 28, 1, 64)/zygote/CPU/2 thread(s) |
6872666.5 ns |
6765209 ns |
1.02 |
lenet(28, 28, 1, 64)/zygote/CPU/4 thread(s) |
12476229 ns |
13257604.5 ns |
0.94 |
lenet(28, 28, 1, 64)/zygote/CPU/8 thread(s) |
19674500 ns |
19997334 ns |
0.98 |
lenet(28, 28, 1, 64)/zygote/CPU/1 thread(s) |
6112292 ns |
6085271 ns |
1.00 |
lenet(28, 28, 1, 64)/zygote/GPU/CUDA |
1328629 ns |
1315018 ns |
1.01 |
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/2 thread(s) |
70588458 ns |
70450771.5 ns |
1.00 |
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/4 thread(s) |
43527917 ns |
43794458 ns |
0.99 |
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/8 thread(s) |
39548208 ns |
39565125 ns |
1.00 |
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/1 thread(s) |
132831042 ns |
132519812 ns |
1.00 |
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/GPU/CUDA |
1868833 ns |
1877581 ns |
1.00 |
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/2 thread(s) |
381802812.5 ns |
383421896 ns |
1.00 |
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/4 thread(s) |
295420250 ns |
297391833.5 ns |
0.99 |
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/8 thread(s) |
285141625 ns |
282075208 ns |
1.01 |
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/1 thread(s) |
534057167 ns |
534360167 ns |
1.00 |
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/GPU/CUDA |
12293921 ns |
12294712.5 ns |
1.00 |
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/2 thread(s) |
413538875 ns |
407452917 ns |
1.01 |
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/4 thread(s) |
392708542 ns |
368882167 ns |
1.06 |
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/8 thread(s) |
667775416 ns |
664901875 ns |
1.00 |
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/1 thread(s) |
712637708 ns |
711106916 ns |
1.00 |
vgg16(32, 32, 3, 128)/forward/CPU/2 thread(s) |
1189256750 ns |
1188807000 ns |
1.00 |
vgg16(32, 32, 3, 128)/forward/CPU/4 thread(s) |
687269895.5 ns |
829881458 ns |
0.83 |
vgg16(32, 32, 3, 128)/forward/CPU/8 thread(s) |
637915292 ns |
629069625 ns |
1.01 |
vgg16(32, 32, 3, 128)/forward/CPU/1 thread(s) |
1878390292 ns |
1864484709 ns |
1.01 |
vgg16(32, 32, 3, 128)/forward/GPU/CUDA |
12530160.5 ns |
12531429 ns |
1.00 |
vgg16(32, 32, 3, 128)/zygote/CPU/2 thread(s) |
3548301438 ns |
3583219562.5 ns |
0.99 |
vgg16(32, 32, 3, 128)/zygote/CPU/4 thread(s) |
2747173416 ns |
2743701542 ns |
1.00 |
vgg16(32, 32, 3, 128)/zygote/CPU/8 thread(s) |
2829751417 ns |
2801027834 ns |
1.01 |
vgg16(32, 32, 3, 128)/zygote/CPU/1 thread(s) |
4995040500 ns |
5095250291 ns |
0.98 |
vgg16(32, 32, 3, 128)/zygote/GPU/CUDA |
49469809 ns |
49598783 ns |
1.00 |
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/2 thread(s) |
3416896 ns |
3396375 ns |
1.01 |
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/4 thread(s) |
2068542 ns |
2056770.5 ns |
1.01 |
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/8 thread(s) |
2543625 ns |
2516292 ns |
1.01 |
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/1 thread(s) |
6022208 ns |
6032625 ns |
1.00 |
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/GPU/CUDA |
289314 ns |
288270 ns |
1.00 |
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/2 thread(s) |
25418875 ns |
25431417 ns |
1.00 |
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/4 thread(s) |
18550687.5 ns |
18519687.5 ns |
1.00 |
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/8 thread(s) |
18952542 ns |
18816834 ns |
1.01 |
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/1 thread(s) |
39001458 ns |
38902042 ns |
1.00 |
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/GPU/CUDA |
2470580.5 ns |
2461496 ns |
1.00 |
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/2 thread(s) |
54526646 ns |
53959750 ns |
1.01 |
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/4 thread(s) |
78866000 ns |
80411625 ns |
0.98 |
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/8 thread(s) |
171786292 ns |
170419166.5 ns |
1.01 |
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/1 thread(s) |
45676500 ns |
45563604 ns |
1.00 |
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/2 thread(s) |
1742916 ns |
1774500 ns |
0.98 |
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/4 thread(s) |
1095625 ns |
1086541.5 ns |
1.01 |
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/8 thread(s) |
1589625 ns |
1585833 ns |
1.00 |
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/1 thread(s) |
3036062.5 ns |
3036958.5 ns |
1.00 |
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/GPU/CUDA |
212504.5 ns |
210199.5 ns |
1.01 |
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/2 thread(s) |
12549437.5 ns |
12515229.5 ns |
1.00 |
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/4 thread(s) |
9241625 ns |
9203541 ns |
1.00 |
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/8 thread(s) |
9648666 ns |
9648916 ns |
1.00 |
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/1 thread(s) |
18955146 ns |
18975125 ns |
1.00 |
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/GPU/CUDA |
1539516.5 ns |
1537145.5 ns |
1.00 |
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/2 thread(s) |
17677812.5 ns |
17611666 ns |
1.00 |
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/4 thread(s) |
14341395.5 ns |
14341209 ns |
1.00 |
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/8 thread(s) |
14612542 ns |
14587625.5 ns |
1.00 |
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/1 thread(s) |
22175250 ns |
22164499.5 ns |
1.00 |
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/2 thread(s) |
70504729 ns |
70184479 ns |
1.00 |
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/4 thread(s) |
43425979 ns |
43685354 ns |
0.99 |
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/8 thread(s) |
39549458 ns |
39447354 ns |
1.00 |
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/1 thread(s) |
132689042 ns |
132435229.5 ns |
1.00 |
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/GPU/CUDA |
1905038.5 ns |
1876651 ns |
1.02 |
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/2 thread(s) |
358037958 ns |
363077416 ns |
0.99 |
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/4 thread(s) |
290593291.5 ns |
286830229.5 ns |
1.01 |
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/8 thread(s) |
289418292 ns |
287419458 ns |
1.01 |
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/1 thread(s) |
625098708.5 ns |
619680250 ns |
1.01 |
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/GPU/CUDA |
13385298 ns |
13399859 ns |
1.00 |
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/2 thread(s) |
419809562.5 ns |
417202479 ns |
1.01 |
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/4 thread(s) |
420935208 ns |
427236167 ns |
0.99 |
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/8 thread(s) |
705973458.5 ns |
702264812 ns |
1.01 |
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/1 thread(s) |
718107000 ns |
716642625 ns |
1.00 |
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/2 thread(s) |
1492375 ns |
1597458 ns |
0.93 |
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/4 thread(s) |
1216291 ns |
1041875 ns |
1.17 |
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/8 thread(s) |
1240042 ns |
1238521 ns |
1.00 |
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/1 thread(s) |
2343521 ns |
2311000 ns |
1.01 |
mlp7layer_bn(gelu)(32 x 256)/forward/GPU/CUDA |
589007.5 ns |
591624 ns |
1.00 |
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/2 thread(s) |
8784750 ns |
8828125 ns |
1.00 |
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/4 thread(s) |
12953958 ns |
13456667 ns |
0.96 |
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/8 thread(s) |
30669125 ns |
30478124.5 ns |
1.01 |
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/1 thread(s) |
9811041 ns |
9827250 ns |
1.00 |
mlp7layer_bn(gelu)(32 x 256)/zygote/GPU/CUDA |
1439550 ns |
1454764 ns |
0.99 |
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/2 thread(s) |
17983625 ns |
17855792 ns |
1.01 |
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/4 thread(s) |
17039667 ns |
17325084 ns |
0.98 |
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/8 thread(s) |
28987562.5 ns |
28978667 ns |
1.00 |
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/1 thread(s) |
14313000 ns |
14301167 ns |
1.00 |
Dense(512 => 512, relu)(512 x 128)/forward/CPU/2 thread(s) |
830958 ns |
785166.5 ns |
1.06 |
Dense(512 => 512, relu)(512 x 128)/forward/CPU/4 thread(s) |
614354 ns |
635083 ns |
0.97 |
Dense(512 => 512, relu)(512 x 128)/forward/CPU/8 thread(s) |
1037417 ns |
1023416 ns |
1.01 |
Dense(512 => 512, relu)(512 x 128)/forward/CPU/1 thread(s) |
724750 ns |
724437.5 ns |
1.00 |
Dense(512 => 512, relu)(512 x 128)/forward/GPU/CUDA |
47424 ns |
48101 ns |
0.99 |
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/2 thread(s) |
1559770.5 ns |
1546042 ns |
1.01 |
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/4 thread(s) |
1030250 ns |
1039334 ns |
0.99 |
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/8 thread(s) |
1374124.5 ns |
1418437.5 ns |
0.97 |
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/1 thread(s) |
2247584 ns |
2186167 ns |
1.03 |
Dense(512 => 512, relu)(512 x 128)/zygote/GPU/CUDA |
233675.5 ns |
237446 ns |
0.98 |
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/2 thread(s) |
1763437.5 ns |
1701854 ns |
1.04 |
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/4 thread(s) |
1243999.5 ns |
1239604.5 ns |
1.00 |
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/8 thread(s) |
2409333.5 ns |
1785437.5 ns |
1.35 |
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/1 thread(s) |
2300000 ns |
2312500 ns |
0.99 |
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/2 thread(s) |
3364292 ns |
3387542 ns |
0.99 |
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/4 thread(s) |
2059396 ns |
2038042 ns |
1.01 |
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/8 thread(s) |
2509500 ns |
2513500 ns |
1.00 |
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/1 thread(s) |
6005313 ns |
6020041 ns |
1.00 |
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/GPU/CUDA |
284303.5 ns |
285597 ns |
1.00 |
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/2 thread(s) |
24092812.5 ns |
24084791 ns |
1.00 |
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/4 thread(s) |
17288208 ns |
17173834 ns |
1.01 |
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/8 thread(s) |
17100625 ns |
17124375 ns |
1.00 |
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/1 thread(s) |
37463583 ns |
37508333 ns |
1.00 |
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/GPU/CUDA |
2406179 ns |
2411179 ns |
1.00 |
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/2 thread(s) |
52881166 ns |
52430334 ns |
1.01 |
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/4 thread(s) |
78782375 ns |
80022625 ns |
0.98 |
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/8 thread(s) |
169706375 ns |
168792792 ns |
1.01 |
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/1 thread(s) |
44653500 ns |
44511146 ns |
1.00 |
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/2 thread(s) |
250671041.5 ns |
248838833 ns |
1.01 |
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/4 thread(s) |
147833583 ns |
148459125 ns |
1.00 |
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/8 thread(s) |
115740916.5 ns |
115539229 ns |
1.00 |
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/1 thread(s) |
447340625 ns |
447599646 ns |
1.00 |
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/GPU/CUDA |
5444407.5 ns |
5455438 ns |
1.00 |
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/2 thread(s) |
1128204000 ns |
1123889625 ns |
1.00 |
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/4 thread(s) |
881423937 ns |
882004229 ns |
1.00 |
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/8 thread(s) |
810219792 ns |
805342291 ns |
1.01 |
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/1 thread(s) |
1745335417 ns |
1746632750 ns |
1.00 |
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/GPU/CUDA |
29318810.5 ns |
29283460 ns |
1.00 |
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/2 thread(s) |
1070060208 ns |
1005172333.5 ns |
1.06 |
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/4 thread(s) |
976817459 ns |
985652583 ns |
0.99 |
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/8 thread(s) |
1251612583 ns |
1246518292 ns |
1.00 |
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/1 thread(s) |
1744948041.5 ns |
1720077833.5 ns |
1.01 |
mlp7layer_bn(relu)(32 x 256)/forward/CPU/2 thread(s) |
1242417 ns |
1224042 ns |
1.02 |
mlp7layer_bn(relu)(32 x 256)/forward/CPU/4 thread(s) |
949750 ns |
780375 ns |
1.22 |
mlp7layer_bn(relu)(32 x 256)/forward/CPU/8 thread(s) |
920542 ns |
903792 ns |
1.02 |
mlp7layer_bn(relu)(32 x 256)/forward/CPU/1 thread(s) |
1941542 ns |
1941500 ns |
1.00 |
mlp7layer_bn(relu)(32 x 256)/forward/GPU/CUDA |
571684.5 ns |
574544.5 ns |
1.00 |
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/2 thread(s) |
5838417 ns |
5625979 ns |
1.04 |
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/4 thread(s) |
6339271 ns |
8687417 ns |
0.73 |
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/8 thread(s) |
23581062 ns |
23834542 ns |
0.99 |
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/1 thread(s) |
7057292 ns |
7099270.5 ns |
0.99 |
mlp7layer_bn(relu)(32 x 256)/zygote/GPU/CUDA |
1366685 ns |
1400097 ns |
0.98 |
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/2 thread(s) |
11446145.5 ns |
10299146 ns |
1.11 |
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/4 thread(s) |
9962895.5 ns |
10509708.5 ns |
0.95 |
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/8 thread(s) |
16217021 ns |
16674333 ns |
0.97 |
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/1 thread(s) |
8676875 ns |
8726562.5 ns |
0.99 |
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/2 thread(s) |
522749.5 ns |
386792 ns |
1.35 |
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/4 thread(s) |
478458 ns |
494500 ns |
0.97 |
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/8 thread(s) |
1640208.5 ns |
2152000 ns |
0.76 |
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/1 thread(s) |
88333.5 ns |
88104.5 ns |
1.00 |
Dense(128 => 128, gelu)(128 x 128)/forward/GPU/CUDA |
27599 ns |
27980 ns |
0.99 |
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/2 thread(s) |
388750 ns |
339145.5 ns |
1.15 |
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/4 thread(s) |
429917 ns |
436062.5 ns |
0.99 |
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/8 thread(s) |
4518875 ns |
4118937.5 ns |
1.10 |
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/1 thread(s) |
261084 ns |
261500 ns |
1.00 |
Dense(128 => 128, gelu)(128 x 128)/zygote/GPU/CUDA |
219602.5 ns |
223966.5 ns |
0.98 |
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/2 thread(s) |
715125 ns |
643312.5 ns |
1.11 |
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/4 thread(s) |
701020.5 ns |
709125 ns |
0.99 |
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/8 thread(s) |
710541.5 ns |
884729 ns |
0.80 |
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/1 thread(s) |
444417 ns |
445958 ns |
1.00 |
Dense(128 => 128, relu)(128 x 128)/forward/CPU/2 thread(s) |
466562.5 ns |
331291 ns |
1.41 |
Dense(128 => 128, relu)(128 x 128)/forward/CPU/4 thread(s) |
417542 ns |
438416 ns |
0.95 |
Dense(128 => 128, relu)(128 x 128)/forward/CPU/8 thread(s) |
665709 ns |
601249.5 ns |
1.11 |
Dense(128 => 128, relu)(128 x 128)/forward/CPU/1 thread(s) |
54833 ns |
53958 ns |
1.02 |
Dense(128 => 128, relu)(128 x 128)/forward/GPU/CUDA |
27780 ns |
28317 ns |
0.98 |
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/2 thread(s) |
347791 ns |
277271 ns |
1.25 |
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/4 thread(s) |
320188 ns |
319417 ns |
1.00 |
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/8 thread(s) |
787520.5 ns |
679875.5 ns |
1.16 |
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/1 thread(s) |
153334 ns |
153292 ns |
1.00 |
Dense(128 => 128, relu)(128 x 128)/zygote/GPU/CUDA |
204855 ns |
208854.5 ns |
0.98 |
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/2 thread(s) |
416125 ns |
344375 ns |
1.21 |
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/4 thread(s) |
384000 ns |
389750 ns |
0.99 |
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/8 thread(s) |
873187.5 ns |
870042 ns |
1.00 |
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/1 thread(s) |
174834 ns |
174063 ns |
1.00 |
vgg16(32, 32, 3, 64)/forward/CPU/2 thread(s) |
601112041 ns |
602013959 ns |
1.00 |
vgg16(32, 32, 3, 64)/forward/CPU/4 thread(s) |
422330834 ns |
430551375 ns |
0.98 |
vgg16(32, 32, 3, 64)/forward/CPU/8 thread(s) |
380600646 ns |
375744687.5 ns |
1.01 |
vgg16(32, 32, 3, 64)/forward/CPU/1 thread(s) |
875866000 ns |
873334145.5 ns |
1.00 |
vgg16(32, 32, 3, 64)/forward/GPU/CUDA |
7025885.5 ns |
7025763 ns |
1.00 |
vgg16(32, 32, 3, 64)/zygote/CPU/2 thread(s) |
2054908312.5 ns |
2078756625 ns |
0.99 |
vgg16(32, 32, 3, 64)/zygote/CPU/4 thread(s) |
1616674312.5 ns |
1607808875 ns |
1.01 |
vgg16(32, 32, 3, 64)/zygote/CPU/8 thread(s) |
1596823937.5 ns |
1638666770.5 ns |
0.97 |
vgg16(32, 32, 3, 64)/zygote/CPU/1 thread(s) |
2751744541 ns |
2782335083 ns |
0.99 |
vgg16(32, 32, 3, 64)/zygote/GPU/CUDA |
25842507.5 ns |
25908154 ns |
1.00 |
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/2 thread(s) |
524562.5 ns |
518875 ns |
1.01 |
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/4 thread(s) |
429042 ns |
395396 ns |
1.09 |
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/8 thread(s) |
1592709 ns |
1924520.5 ns |
0.83 |
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/1 thread(s) |
864916.5 ns |
865667 ns |
1.00 |
Dense(512 => 512, gelu)(512 x 128)/forward/GPU/CUDA |
47515 ns |
46907 ns |
1.01 |
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/2 thread(s) |
1864437.5 ns |
1851083 ns |
1.01 |
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/4 thread(s) |
2338167 ns |
1779896 ns |
1.31 |
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/8 thread(s) |
14751917 ns |
14384125 ns |
1.03 |
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/1 thread(s) |
2765542 ns |
2660187.5 ns |
1.04 |
Dense(512 => 512, gelu)(512 x 128)/zygote/GPU/CUDA |
246164.5 ns |
247071 ns |
1.00 |
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/2 thread(s) |
3189666.5 ns |
2699458.5 ns |
1.18 |
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/4 thread(s) |
2261709 ns |
2245375 ns |
1.01 |
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/8 thread(s) |
4310687.5 ns |
3691833 ns |
1.17 |
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/1 thread(s) |
3371271 ns |
3398958 ns |
0.99 |
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/2 thread(s) |
1467187.5 ns |
1486833.5 ns |
0.99 |
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/4 thread(s) |
1194250 ns |
933395.5 ns |
1.28 |
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/8 thread(s) |
1230250 ns |
1185562.5 ns |
1.04 |
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/1 thread(s) |
2300292 ns |
2210083 ns |
1.04 |
mlp7layer_bn(tanh)(32 x 256)/forward/GPU/CUDA |
544421 ns |
550416 ns |
0.99 |
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/2 thread(s) |
5777145.5 ns |
5783000 ns |
1.00 |
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/4 thread(s) |
6879750 ns |
7999604 ns |
0.86 |
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/8 thread(s) |
24684625 ns |
23905084 ns |
1.03 |
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/1 thread(s) |
7281875 ns |
7315812.5 ns |
1.00 |
mlp7layer_bn(tanh)(32 x 256)/zygote/GPU/CUDA |
1362898 ns |
1359665.5 ns |
1.00 |
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/2 thread(s) |
13183458.5 ns |
12501603.5 ns |
1.05 |
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/4 thread(s) |
12027104.5 ns |
12176000 ns |
0.99 |
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/8 thread(s) |
21058792 ns |
20858687.5 ns |
1.01 |
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/1 thread(s) |
10417209 ns |
10743417 ns |
0.97 |
Dense(16 => 16, relu)(16 x 128)/forward/CPU/2 thread(s) |
2792 ns |
3916.5 ns |
0.71 |
Dense(16 => 16, relu)(16 x 128)/forward/CPU/4 thread(s) |
2458.5 ns |
2875 ns |
0.86 |
Dense(16 => 16, relu)(16 x 128)/forward/CPU/8 thread(s) |
3250 ns |
5104.5 ns |
0.64 |
Dense(16 => 16, relu)(16 x 128)/forward/CPU/1 thread(s) |
3479 ns |
2645.5 ns |
1.32 |
Dense(16 => 16, relu)(16 x 128)/forward/GPU/CUDA |
24741 ns |
22876 ns |
1.08 |
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/2 thread(s) |
8208 ns |
8541 ns |
0.96 |
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/4 thread(s) |
8583 ns |
8500 ns |
1.01 |
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/8 thread(s) |
8834 ns |
8583.5 ns |
1.03 |
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/1 thread(s) |
8916 ns |
8625 ns |
1.03 |
Dense(16 => 16, relu)(16 x 128)/zygote/GPU/CUDA |
212597.5 ns |
209808.5 ns |
1.01 |
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/2 thread(s) |
16604.5 ns |
16667 ns |
1.00 |
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/4 thread(s) |
16459 ns |
16875 ns |
0.98 |
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/8 thread(s) |
16791.5 ns |
16708 ns |
1.00 |
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/1 thread(s) |
10750 ns |
10792 ns |
1.00 |
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/2 thread(s) |
10458 ns |
10166.5 ns |
1.03 |
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/4 thread(s) |
14583 ns |
15625 ns |
0.93 |
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/8 thread(s) |
10625 ns |
11375 ns |
0.93 |
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/1 thread(s) |
8000 ns |
7625 ns |
1.05 |
Dense(16 => 16, gelu)(16 x 128)/forward/GPU/CUDA |
24785 ns |
24722 ns |
1.00 |
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/2 thread(s) |
22334 ns |
22270.5 ns |
1.00 |
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/4 thread(s) |
22250 ns |
22333 ns |
1.00 |
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/8 thread(s) |
22625 ns |
22667 ns |
1.00 |
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/1 thread(s) |
22666.5 ns |
22500 ns |
1.01 |
Dense(16 => 16, gelu)(16 x 128)/zygote/GPU/CUDA |
233932.5 ns |
230109 ns |
1.02 |
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/2 thread(s) |
52541 ns |
52292 ns |
1.00 |
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/4 thread(s) |
52208 ns |
52458 ns |
1.00 |
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/8 thread(s) |
52687.5 ns |
52250 ns |
1.01 |
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/1 thread(s) |
44020.5 ns |
43916 ns |
1.00 |
Dense(128 => 128, identity)(128 x 128)/forward/CPU/2 thread(s) |
29084 ns |
28708 ns |
1.01 |
Dense(128 => 128, identity)(128 x 128)/forward/CPU/4 thread(s) |
28687.5 ns |
29083 ns |
0.99 |
Dense(128 => 128, identity)(128 x 128)/forward/CPU/8 thread(s) |
29541 ns |
29708 ns |
0.99 |
Dense(128 => 128, identity)(128 x 128)/forward/CPU/1 thread(s) |
46125 ns |
46250 ns |
1.00 |
Dense(128 => 128, identity)(128 x 128)/forward/GPU/CUDA |
25993 ns |
25756.5 ns |
1.01 |
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/2 thread(s) |
226000 ns |
207687.5 ns |
1.09 |
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/4 thread(s) |
260354 ns |
259000 ns |
1.01 |
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/8 thread(s) |
4296459 ns |
4070042 ns |
1.06 |
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/1 thread(s) |
148208 ns |
147583 ns |
1.00 |
Dense(128 => 128, identity)(128 x 128)/zygote/GPU/CUDA |
222796 ns |
223667.5 ns |
1.00 |
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/2 thread(s) |
327500 ns |
309125 ns |
1.06 |
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/4 thread(s) |
292812.5 ns |
289833 ns |
1.01 |
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/8 thread(s) |
816083 ns |
766104 ns |
1.07 |
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/1 thread(s) |
162250 ns |
161834 ns |
1.00 |
Dense(16 => 16, identity)(16 x 128)/forward/CPU/2 thread(s) |
2042 ns |
2000 ns |
1.02 |
Dense(16 => 16, identity)(16 x 128)/forward/CPU/4 thread(s) |
2000 ns |
2292 ns |
0.87 |
Dense(16 => 16, identity)(16 x 128)/forward/CPU/8 thread(s) |
2584 ns |
4416 ns |
0.59 |
Dense(16 => 16, identity)(16 x 128)/forward/CPU/1 thread(s) |
2167 ns |
2083 ns |
1.04 |
Dense(16 => 16, identity)(16 x 128)/forward/GPU/CUDA |
23103 ns |
22925 ns |
1.01 |
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/2 thread(s) |
7125 ns |
7604.5 ns |
0.94 |
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/4 thread(s) |
7250 ns |
7208 ns |
1.01 |
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/8 thread(s) |
7750 ns |
7542 ns |
1.03 |
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/1 thread(s) |
7667 ns |
7333 ns |
1.05 |
Dense(16 => 16, identity)(16 x 128)/zygote/GPU/CUDA |
292532.5 ns |
270317 ns |
1.08 |
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/2 thread(s) |
11375 ns |
11541 ns |
0.99 |
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/4 thread(s) |
11375 ns |
11542 ns |
0.99 |
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/8 thread(s) |
11666 ns |
11542 ns |
1.01 |
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/1 thread(s) |
7084 ns |
7125 ns |
0.99 |
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/2 thread(s) |
79948625 ns |
79878271 ns |
1.00 |
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/4 thread(s) |
49022000 ns |
47895875 ns |
1.02 |
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/8 thread(s) |
44937249.5 ns |
44952396.5 ns |
1.00 |
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/1 thread(s) |
151500083 ns |
151396791 ns |
1.00 |
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/GPU/CUDA |
2720047 ns |
2712095.5 ns |
1.00 |
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/2 thread(s) |
604334625 ns |
498390209 ns |
1.21 |
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/4 thread(s) |
411604000 ns |
410182750 ns |
1.00 |
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/8 thread(s) |
399438334 ns |
398143833 ns |
1.00 |
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/1 thread(s) |
693648750 ns |
683908500 ns |
1.01 |
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/GPU/CUDA |
14586033 ns |
14599220 ns |
1.00 |
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/2 thread(s) |
712920291.5 ns |
686490667 ns |
1.04 |
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/4 thread(s) |
669190583 ns |
660533166 ns |
1.01 |
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/8 thread(s) |
983677708 ns |
1012950542 ns |
0.97 |
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/1 thread(s) |
998398000 ns |
997113125 ns |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
5b3c184
to
470b5d9
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #856 +/- ##
==========================================
- Coverage 93.77% 91.53% -2.24%
==========================================
Files 59 59
Lines 2954 2964 +10
==========================================
- Hits 2770 2713 -57
- Misses 184 251 +67 ☔ View full report in Codecov by Sentry. |
08473af
to
3927dbb
Compare
3927dbb
to
43993e9
Compare
60e3bf1
to
c5f0256
Compare
these layers are not being removed, we are simply moving them to
Boltz.jl
needs LuxDL/Boltz.jl#48. The symbolic optimal control tutorial has been moved to boltz.