Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use simd masking for amd64&arm64 #326

Merged
merged 26 commits into from
Feb 22, 2024
Merged

Conversation

wdvxdr1123
Copy link
Contributor

goos: windows
goarch: amd64
pkg: nhooyr.io/websocket
cpu: Intel(R) Core(TM) i5-9300H CPU @ 2.40GHz
Benchmark_mask/2/basic-8 425339004 2.795 ns/op 715.66 MB/s
Benchmark_mask/2/nhooyr-8 379937766 3.186 ns/op 627.78 MB/s
Benchmark_mask/2/gorilla-8 392164167 3.071 ns/op 651.24 MB/s
Benchmark_mask/2/gobwas-8 310037222 3.880 ns/op 515.46 MB/s
Benchmark_mask/3/basic-8 321408024 3.806 ns/op 788.32 MB/s
Benchmark_mask/3/nhooyr-8 350726338 3.478 ns/op 862.58 MB/s
Benchmark_mask/3/gorilla-8 332217727 3.634 ns/op 825.43 MB/s
Benchmark_mask/3/gobwas-8 247376214 4.886 ns/op 614.01 MB/s
Benchmark_mask/4/basic-8 261182472 4.582 ns/op 872.91 MB/s
Benchmark_mask/4/nhooyr-8 381830712 3.262 ns/op 1226.05 MB/s
Benchmark_mask/4/gorilla-8 272616304 4.395 ns/op 910.04 MB/s
Benchmark_mask/4/gobwas-8 204574558 5.855 ns/op 683.19 MB/s
Benchmark_mask/8/basic-8 191330037 6.162 ns/op 1298.24 MB/s
Benchmark_mask/8/nhooyr-8 369694992 3.285 ns/op 2435.65 MB/s
Benchmark_mask/8/gorilla-8 175388466 6.743 ns/op 1186.48 MB/s
Benchmark_mask/8/gobwas-8 241719933 4.886 ns/op 1637.45 MB/s
Benchmark_mask/16/basic-8 100000000 10.92 ns/op 1464.83 MB/s
Benchmark_mask/16/nhooyr-8 272565096 4.436 ns/op 3606.98 MB/s
Benchmark_mask/16/gorilla-8 100000000 11.20 ns/op 1428.53 MB/s
Benchmark_mask/16/gobwas-8 221356798 5.405 ns/op 2960.45 MB/s
Benchmark_mask/32/basic-8 61476984 20.40 ns/op 1568.80 MB/s
Benchmark_mask/32/nhooyr-8 238665572 5.050 ns/op 6337.22 MB/s
Benchmark_mask/32/gorilla-8 100000000 12.09 ns/op 2647.28 MB/s
Benchmark_mask/32/gobwas-8 186077235 6.477 ns/op 4940.36 MB/s
Benchmark_mask/128/basic-8 14629720 80.90 ns/op 1582.19 MB/s
Benchmark_mask/128/nhooyr-8 181241968 6.565 ns/op 19497.98 MB/s
Benchmark_mask/128/gorilla-8 68308342 16.76 ns/op 7639.37 MB/s
Benchmark_mask/128/gobwas-8 94582026 12.97 ns/op 9872.11 MB/s
Benchmark_mask/512/basic-8 3921001 305.6 ns/op 1675.55 MB/s
Benchmark_mask/512/nhooyr-8 123102199 9.721 ns/op 52669.11 MB/s
Benchmark_mask/512/gorilla-8 32355914 38.18 ns/op 13411.43 MB/s
Benchmark_mask/512/gobwas-8 31528501 37.80 ns/op 13544.37 MB/s
Benchmark_mask/4096/basic-8 491804 2381 ns/op 1720.39 MB/s
Benchmark_mask/4096/nhooyr-8 26159691 46.98 ns/op 87187.73 MB/s
Benchmark_mask/4096/gorilla-8 4898440 243.6 ns/op 16817.89 MB/s
Benchmark_mask/4096/gobwas-8 4336398 277.2 ns/op 14776.40 MB/s
Benchmark_mask/16384/basic-8 113842 9623 ns/op 1702.66 MB/s
Benchmark_mask/16384/nhooyr-8 8088847 154.5 ns/op 106058.18 MB/s
Benchmark_mask/16384/gorilla-8 1282993 933.6 ns/op 17549.90 MB/s
Benchmark_mask/16384/gobwas-8 997347 1086 ns/op 15093.49 MB/s

@wdvxdr1123 wdvxdr1123 requested a review from nhooyr as a code owner January 24, 2022 11:26
@wdvxdr1123 wdvxdr1123 changed the title use simd mask for amd64&arm64 use simd masking for amd64&arm64 Jan 24, 2022
@nhooyr nhooyr changed the base branch from master to dev October 13, 2023 09:12
@nhooyr nhooyr added this to the v1.9.0 milestone Oct 13, 2023
@nhooyr nhooyr force-pushed the dev branch 8 times, most recently from e6fb843 to 0caa997 Compare October 19, 2023 11:01
@nhooyr
Copy link
Contributor

nhooyr commented Oct 19, 2023

Finally gotten around to reviewing this. I'm not very familiar with writing assembly of any kind. Why use AVX2 instead of AVX-512?

@nhooyr
Copy link
Contributor

nhooyr commented Oct 19, 2023

Also don't worry about the merge conflicts, I'll fix them myself.

@nhooyr
Copy link
Contributor

nhooyr commented Oct 19, 2023

Benchmark_mask/2/basic-12           631384161            1.883 ns/op    1061.88 MB/s           0 B/op          0 allocs/op
Benchmark_mask/2/nhooyr-12          591894866            2.061 ns/op     970.52 MB/s           0 B/op          0 allocs/op
Benchmark_mask/2/gorilla-12         657205106            1.923 ns/op    1040.00 MB/s           0 B/op          0 allocs/op
Benchmark_mask/2/gobwas-12          496567813            2.496 ns/op     801.34 MB/s           0 B/op          0 allocs/op
Benchmark_mask/3/basic-12           592897168            1.992 ns/op    1506.14 MB/s           0 B/op          0 allocs/op
Benchmark_mask/3/nhooyr-12          507159836            2.197 ns/op    1365.80 MB/s           0 B/op          0 allocs/op
Benchmark_mask/3/gorilla-12         553840022            2.304 ns/op    1302.28 MB/s           0 B/op          0 allocs/op
Benchmark_mask/3/gobwas-12          397366413            2.800 ns/op    1071.31 MB/s           0 B/op          0 allocs/op
Benchmark_mask/4/basic-12           634193241            1.807 ns/op    2213.23 MB/s           0 B/op          0 allocs/op
Benchmark_mask/4/nhooyr-12          569515338            2.002 ns/op    1998.05 MB/s           0 B/op          0 allocs/op
Benchmark_mask/4/gorilla-12         451382727            2.599 ns/op    1538.81 MB/s           0 B/op          0 allocs/op
Benchmark_mask/4/gobwas-12          356507592            3.312 ns/op    1207.75 MB/s           0 B/op          0 allocs/op
Benchmark_mask/8/basic-12           405458120            2.981 ns/op    2683.23 MB/s           0 B/op          0 allocs/op
Benchmark_mask/8/nhooyr-12          586096395            2.124 ns/op    3765.62 MB/s           0 B/op          0 allocs/op
Benchmark_mask/8/gorilla-12         296482132            4.003 ns/op    1998.59 MB/s           0 B/op          0 allocs/op
Benchmark_mask/8/gobwas-12          358996738            3.317 ns/op    2411.46 MB/s           0 B/op          0 allocs/op
Benchmark_mask/16/basic-12          199646600            5.828 ns/op    2745.57 MB/s           0 B/op          0 allocs/op
Benchmark_mask/16/nhooyr-12         482739769            2.494 ns/op    6416.64 MB/s           0 B/op          0 allocs/op
Benchmark_mask/16/gorilla-12        166567765            7.225 ns/op    2214.41 MB/s           0 B/op          0 allocs/op
Benchmark_mask/16/gobwas-12         297547316            3.989 ns/op    4011.07 MB/s           0 B/op          0 allocs/op
Benchmark_mask/32/basic-12          66204484            18.72 ns/op 1709.47 MB/s           0 B/op          0 allocs/op
Benchmark_mask/32/nhooyr-12         444971588            2.557 ns/op    12516.90 MB/s          0 B/op          0 allocs/op
Benchmark_mask/32/gorilla-12        153725197            7.672 ns/op    4171.01 MB/s           0 B/op          0 allocs/op
Benchmark_mask/32/gobwas-12         221328512            5.407 ns/op    5918.17 MB/s           0 B/op          0 allocs/op
Benchmark_mask/128/basic-12         21106347            58.03 ns/op 2205.73 MB/s           0 B/op          0 allocs/op
Benchmark_mask/128/nhooyr-12        329196819            3.777 ns/op    33893.45 MB/s          0 B/op          0 allocs/op
Benchmark_mask/128/gorilla-12       100000000           11.08 ns/op 11552.46 MB/s          0 B/op          0 allocs/op
Benchmark_mask/128/gobwas-12        82296996            14.98 ns/op 8546.19 MB/s           0 B/op          0 allocs/op
Benchmark_mask/512/basic-12          5925668           208.8 ns/op  2451.84 MB/s           0 B/op          0 allocs/op
Benchmark_mask/512/nhooyr-12        11774136           101.9 ns/op  5023.62 MB/s           0 B/op          0 allocs/op
Benchmark_mask/512/gorilla-12       43038144            26.93 ns/op 19014.42 MB/s          0 B/op          0 allocs/op
Benchmark_mask/512/gobwas-12        23169214            55.74 ns/op 9184.92 MB/s           0 B/op          0 allocs/op
Benchmark_mask/4096/basic-12          795450          1445 ns/op    2835.39 MB/s           0 B/op          0 allocs/op
Benchmark_mask/4096/nhooyr-12        9641613           124.3 ns/op  32940.03 MB/s          0 B/op          0 allocs/op
Benchmark_mask/4096/gorilla-12       8906532           139.6 ns/op  29346.43 MB/s          0 B/op          0 allocs/op
Benchmark_mask/4096/gobwas-12        2789071           424.5 ns/op  9648.84 MB/s           0 B/op          0 allocs/op
Benchmark_mask/16384/basic-12         219685          5795 ns/op    2827.23 MB/s           0 B/op          0 allocs/op
Benchmark_mask/16384/nhooyr-12       6135582           196.3 ns/op  83454.70 MB/s          0 B/op          0 allocs/op
Benchmark_mask/16384/gorilla-12      2377486           516.0 ns/op  31752.39 MB/s          0 B/op          0 allocs/op
Benchmark_mask/16384/gobwas-12        723357          1557 ns/op    10523.07 MB/s          0 B/op          0 allocs/op
PASS
ok      nhooyr.io/websocket/internal/thirdparty 58.195s

For some reason it slows down at the 512 byte benchmark. Not sure what's going on there.

@nhooyr
Copy link
Contributor

nhooyr commented Oct 19, 2023

More clearly:

Benchmark_mask/2/nhooyr-12          590403414            2.028 ns/op     986.19 MB/s           0 B/op          0 allocs/op
Benchmark_mask/3/nhooyr-12          584087539            2.063 ns/op    1453.96 MB/s           0 B/op          0 allocs/op
Benchmark_mask/4/nhooyr-12          655971961            1.839 ns/op    2175.33 MB/s           0 B/op          0 allocs/op
Benchmark_mask/8/nhooyr-12          642215430            1.905 ns/op    4199.37 MB/s           0 B/op          0 allocs/op
Benchmark_mask/16/nhooyr-12         485812323            2.301 ns/op    6954.78 MB/s           0 B/op          0 allocs/op
Benchmark_mask/32/nhooyr-12         501743362            2.351 ns/op    13608.66 MB/s          0 B/op          0 allocs/op
Benchmark_mask/128/nhooyr-12        334930033            3.648 ns/op    35090.20 MB/s          0 B/op          0 allocs/op
Benchmark_mask/512/nhooyr-12        51036463            99.33 ns/op 5154.74 MB/s           0 B/op          0 allocs/op
Benchmark_mask/4096/nhooyr-12       11011562           121.7 ns/op  33663.04 MB/s          0 B/op          0 allocs/op
Benchmark_mask/16384/nhooyr-12       6010369           197.6 ns/op  82904.02 MB/s          0 B/op          0 allocs/op

Super weird.

@nhooyr
Copy link
Contributor

nhooyr commented Oct 19, 2023

Disabling AVX2 seems to have fixed it.

Benchmark_mask/2/nhooyr-12          542097008            2.197 ns/op     910.42 MB/s           0 B/op          0 allocs/op
Benchmark_mask/3/nhooyr-12          537046092            2.258 ns/op    1328.35 MB/s           0 B/op          0 allocs/op
Benchmark_mask/4/nhooyr-12          516057957            1.957 ns/op    2044.01 MB/s           0 B/op          0 allocs/op
Benchmark_mask/8/nhooyr-12          566813392            2.027 ns/op    3946.05 MB/s           0 B/op          0 allocs/op
Benchmark_mask/16/nhooyr-12         456252357            2.465 ns/op    6491.72 MB/s           0 B/op          0 allocs/op
Benchmark_mask/32/nhooyr-12         477971746            2.697 ns/op    11862.99 MB/s          0 B/op          0 allocs/op
Benchmark_mask/128/nhooyr-12        323935191            3.760 ns/op    34040.58 MB/s          0 B/op          0 allocs/op
Benchmark_mask/512/nhooyr-12        131543775            8.955 ns/op    57174.80 MB/s          0 B/op          0 allocs/op
Benchmark_mask/4096/nhooyr-12       23514272            46.50 ns/op 88092.14 MB/s          0 B/op          0 allocs/op
Benchmark_mask/16384/nhooyr-12       6336271           181.9 ns/op  90069.97 MB/s          0 B/op          0 allocs/op

@nhooyr nhooyr force-pushed the patch-simd-mask branch 6 times, most recently from 1e8bf28 to 32d0aa1 Compare October 19, 2023 23:40
@nhooyr
Copy link
Contributor

nhooyr commented Oct 20, 2023

The amd64 code looks good to me so far but the arm64 code doesn't seem to produce any speedup at least through qemu.

goos: linux
goarch: amd64
pkg: nhooyr.io/websocket
cpu: 12th Gen Intel(R) Core(TM) i5-1235U
BenchmarkFlateWriter-12    	    3722	    326920 ns/op	 1200024 B/op	      16 allocs/op
BenchmarkFlateReader-12    	  169479	      6926 ns/op	   41047 B/op	       6 allocs/op
BenchmarkConn/disabledCompress-12         	   84481	     12720 ns/op	  40.25 MB/s	       518.0 read/op	       520.0 written/op	       1 B/op	       0 allocs/op
BenchmarkConn/compressContextTakeover-12  	   32448	     33822 ns/op	  15.14 MB/s	        24.00 read/op	        36.00 written/op	      42 B/op	       0 allocs/op
BenchmarkConn/compressNoContext-12        	   38430	     29966 ns/op	  17.09 MB/s	        41.00 read/op	        29.00 written/op	      96 B/op	       0 allocs/op
PASS
ok  	nhooyr.io/websocket	6.819s
goos: linux
goarch: amd64
pkg: nhooyr.io/websocket/internal/thirdparty
cpu: 12th Gen Intel(R) Core(TM) i5-1235U
Benchmark_mask/amd64/basic/8-12 	425723130	         2.780 ns/op	2877.27 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/amd64/basic/16-12         	224227551	         5.293 ns/op	3022.94 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/amd64/basic/32-12         	100000000	        10.19 ns/op	3139.45 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/amd64/basic/128-12        	24135116	        46.41 ns/op	2757.74 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/amd64/basic/256-12        	12339093	        85.20 ns/op	3004.60 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/amd64/basic/512-12        	 7325516	       163.8 ns/op	3125.51 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/amd64/basic/1024-12       	 3657289	       320.5 ns/op	3194.87 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/amd64/basic/2048-12       	 1887517	       638.8 ns/op	3206.18 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/amd64/basic/4096-12       	  934762	      1264 ns/op	3241.70 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/amd64/basic/8192-12       	  395722	      2598 ns/op	3153.37 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/amd64/basic/16384-12      	  236943	      5162 ns/op	3173.86 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/amd64/nhooyr-go/8-12      	505864449	         2.316 ns/op	3454.92 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/amd64/nhooyr-go/16-12     	500031924	         2.375 ns/op	6737.54 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/amd64/nhooyr-go/32-12     	451944298	         2.574 ns/op	12429.91 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/amd64/nhooyr-go/128-12    	306800580	         3.938 ns/op	32506.67 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/amd64/nhooyr-go/256-12    	197035516	         6.612 ns/op	38717.64 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/amd64/nhooyr-go/512-12    	114783908	        10.59 ns/op	48332.85 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/amd64/nhooyr-go/1024-12   	59498761	        19.20 ns/op	53328.93 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/amd64/nhooyr-go/2048-12   	31537369	        36.59 ns/op	55970.07 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/amd64/nhooyr-go/4096-12   	15516426	        77.49 ns/op	52861.24 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/amd64/nhooyr-go/8192-12   	 8057901	       150.7 ns/op	54358.50 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/amd64/nhooyr-go/16384-12  	 4023576	       294.3 ns/op	55666.10 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/amd64/wdvxdr1123-asm/8-12 	498550161	         2.298 ns/op	3481.43 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/amd64/wdvxdr1123-asm/16-12         	508013607	         2.505 ns/op	6387.00 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/amd64/wdvxdr1123-asm/32-12         	475446944	         2.687 ns/op	11909.62 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/amd64/wdvxdr1123-asm/128-12        	347085175	         3.462 ns/op	36969.76 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/amd64/wdvxdr1123-asm/256-12        	239742297	         5.094 ns/op	50253.25 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/amd64/wdvxdr1123-asm/512-12        	132367032	         9.429 ns/op	54300.89 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/amd64/wdvxdr1123-asm/1024-12       	59876775	        17.24 ns/op	59387.88 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/amd64/wdvxdr1123-asm/2048-12       	43464296	        28.10 ns/op	72877.63 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/amd64/wdvxdr1123-asm/4096-12       	25988770	        51.22 ns/op	79973.77 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/amd64/wdvxdr1123-asm/8192-12       	11870416	        97.20 ns/op	84279.05 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/amd64/wdvxdr1123-asm/16384-12      	 6374655	       196.1 ns/op	83555.75 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/amd64/gorilla/8-12                 	307082148	         4.199 ns/op	1905.10 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/amd64/gorilla/16-12                	166534495	         7.258 ns/op	2204.54 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/amd64/gorilla/32-12                	157286900	         7.638 ns/op	4189.59 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/amd64/gorilla/128-12               	121178448	        10.14 ns/op	12620.60 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/amd64/gorilla/256-12               	88366356	        13.62 ns/op	18791.55 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/amd64/gorilla/512-12               	40303383	        26.69 ns/op	19181.52 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/amd64/gorilla/1024-12              	28564507	        41.38 ns/op	24744.85 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/amd64/gorilla/2048-12              	14325160	        72.32 ns/op	28317.53 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/amd64/gorilla/4096-12              	 8834644	       130.5 ns/op	31378.79 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/amd64/gorilla/8192-12              	 4661844	       249.3 ns/op	32856.93 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/amd64/gorilla/16384-12             	 2452156	       491.8 ns/op	33317.08 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/amd64/gobwas/8-12                  	372520472	         3.229 ns/op	2477.79 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/amd64/gobwas/16-12                 	303515722	         3.914 ns/op	4088.10 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/amd64/gobwas/32-12                 	215681712	         5.353 ns/op	5977.97 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/amd64/gobwas/128-12                	82971432	        15.39 ns/op	8319.67 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/amd64/gobwas/256-12                	43254800	        30.40 ns/op	8420.77 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/amd64/gobwas/512-12                	20618145	        58.86 ns/op	8698.44 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/amd64/gobwas/1024-12               	11872770	       108.3 ns/op	9453.73 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/amd64/gobwas/2048-12               	 6433407	       207.7 ns/op	9860.23 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/amd64/gobwas/4096-12               	 3156878	       403.0 ns/op	10162.75 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/amd64/gobwas/8192-12               	 1622864	       745.8 ns/op	10984.28 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/amd64/gobwas/16384-12              	  820447	      1490 ns/op	10997.96 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/amd64/nbio/8-12                    	585874134	         2.147 ns/op	3726.24 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/amd64/nbio/16-12                   	475160053	         2.394 ns/op	6684.32 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/amd64/nbio/32-12                   	356494118	         3.316 ns/op	9650.56 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/amd64/nbio/128-12                  	269125159	         4.106 ns/op	31177.06 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/amd64/nbio/256-12                  	150355809	         7.474 ns/op	34249.82 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/amd64/nbio/512-12                  	72345751	        14.25 ns/op	35929.65 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/amd64/nbio/1024-12                 	41781184	        24.17 ns/op	42371.22 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/amd64/nbio/2048-12                 	26343178	        45.28 ns/op	45225.24 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/amd64/nbio/4096-12                 	13897591	        94.29 ns/op	43440.45 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/amd64/nbio/8192-12                 	 6824702	       185.3 ns/op	44204.41 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/amd64/nbio/16384-12                	 3472126	       368.5 ns/op	44459.74 MB/s	       0 B/op	       0 allocs/op
PASS
ok  	nhooyr.io/websocket/internal/thirdparty	102.835s
goos: linux
goarch: arm64
pkg: nhooyr.io/websocket/internal/thirdparty
cpu: 12th Gen Intel(R) Core(TM) i5-1235U @ 1364.583MHz
Benchmark_mask/arm64/basic/8-12 	47771958	        26.59 ns/op	 300.86 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/arm64/basic/16-12         	24547660	        52.69 ns/op	 303.64 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/arm64/basic/32-12         	12533614	        92.10 ns/op	 347.46 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/arm64/basic/128-12        	 3555813	       346.9 ns/op	 368.94 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/arm64/basic/256-12        	 1811830	       673.4 ns/op	 380.14 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/arm64/basic/512-12        	  938022	      1335 ns/op	 383.53 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/arm64/basic/1024-12       	  484177	      2479 ns/op	 413.13 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/arm64/basic/2048-12       	  211894	      5014 ns/op	 408.45 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/arm64/basic/4096-12       	  112736	     10130 ns/op	 404.35 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/arm64/basic/8192-12       	   61010	     21183 ns/op	 386.72 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/arm64/basic/16384-12      	   31218	     39141 ns/op	 418.59 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/arm64/nhooyr-go/8-12      	39843982	        28.80 ns/op	 277.80 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/arm64/nhooyr-go/16-12     	34930447	        29.61 ns/op	 540.32 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/arm64/nhooyr-go/32-12     	32931360	        33.07 ns/op	 967.69 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/arm64/nhooyr-go/128-12    	32877277	        42.30 ns/op	3025.92 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/arm64/nhooyr-go/256-12    	21600469	        60.31 ns/op	4244.99 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/arm64/nhooyr-go/512-12    	14673056	        94.28 ns/op	5430.72 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/arm64/nhooyr-go/1024-12   	 8250734	       163.7 ns/op	6256.35 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/arm64/nhooyr-go/2048-12   	 3977023	       301.1 ns/op	6802.66 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/arm64/nhooyr-go/4096-12   	 2260831	       578.0 ns/op	7086.45 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/arm64/nhooyr-go/8192-12   	 1121847	      1079 ns/op	7594.77 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/arm64/nhooyr-go/16384-12  	  508933	      2095 ns/op	7819.85 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/arm64/wdvxdr1123-asm/8-12 	34301584	        36.89 ns/op	 216.87 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/arm64/wdvxdr1123-asm/16-12         	33929019	        37.52 ns/op	 426.46 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/arm64/wdvxdr1123-asm/32-12         	31671778	        41.70 ns/op	 767.31 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/arm64/wdvxdr1123-asm/128-12        	25115096	        53.61 ns/op	2387.78 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/arm64/wdvxdr1123-asm/256-12        	17948512	        63.43 ns/op	4036.25 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/arm64/wdvxdr1123-asm/512-12        	12472801	       104.4 ns/op	4902.55 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/arm64/wdvxdr1123-asm/1024-12       	 7425166	       161.7 ns/op	6334.35 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/arm64/wdvxdr1123-asm/2048-12       	 3981708	       292.6 ns/op	6998.52 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/arm64/wdvxdr1123-asm/4096-12       	 2086530	       563.9 ns/op	7264.25 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/arm64/wdvxdr1123-asm/8192-12       	 1070166	      1114 ns/op	7355.53 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/arm64/wdvxdr1123-asm/16384-12      	  504093	      2159 ns/op	7588.84 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/arm64/gorilla/8-12                 	27462318	        46.20 ns/op	 173.14 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/arm64/gorilla/16-12                	23176634	        49.10 ns/op	 325.85 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/arm64/gorilla/32-12                	22810416	        58.54 ns/op	 546.62 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/arm64/gorilla/128-12               	12784365	        87.69 ns/op	1459.75 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/arm64/gorilla/256-12               	 8819766	       142.4 ns/op	1797.13 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/arm64/gorilla/512-12               	 5834811	       225.9 ns/op	2266.71 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/arm64/gorilla/1024-12              	 3309975	       369.7 ns/op	2769.72 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/arm64/gorilla/2048-12              	 1758891	       763.6 ns/op	2682.02 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/arm64/gorilla/4096-12              	  742028	      1404 ns/op	2917.37 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/arm64/gorilla/8192-12              	  489636	      2739 ns/op	2990.70 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/arm64/gorilla/16384-12             	  236709	      5086 ns/op	3221.13 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/arm64/gobwas/8-12                  	31763971	        34.14 ns/op	 234.35 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/arm64/gobwas/16-12                 	28280493	        41.83 ns/op	 382.47 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/arm64/gobwas/32-12                 	23041581	        52.92 ns/op	 604.73 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/arm64/gobwas/128-12                	10903680	       115.3 ns/op	1110.58 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/arm64/gobwas/256-12                	 6139404	       202.1 ns/op	1266.59 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/arm64/gobwas/512-12                	 3639919	       339.2 ns/op	1509.60 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/arm64/gobwas/1024-12               	 1897648	       680.3 ns/op	1505.26 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/arm64/gobwas/2048-12               	  958771	      1223 ns/op	1674.76 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/arm64/gobwas/4096-12               	  520082	      2581 ns/op	1586.94 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/arm64/gobwas/8192-12               	  243410	      4994 ns/op	1640.52 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/arm64/gobwas/16384-12              	  129097	      9468 ns/op	1730.40 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/arm64/nbio/8-12                    	41615394	        27.52 ns/op	 290.75 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/arm64/nbio/16-12                   	38795175	        31.95 ns/op	 500.84 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/arm64/nbio/32-12                   	35392299	        36.75 ns/op	 870.68 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/arm64/nbio/128-12                  	31278990	        39.73 ns/op	3221.36 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/arm64/nbio/256-12                  	20779035	        59.31 ns/op	4316.11 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/arm64/nbio/512-12                  	12213514	        99.53 ns/op	5144.00 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/arm64/nbio/1024-12                 	 7523419	       161.6 ns/op	6335.00 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/arm64/nbio/2048-12                 	 3721555	       330.7 ns/op	6192.52 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/arm64/nbio/4096-12                 	 1884742	       612.8 ns/op	6683.56 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/arm64/nbio/8192-12                 	 1000591	      1199 ns/op	6834.55 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/arm64/nbio/16384-12                	  512989	      2263 ns/op	7238.41 MB/s	       0 B/op	       0 allocs/op
PASS

In fact it's slower. Not sure what's going on.

@nhooyr
Copy link
Contributor

nhooyr commented Oct 20, 2023

Will test on a proper VM too.

@nhooyr nhooyr force-pushed the patch-simd-mask branch 2 times, most recently from 7d0c6f4 to 9f298ec Compare October 20, 2023 14:29
nhooyr added 12 commits October 25, 2023 17:41
json.Encoder is 42% faster than json.Marshal thanks to the memory reuse.

goos: linux
goarch: amd64
pkg: nhooyr.io/websocket/wsjson
cpu: 12th Gen Intel(R) Core(TM) i5-1235U
BenchmarkJSON/json.Encoder-12            3517579           340.2 ns/op        24 B/op          1 allocs/op
BenchmarkJSON/json.Marshal-12            2374086           484.3 ns/op       728 B/op          2 allocs/op

Closes coder#409
[qrvnl@dios ~/src/websocket] 130$ go test -bench=. ./wsjson/
goos: linux
goarch: amd64
pkg: nhooyr.io/websocket/wsjson
cpu: 12th Gen Intel(R) Core(TM) i5-1235U
BenchmarkJSON/json.Encoder/8-12         14041426            72.59 ns/op  110.21 MB/s          16 B/op          1 allocs/op
BenchmarkJSON/json.Encoder/16-12        13936426            86.99 ns/op  183.92 MB/s          16 B/op          1 allocs/op
BenchmarkJSON/json.Encoder/32-12        11416401           115.3 ns/op   277.59 MB/s          16 B/op          1 allocs/op
BenchmarkJSON/json.Encoder/128-12        4600574           264.7 ns/op   483.55 MB/s          16 B/op          1 allocs/op
BenchmarkJSON/json.Encoder/256-12        2710398           433.9 ns/op   590.06 MB/s          16 B/op          1 allocs/op
BenchmarkJSON/json.Encoder/512-12        1588930           717.3 ns/op   713.82 MB/s          16 B/op          1 allocs/op
BenchmarkJSON/json.Encoder/1024-12        823138          1484 ns/op     689.80 MB/s          16 B/op          1 allocs/op
BenchmarkJSON/json.Encoder/2048-12        402823          2875 ns/op     712.32 MB/s          16 B/op          1 allocs/op
BenchmarkJSON/json.Encoder/4096-12        213926          5602 ns/op     731.14 MB/s          16 B/op          1 allocs/op
BenchmarkJSON/json.Encoder/8192-12         92864         11281 ns/op     726.19 MB/s          16 B/op          1 allocs/op
BenchmarkJSON/json.Encoder/16384-12        39318         29203 ns/op     561.04 MB/s          19 B/op          1 allocs/op
BenchmarkJSON/json.Marshal/8-12         10768671           114.5 ns/op    69.89 MB/s          48 B/op          2 allocs/op
BenchmarkJSON/json.Marshal/16-12        10140996           113.9 ns/op   140.51 MB/s          64 B/op          2 allocs/op
BenchmarkJSON/json.Marshal/32-12         9211780           121.6 ns/op   263.06 MB/s          64 B/op          2 allocs/op
BenchmarkJSON/json.Marshal/128-12        4632796           264.2 ns/op   484.53 MB/s         224 B/op          2 allocs/op
BenchmarkJSON/json.Marshal/256-12        2441511           473.5 ns/op   540.65 MB/s         432 B/op          2 allocs/op
BenchmarkJSON/json.Marshal/512-12        1298788           896.2 ns/op   571.27 MB/s         912 B/op          2 allocs/op
BenchmarkJSON/json.Marshal/1024-12        602084          1866 ns/op     548.83 MB/s        1808 B/op          2 allocs/op
BenchmarkJSON/json.Marshal/2048-12        341151          3817 ns/op     536.61 MB/s        3474 B/op          2 allocs/op
BenchmarkJSON/json.Marshal/4096-12        175594          7034 ns/op     582.32 MB/s        6548 B/op          2 allocs/op
BenchmarkJSON/json.Marshal/8192-12         83222         15023 ns/op     545.30 MB/s       13591 B/op          2 allocs/op
BenchmarkJSON/json.Marshal/16384-12        33087         39348 ns/op     416.39 MB/s       27304 B/op          2 allocs/op
PASS
ok      nhooyr.io/websocket/wsjson  32.934s
@dixyes
Copy link

dixyes commented Nov 20, 2023

I guess qemu simd emulation harms performance

on aliyun(alibabacloud) yitian710 (arm64 armv8) 2c4g machine:

root@iZbp1heu8m4uq7gguvddwaZ:~/websocket# cat /proc/cpuinfo
processor       : 0
BogoMIPS        : 100.00
Features        : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm ssbs sb dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh bti
CPU implementer : 0x41
CPU architecture: 8
CPU variant     : 0x0
CPU part        : 0xd49
CPU revision    : 0

processor       : 1
BogoMIPS        : 100.00
Features        : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm ssbs sb dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh bti
CPU implementer : 0x41
CPU architecture: 8
CPU variant     : 0x0
CPU part        : 0xd49
CPU revision    : 0

root@iZbp1heu8m4uq7gguvddwaZ:~/websocket# uname -a
Linux iZbp1heu8m4uq7gguvddwaZ 5.10.0-19-arm64 #1 SMP Debian 5.10.149-2 (2022-10-21) aarch64 GNU/Linux
goos: linux
goarch: arm64
pkg: nhooyr.io/websocket/internal/thirdparty
Benchmark_mask/arm64/basic/8-2  206792809                5.802 ns/op    1378.89 MB/s           0 B/op          0 allocs/op
Benchmark_mask/arm64/basic/16-2                 100000000               10.02 ns/op     1596.73 MB/s           0 B/op          0 allocs/op
Benchmark_mask/arm64/basic/32-2                 58691935                20.34 ns/op     1573.17 MB/s           0 B/op          0 allocs/op
Benchmark_mask/arm64/basic/128-2                14648796                81.91 ns/op     1562.64 MB/s           0 B/op          0 allocs/op
Benchmark_mask/arm64/basic/256-2                 7302968               164.3 ns/op      1558.27 MB/s           0 B/op          0 allocs/op
Benchmark_mask/arm64/basic/512-2                 3585920               334.4 ns/op      1530.96 MB/s           0 B/op          0 allocs/op
Benchmark_mask/arm64/basic/1024-2                1807688               663.8 ns/op      1542.68 MB/s           0 B/op          0 allocs/op
Benchmark_mask/arm64/basic/2048-2                 901452              1322 ns/op        1548.69 MB/s           0 B/op          0 allocs/op
Benchmark_mask/arm64/basic/4096-2                 453880              2641 ns/op        1550.79 MB/s           0 B/op          0 allocs/op
Benchmark_mask/arm64/basic/8192-2                 227306              5273 ns/op        1553.59 MB/s           0 B/op          0 allocs/op
Benchmark_mask/arm64/basic/16384-2                113630             10536 ns/op        1555.07 MB/s           0 B/op          0 allocs/op
Benchmark_mask/arm64/nhooyr-go/8-2              372385791                3.200 ns/op    2499.82 MB/s           0 B/op          0 allocs/op
Benchmark_mask/arm64/nhooyr-go/16-2             326266168                3.677 ns/op    4351.15 MB/s           0 B/op          0 allocs/op
Benchmark_mask/arm64/nhooyr-go/32-2             326263063                3.675 ns/op    8706.64 MB/s           0 B/op          0 allocs/op
Benchmark_mask/arm64/nhooyr-go/128-2            193277991                6.178 ns/op    20717.82 MB/s          0 B/op          0 allocs/op
Benchmark_mask/arm64/nhooyr-go/256-2            120835178                9.939 ns/op    25757.71 MB/s          0 B/op          0 allocs/op
Benchmark_mask/arm64/nhooyr-go/512-2            67891269                17.58 ns/op     29120.25 MB/s          0 B/op          0 allocs/op
Benchmark_mask/arm64/nhooyr-go/1024-2           36238434                33.05 ns/op     30981.53 MB/s          0 B/op          0 allocs/op
Benchmark_mask/arm64/nhooyr-go/2048-2           18876517                63.51 ns/op     32244.43 MB/s          0 B/op          0 allocs/op
Benchmark_mask/arm64/nhooyr-go/4096-2            9632865               124.4 ns/op      32913.56 MB/s          0 B/op          0 allocs/op
Benchmark_mask/arm64/nhooyr-go/8192-2            4862270               246.5 ns/op      33239.77 MB/s          0 B/op          0 allocs/op
Benchmark_mask/arm64/nhooyr-go/16384-2           2449879               490.7 ns/op      33386.93 MB/s          0 B/op          0 allocs/op
Benchmark_mask/arm64/wdvxdr1123-asm/8-2         320587507                3.747 ns/op    2134.84 MB/s           0 B/op          0 allocs/op
Benchmark_mask/arm64/wdvxdr1123-asm/16-2                298397137                4.016 ns/op    3984.46 MB/s           0 B/op          0 allocs/op
Benchmark_mask/arm64/wdvxdr1123-asm/32-2                295286755                4.051 ns/op    7899.26 MB/s           0 B/op          0 allocs/op
Benchmark_mask/arm64/wdvxdr1123-asm/128-2               198758401                6.010 ns/op    21299.05 MB/s          0 B/op          0 allocs/op
Benchmark_mask/arm64/wdvxdr1123-asm/256-2               148294503                8.101 ns/op    31599.58 MB/s          0 B/op          0 allocs/op
Benchmark_mask/arm64/wdvxdr1123-asm/512-2               99287224                12.21 ns/op     41941.45 MB/s          0 B/op          0 allocs/op
Benchmark_mask/arm64/wdvxdr1123-asm/1024-2              59101357                20.24 ns/op     50591.08 MB/s          0 B/op          0 allocs/op
Benchmark_mask/arm64/wdvxdr1123-asm/2048-2              32870538                36.43 ns/op     56215.26 MB/s          0 B/op          0 allocs/op
Benchmark_mask/arm64/wdvxdr1123-asm/4096-2              17392502                68.75 ns/op     59578.86 MB/s          0 B/op          0 allocs/op
Benchmark_mask/arm64/wdvxdr1123-asm/8192-2               8991554               133.3 ns/op      61432.88 MB/s          0 B/op          0 allocs/op
Benchmark_mask/arm64/wdvxdr1123-asm/16384-2              4537192               264.3 ns/op      61990.60 MB/s          0 B/op          0 allocs/op
Benchmark_mask/arm64/gorilla/8-2                        166697532                7.199 ns/op    1111.26 MB/s           0 B/op          0 allocs/op
Benchmark_mask/arm64/gorilla/16-2                       95416378                12.50 ns/op     1280.35 MB/s           0 B/op          0 allocs/op
Benchmark_mask/arm64/gorilla/32-2                       99859288                12.03 ns/op     2659.82 MB/s           0 B/op          0 allocs/op
Benchmark_mask/arm64/gorilla/128-2                      74788264                15.98 ns/op     8008.48 MB/s           0 B/op          0 allocs/op
Benchmark_mask/arm64/gorilla/256-2                      49521510                24.10 ns/op     10620.54 MB/s          0 B/op          0 allocs/op
Benchmark_mask/arm64/gorilla/512-2                      30854259                38.75 ns/op     13213.30 MB/s          0 B/op          0 allocs/op
Benchmark_mask/arm64/gorilla/1024-2                     17709324                67.75 ns/op     15114.36 MB/s          0 B/op          0 allocs/op
Benchmark_mask/arm64/gorilla/2048-2                      9540504               125.6 ns/op      16301.06 MB/s          0 B/op          0 allocs/op
Benchmark_mask/arm64/gorilla/4096-2                      4887254               245.4 ns/op      16689.60 MB/s          0 B/op          0 allocs/op
Benchmark_mask/arm64/gorilla/8192-2                      2506159               477.0 ns/op      17173.59 MB/s          0 B/op          0 allocs/op
Benchmark_mask/arm64/gorilla/16384-2                     1276844               939.9 ns/op      17431.75 MB/s          0 B/op          0 allocs/op
Benchmark_mask/arm64/gobwas/8-2                         239466345                5.011 ns/op    1596.61 MB/s           0 B/op          0 allocs/op
Benchmark_mask/arm64/gobwas/16-2                        198722446                6.030 ns/op    2653.50 MB/s           0 B/op          0 allocs/op
Benchmark_mask/arm64/gobwas/32-2                        149454994                8.028 ns/op    3986.12 MB/s           0 B/op          0 allocs/op
Benchmark_mask/arm64/gobwas/128-2                       58453107                20.45 ns/op     6259.12 MB/s           0 B/op          0 allocs/op
Benchmark_mask/arm64/gobwas/256-2                       32118558                37.26 ns/op     6870.96 MB/s           0 B/op          0 allocs/op
Benchmark_mask/arm64/gobwas/512-2                       16886425                70.98 ns/op     7213.33 MB/s           0 B/op          0 allocs/op
Benchmark_mask/arm64/gobwas/1024-2                       8660222               138.4 ns/op      7396.91 MB/s           0 B/op          0 allocs/op
Benchmark_mask/arm64/gobwas/2048-2                       4389014               273.8 ns/op      7478.89 MB/s           0 B/op          0 allocs/op
Benchmark_mask/arm64/gobwas/4096-2                       2220012               540.4 ns/op      7579.69 MB/s           0 B/op          0 allocs/op
Benchmark_mask/arm64/gobwas/8192-2                       1000000              1070 ns/op        7654.83 MB/s           0 B/op          0 allocs/op
Benchmark_mask/arm64/gobwas/16384-2                       561620              2130 ns/op        7691.23 MB/s           0 B/op          0 allocs/op
Benchmark_mask/arm64/nbio/8-2                           359732443                3.339 ns/op    2395.91 MB/s           0 B/op          0 allocs/op
Benchmark_mask/arm64/nbio/16-2                          295799040                4.060 ns/op    3941.20 MB/s           0 B/op          0 allocs/op
Benchmark_mask/arm64/nbio/32-2                          222655515                5.406 ns/op    5918.87 MB/s           0 B/op          0 allocs/op
Benchmark_mask/arm64/nbio/128-2                         175895174                6.824 ns/op    18757.64 MB/s          0 B/op          0 allocs/op
Benchmark_mask/arm64/nbio/256-2                         100000000               11.33 ns/op     22586.09 MB/s          0 B/op          0 allocs/op
Benchmark_mask/arm64/nbio/512-2                         59968189                19.72 ns/op     25968.88 MB/s          0 B/op          0 allocs/op
Benchmark_mask/arm64/nbio/1024-2                        33116636                36.16 ns/op     28320.44 MB/s          0 B/op          0 allocs/op
Benchmark_mask/arm64/nbio/2048-2                        17286394                69.43 ns/op     29496.87 MB/s          0 B/op          0 allocs/op
Benchmark_mask/arm64/nbio/4096-2                         8810706               136.0 ns/op      30118.04 MB/s          0 B/op          0 allocs/op
Benchmark_mask/arm64/nbio/8192-2                         4461346               268.9 ns/op      30466.70 MB/s          0 B/op          0 allocs/op
Benchmark_mask/arm64/nbio/16384-2                        2242198               534.8 ns/op      30633.09 MB/s          0 B/op          0 allocs/op
PASS

on aliyun(alibabacloud) ampere altra (arm64 armv8) 2c4g machine:

root@iZbp19nzrw6iywyjtl52srZ:~# cat /proc/cpuinfo 
processor       : 0
BogoMIPS        : 50.00
Features        : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp ssbs
CPU implementer : 0x41
CPU architecture: 8
CPU variant     : 0x3
CPU part        : 0xd0c
CPU revision    : 1

processor       : 1
BogoMIPS        : 50.00
Features        : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp ssbs
CPU implementer : 0x41
CPU architecture: 8
CPU variant     : 0x3
CPU part        : 0xd0c
CPU revision    : 1

root@iZbp19nzrw6iywyjtl52srZ:~# uname -a
Linux iZbp19nzrw6iywyjtl52srZ 5.10.0-19-arm64 #1 SMP Debian 5.10.149-2 (2022-10-21) aarch64 GNU/Linux
goos: linux
goarch: arm64
pkg: nhooyr.io/websocket/internal/thirdparty
Benchmark_mask/arm64/basic/8-2  156192206                7.680 ns/op    1041.61 MB/s           0 B/op          0 allocs/op
Benchmark_mask/arm64/basic/16-2                 87099630                13.69 ns/op     1168.31 MB/s           0 B/op          0 allocs/op
Benchmark_mask/arm64/basic/32-2                 43625746                27.15 ns/op     1178.65 MB/s           0 B/op          0 allocs/op
Benchmark_mask/arm64/basic/128-2                11600862               103.4 ns/op      1237.93 MB/s           0 B/op          0 allocs/op
Benchmark_mask/arm64/basic/256-2                 5790669               207.2 ns/op      1235.57 MB/s           0 B/op          0 allocs/op
Benchmark_mask/arm64/basic/512-2                 2849724               421.0 ns/op      1216.19 MB/s           0 B/op          0 allocs/op
Benchmark_mask/arm64/basic/1024-2                1443289               830.9 ns/op      1232.42 MB/s           0 B/op          0 allocs/op
Benchmark_mask/arm64/basic/2048-2                 723596              1652 ns/op        1239.84 MB/s           0 B/op          0 allocs/op
Benchmark_mask/arm64/basic/4096-2                 364108              3289 ns/op        1245.26 MB/s           0 B/op          0 allocs/op
Benchmark_mask/arm64/basic/8192-2                 182422              6565 ns/op        1247.79 MB/s           0 B/op          0 allocs/op
Benchmark_mask/arm64/basic/16384-2                 91266             13126 ns/op        1248.20 MB/s           0 B/op          0 allocs/op
Benchmark_mask/arm64/nhooyr-go/8-2              179696448                6.678 ns/op    1198.02 MB/s           0 B/op          0 allocs/op
Benchmark_mask/arm64/nhooyr-go/16-2             171135552                7.011 ns/op    2282.01 MB/s           0 B/op          0 allocs/op
Benchmark_mask/arm64/nhooyr-go/32-2             163356070                7.345 ns/op    4356.99 MB/s           0 B/op          0 allocs/op
Benchmark_mask/arm64/nhooyr-go/128-2            100000000               10.21 ns/op     12531.93 MB/s          0 B/op          0 allocs/op
Benchmark_mask/arm64/nhooyr-go/256-2            73170615                16.29 ns/op     15715.26 MB/s          0 B/op          0 allocs/op
Benchmark_mask/arm64/nhooyr-go/512-2            42342985                28.30 ns/op     18091.30 MB/s          0 B/op          0 allocs/op
Benchmark_mask/arm64/nhooyr-go/1024-2           22871635                52.36 ns/op     19557.29 MB/s          0 B/op          0 allocs/op
Benchmark_mask/arm64/nhooyr-go/2048-2           11953033               100.4 ns/op      20390.69 MB/s          0 B/op          0 allocs/op
Benchmark_mask/arm64/nhooyr-go/4096-2            6098042               196.7 ns/op      20824.14 MB/s          0 B/op          0 allocs/op
Benchmark_mask/arm64/nhooyr-go/8192-2            3083127               389.3 ns/op      21045.51 MB/s          0 B/op          0 allocs/op
Benchmark_mask/arm64/nhooyr-go/16384-2           1549681               773.8 ns/op      21172.69 MB/s          0 B/op          0 allocs/op
Benchmark_mask/arm64/wdvxdr1123-asm/8-2         239631874                5.007 ns/op    1597.77 MB/s           0 B/op          0 allocs/op
Benchmark_mask/arm64/wdvxdr1123-asm/16-2                246623696                4.874 ns/op    3282.49 MB/s           0 B/op          0 allocs/op
Benchmark_mask/arm64/wdvxdr1123-asm/32-2                224660503                5.343 ns/op    5989.62 MB/s           0 B/op          0 allocs/op
Benchmark_mask/arm64/wdvxdr1123-asm/128-2               146873190                8.150 ns/op    15705.46 MB/s          0 B/op          0 allocs/op
Benchmark_mask/arm64/wdvxdr1123-asm/256-2               100000000               11.35 ns/op     22548.91 MB/s          0 B/op          0 allocs/op
Benchmark_mask/arm64/wdvxdr1123-asm/512-2               66308772                18.03 ns/op     28401.24 MB/s          0 B/op          0 allocs/op
Benchmark_mask/arm64/wdvxdr1123-asm/1024-2              38059369                31.39 ns/op     32624.65 MB/s          0 B/op          0 allocs/op
Benchmark_mask/arm64/wdvxdr1123-asm/2048-2              20609492                58.09 ns/op     35258.25 MB/s          0 B/op          0 allocs/op
Benchmark_mask/arm64/wdvxdr1123-asm/4096-2              10760130               111.5 ns/op      36737.95 MB/s          0 B/op          0 allocs/op
Benchmark_mask/arm64/wdvxdr1123-asm/8192-2               5494204               218.4 ns/op      37501.18 MB/s          0 B/op          0 allocs/op
Benchmark_mask/arm64/wdvxdr1123-asm/16384-2              2776998               432.0 ns/op      37923.91 MB/s          0 B/op          0 allocs/op
Benchmark_mask/arm64/gorilla/8-2                        126019189                9.511 ns/op     841.13 MB/s           0 B/op          0 allocs/op
Benchmark_mask/arm64/gorilla/16-2                       87176002                13.69 ns/op     1168.45 MB/s           0 B/op          0 allocs/op
Benchmark_mask/arm64/gorilla/32-2                       79482931                15.03 ns/op     2129.34 MB/s           0 B/op          0 allocs/op
Benchmark_mask/arm64/gorilla/128-2                      51963406                23.05 ns/op     5552.79 MB/s           0 B/op          0 allocs/op
Benchmark_mask/arm64/gorilla/256-2                      35389480                33.79 ns/op     7576.26 MB/s           0 B/op          0 allocs/op
Benchmark_mask/arm64/gorilla/512-2                      21703480                55.26 ns/op     9265.94 MB/s           0 B/op          0 allocs/op
Benchmark_mask/arm64/gorilla/1024-2                     12215022                98.20 ns/op     10427.73 MB/s          0 B/op          0 allocs/op
Benchmark_mask/arm64/gorilla/2048-2                      6514315               184.3 ns/op      11113.19 MB/s          0 B/op          0 allocs/op
Benchmark_mask/arm64/gorilla/4096-2                      3286785               365.1 ns/op      11219.00 MB/s          0 B/op          0 allocs/op
Benchmark_mask/arm64/gorilla/8192-2                      1691893               709.5 ns/op      11545.83 MB/s          0 B/op          0 allocs/op
Benchmark_mask/arm64/gorilla/16384-2                      855566              1397 ns/op        11726.46 MB/s          0 B/op          0 allocs/op
Benchmark_mask/arm64/gobwas/8-2                         170618257                7.011 ns/op    1141.12 MB/s           0 B/op          0 allocs/op
Benchmark_mask/arm64/gobwas/16-2                        138242857                8.621 ns/op    1855.95 MB/s           0 B/op          0 allocs/op
Benchmark_mask/arm64/gobwas/32-2                        100000000               11.73 ns/op     2729.19 MB/s           0 B/op          0 allocs/op
Benchmark_mask/arm64/gobwas/128-2                       39364594                30.25 ns/op     4230.99 MB/s           0 B/op          0 allocs/op
Benchmark_mask/arm64/gobwas/256-2                       22373491                54.43 ns/op     4703.68 MB/s           0 B/op          0 allocs/op
Benchmark_mask/arm64/gobwas/512-2                       11536062               105.1 ns/op      4873.04 MB/s           0 B/op          0 allocs/op
Benchmark_mask/arm64/gobwas/1024-2                       5862846               201.1 ns/op      5091.32 MB/s           0 B/op          0 allocs/op
Benchmark_mask/arm64/gobwas/2048-2                       2996881               397.3 ns/op      5154.43 MB/s           0 B/op          0 allocs/op
Benchmark_mask/arm64/gobwas/4096-2                       1488253               820.2 ns/op      4993.60 MB/s           0 B/op          0 allocs/op
Benchmark_mask/arm64/gobwas/8192-2                        770410              1599 ns/op        5123.95 MB/s           0 B/op          0 allocs/op
Benchmark_mask/arm64/gobwas/16384-2                       373005              3122 ns/op        5248.40 MB/s           0 B/op          0 allocs/op
Benchmark_mask/arm64/nbio/8-2                           224579071                5.341 ns/op    1497.82 MB/s           0 B/op          0 allocs/op
Benchmark_mask/arm64/nbio/16-2                          189027944                6.344 ns/op    2521.98 MB/s           0 B/op          0 allocs/op
Benchmark_mask/arm64/nbio/32-2                          143714523                8.347 ns/op    3833.80 MB/s           0 B/op          0 allocs/op
Benchmark_mask/arm64/nbio/128-2                         100000000               10.35 ns/op     12369.09 MB/s          0 B/op          0 allocs/op
Benchmark_mask/arm64/nbio/256-2                         69961245                17.03 ns/op     15031.81 MB/s          0 B/op          0 allocs/op
Benchmark_mask/arm64/nbio/512-2                         39476787                30.39 ns/op     16845.45 MB/s          0 B/op          0 allocs/op
Benchmark_mask/arm64/nbio/1024-2                        20975644                57.11 ns/op     17930.66 MB/s          0 B/op          0 allocs/op
Benchmark_mask/arm64/nbio/2048-2                        10845069               110.5 ns/op      18530.47 MB/s          0 B/op          0 allocs/op
Benchmark_mask/arm64/nbio/4096-2                         5520278               217.4 ns/op      18844.48 MB/s          0 B/op          0 allocs/op
Benchmark_mask/arm64/nbio/8192-2                         2782627               431.4 ns/op      18989.59 MB/s          0 B/op          0 allocs/op
Benchmark_mask/arm64/nbio/16384-2                        1397580               858.8 ns/op      19077.29 MB/s          0 B/op          0 allocs/op
PASS

@nhooyr
Copy link
Contributor

nhooyr commented Nov 20, 2023

Right on, thanks for testing @dixyes

@nightwolfz
Copy link

Finally gotten around to reviewing this. I'm not very familiar with writing assembly of any kind. Why use AVX2 instead of AVX-512?

AVX-512 is not widely supported, while AVX2 is everywhere.

I'm just not good enough at assembly. I added tests to confirm that @wdvxdr's
implementation works correctly and matches the output of the basic masking loop.
nhooyr added a commit to wdvxdr1123/websocket that referenced this pull request Feb 22, 2024
Standard library does this too. Unfortunate wish they just exposed it in the
standard library. Perhaps we can isolate the specific code we need later.
@nhooyr
Copy link
Contributor

nhooyr commented Feb 22, 2024

Final results:

goos: linux
goarch: amd64
pkg: nhooyr.io/websocket/internal/thirdparty
cpu: 12th Gen Intel(R) Core(TM) i5-1235U
Benchmark_mask/amd64/basic/8-12 	423375534	         2.786 ns/op	2871.05 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/amd64/basic/16-12         	226554633	         5.359 ns/op	2985.68 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/amd64/basic/32-12         	117482640	        10.19 ns/op	3140.90 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/amd64/basic/128-12        	26246637	        45.81 ns/op	2794.00 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/amd64/basic/256-12        	14100849	        84.95 ns/op	3013.68 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/amd64/basic/512-12        	 7287253	       165.2 ns/op	3098.76 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/amd64/basic/1024-12       	 3688262	       320.3 ns/op	3197.24 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/amd64/basic/2048-12       	 1888688	       638.6 ns/op	3207.04 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/amd64/basic/4096-12       	  939709	      1275 ns/op	3212.55 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/amd64/basic/8192-12       	  416410	      2533 ns/op	3233.74 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/amd64/basic/16384-12      	  237880	      5075 ns/op	3228.53 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/amd64/nhooyr-go/8-12      	516842565	         2.323 ns/op	3443.66 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/amd64/nhooyr-go/16-12     	512148457	         2.321 ns/op	6895.02 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/amd64/nhooyr-go/32-12     	463799696	         2.554 ns/op	12531.05 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/amd64/nhooyr-go/128-12    	305272117	         3.889 ns/op	32909.16 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/amd64/nhooyr-go/256-12    	186344584	         6.533 ns/op	39186.37 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/amd64/nhooyr-go/512-12    	98735030	        10.37 ns/op	49364.30 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/amd64/nhooyr-go/1024-12   	60532092	        20.18 ns/op	50735.99 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/amd64/nhooyr-go/2048-12   	31890501	        36.09 ns/op	56745.07 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/amd64/nhooyr-go/4096-12   	15045230	        79.13 ns/op	51760.10 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/amd64/nhooyr-go/8192-12   	 7874872	       152.5 ns/op	53720.47 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/amd64/nhooyr-go/16384-12  	 3976707	       300.0 ns/op	54621.87 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/amd64/wdvxdr1123-asm/8-12 	565721422	         2.087 ns/op	3833.34 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/amd64/wdvxdr1123-asm/16-12         	490515590	         2.396 ns/op	6678.41 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/amd64/wdvxdr1123-asm/32-12         	499705630	         2.309 ns/op	13859.26 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/amd64/wdvxdr1123-asm/128-12        	349259366	         3.673 ns/op	34851.70 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/amd64/wdvxdr1123-asm/256-12        	121710386	        10.07 ns/op	25427.13 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/amd64/wdvxdr1123-asm/512-12        	100000000	        12.00 ns/op	42654.69 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/amd64/wdvxdr1123-asm/1024-12       	68401042	        17.57 ns/op	58296.87 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/amd64/wdvxdr1123-asm/2048-12       	38861618	        28.96 ns/op	70716.39 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/amd64/wdvxdr1123-asm/4096-12       	22134694	        53.55 ns/op	76483.15 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/amd64/wdvxdr1123-asm/8192-12       	12523645	        91.32 ns/op	89702.20 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/amd64/wdvxdr1123-asm/16384-12      	 6966129	       167.6 ns/op	97768.91 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/amd64/gorilla/8-12                 	306537969	         3.908 ns/op	2047.33 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/amd64/gorilla/16-12                	167440917	         7.127 ns/op	2245.06 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/amd64/gorilla/32-12                	157346451	         7.623 ns/op	4197.75 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/amd64/gorilla/128-12               	100000000	        10.17 ns/op	12590.73 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/amd64/gorilla/256-12               	91401891	        13.36 ns/op	19161.41 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/amd64/gorilla/512-12               	43890088	        26.60 ns/op	19246.01 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/amd64/gorilla/1024-12              	26414316	        41.59 ns/op	24621.32 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/amd64/gorilla/2048-12              	16049217	        71.19 ns/op	28766.12 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/amd64/gorilla/4096-12              	 9171207	       129.4 ns/op	31658.05 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/amd64/gorilla/8192-12              	 4856886	       250.7 ns/op	32674.27 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/amd64/gorilla/16384-12             	 2488569	       485.2 ns/op	33764.34 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/amd64/gobwas/8-12                  	366741759	         3.282 ns/op	2437.84 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/amd64/gobwas/16-12                 	303639134	         3.906 ns/op	4095.90 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/amd64/gobwas/32-12                 	223418820	         5.406 ns/op	5919.31 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/amd64/gobwas/128-12                	89532153	        13.94 ns/op	9180.17 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/amd64/gobwas/256-12                	39774794	        32.82 ns/op	7799.75 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/amd64/gobwas/512-12                	21657115	        53.08 ns/op	9646.12 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/amd64/gobwas/1024-12               	11203101	        97.40 ns/op	10513.88 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/amd64/gobwas/2048-12               	 6175005	       200.9 ns/op	10194.80 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/amd64/gobwas/4096-12               	 3083400	       390.6 ns/op	10487.27 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/amd64/gobwas/8192-12               	 1551018	       714.0 ns/op	11473.42 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/amd64/gobwas/16384-12              	  847084	      1428 ns/op	11476.19 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/amd64/nbio/8-12                    	640919714	         1.895 ns/op	4220.73 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/amd64/nbio/16-12                   	523854591	         2.453 ns/op	6522.16 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/amd64/nbio/32-12                   	344619900	         3.268 ns/op	9793.04 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/amd64/nbio/128-12                  	281670219	         4.072 ns/op	31433.68 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/amd64/nbio/256-12                  	164968168	         7.219 ns/op	35463.76 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/amd64/nbio/512-12                  	82934056	        13.82 ns/op	37060.27 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/amd64/nbio/1024-12                 	48002257	        22.96 ns/op	44599.52 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/amd64/nbio/2048-12                 	29191290	        41.93 ns/op	48845.44 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/amd64/nbio/4096-12                 	14418003	        84.95 ns/op	48215.55 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/amd64/nbio/8192-12                 	 7101901	       161.0 ns/op	50892.32 MB/s	       0 B/op	       0 allocs/op
Benchmark_mask/amd64/nbio/16384-12                	 3655984	       353.4 ns/op	46365.54 MB/s	       0 B/op	       0 allocs/op
PASS
ok  	nhooyr.io/websocket/internal/thirdparty	94.759s

Thanks again @wdvxdr1123 and sorry for the large delay.

@nhooyr nhooyr merged commit 8a54c1b into coder:dev Feb 22, 2024
4 checks passed
nhooyr added a commit to alixander/websocket that referenced this pull request Apr 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants