Fboemer/faster decrypt #363

fboemer · 2021-06-28T17:53:53Z

Optimize CKKS and (to a lesser degree) BFV Decrypt. Also fixes BGV compilation error.

On ICX with clang-12:

with HEXL=ON, CKKS decrypt shows:

N	Time before (us)	Time after (us)	Speedup
1024	1.70	1.42	1.20x
2048	4.09	3.69	1.10x
4096	12.5	8.56	1.46x
8192	76.1	60.4	1.26x
16384	381	216	1.76x

with HEXL=OFF, CKKS decrypt shows:

N	Time before (us)	Time after (us)	Speedup
1024	6.16	5.29	1.16x
2048	11.9	10.1	1.18x
4096	48.1	39.9	1.20x
8192	189	156	1.21x
16384	805	635	1.26x

with HEXL=ON, BFV decrypt shows:

N	Time before (us)	Time after (us)	Speedup
1024	27.8	26.6	1.04x
2048	62.0	61.6	1.01x
4096	135	136	0.99x
8192	474	451	1.05x
16384	1692	1561	1.08x

fionser · 2021-06-29T01:36:42Z

A quick question, why write explicitly out the tile = 1024 would give a better performance than a simple for-loop ?
Or why the compiler would not optimize this for us :(.
Another question. Do you try to run this program on the same machine on parallel.
I mean when multiple SEAL's programs run on the same time, should the L1 Cache still in-and-out for these different program ?

native/src/seal/decryptor.cpp

WeiDaiWD · 2021-06-29T06:54:25Z

Thanks, Fabian! We are developing version 4.0.0 based on the BGV PR from Alibaba this summer. The progress is slow, like continental drift. :) For now, I'm restraining from releasing patches unless there is serious bug. PRs will be merged with more delay than usual. Please be patient with me.

fboemer · 2021-08-03T19:19:25Z

A quick question, why write explicitly out the tile = 1024 would give a better performance than a simple for-loop ?
Or why the compiler would not optimize this for us :(.

Another question. Do you try to run this program on the same machine on parallel.
I mean when multiple SEAL's programs run on the same time, should the L1 Cache still in-and-out for these different program ?

@fiosner, apologies for the delay on this.

Interestingly, I'm seeing some varying speedup/slowdown with the tiling approach, so I've removed the tiling for now.
I've only tested single-threaded. To my knowledge, the L1 cache is unique to each core, so there should be no L1 cache interference between different threads of SEALon different CPUs. In general, I would expect the L2 cache could create some slowdowns due to contention.

WeiDaiWD · 2021-09-15T05:02:29Z

Since this PR is made to contrib branch that includes BGV code, I cannot merge it with 3.6.6 easily. I applied your changes and named the commit as a pull request merge. So your commits won't be in the commit history; but your id and fork/branch/pr name will. I hope that it is fine with you.

fboemer added 2 commits June 28, 2021 10:49

Optimize CKKS and (to a lesser degree) BFV Decrypt.

e34116b

Fix BGV compilation

bd0ca48

WeiDaiWD reviewed Jun 29, 2021

View reviewed changes

native/src/seal/decryptor.cpp Show resolved Hide resolved

Remove tiling

75e0dda

fboemer force-pushed the fboemer/faster-decrypt branch from deef261 to 75e0dda Compare August 3, 2021 19:13

WeiDaiWD merged commit 427c55f into microsoft:contrib Sep 15, 2021

WeiDaiWD pushed a commit that referenced this pull request Sep 15, 2021

Merge pull request #363 from fboemer/fboemer/faster-decrypt

e966ba1

fboemer deleted the fboemer/faster-decrypt branch November 3, 2021 18:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fboemer/faster decrypt #363

Fboemer/faster decrypt #363

fboemer commented Jun 28, 2021

fionser commented Jun 29, 2021

WeiDaiWD commented Jun 29, 2021

fboemer commented Aug 3, 2021

WeiDaiWD commented Sep 15, 2021

Fboemer/faster decrypt #363

Fboemer/faster decrypt #363

Conversation

fboemer commented Jun 28, 2021

fionser commented Jun 29, 2021

WeiDaiWD commented Jun 29, 2021

fboemer commented Aug 3, 2021

WeiDaiWD commented Sep 15, 2021