-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Chacha20-Poly1305 encryption #14249
base: master
Are you sure you want to change the base?
Chacha20-Poly1305 encryption #14249
Conversation
be24166
to
22644d4
Compare
Last push has support for FreeBSD, which is very straightforward because all the crypto code is already available in the kernel. Importantly, I've done a interop test: a pool with a |
I appreciate very much you made it part of ICP, it means nearly no work needs to be done for me. |
So, what's the next step for this? If its just waiting until volunteers have time to review, no problem, I get it! If there's something I could do or need to do though, I'd like to get on to that. (Should I be dropping in on the monthly call to talk about it?) |
Yes. There are a number of PRs waiting review and reviewers have limited bandwidth to look at them. Larger PRs like this take substantially longer to see review.
That might help attract reviewers. |
Hey there, I must admit I'm a little bit confused here. Linux has already support for Chacha20Poly1305 under lib/ and relevant arch/, why do we need another implementation? |
@evrim there's two ways to look at it. One is that Linux's crypto code is only available to GPL kernel modules, which OpenZFS is not. The other is that by doing it in the ICP, other ICP-using ports of OpenZFS (Windows and macOS) can use it. |
@jittygitty there's no code in ZFS right now that can call into the kernel crypto API, regardless of how you build it - you will always be using the ICP for crypto even for AES modes. You'd need a variant of zio_crypt for that, which is well outside the scope of this PR. |
We talked about this at the 31 Jan call. Feedback was generally favourable, thanks all! I was waiting to check my notes against the recording but that hasn't surfaced yet and I want to get moving on this a bit this week. So here's what I wrote down. If it turns out I got something wrong, I will of course adjust it.
Finally, and most importantly, it was agreed that this is obviously a nice feature, but if no one else wants it then perhaps it we shouldn't be taking on the maintenance burden. I will soon go and try to get a rough show of hands to see if anyone would else would use this. Probably, a carefully-worded query to the mailing list, discussion forum and reddit, and go from there. I'll return this to draft status for the moment, and add some additional todo items to the top post. |
I find this feature usefull and would help with this PR if you want. There is also a rfc7905 which update TLS / DTLS with ChaCha20-Poly1305 - so why should OpenZFS not go with the time and have this option. |
I did all the testing I wanted to do with a view to adding a feature flag. I created a new pool on Linux with master 620a977 + this PR, exported it, then rebuilt on master alone and imported the pool. All the encryption properties are returned except for
This appears to be because the underlying value is out bounds for this enum, so the default ( Attempting to load the key fails, in a slightly weird way:
and of course it can't be mounted:
I say it fails in a slightly weird way, because I think its just lucky this time. If we're compiled with
When asserts are disabled, it seems like its only doing ok because the crypto table entry does actually exist for I do think that behaviour is lucky. For the next new cipher suite So probably this alone is enough to suggest a feature flag is warranted. |
I did further send/receive testing as well. When sender and receiver both support chapoly, it works exactly as it does for AES - raw send doesn't require the dataset loaded on the sender, not-raw does. All fine. When there's no chapoly support, raw send still works. A stream is produced, and if that's later received to a chapoly-supporting pool, it goes in just fine, key can be loaded and it can't be mounted. Importing a chapoly stream into a pool with no support fails:
Its not totally clear to me without looking inside that that's indicating. Presumably, invalid property of some sort. I'll check more closely tomorrow. But again, likely the a feature flag is going to benefit. |
@mcmilk sorry I missed this! Thanks! At this point I don't think there's much else to do other than as much code review as we can. The main thing I want to feel good about is that I'm not misusing the crypto stuff in some important way. General bugs would be bad but survivable, but the crypto being no good is worse than not having it at all. I think its probably fine, but if you've got good ideas that'd be great. |
Ehh, I've got my brain in backwards this evening. A feature in the dataset isn't going to make an old implementation magically able to cope with it. That said, I'll see if I can do something now to help for the next new crypto algorithm. |
@ryao I’ve just ran Monocypher’s speed tests under @mcmilk before resorting to assembly you might want to take a look at compiler intrinsics instead. I would start with Chacha20. Some day I’ll write a high performance version of Monocypher for vector capable 64-bit processors, but if you don’t want to wait an unspecified amount of time I recommend you take a look at Dolbeau’s implementation (featured in libsodium). |
It is possible to implement high level vector operations in GNU C and it is much easier to use/review than compiler intrinsics. In addition, the same code can be used on multiple architectures, which is more difficult to do with compiler intrinsics. Clang and GCC also do not always agree on compiler intrinsics, but they do agree on GNU C’s vector support, so you have a choice of both compilers. You can see me use GNU C’s vector extensions here: The biggest downside is that GCC does not optimize it correctly when you have a larger vector width than the hardware due to a vector lowering bug (with the workaround I found only applying to fairly recent GCC versions), so my plan is to cache the assembly output from clang in the source tree and build with that when gcc is used. When the compiler is doing good optimization, the end result can outperform hand written assembly. The WIP code in that PR already outperforms the existing hard written assembly because the compiler was able to do loop optimizations that are not possible with inline assembly since people had kept the loop in C. I am not sure offhand if compiler intrinsics would be similarly optimized. The other downsides of GNU C’s vector extension are that the builtins intended to extend it are not supported in all of the compiler versions we support and getting the best results requires tweaking the code in ways that the existing optimization passes can handle in places where it is obvious that the compiler’s optimization passes are weak when looking at the assembly output. Intel has its own alternative that is likely similarly easy to use and according to Intel, is more performant than compiler intrinsics, but it presumably is limited to x86: |
Ah, I see. Caching generated assembly might be good too. Note that Chacha20 has 32 integers worth of state, which means as many vectors, plus any intermediate variable. I believe the typical CPU doesn’t have that many vectors, so you’d have to suffer manual register allocation. I wouldn’t look forward to that, so it’s good that you can steal generated code from a non-buggy compiler. |
I just realized that this conflicts with my earlier suggestion that Monocypher is a candidate to rewrite in WUFFS. If you were to adopt WUFFS, you would be forced to use compiler intrinsics to make a vectorized version with WUFFS since WUFFS does not support GNU C's vector extension. However, you could use GNU C for an initial version, do some tweaking to get good code generation out of clang based on how the code performs and then rewrite it in WUFFS using intrinsics that map the the instruction choices that the compiler made. That would likely give a better result than manually guessing which intrinsics are the most appropriate. Passing the WUFFS compiler would would prove certain properties of the code, although implementing vectorized code in WUFFS would have the downside that you would need to implement a different version for each major architecture unless you were to rely on SIMDe. Ironically, SIMDe has code to use the GNU C vector extension, which would bring things back to GNU C vector code, although it would likely be less optimal than the hypothetical original that I suggested. By the way, it occurs to me that there is an easier way to prove the absence of integer overflows in monocypher than a rewrite in WUFFS, which would be to run the code through NASA' IKOS: https://github.com/NASA-SW-VnV/ikos There is a presentation on it here, which includes a list of the various issues that it is designed to detect: https://ntrs.nasa.gov/api/citations/20220007431/downloads/SWS_TC3_IKOS_Tutorial.pdf If it complains, it does not necessarily mean that there is an issue, but if it does not complain, then it has proven the absence of the issues that it is designed to detect, which includes integer overflow issues. It is unfortunate that IKOS does not support concurrency, which has kept me from using it to find bugs in ZFS. |
It turns out that IKOS can prove a superset of what the WUFFS compiler proves, so if the monocypher code passes inspection by IKOS, it would be proven to have all of the correctness properties that the WUFFS compiler guarantees (minus hermeticity, which IKOS does not check, although it is fairly obvious that the monocypher code is hermetic). It occurs to me that we should look into using IKOS for the portions of ZFS that are non-concurrent, like compression/decompression, encryption/decryption, checksumming and parity calculations. Since it relies on the Clang frontend, it presuambly can check code using GNU C's vector extension too. :) |
@behlendorf thanks for the nice list, getting on it tomorrow!
I didn't really want to do a feature flag, because it seems like missing support means being unable to unlock and mount the dataset, but still allows the pool to be imported, scrubbed and otherwise manipulated, and not being able to unlock the dataset is actually a stated feature of ZFS encryption (remote backups without needing the keys). Blocking import entirely feels like a bit much. This is what I was hoping to head off with #14577 (which I have to get back into 2.1). But, receive doesn't work (even though it could), and obviously none of this helps older ZFS, which at least will check for feature flags, and tripping assertions is an indicator of some danger. So I'll do it, and I'm not really arguing, just a bit bummed about it.
Sorry, what are you asking for specifically? Right now I suppose we could do a feature flag check at that point as well, and not accept the stream if the admin has explicitly disabled the feature.
I've already added I didn't check them all, but a quick look suggests the other tests don't seem to be dependent on the type of crypto in use. I could change the ones that explicitly call for an AES suite to just say The receive tests will need to make sure they do the right thing when the feature is disabled, of course.
I am satisfied there's no problem per #14249 (comment). Please mark them as false positives. |
02498c9
to
5924253
Compare
Top commit adds |
aa3c336
to
5c73546
Compare
Hi Rob,
I am running acceleration at network level and have dabbled with zfs and
intel QAT with some good results.
At the moment we have a decent setup with xeon gold cpu 1.2tbps with
enterprise switches etc we're having some decent results with ipsec and
would look forward to testing your zfs updates, exiting ethernet to storage
is always an interesting challenge.
Let me know how we can test your updates and we can measure the end to end.
…On Fri, 2 Dec 2022, 12:29 Rob N ★, ***@***.***> wrote:
I’ve added a new encryption option to OpenZFS, chacha20-poly1305, which
implements the Chacha20-Poly1305 AEAD
<https://en.wikipedia.org/wiki/ChaCha20-Poly1305> described in RFC 8439
<https://datatracker.ietf.org/doc/html/rfc8439>. I’m interested in
feedback, code review, testing and benchmarking from anyone interested in
such things!
Important note: I have no particular skill in math, cryptography or secure
software. I’m as confident as a reasonably good programmer can be that this
is right, but I wouldn’t go trusting your secrets to this just yet!
Closes #8679 <#8679>.
Rationale
AES (and particularly the default AES-GCM mode used in OpenZFS) is known
to be very slow on systems without hardware assistance. There are many such
machines out there could make good use of OpenZFS, especially low-cost
machines and small boards that would otherwise make very nice storage
machines. The Raspberry Pi series of machines are a good example.
The best option for these systems is an encryption option that performs
well in software. Chacha20-Poly1305 is the current “standard” option for
this in many contexts, and is a good choice for OpenZFS.
More information about the rationale can be found in Google’s work on
Adiantum
<https://security.googleblog.com/2019/02/introducing-adiantum-encryption-for.html>
(though their solution is necessarily different because unlike OpenZFS it
operates on whole disk blocks).
Implementation Core algorithms
I’ve taken the “core” Chacha20 and Poly1305 implementations from Loup
Vaillant’s Monocypher <https://monocypher.org/>. They are small and easy
to read and understand, and the author has written extensively about its
development and in particular the mistakes made along the way, which (at
least to me) inspires confidence in the implementation.
I considered a number of other implementations, including those in Sodium
<https://doc.libsodium.org/> and OpenSSL <https://www.openssl.org/> as
well as a number of other smaller implementations. Despite the relative
simplicity of the algorithms, there is lot of variability in the overall
quality and usability of different implementations, so I opted for the most
straightforward ones I could find.
ICP/KCF module
I’ve written a new KCF-style module for the ICP, that implements the AEAD
(the “construction” that binds the cipher (ChaCha20) and authenticator
(Poly1305) together) and the glue to make it available to OpenZFS proper.
Illumos does not currently support these algorithms, so it wasn’t possible
to just take one from there. I’ve implemented a module, however it only
implements what OpenZFS needs and is not sufficiently general-purpose to
contribute back to Illumos. I have no plans to do this.
The rest of the module is as simple as I could figure out how to make,
especially in some of the buffer management which is quite complex in the
AES modes.
Tests
I’ve added a test program that runs the test vectors in RFC 8439. For
Chacha20 and Poly1305 this is mostly a basic sanity check to assist in the
future with possible alternate implementations (eg hardware-accelerated
versions), as there are many implementations out there that have surprising
quirks even when used correctly (most commonly endianness problems).
There’s also a full module test, checking that the AEAD is constructed
correctly (again using the test vectors) and also makng sure the ICP module
interface works.
The tests are only there to confirm basic operation; they have nothing to
say about security properties or correctness.
Performance
I have only done very light performance tests to gain a rough idea of how
much “better” or “worse” this is than aes-256-gcm on a handful of
machines I have here. I do not know how to drive fio very well and I did
not make much effort to control variables like other programs running on
the machine, so I will not defend these numbers very hard. I’ve run them
enough times to be reasonably confident they’re not wildly off the mark.
Method and fio output:
https://gist.github.com/robn/4b40251ba84728cf8e669649ce77ad56
- Broadcom BCM2711 (Raspberry Pi 4B): ~2.4x faster (no AES hardware
accel)
- Celeron J3455 (Fitlet2): ~1.3x faster (partial AES hardware accel
<#10846>)
- Core i7-8565U (Thinkpad X1 Gen7): ~1.2x slower (full AES hardware
accel)
(ChaPoly in software coming so close to AES in hardware makes me think
there’s probably still optimisation opportunities in the AES code, but I
have not really considered this even a little bit).
Other notes
- This is only for Linux right now. It should be straightforward to
implement for FreeBSD to, which has the necessary crypto primitives
available in the kernel. I will try to add that soon.
- Related to that, I have not fully updated the tests yet because I
don’t want to them to break any automatic PR test stuff that might run on
FreeBSD.
- XChacha20 (the “better” Chacha variant) isn’t an option because
OpenZFS cannot provide a large enough nonce to initialise it. In practice
it wouldn’t matter because OpenZFS already takes steps to mitigate
nonce reuse
<https://github.com/openzfs/zfs/blob/fe975048da29c4756bafd9f63a192db17e3acb7c/module/os/linux/zfs/zio_crypt.c#L166>
.
Future work
Just stuff I thought about along the way that I’m not committing to but
might like to look at in the future.
- I want to at least try accelerated versions (read: assembly
implementations of Chacha20 and/or Poly1305) of this in the future, to see
if there are significant gains to be had. That will likely make a runtime
selection mechanism necessary (like icp_aes_impl and icp_gcm_impl).
- The entire ICP/KCF system has a lot of unused stuff and overhead
that I’d like to see radically slimmed down. I think its probably possible
lift zio_crypt.c up to be common to all ports, with a much smaller
layer that calls down to OS-provided cryptographic functions or uses the
bundled ones where the system doesn’t have something available. That’s not
just for Linux, but would help in (for example) Illumos, which has AES in
the kernel but not ChaPoly. I’d probably wait to see what the Windows and
Mac ports look like when they’re finally merged before considering this
seriously.
Types of changes
- Bug fix (non-breaking change which fixes an issue)
- New feature (non-breaking change which adds functionality)
- Performance enhancement (non-breaking change which improves
efficiency)
- Code cleanup (non-breaking change which makes code smaller or more
readable)
- Breaking change (fix or feature that would cause existing
functionality to change)
- Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and
libzfsbootenv)
- Documentation (a change to man pages or other documentation)
Checklist:
- My code follows the OpenZFS code style requirements
<https://github.com/openzfs/zfs/blob/master/.github/CONTRIBUTING.md#coding-conventions>
.
- I have updated the documentation accordingly.
- I have read the *contributing* document
<https://github.com/openzfs/zfs/blob/master/.github/CONTRIBUTING.md>.
- I have added tests <https://github.com/openzfs/zfs/tree/master/tests>
to cover my changes.
- I have run the ZFS Test Suite with this change applied.
- All commit messages are properly formatted and contain Signed-off-by
<https://github.com/openzfs/zfs/blob/master/.github/CONTRIBUTING.md#signed-off-by>
.
------------------------------
You can view, comment on, or merge this pull request online at:
#14249
Commit Summary
- 64129b8
<64129b8>
monocypher: import 3.1.3
- 392c964
<392c964>
chapoly: initial plumbing
- 0132699
<0132699>
chapoly: stub KCF provider (no-op)
- a344738
<a344738>
chapoly: hook up all the crypto
- 9c5dcb5
<9c5dcb5>
chapoly: functional tests
File Changes
(15 files <https://github.com/openzfs/zfs/pull/14249/files>)
- *M* include/sys/crypto/common.h
<https://github.com/openzfs/zfs/pull/14249/files#diff-084479388b1e1175be381ba27660fc1377ef074cd5ad2299aaecae5b1e229737>
(1)
- *M* include/sys/crypto/icp.h
<https://github.com/openzfs/zfs/pull/14249/files#diff-617e85596d5234d965cef1ca9deca7df8db846633059695f8c8c7e1a21522be8>
(3)
- *M* include/sys/fs/zfs.h
<https://github.com/openzfs/zfs/pull/14249/files#diff-e8559afd366a1e820c394a67bf81d41785e818c966d6487758ede31d9094dc33>
(1)
- *M* include/sys/zio_crypt.h
<https://github.com/openzfs/zfs/pull/14249/files#diff-08cdd543943a873fa07a540f59e9878cd4126595be320d1ac5e9ff9d520d1ae8>
(5)
- *M* lib/libicp/Makefile.am
<https://github.com/openzfs/zfs/pull/14249/files#diff-3337a009199385386620ef01e5becd79c35e80181563aa4bbeeb12cc21a1424c>
(2)
- *M* man/man7/zfsprops.7
<https://github.com/openzfs/zfs/pull/14249/files#diff-359f1ade066146abed7dbb2bc034423d52b3d5c03d5ce54db39712839c4093c0>
(2)
- *M* module/Kbuild.in
<https://github.com/openzfs/zfs/pull/14249/files#diff-2a755b82e21c9b3e2c6b817d402bf00bea411f375331ea09c926130fad5287fb>
(2)
- *M* module/icp/illumos-crypto.c
<https://github.com/openzfs/zfs/pull/14249/files#diff-65b2c3fb05b1118c5b8d8c16566684931a3c64ecafa0ff2794c2824cba9908b7>
(2)
- *A* module/icp/include/monocypher.h
<https://github.com/openzfs/zfs/pull/14249/files#diff-8c5c304db996000cd41c8a012a3bdcb5c3cc943cefb90db7bf289cbe4d678880>
(127)
- *A* module/icp/io/chapoly.c
<https://github.com/openzfs/zfs/pull/14249/files#diff-8a534fc39708d7ac46d22aba2d70c671979410c913f24f47d34da4ed7d003a1e>
(479)
- *A* module/icp/monocypher.c
<https://github.com/openzfs/zfs/pull/14249/files#diff-b80243c609be3ac5e82270c8209d178e02d6b4cf4f79f167c7995c3aa041bad6>
(385)
- *M* module/os/linux/zfs/zio_crypt.c
<https://github.com/openzfs/zfs/pull/14249/files#diff-7ea6d8764e5e54494c248ffeb0ef512a6846d60643e608319fe5a594dec1c79b>
(19)
- *M* module/zcommon/zfs_prop.c
<https://github.com/openzfs/zfs/pull/14249/files#diff-819ca46019f34284194f31581feb138a06aae1fe0eec73643af704e3dd882c36>
(5)
- *M* tests/zfs-tests/Makefile.am
<https://github.com/openzfs/zfs/pull/14249/files#diff-73d6ed7f071c56b1c5ed506bf0fe485409f4d359afd722183b5d0ee32adaa7d3>
(5)
- *A* tests/zfs-tests/tests/functional/chapoly/chapoly_test.c
<https://github.com/openzfs/zfs/pull/14249/files#diff-47d76bf56d9948f94c977461864b5bb44524cfe0fe394f190a0e842813ce9c02>
(813)
Patch Links:
- https://github.com/openzfs/zfs/pull/14249.patch
- https://github.com/openzfs/zfs/pull/14249.diff
—
Reply to this email directly, view it on GitHub
<#14249>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ALGMUCILTE6DWOT7TW5T5XLWLHMQHANCNFSM6AAAAAASR2BTVQ>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
c77d92a
to
245e9ac
Compare
const u32 s4 = h4 + end; | ||
|
||
// (h + c) * r, without carry propagation | ||
const u64 x0 = s0*r0+ s1*rr3+ s2*rr2+ s3*rr1+ s4*rr0; |
Check failure
Code scanning / CodeQL
Multiplication result converted to larger type High
|
||
// (h + c) * r, without carry propagation | ||
const u64 x0 = s0*r0+ s1*rr3+ s2*rr2+ s3*rr1+ s4*rr0; | ||
const u64 x1 = s0*r1+ s1*r0 + s2*rr3+ s3*rr2+ s4*rr1; |
Check failure
Code scanning / CodeQL
Multiplication result converted to larger type High
// (h + c) * r, without carry propagation | ||
const u64 x0 = s0*r0+ s1*rr3+ s2*rr2+ s3*rr1+ s4*rr0; | ||
const u64 x1 = s0*r1+ s1*r0 + s2*rr3+ s3*rr2+ s4*rr1; | ||
const u64 x2 = s0*r2+ s1*r1 + s2*r0 + s3*rr3+ s4*rr2; |
Check failure
Code scanning / CodeQL
Multiplication result converted to larger type High
const u64 x0 = s0*r0+ s1*rr3+ s2*rr2+ s3*rr1+ s4*rr0; | ||
const u64 x1 = s0*r1+ s1*r0 + s2*rr3+ s3*rr2+ s4*rr1; | ||
const u64 x2 = s0*r2+ s1*r1 + s2*r0 + s3*rr3+ s4*rr2; | ||
const u64 x3 = s0*r3+ s1*r2 + s2*r1 + s3*r0 + s4*rr3; |
Check failure
Code scanning / CodeQL
Multiplication result converted to larger type High
Last push updates Monocypher to 4.0.2, adds a test to ensure replication streams with encrypted datasets are accepted or not depending on whether or not the feature flag is set, and collapses it down into a single commit. The CodeQL failures at the same as the ones we saw right at the beginning, and are safe. @behlendorf how do you feel about this nearly a year on? :) I don't think there's any more to do. |
Did anyone thought about wireguard zinc implementation like in comment: #8679 (comment) Today we will only have implementation which don't use vector extensions targeted only at raspbery pi like devices :( |
Hi, thanks for working on this! I decided to try this out on a Raspberry Pi 4B (4 core, 4 GB RAM) and a 1 TB HDD connected via a USB enclosure. One issue though, I noticed I'm only getting about 30 MB/s read speeds and massive amounts of CPU usage. I can get 100 MB/s write speeds just fine (and this matches the speed of the chacha20 Rust crate without neon). Without any encryption, I get 150 MB/s write and 170 MB/s read. Upon further inspection, it seems to be because of the massive amounts of messages in dmesg. I don't get them with AES-256-GCM though.
|
@ErrorNoInternet thanks for the report! I've pushed a small change that should at least deal with the log spam. |
Nice! Getting stable 110+ MB/s read speeds now :) Thank you! |
Great to know, thanks :) |
a972fb9
to
e270db8
Compare
This commit implements the Chacha20-Poly1305 AEAD from RFC 8439 as a new algorithm option for encrypted datasets. AES (and particularly the default AES-GCM mode used in OpenZFS) is known to be very slow on systems without hardware assistance. There are many such machines out there could make good use of OpenZFS, especially low-cost machines and small boards that would otherwise make very nice storage machines. The Raspberry Pi series of machines are a good example. The best option for these systems is an encryption option that performs well in software. Chacha20-Poly1305 is the current "standard" option for this in many contexts, and is a good choice for OpenZFS. The core Chacha20 and Poly1305 implementations are taken from Loup Valliant's Monocypher. These were chosen because they are compact, easy to read, easy to use and the author has written extensively about its development, all of which give me confidence that there are unlikely to be any surprises. I've added a KCF-style module to the ICP to implement the AEAD. This implements just enough for OpenZFS, and is not suitable as a general-purpose KCF for Illumos (though it could be the starting point for one). For FreeBSD, which does not use the ICP, I've instead hooked it up to FreeBSD's builtin crypto stack. The rest is adding an enabling property value and a feature flag and and hooking it up to all the right touch points, and documentation updates. The existing tests that cycle through the possible encryption options have been extended to add one more. I've added a test to ensure that raw receives of chacha20-poly1305 datasets do the right thing based on the state of the feature flag on the receiving side. There's also a test unit that runs the test vectors in RFC 8439 against Chacha20, Poly1305 and the AEAD in the ICP that combines them. This is most useful as a sanity check during future work to add alternate (accelerated) implementations. Finally, manual interop testing has been done to confirm that pools and streams can be moved between Linux and FreeBSD correctly. Light and uncontrolled performance testing on a Raspberry Pi 4B (Broadcom BCM2711, no hardware AES) writing to a chacha20-poly1305 dataset was ~2.4x faster than aes-256-gcm on the same hardware. On a Fitlet2 (Celeron J3455, AES-NI but no AVX (openzfs#10846)) it was ~1.3x faster. Sponsored-by: https://despairlabs.com/sponsor/ Signed-off-by: Rob Norris <robn@despairlabs.com>
I’ve added a new encryption option to OpenZFS,
chacha20-poly1305
, which implements the Chacha20-Poly1305 AEAD described in RFC 8439. I’m interested in feedback, code review, testing and benchmarking from anyone interested in such things!Important note: I have no particular skill in math, cryptography or secure software. I’m as confident as a reasonably good programmer can be that this is right, but I wouldn’t go trusting your secrets to this just yet!
Closes #8679.
Rationale
AES (and particularly the default AES-GCM mode used in OpenZFS) is known to be very slow on systems without hardware assistance. There are many such machines out there could make good use of OpenZFS, especially low-cost machines and small boards that would otherwise make very nice storage machines. The Raspberry Pi series of machines are a good example.
The best option for these systems is an encryption option that performs well in software. Chacha20-Poly1305 is the current “standard” option for this in many contexts, and is a good choice for OpenZFS.
More information about the rationale can be found in Google’s work on Adiantum (though their solution is necessarily different because unlike OpenZFS it operates on whole disk blocks).
Implementation
Core algorithms
I’ve taken the “core” Chacha20 and Poly1305 implementations from Loup Vaillant’s Monocypher. They are small and easy to read and understand, and the author has written extensively about its development and in particular the mistakes made along the way, which (at least to me) inspires confidence in the implementation.
I considered a number of other implementations, including those in Sodium and OpenSSL as well as a number of other smaller implementations. Despite the relative simplicity of the algorithms, there is lot of variability in the overall quality and usability of different implementations, so I opted for the most straightforward ones I could find.
ICP/KCF module
I’ve written a new KCF-style module for the ICP, that implements the AEAD (the “construction” that binds the cipher (ChaCha20) and authenticator (Poly1305) together) and the glue to make it available to OpenZFS proper.
Illumos does not currently support these algorithms, so it wasn’t possible to just take one from there. I’ve implemented a module, however it only implements what OpenZFS needs and is not sufficiently general-purpose to contribute back to Illumos. I have no plans to do this.
The rest of the module is as simple as I could figure out how to make, especially in some of the buffer management which is quite complex in the AES modes.
Tests
I’ve added a test program that runs the test vectors in RFC 8439. For Chacha20 and Poly1305 this is mostly a basic sanity check to assist in the future with possible alternate implementations (eg hardware-accelerated versions), as there are many implementations out there that have surprising quirks even when used correctly (most commonly endianness problems).
There’s also a full module test, checking that the AEAD is constructed correctly (again using the test vectors) and also makng sure the ICP module interface works.
The tests are only there to confirm basic operation; they have nothing to say about security properties or correctness.
Performance
I have only done very light performance tests to gain a rough idea of how much “better” or “worse” this is than
aes-256-gcm
on a handful of machines I have here. I do not know how to drivefio
very well and I did not make much effort to control variables like other programs running on the machine, so I will not defend these numbers very hard. I’ve run them enough times to be reasonably confident they’re not wildly off the mark.Method and fio output: https://gist.github.com/robn/4b40251ba84728cf8e669649ce77ad56
(ChaPoly in software coming so close to AES in hardware makes me think there’s probably still optimisation opportunities in the AES code, but I have not really considered this even a little bit).
Other notes
This is only for Linux right now. It should be straightforward to implement for FreeBSD too, which has the necessary crypto primitives available in the kernel. I will try to add that soon.Now with FreeBSD support.Related to that, I have not fully updated the tests yet because I don’t want to them to break any automatic PR test stuff that might run on FreeBSD.Done.TODO
After discussion in monthly call:
encryption
propertyFuture work
Just stuff I thought about along the way that I’m not committing to but might like to look at in the future.
icp_aes_impl
andicp_gcm_impl
).zio_crypt.c
up to be common to all ports, with a much smaller layer that calls down to OS-provided cryptographic functions or uses the bundled ones where the system doesn’t have something available. That’s not just for Linux, but would help in (for example) Illumos, which has AES in the kernel but not ChaPoly. I’d probably wait to see what the Windows and Mac ports look like when they’re finally merged before considering this seriously.Types of changes
Checklist:
Signed-off-by
.