Gcm #1908

jenswi-linaro · 2017-10-31T11:07:33Z

Hardware optimized AES-GCM implementation. Here's some figures from testing on Hikey:

"default" means what's on master today.

command: xtest --aes-perf -m ECB
default:
min=99.166us max=153.334us mean=107.972us stddev=3.59459us (cv 3.32917%) (9.04455MiB/s)

command: xtest --sha-perf -a SHA256
default:
min=105.833us max=149.167us mean=113.857us stddev=3.38391us (cv 2.97209%) (8.57713MiB/s)

command: xtest --aes-perf -m GCM

default:
min=1445.83us max=1503.33us mean=1471.38us stddev=9.70065us (cv 0.659288%) (0.663704MiB/s)

LTC + GHASH (commit "LTC: add GHASH acceleration")
min=178.334us max=222.5us mean=188.389us stddev=4.36719us (cv 2.31818%) (5.18376MiB/s)

LTC + GHASH + PMULL (commit "hikey: CFG_HWSUPP_PMULL=y")
min=170.833us max=210.834us mean=172.781us stddev=3.58542us (cv 2.07512%) (5.65202MiB/s)

With this pull request:
min=98.333us max=127.5us mean=102.794us stddev=3.55137us (cv 3.45483%) (9.50017MiB/s)

There's obviously rooms for improvements for AES-ECB mode as AES-GCM is a bit faster. I suspect that our SHA-256 implementation can be improved too.

jenswi-linaro · 2017-10-31T11:45:52Z

Numbers for new AES-GCM implementation withouth the fused AES+PMULL

min=153.333us max=178.333us mean=155.11us stddev=3.2519us (cv 2.09652%) (6.29596MiB/s)

jenswi-linaro · 2017-10-31T13:28:09Z

Checkpatch fixes

etienne-lms · 2017-10-31T13:22:45Z

core/crypto/aes-gcm.c

+
+			if (!ctx->buf_pos && m == TEE_MODE_DECRYPT) {
+				aes_gcm_encrypt_block(ctx, ctx->ctr,
+						      ctx->buf_cryp);


aes_gcm_encrypt_block() or aes_gcm_decrypt_block() ?

Always encrypt, this is the counter in AES Counter mode.

etienne-lms · 2017-10-31T13:25:43Z

core/crypto/aes-gcm.c

+	if (tag_len != ctx->tag_len)
+		return TEE_ERROR_MAC_INVALID;
+
+	res = operation_final(ctx, TEE_MODE_ENCRYPT, src, len, dst);


TEE_MODE_ENCRYPT => TEE_MODE_DECRYPT

Thanks, I guess this reveals a blind spot in the test suite.

etienne-lms · 2017-10-31T13:30:16Z

core/include/crypto/aes-gcm.h

+ */
+
+#ifndef __KERNEL_AES_GCM_H
+#define __KERNEL_AES_GCM_H


__CRYPTO_AES_GCM_H ?

etienne-lms · 2017-10-31T13:33:58Z

core/include/crypto/aes-gcm.h

+	unsigned int tag_len;
+	unsigned int aad_bytes;
+	unsigned int payload_bytes;
+	unsigned int buf_pos;


Using a 64-bit value (on 64-bit platforms) seems a bit wasteful since we're only storing small numbers. It could be argued that aad_bytes or payload_bytes should be 64-bit values, but I doubt we'll need it in practice. I'll change them if you think it's important.

jenswi-linaro · 2017-11-01T07:39:45Z

There's even a small gain in "core: arm: crypto: fix AES-GCM counter increase"
now AES-GCM on Hikey is:
min=100us max=124.167us mean=100.995us stddev=2.56812us (cv 2.54281%) (9.66937MiB/s)

jforissier · 2017-11-02T16:47:05Z

Tested-by: Jerome Forissier <jerome.forissier@linaro.org> (HiKey960)

I noticed a strange thing: this PR seems to make ECB and SHA256 significantly slower on the a73 cores, but not on the a53 cores. See below. Any idea what might be happening?

Update: obviously not related to the code in this PR, maybe some caching issue? I noticed the same kind of difference when using this PR and changing CFG_TEE_CORE_LOG_LEVEL from 2 (slowest) to 3 (fastest).

# HiKey960 a73 cores only:
for i in 4 5 6 7; do echo 1 >/sys/devices/system/cpu/cpu$i/online; done
for i in 0 1 2 3; do echo 0 >/sys/devices/system/cpu/cpu$i/online; done
# HiKey960 a53 cores only:
for i in 0 1 2 3; do echo 1 >/sys/devices/system/cpu/cpu$i/online; done
for i in 4 5 6 7; do echo 0 >/sys/devices/system/cpu/cpu$i/online; done


### master, a73 cores:

xtest --aes-perf -m ECB
min=24.479us max=36.979us mean=24.8683us stddev=0.505735us (cv 2.03365%) (39.2693MiB/s)

xtest --sha-perf -a SHA256
min=26.562us max=38.542us mean=26.888us stddev=0.507852us (cv 1.88877%) (36.3196MiB/s)

xtest --aes-perf -m GCM
min=408.854us max=425.52us mean=413.601us stddev=1.9973us (cv 0.482905%) (2.36112MiB/s)


### master, a53 cores:

xtest --aes-perf -m ECB
min=59.895us max=94.271us mean=60.5282us stddev=1.28037us (cv 2.11532%) (16.134MiB/s)

xtest --sha-perf -a SHA256
min=64.583us max=100us mean=65.1378us stddev=1.4101us (cv 2.1648%) (14.9923MiB/s)

xtest --aes-perf -m GCM
min=934.375us max=982.812us mean=948.535us stddev=5.34257us (cv 0.563245%) (1.02955MiB/s)


### PR#1908 + CFG_HWSUPP_PMULL=y, a73 cores:

xtest --aes-perf -m ECB
min=32.291us max=45.834us mean=32.8579us stddev=0.521768us (cv 1.58795%) (29.7208MiB/s)

xtest --sha-perf -a SHA256
min=34.375us max=47.396us mean=34.9583us stddev=0.527139us (cv 1.50791%) (27.935MiB/s)

xtest --aes-perf -m GCM
min=32.291us max=48.437us mean=32.9785us stddev=0.540848us (cv 1.64%) (29.6121MiB/s)


### PR#1908 + CFG_HWSUPP_PMULL=y, a53 cores:

xtest --aes-perf -m ECB
min=59.895us max=103.646us mean=64.5185us stddev=1.96989us (cv 3.05322%) (15.1362MiB/s)

xtest --sha-perf -a SHA256
min=64.583us max=95.833us mean=68.9705us stddev=1.63784us (cv 2.3747%) (14.1591MiB/s)

xtest --aes-perf -m GCM
min=59.895us max=101.562us mean=64.2076us stddev=1.88134us (cv 2.93009%) (15.2095MiB/s)

jbech-linaro · 2017-11-06T12:27:20Z

FYI: OP-TEE Hikey auto builder - shows false positive, "OK" here but the logs shows errors (and the log from the script also says "failed", but still it will send an "OK" to GitHub. I thought I had this fixed, but apparently not).

jforissier

For "core: arm64: add mov_imm assembly macro"
I'm not sure a Signed-off-by: Ard is appropriate. I'd rather you mention him in the commit text, and hopefully we can get his Acked-by:.
Acked-by: Jerome Forissier <jerome.forissier@linaro.org>
For "core: io.h: add {get,put}_be{16,32,64}()"
Reviewed-by: Jerome Forissier <jerome.forissier@linaro.org>
For "core: import GHASH acceleration routines":
Acked-by: Jerome Forissier <jerome.forissier@linaro.org>
For "LTC: add GHASH acceleration"
Acked-by: Jerome Forissier <jerome.forissier@linaro.org>
For "hikey: CFG_HWSUPP_PMULL=y":
Acked-by: Jerome Forissier <jerome.forissier@linaro.org>
Could you please do the same for HiKey960?
About "tee_ltc_provider.c: use new AES-GCM implementation":
I don't think it should be done like that. The problem is, tee_ltc_provider is supposed to interface with LTC, not with multiple crypto implementations. Another problem is that you are replacing LTC GCM with something else with no way to chose at compile time. This might be bad for guys who may have their own modified/accelerated LTC (after all, it's how we have introduced crypto extensions: in LTC, not at the crypto_ops level).
I think we should take this opportunity to get rid of struct crypto_ops and replace it with a simple function interface instead, as we discussed a few times already.
If we do that, it will be quite easy to extract GCM from tee_ltc_provider.c and then have a CFG_ to select it.

jforissier · 2017-11-09T16:34:10Z

core/arch/arm/include/arm64_macros.S

+		movk    \_reg, :abs_g0_nc:\_val
+	.endm
+
+


Too many empty lines

jforissier · 2017-11-09T16:36:25Z

core/include/io.h

@@ -29,6 +29,8 @@

 #include <stdint.h>
 #include <types_ext.h>
+#include <string.h>


Not needed?

jenswi-linaro · 2017-11-09T21:17:50Z

For "hikey: CFG_HWSUPP_PMULL=y":
actually this seems to be unintentionally applied to 960 already. I'll update the commit message to make that clear.

We'd better ask @ardbiesheuvel about "core: arm64: add mov_imm assembly macro". @ardbiesheuvel are you OK with a sign-off or would you rather ack it or something?

I'll add a function based interface for the crypto provider in this PR.

jforissier · 2017-11-15T07:56:15Z

@jenswi-linaro about Ard's S-o-b: I'm citing a maintainer on the LKML [1]:

"The last Signed-off-by should match the person posting the patch. It's a chain of custody for GPL-signoff purposes, not a 'partially-written-by'. If you want to credit [someone] you can add CC and they can Ack/Review your patch."

I think the same reasoning should apply here.

[1] https://lkml.org/lkml/2017/11/14/390

ardbiesheuvel · 2017-11-15T08:00:50Z

On 15 November 2017 at 07:56, Jérôme Forissier ***@***.***> wrote: @jenswi-linaro <https://github.com/jenswi-linaro> about Ard's S-o-b: I'm citing a maintainer on the LKML [1]: "The last Signed-off-by should match the person posting the patch. It's a chain of custody for GPL-signoff purposes, not a 'partially-written-by'. If you want to credit [someone] you can add CC and they can Ack/Review your patch." I think the same reasoning should apply here.

Works for me Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>

…

[1] https://lkml.org/lkml/2017/11/14/390 — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1908 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AIDoWEhP9jkT90totG4c-_tTtCLUahYqks5s2pkjgaJpZM4QMkTK> .

Implement a macro mov_imm that can be used to move an immediate constant into a 64-bit register, using between 2 and 4 movz/movk instructions (depending on the operand) Acked-by: Jerome Forissier <jerome.forissier@linaro.org> Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Jens Wiklander <jens.wiklander@linaro.org>

Reviewed-by: Jerome Forissier <jerome.forissier@linaro.org> Signed-off-by: Jens Wiklander <jens.wiklander@linaro.org>

Imports assembly code for accelerated GHASH computations and provides an interface for using these low level functions. The assembly code relies on features now available in all ARM cores. No assembly code is enabled at all unless CFG_WITH_VFP = y. Code using the PMULL/PMULL2 instruction is enabled with CFG_HWSUPP_PMULL = y. The assembly code is written by Ard Biesheuvel <ard.biesheuvel@linaro.org> and modified slightly here to fit better into OP-TEE. Acked-by: Jerome Forissier <jerome.forissier@linaro.org> Signed-off-by: Jens Wiklander <jens.wiklander@linaro.org>

Acked-by: Jerome Forissier <jerome.forissier@linaro.org> Signed-off-by: Jens Wiklander <jens.wiklander@linaro.org>

Takes full advantage of LTC GHASH acceleration by using the pmull instruction. Acked-by: Jerome Forissier <jerome.forissier@linaro.org> Signed-off-by: Jens Wiklander <jens.wiklander@linaro.org>

Takes full advantage of LTC GHASH acceleration by using the pmull instruction. Signed-off-by: Jens Wiklander <jens.wiklander@linaro.org>

Adds temporary CFG_LTC_CRYPTO_GCM which is a mirror of CFG_CRYPTO_GCM. Signed-off-by: Jens Wiklander <jens.wiklander@linaro.org>

Adds a new AES-GCM implementation optimized for hardware acceleration. Signed-off-by: Jens Wiklander <jens.wiklander@linaro.org>

Signed-off-by: Jens Wiklander <jens.wiklander@linaro.org>

In pmull_gcm_encrypt() and pmull_gcm_decrypt() it was assumed that it's enough to only increase the least significant 64-bits of the counter fed to the block cipher. This can hold for 96-bit IVs, but not for IVs of any other length as the number stored in the least significant 64-bits of the counter can't be easily predicted. In this patch pmull_gcm_encrypt() and pmull_gcm_decrypt() are updated to increase the entire counter, at the same time is the interface changed to accept the counter in little endian format instead. Signed-off-by: Jens Wiklander <jens.wiklander@linaro.org>

Merges aes-gcm.h and aes-gcm_ext.h and aligns types of function argument etc. Signed-off-by: Jens Wiklander <jens.wiklander@linaro.org>

jenswi-linaro · 2017-11-16T11:54:13Z

Rebased on master

jforissier · 2017-11-16T14:40:29Z

core/arch/arm/crypto/sub.mk

@@ -0,0 +1,4 @@
+ifeq ($(CFG_CRYPTO_WITH_CE),y)


In the commit description, you write: "No assembly code is enabled at all unless CFG_WITH_VFP = y". I suppose it should be "unless CFG_CRYPTO_WITH_CE = y"?

Yes, you're right. I'll update it when rebasing.

jforissier · 2017-11-16T14:56:04Z

core/lib/libtomcrypt/src/encauth/gcm/sub.mk

 srcs-y += gcm_gf_mult.c
+endif


Wrong place? (should be #else gcm_mult_h.c, not gcm_gf_mult.c)

jforissier · 2017-11-16T15:49:13Z

core/arch/arm/crypto/aes-gcm-ce.c

@@ -13,6 +13,22 @@
 #include <tomcrypt.h>
 #include <types_ext.h>

+static void get_be_block(void *dst, const void *src)


Shouldn't this whole patch be folded into "core: crypto: add new AES-GCM implementation"?

Signed-off-by: Jens Wiklander <jens.wiklander@linaro.org>

jenswi-linaro · 2017-11-17T08:26:21Z

Update

jforissier · 2017-11-17T09:38:55Z

I am still concerned by "core: ltc: remove redundant aes-gcm implementation". It means it is not possible to use LibTomCrypt anymore for AES GCM, and as I said already it might be a problem. Some people may be using their own modified version of LTC, or may have audited LTC and do not want to use anything else...

More generally, and this relates to #1931 actually, the crypto API in crypto.h is not clean enough IMO:

core/arch/crypto/crypto.c depends on _CFG_CRYPTO flags defined in LTC.
The crypto_authenc_*() functions, although supposedly part of the API, cannot be overridden.

So how to fix this? Well it shouldn't be too hard hopefully:

Move the _CFG_CRYPTO flags out of LTC, from core/lib/libtomcrypt/sub.mk to core/crypto/sub.mk. They do not depend on a particular implementation.
Introduce CFG_CRYPTO_GCM_CE to select this implementation of crypto_authenc_*(), otherwise use LTC.

What do you think?

jenswi-linaro · 2017-11-17T10:48:35Z

OK, I see your point.

This means that we'll drop b1538e9 ("core: crypto: merge aes-gcm.h and aes-gcm_ext.h") too.

The flags should probably go into core/core.mk instead since sub.mk shouldn't set flags because only the sub.mk parsed after that will see the changes.

How about instead of CFG_CRYPTO_GCM_CE have a CFG_CRYPTO_AES_GCM_FROM_CRYPTOLIB which would disable the internal AES-GCM implementation and expect the crypto library (LTC for now) to provide an AES-GCM implementation instead.

However, this doesn't sort it out completely, we have a third(!) AES-GCM implementation used by the pager too. I was planning to replace that implementation with the new AES-GCM implementation. The LTC implementation can't be used for this so we'll need a way to let the two implementations (LTC and new) coexist. One way would be to have a separate API for the new AES-GCM implementation with a good prefix instead of crypto_, and then just some wrappers to provide the required crypto_aes_gcm_*() functions when configured to do so. Do you have suggestion for the "good prefix"? I've run out of ideas for the moment.

jforissier · 2017-11-17T12:05:48Z

This means that we'll drop b1538e9 ("core: crypto: merge aes-gcm.h and aes-gcm_ext.h") too.

Yes.

The flags should probably go into core/core.mk instead since sub.mk shouldn't set flags because only the sub.mk parsed after that will see the changes.

Correct, my bad.

How about instead of CFG_CRYPTO_GCM_CE have a CFG_CRYPTO_AES_GCM_FROM_CRYPTOLIB which would disable the internal AES-GCM implementation and expect the crypto library (LTC for now) to provide an AES-GCM implementation instead.

Sounds good to me.

However, this doesn't sort it out completely, we have a third(!) AES-GCM implementation used by the pager too. I was planning to replace that implementation with the new AES-GCM implementation. The LTC implementation can't be used for this so we'll need a way to let the two implementations (LTC and new) coexist.

Doh! That's right. Sure, it a good idea to use the implementation in this PR for the pager, too.

One way would be to have a separate API for the new AES-GCM implementation with a good prefix instead of crypto_, and then just some wrappers to provide the required crypto_aes_gcm_*() functions when configured to do so.

OK

Do you have suggestion for the "good prefix"? I've run out of ideas for the moment.

internal_ maybe? Internal to the OP-TEE core code, as opposed to _FROM_CRYPTOLIB, that is.

jenswi-linaro · 2017-11-17T12:22:27Z

internal_ maybe? Internal to the OP-TEE core code, as opposed to _FROM_CRYPTOLIB, that is.

Thanks, that will work. I guess I'll shorten it to just "int". Then we'll have the .h file, <crypto/int_aes-gcm.h> and the prefix in that file would be crypto_int_aes_gcm_.

jforissier · 2017-11-17T12:28:08Z

I guess I'll shorten it to just "int". Then we'll have the .h file, <crypto/int_aes-gcm.h> and the prefix in that file would be crypto_int_aes_gcm_.

If you want to keep crypto_ then I'm fine with crypto_int_. To be clear, my proposal was to have only internal_ as a prefix, no more crypto_. That is: internal_aes_gcm_init() which would be called by crypto_aes_gcm_init() etc. But no big deal, pick the one you prefer ;-)

jenswi-linaro · 2017-11-17T13:45:24Z

To make it easier to see the difference I've created #1949 with the changes we discussed here.

jenswi-linaro · 2017-11-20T09:43:55Z

#1949 was merged instead of this.

etienne-lms reviewed Oct 31, 2017

View reviewed changes

jforissier reviewed Nov 9, 2017

View reviewed changes

jenswi-linaro mentioned this pull request Nov 10, 2017

Replace crypto ops #1931

Merged

jenswi-linaro added 11 commits November 16, 2017 12:44

core: io.h: add {get,put}_be{16,32,64}()

b367de3

Reviewed-by: Jerome Forissier <jerome.forissier@linaro.org> Signed-off-by: Jens Wiklander <jens.wiklander@linaro.org>

LTC: add GHASH acceleration

80427c4

Acked-by: Jerome Forissier <jerome.forissier@linaro.org> Signed-off-by: Jens Wiklander <jens.wiklander@linaro.org>

hikey{,960}: CFG_HWSUPP_PMULL=y

993a8c6

Takes full advantage of LTC GHASH acceleration by using the pmull instruction. Acked-by: Jerome Forissier <jerome.forissier@linaro.org> Signed-off-by: Jens Wiklander <jens.wiklander@linaro.org>

plat-vexpress: Juno: CFG_HWSUPP_PMULL=y

f73a51e

Takes full advantage of LTC GHASH acceleration by using the pmull instruction. Signed-off-by: Jens Wiklander <jens.wiklander@linaro.org>

core: add temporary CFG_LTC_CRYPTO_GCM

053918a

Adds temporary CFG_LTC_CRYPTO_GCM which is a mirror of CFG_CRYPTO_GCM. Signed-off-by: Jens Wiklander <jens.wiklander@linaro.org>

core: crypto: add new AES-GCM implementation

f3ddbfd

Adds a new AES-GCM implementation optimized for hardware acceleration. Signed-off-by: Jens Wiklander <jens.wiklander@linaro.org>

core: ltc: remove redundant aes-gcm implementation

56db818

Signed-off-by: Jens Wiklander <jens.wiklander@linaro.org>

core: crypto: merge aes-gcm.h and aes-gcm_ext.h

b1538e9

Merges aes-gcm.h and aes-gcm_ext.h and aligns types of function argument etc. Signed-off-by: Jens Wiklander <jens.wiklander@linaro.org>

jenswi-linaro force-pushed the gcm branch from abe09f9 to b1538e9 Compare November 16, 2017 11:51

jforissier reviewed Nov 16, 2017

View reviewed changes

[review] LTC: add GHASH acceleration

7d9fabe

Signed-off-by: Jens Wiklander <jens.wiklander@linaro.org>

jenswi-linaro mentioned this pull request Nov 17, 2017

Gcm2 #1949

Merged

jenswi-linaro closed this Nov 20, 2017

jenswi-linaro deleted the gcm branch November 20, 2017 09:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gcm #1908

Gcm #1908

jenswi-linaro commented Oct 31, 2017

jenswi-linaro commented Oct 31, 2017 •

edited

Loading

jenswi-linaro commented Oct 31, 2017

etienne-lms Oct 31, 2017

jenswi-linaro Oct 31, 2017

etienne-lms Oct 31, 2017

jenswi-linaro Oct 31, 2017

etienne-lms Oct 31, 2017

etienne-lms Oct 31, 2017

jenswi-linaro Oct 31, 2017

jenswi-linaro commented Nov 1, 2017

jforissier commented Nov 2, 2017 •

edited

Loading

jbech-linaro commented Nov 6, 2017

jforissier left a comment •

edited

Loading

jforissier Nov 9, 2017

jforissier Nov 9, 2017

jenswi-linaro commented Nov 9, 2017

jforissier commented Nov 15, 2017

ardbiesheuvel commented Nov 15, 2017 via email

jenswi-linaro commented Nov 16, 2017

jforissier Nov 16, 2017

jenswi-linaro Nov 17, 2017

jforissier Nov 16, 2017

jforissier Nov 16, 2017

jenswi-linaro commented Nov 17, 2017

jforissier commented Nov 17, 2017

jenswi-linaro commented Nov 17, 2017

jforissier commented Nov 17, 2017

jenswi-linaro commented Nov 17, 2017

jforissier commented Nov 17, 2017

jenswi-linaro commented Nov 17, 2017

jenswi-linaro commented Nov 20, 2017

Gcm #1908

Gcm #1908

Conversation

jenswi-linaro commented Oct 31, 2017

jenswi-linaro commented Oct 31, 2017 • edited Loading

jenswi-linaro commented Oct 31, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jenswi-linaro commented Nov 1, 2017

jforissier commented Nov 2, 2017 • edited Loading

jbech-linaro commented Nov 6, 2017

jforissier left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jenswi-linaro commented Nov 9, 2017

jforissier commented Nov 15, 2017

ardbiesheuvel commented Nov 15, 2017 via email

jenswi-linaro commented Nov 16, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jenswi-linaro commented Nov 17, 2017

jforissier commented Nov 17, 2017

jenswi-linaro commented Nov 17, 2017

jforissier commented Nov 17, 2017

jenswi-linaro commented Nov 17, 2017

jforissier commented Nov 17, 2017

jenswi-linaro commented Nov 17, 2017

jenswi-linaro commented Nov 20, 2017

jenswi-linaro commented Oct 31, 2017 •

edited

Loading

jforissier commented Nov 2, 2017 •

edited

Loading

jforissier left a comment •

edited

Loading