Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gcm #1908

Closed
wants to merge 12 commits into from
Closed

Gcm #1908

wants to merge 12 commits into from

Conversation

jenswi-linaro
Copy link
Contributor

Hardware optimized AES-GCM implementation. Here's some figures from testing on Hikey:

"default" means what's on master today.

command: xtest --aes-perf -m ECB
default:
min=99.166us max=153.334us mean=107.972us stddev=3.59459us (cv 3.32917%) (9.04455MiB/s)

command: xtest --sha-perf -a SHA256
default:
min=105.833us max=149.167us mean=113.857us stddev=3.38391us (cv 2.97209%) (8.57713MiB/s)

command: xtest --aes-perf -m GCM

default:
min=1445.83us max=1503.33us mean=1471.38us stddev=9.70065us (cv 0.659288%) (0.663704MiB/s)

LTC + GHASH (commit "LTC: add GHASH acceleration")
min=178.334us max=222.5us mean=188.389us stddev=4.36719us (cv 2.31818%) (5.18376MiB/s)

LTC + GHASH + PMULL (commit "hikey: CFG_HWSUPP_PMULL=y")
min=170.833us max=210.834us mean=172.781us stddev=3.58542us (cv 2.07512%) (5.65202MiB/s)

With this pull request:
min=98.333us max=127.5us mean=102.794us stddev=3.55137us (cv 3.45483%) (9.50017MiB/s)

There's obviously rooms for improvements for AES-ECB mode as AES-GCM is a bit faster. I suspect that our SHA-256 implementation can be improved too.

@jenswi-linaro
Copy link
Contributor Author

jenswi-linaro commented Oct 31, 2017

Numbers for new AES-GCM implementation withouth the fused AES+PMULL

min=153.333us max=178.333us mean=155.11us stddev=3.2519us (cv 2.09652%) (6.29596MiB/s)

@jenswi-linaro
Copy link
Contributor Author

Checkpatch fixes


if (!ctx->buf_pos && m == TEE_MODE_DECRYPT) {
aes_gcm_encrypt_block(ctx, ctx->ctr,
ctx->buf_cryp);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aes_gcm_encrypt_block() or aes_gcm_decrypt_block() ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Always encrypt, this is the counter in AES Counter mode.

if (tag_len != ctx->tag_len)
return TEE_ERROR_MAC_INVALID;

res = operation_final(ctx, TEE_MODE_ENCRYPT, src, len, dst);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TEE_MODE_ENCRYPT => TEE_MODE_DECRYPT

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I guess this reveals a blind spot in the test suite.

*/

#ifndef __KERNEL_AES_GCM_H
#define __KERNEL_AES_GCM_H
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

__CRYPTO_AES_GCM_H ?

unsigned int tag_len;
unsigned int aad_bytes;
unsigned int payload_bytes;
unsigned int buf_pos;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

size_t ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using a 64-bit value (on 64-bit platforms) seems a bit wasteful since we're only storing small numbers. It could be argued that aad_bytes or payload_bytes should be 64-bit values, but I doubt we'll need it in practice. I'll change them if you think it's important.

@jenswi-linaro
Copy link
Contributor Author

There's even a small gain in "core: arm: crypto: fix AES-GCM counter increase"
now AES-GCM on Hikey is:
min=100us max=124.167us mean=100.995us stddev=2.56812us (cv 2.54281%) (9.66937MiB/s)

@jforissier
Copy link
Contributor

jforissier commented Nov 2, 2017

Tested-by: Jerome Forissier <jerome.forissier@linaro.org> (HiKey960)

I noticed a strange thing: this PR seems to make ECB and SHA256 significantly slower on the a73 cores, but not on the a53 cores. See below. Any idea what might be happening?

Update: obviously not related to the code in this PR, maybe some caching issue? I noticed the same kind of difference when using this PR and changing CFG_TEE_CORE_LOG_LEVEL from 2 (slowest) to 3 (fastest).

# HiKey960 a73 cores only:
for i in 4 5 6 7; do echo 1 >/sys/devices/system/cpu/cpu$i/online; done
for i in 0 1 2 3; do echo 0 >/sys/devices/system/cpu/cpu$i/online; done
# HiKey960 a53 cores only:
for i in 0 1 2 3; do echo 1 >/sys/devices/system/cpu/cpu$i/online; done
for i in 4 5 6 7; do echo 0 >/sys/devices/system/cpu/cpu$i/online; done


### master, a73 cores:

xtest --aes-perf -m ECB
min=24.479us max=36.979us mean=24.8683us stddev=0.505735us (cv 2.03365%) (39.2693MiB/s)

xtest --sha-perf -a SHA256
min=26.562us max=38.542us mean=26.888us stddev=0.507852us (cv 1.88877%) (36.3196MiB/s)

xtest --aes-perf -m GCM
min=408.854us max=425.52us mean=413.601us stddev=1.9973us (cv 0.482905%) (2.36112MiB/s)


### master, a53 cores:

xtest --aes-perf -m ECB
min=59.895us max=94.271us mean=60.5282us stddev=1.28037us (cv 2.11532%) (16.134MiB/s)

xtest --sha-perf -a SHA256
min=64.583us max=100us mean=65.1378us stddev=1.4101us (cv 2.1648%) (14.9923MiB/s)

xtest --aes-perf -m GCM
min=934.375us max=982.812us mean=948.535us stddev=5.34257us (cv 0.563245%) (1.02955MiB/s)


### PR#1908 + CFG_HWSUPP_PMULL=y, a73 cores:

xtest --aes-perf -m ECB
min=32.291us max=45.834us mean=32.8579us stddev=0.521768us (cv 1.58795%) (29.7208MiB/s)

xtest --sha-perf -a SHA256
min=34.375us max=47.396us mean=34.9583us stddev=0.527139us (cv 1.50791%) (27.935MiB/s)

xtest --aes-perf -m GCM
min=32.291us max=48.437us mean=32.9785us stddev=0.540848us (cv 1.64%) (29.6121MiB/s)


### PR#1908 + CFG_HWSUPP_PMULL=y, a53 cores:

xtest --aes-perf -m ECB
min=59.895us max=103.646us mean=64.5185us stddev=1.96989us (cv 3.05322%) (15.1362MiB/s)

xtest --sha-perf -a SHA256
min=64.583us max=95.833us mean=68.9705us stddev=1.63784us (cv 2.3747%) (14.1591MiB/s)

xtest --aes-perf -m GCM
min=59.895us max=101.562us mean=64.2076us stddev=1.88134us (cv 2.93009%) (15.2095MiB/s)

@jbech-linaro
Copy link
Contributor

FYI: OP-TEE Hikey auto builder - shows false positive, "OK" here but the logs shows errors (and the log from the script also says "failed", but still it will send an "OK" to GitHub. I thought I had this fixed, but apparently not).

Copy link
Contributor

@jforissier jforissier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • For "core: arm64: add mov_imm assembly macro"
    I'm not sure a Signed-off-by: Ard is appropriate. I'd rather you mention him in the commit text, and hopefully we can get his Acked-by:.
    Acked-by: Jerome Forissier <jerome.forissier@linaro.org>
  • For "core: io.h: add {get,put}_be{16,32,64}()"
    Reviewed-by: Jerome Forissier <jerome.forissier@linaro.org>
  • For "core: import GHASH acceleration routines":
    Acked-by: Jerome Forissier <jerome.forissier@linaro.org>
  • For "LTC: add GHASH acceleration"
    Acked-by: Jerome Forissier <jerome.forissier@linaro.org>
  • For "hikey: CFG_HWSUPP_PMULL=y":
    Acked-by: Jerome Forissier <jerome.forissier@linaro.org>
    Could you please do the same for HiKey960?
  • About "tee_ltc_provider.c: use new AES-GCM implementation":
    I don't think it should be done like that. The problem is, tee_ltc_provider is supposed to interface with LTC, not with multiple crypto implementations. Another problem is that you are replacing LTC GCM with something else with no way to chose at compile time. This might be bad for guys who may have their own modified/accelerated LTC (after all, it's how we have introduced crypto extensions: in LTC, not at the crypto_ops level).
    I think we should take this opportunity to get rid of struct crypto_ops and replace it with a simple function interface instead, as we discussed a few times already.
    If we do that, it will be quite easy to extract GCM from tee_ltc_provider.c and then have a CFG_ to select it.

movk \_reg, :abs_g0_nc:\_val
.endm


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Too many empty lines

@@ -29,6 +29,8 @@

#include <stdint.h>
#include <types_ext.h>
#include <string.h>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not needed?

@jenswi-linaro
Copy link
Contributor Author

For "hikey: CFG_HWSUPP_PMULL=y":
actually this seems to be unintentionally applied to 960 already. I'll update the commit message to make that clear.

We'd better ask @ardbiesheuvel about "core: arm64: add mov_imm assembly macro". @ardbiesheuvel are you OK with a sign-off or would you rather ack it or something?

I'll add a function based interface for the crypto provider in this PR.

@jforissier
Copy link
Contributor

@jenswi-linaro about Ard's S-o-b: I'm citing a maintainer on the LKML [1]:

"The last Signed-off-by should match the person posting the patch. It's a chain of custody for GPL-signoff purposes, not a 'partially-written-by'. If you want to credit [someone] you can add CC and they can Ack/Review your patch."

I think the same reasoning should apply here.

[1] https://lkml.org/lkml/2017/11/14/390

@ardbiesheuvel
Copy link

ardbiesheuvel commented Nov 15, 2017 via email

Implement a macro mov_imm that can be used to move an immediate constant
into a 64-bit register, using between 2 and 4 movz/movk instructions
(depending on the operand)

Acked-by: Jerome Forissier <jerome.forissier@linaro.org>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Jens Wiklander <jens.wiklander@linaro.org>
Reviewed-by: Jerome Forissier <jerome.forissier@linaro.org>
Signed-off-by: Jens Wiklander <jens.wiklander@linaro.org>
Imports assembly code for accelerated GHASH computations and provides an
interface for using these low level functions.

The assembly code relies on features now available in all ARM cores. No
assembly code is enabled at all unless CFG_WITH_VFP = y. Code using the
PMULL/PMULL2 instruction is enabled with CFG_HWSUPP_PMULL = y.

The assembly code is written by Ard Biesheuvel <ard.biesheuvel@linaro.org>
and modified slightly here to fit better into OP-TEE.

Acked-by: Jerome Forissier <jerome.forissier@linaro.org>
Signed-off-by: Jens Wiklander <jens.wiklander@linaro.org>
Acked-by: Jerome Forissier <jerome.forissier@linaro.org>
Signed-off-by: Jens Wiklander <jens.wiklander@linaro.org>
Takes full advantage of LTC GHASH acceleration by using the pmull
instruction.

Acked-by: Jerome Forissier <jerome.forissier@linaro.org>
Signed-off-by: Jens Wiklander <jens.wiklander@linaro.org>
Takes full advantage of LTC GHASH acceleration by using the pmull
instruction.

Signed-off-by: Jens Wiklander <jens.wiklander@linaro.org>
Adds temporary CFG_LTC_CRYPTO_GCM which is a mirror of CFG_CRYPTO_GCM.

Signed-off-by: Jens Wiklander <jens.wiklander@linaro.org>
Adds a new AES-GCM implementation optimized for hardware acceleration.

Signed-off-by: Jens Wiklander <jens.wiklander@linaro.org>
Signed-off-by: Jens Wiklander <jens.wiklander@linaro.org>
In pmull_gcm_encrypt() and pmull_gcm_decrypt() it was assumed that it's
enough to only increase the least significant 64-bits of the counter fed
to the block cipher. This can hold for 96-bit IVs, but not for IVs of
any other length as the number stored in the least significant 64-bits
of the counter can't be easily predicted.

In this patch pmull_gcm_encrypt() and pmull_gcm_decrypt() are updated to
increase the entire counter, at the same time is the interface changed
to accept the counter in little endian format instead.

Signed-off-by: Jens Wiklander <jens.wiklander@linaro.org>
Merges aes-gcm.h and aes-gcm_ext.h and aligns types of function argument
etc.

Signed-off-by: Jens Wiklander <jens.wiklander@linaro.org>
@jenswi-linaro
Copy link
Contributor Author

Rebased on master

@@ -0,0 +1,4 @@
ifeq ($(CFG_CRYPTO_WITH_CE),y)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the commit description, you write: "No assembly code is enabled at all unless CFG_WITH_VFP = y". I suppose it should be "unless CFG_CRYPTO_WITH_CE = y"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you're right. I'll update it when rebasing.

srcs-y += gcm_gf_mult.c
endif
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrong place? (should be #else gcm_mult_h.c, not gcm_gf_mult.c)

@@ -13,6 +13,22 @@
#include <tomcrypt.h>
#include <types_ext.h>

static void get_be_block(void *dst, const void *src)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this whole patch be folded into "core: crypto: add new AES-GCM implementation"?

Signed-off-by: Jens Wiklander <jens.wiklander@linaro.org>
@jenswi-linaro
Copy link
Contributor Author

Update

@jforissier
Copy link
Contributor

I am still concerned by "core: ltc: remove redundant aes-gcm implementation". It means it is not possible to use LibTomCrypt anymore for AES GCM, and as I said already it might be a problem. Some people may be using their own modified version of LTC, or may have audited LTC and do not want to use anything else...

More generally, and this relates to #1931 actually, the crypto API in crypto.h is not clean enough IMO:

  • core/arch/crypto/crypto.c depends on _CFG_CRYPTO flags defined in LTC.
  • The crypto_authenc_*() functions, although supposedly part of the API, cannot be overridden.

So how to fix this? Well it shouldn't be too hard hopefully:

  1. Move the _CFG_CRYPTO flags out of LTC, from core/lib/libtomcrypt/sub.mk to core/crypto/sub.mk. They do not depend on a particular implementation.
  2. Introduce CFG_CRYPTO_GCM_CE to select this implementation of crypto_authenc_*(), otherwise use LTC.

What do you think?

@jenswi-linaro
Copy link
Contributor Author

OK, I see your point.

This means that we'll drop b1538e9 ("core: crypto: merge aes-gcm.h and aes-gcm_ext.h") too.

The flags should probably go into core/core.mk instead since sub.mk shouldn't set flags because only the sub.mk parsed after that will see the changes.

How about instead of CFG_CRYPTO_GCM_CE have a CFG_CRYPTO_AES_GCM_FROM_CRYPTOLIB which would disable the internal AES-GCM implementation and expect the crypto library (LTC for now) to provide an AES-GCM implementation instead.

However, this doesn't sort it out completely, we have a third(!) AES-GCM implementation used by the pager too. I was planning to replace that implementation with the new AES-GCM implementation. The LTC implementation can't be used for this so we'll need a way to let the two implementations (LTC and new) coexist. One way would be to have a separate API for the new AES-GCM implementation with a good prefix instead of crypto_, and then just some wrappers to provide the required crypto_aes_gcm_*() functions when configured to do so. Do you have suggestion for the "good prefix"? I've run out of ideas for the moment.

@jforissier
Copy link
Contributor

This means that we'll drop b1538e9 ("core: crypto: merge aes-gcm.h and aes-gcm_ext.h") too.

Yes.

The flags should probably go into core/core.mk instead since sub.mk shouldn't set flags because only the sub.mk parsed after that will see the changes.

Correct, my bad.

How about instead of CFG_CRYPTO_GCM_CE have a CFG_CRYPTO_AES_GCM_FROM_CRYPTOLIB which would disable the internal AES-GCM implementation and expect the crypto library (LTC for now) to provide an AES-GCM implementation instead.

Sounds good to me.

However, this doesn't sort it out completely, we have a third(!) AES-GCM implementation used by the pager too. I was planning to replace that implementation with the new AES-GCM implementation. The LTC implementation can't be used for this so we'll need a way to let the two implementations (LTC and new) coexist.

Doh! That's right. Sure, it a good idea to use the implementation in this PR for the pager, too.

One way would be to have a separate API for the new AES-GCM implementation with a good prefix instead of crypto_, and then just some wrappers to provide the required crypto_aes_gcm_*() functions when configured to do so.

OK

Do you have suggestion for the "good prefix"? I've run out of ideas for the moment.

internal_ maybe? Internal to the OP-TEE core code, as opposed to _FROM_CRYPTOLIB, that is.

@jenswi-linaro
Copy link
Contributor Author

internal_ maybe? Internal to the OP-TEE core code, as opposed to _FROM_CRYPTOLIB, that is.

Thanks, that will work. I guess I'll shorten it to just "int". Then we'll have the .h file, <crypto/int_aes-gcm.h> and the prefix in that file would be crypto_int_aes_gcm_.

@jforissier
Copy link
Contributor

I guess I'll shorten it to just "int". Then we'll have the .h file, <crypto/int_aes-gcm.h> and the prefix in that file would be crypto_int_aes_gcm_.

If you want to keep crypto_ then I'm fine with crypto_int_. To be clear, my proposal was to have only internal_ as a prefix, no more crypto_. That is: internal_aes_gcm_init() which would be called by crypto_aes_gcm_init() etc. But no big deal, pick the one you prefer ;-)

@jenswi-linaro jenswi-linaro mentioned this pull request Nov 17, 2017
@jenswi-linaro
Copy link
Contributor Author

To make it easier to see the difference I've created #1949 with the changes we discussed here.

@jenswi-linaro
Copy link
Contributor Author

#1949 was merged instead of this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants