Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

crypto api - update cipher operation returning a different length #3823

Closed
ldts opened this issue May 4, 2020 · 20 comments
Closed

crypto api - update cipher operation returning a different length #3823

ldts opened this issue May 4, 2020 · 20 comments

Comments

@ldts
Copy link
Contributor

ldts commented May 4, 2020

Use case:
Some crypto devices cache the input data and only process some of the requested number of bytes during the crypto update operations which are then placed on the destination buffer.

Problem:
the current crypto API assumes that all crypto updates will be performed up to the length of the source data and therefore assumes that the destination buffer contains the exact number of bytes requested

struct crypto_cipher_ops {
	TEE_Result (*init)(struct crypto_cipher_ctx *ctx,
			   TEE_OperationMode mode,
			   const uint8_t *key1, size_t key1_len,
			   const uint8_t *key2, size_t key2_len,
			   const uint8_t *iv, size_t iv_len);

	TEE_Result (*update)(struct crypto_cipher_ctx *ctx, bool last_block,  <<<<
			     const uint8_t *data, size_t len, uint8_t *dst);

	void (*final)(struct crypto_cipher_ctx *ctx);

	void (*free_ctx)(struct crypto_cipher_ctx *ctx);
	void (*copy_state)(struct crypto_cipher_ctx *dst_ctx,
			   struct crypto_cipher_ctx *src_ctx);
};

Solution:
the crypto API should should allow the device to provide the number of bytes places in the update buffer

Question: is there any ongoing work to address this situation?

@jenswi-linaro
Copy link
Contributor

Can you give an example with a specific algorithm where this is a problem?
Is this some kind of optimization?

@ldts
Copy link
Contributor Author

ldts commented May 4, 2020

This problem I believe is systemic and not algorithm dependent. Systemic in the sense that the op-tee stack expects that all crypto update operations complete the bytes provided on input when in fact, the underlying hardware might chose to do differently and wait for more data before start generating output.

As an example, when running regression_4000.c and execute aes_ctr, the regression software it will:

  • splits data transfer of 48 bytes into two update calls: update for a length of 13 and update for 35
  • on the first update request, the se050 on the i2c bus indicates that it has not enough data to generate an output and asks for more. the op-tee stack send the second update of 35 bytes and the processing completes.

the output will show:

Got
  00:00:00:00 00:00:00:00 00:00:00:00 00:D2:DD:11  ................
  A8:F7:B0:AE 55:BE:61:7A E6:A1:6C:79 F4:62:51:7B  ....U.az..ly.bQ{
  E9:7C:A0:31 0C:24:15:70 7F:47:37:69 E0:24:C3:29  .|.1.$.p.G7i.$.)
Expected
  D2:DD:11:A8 F7:B0:AE:55 BE:61:7A:E6 A1:6C:79:F4  .......U.az..ly.
  62:51:7B:E9 7C:A0:31:0C 24:15:70:7F 47:37:69:E0  bQ{.|.1.$.p.G7i.
  24:C3:29:CD F2:26:69:FF 72:0E:3C:D1 A1:2F:5D:33  $.)..&i.r.<../]3

you can see the first 13 bytes being accepted as valid when in fact the device just was hoping to receive more but since the API could only return success, the receiving end assumed that those 13 bytes were placed in the buffer when in fact they werent.

IMO the interface just needs a *dst_len field.

@jforissier
Copy link
Contributor

From a (GP) API perspective, TEE_CipherUpdate() gives no guarantee as to how much data it will return. The spec says: "Unless one or more calls of this function have supplied sufficient input data, no output is generated.". There are no additional details on what "sufficient" means, so in effect the caller should be prepared to receive anything between 0 and srcLen bytes (the value is returned in *dstLen).

IMO the interface just needs a *dst_len field.

Yes, maybe.

@ldts
Copy link
Contributor Author

ldts commented May 4, 2020

good, so shall I just wait (if there is a better alternative to dst_len)?

also AFAICS reporting back dst_len should have no impact on how the caller needs to continue with the transfer (so no need to resend bytes): it just informs the caller how to assemble back the receiving data.

@jenswi-linaro
Copy link
Contributor

Another option is to let the hardware produce the missing bytes too. From an algorithm point of view the interface is enough. AES-CTR is a stream cipher so for it is possible to match each input byte with an output byte. Holding back output can be inconvenient for the caller.

@jenswi-linaro
Copy link
Contributor

There was one issue a few years ago where someone was expecting AES-CTR to return as many bytes as supplied. Can't remember it right now though.

@jforissier
Copy link
Contributor

There was one issue a few years ago where someone was expecting AES-CTR to return as many bytes as supplied. Can't remember it right now though.

This one perhaps? #1580

@ldts
Copy link
Contributor Author

ldts commented May 4, 2020

From looking at the code and the comments above I believe -please do correct me if I am wrong- that you are saying that OP-TEE in tee_svc_cipher_update_helper must ignore the information provided by the hardware as to how many bytes it has placed in the destination buffer because it is always known in advance independently of the hardware used.
In particular -maybe this is a corollary- on AES-CTR, if the hardware needs more than 13 bytes to process an update operation, it is the hardware which must not claim to be able to provide such operation.
is this a fair summary or where we stand?

IMO and from what I can see, I am not sure the above statements will be applicable to hardware on slow buses applying their own performance optimizations to their algorithms. And anyways, the way I see it, it would be much cleaner that tee_do_cipher_update gets the dst_len and passes it up the caller, and that the caller does the necessary check with real data.

@jenswi-linaro
Copy link
Contributor

This one perhaps? #1580

Yes, thanks @jforissier.

@ldts, I'd rather summarize it as stream ciphers must not be buffered and block ciphers must only be supplied with full blocks. If those rules are followed dst_len is redundant as we know that it must be the same as src_len.

@ldts
Copy link
Contributor Author

ldts commented May 4, 2020

yes, that is a better the way you put it. thanks @jenswi-linaro .

the obvious question for me now is if there is a plan -or interest?- to support hardware that does not comply to those requirements (ie, use the data provided by the hardware instead of assuming what the hardware will provide).

@jenswi-linaro
Copy link
Contributor

One option is to handle it as in

int mbedtls_aes_crypt_ctr(mbedtls_aes_context *ctx, size_t length,

If the caller uses odd sized buffers it's clearly not performance critical so we can deal with it in software. I think we should avoid different behavior in the API depending on which implementation of the algorithm is used.

@ldts
Copy link
Contributor Author

ldts commented May 4, 2020

thanks. I'll add a new api to the NXP stack to bypass any buffering and always put the requested number of bytes on the bus (I was trying to avoid changing the NXP API for maintainability reasons but maybe this is indeed required).

@ldts
Copy link
Contributor Author

ldts commented May 4, 2020

@jenswi-linaro nah, it didnt make any difference. the se050 on AES-CTR did consume the 13 bytes we put on the wire but generated no output and no errors indicating that is waiting for more data to start processing. So buffering occurs internally in the chip, not under software control.

I am not sure I fully understand why you are pushing back on using the data that the hardware reports though. why is it such a big deal?

I'll look into mbedtls_aes_crypt_ctr as well...but it doesnt seem right having to do this when all we need to do is to listen to the hardware progress.

@jenswi-linaro
Copy link
Contributor

The NXP stack you're mentioning, is that open source so I can have a look?

I am not sure I fully understand why you are pushing back on using the data that the hardware reports though. why is it such a big deal?

It is an API change of sort you're proposing and there are use cases which may break because of this.

The problem is that we expect AES-CTR as a stream cipher. While hardware (and optimized software) usually implements it as a block cipher, that is, it only operates on complete blocks. This is good when optimizing throughput for larger chunks of data, but it's not very useful if you only want to encrypt 13 bytes right now and maybe more later.

Some software trick is needed to handle this. In the function I linked to above we handle it by encrypting a block of zeroes and do the xoring in software for the odd sized data. Since this has to be done somewhere it's better done here than in each caller which expects a stream cipher implementation but finds a block cipher implementation.

@ldts
Copy link
Contributor Author

ldts commented May 5, 2020

I see. OK will revisit that as soon as I am done with RSA.

yes (thanks for looking into this) you can download it from this link : it is the Edgelock SE050 Development kit (you might need to create an account but that is all).

I have a work in progress branch https://github.com/ldts/optee_os/commits/se050 which I am still working on an in need to clean/tidy up but passing the regression tests for those crypto operations (it is in flux so maybe some of the missing fixes are still local in my office server - for instance the block cipher I am using is 16 bytes but this branch still was tested with a commit changing it to 8...need to revert that). The original stack I used as a reference is there as well (so you could pick it up from https://github.com/ldts/optee_os/tree/se050/lib/libnxpse050/docs)

@jforissier
Copy link
Contributor

@jenswi-linaro +1

Note, I can't find anything in the GP spec mandating that some output shall always be generated whatever the input data size for stream ciphers (or block ciphers used in modes that effectively turn them into stream ciphers). The GP test suite (TEE_Initial_Configuration-Test_Suite_v1_1_0_4-2014_11_07) does not have meaningful test cases for this either.

But I agree we should stick to the current behavior (and maybe document it clearly somewhere).

ldts added a commit to ldts/optee_os that referenced this issue May 5, 2020
needs further work (see)
OP-TEE#3823

Got
  00:00:00:00 00:00:00:00 00:00:00:00 00:D2:DD:11  ................
  A8:F7:B0:AE 55:BE:61:7A E6:A1:6C:79 F4:62:51:7B  ....U.az..ly.bQ{
  E9:7C:A0:31 0C:24:15:70 7F:47:37:69 E0:24:C3:29  .|.1.$.p.G7i.$.)
Expected
  D2:DD:11:A8 F7:B0:AE:55 BE:61:7A:E6 A1:6C:79:F4  .......U.az..ly.
  62:51:7B:E9 7C:A0:31:0C 24:15:70:7F 47:37:69:E0  bQ{.|.1.$.p.G7i.
  24:C3:29:CD F2:26:69:FF 72:0E:3C:D1 A1:2F:5D:33  $.)..&i.r.<../]3
@ldts
Copy link
Contributor Author

ldts commented May 5, 2020

I just pushed an update to my working branch that does some minor cleaning and clarifies the AES CRT (as per this issue)...um, now it seems github notifies here every-time I force push to my tree (will probably remove the comment to the issue from the commit on the next update)

ldts added a commit to ldts/optee_os that referenced this issue May 5, 2020
needs further work (see)
OP-TEE#3823

Got
  00:00:00:00 00:00:00:00 00:00:00:00 00:D2:DD:11  ................
  A8:F7:B0:AE 55:BE:61:7A E6:A1:6C:79 F4:62:51:7B  ....U.az..ly.bQ{
  E9:7C:A0:31 0C:24:15:70 7F:47:37:69 E0:24:C3:29  .|.1.$.p.G7i.$.)
Expected
  D2:DD:11:A8 F7:B0:AE:55 BE:61:7A:E6 A1:6C:79:F4  .......U.az..ly.
  62:51:7B:E9 7C:A0:31:0C 24:15:70:7F 47:37:69:E0  bQ{.|.1.$.p.G7i.
  24:C3:29:CD F2:26:69:FF 72:0E:3C:D1 A1:2F:5D:33  $.)..&i.r.<../]3
ldts added a commit to ldts/optee_os that referenced this issue May 5, 2020
needs further work (see)
OP-TEE#3823

Got
  00:00:00:00 00:00:00:00 00:00:00:00 00:D2:DD:11  ................
  A8:F7:B0:AE 55:BE:61:7A E6:A1:6C:79 F4:62:51:7B  ....U.az..ly.bQ{
  E9:7C:A0:31 0C:24:15:70 7F:47:37:69 E0:24:C3:29  .|.1.$.p.G7i.$.)
Expected
  D2:DD:11:A8 F7:B0:AE:55 BE:61:7A:E6 A1:6C:79:F4  .......U.az..ly.
  62:51:7B:E9 7C:A0:31:0C 24:15:70:7F 47:37:69:E0  bQ{.|.1.$.p.G7i.
  24:C3:29:CD F2:26:69:FF 72:0E:3C:D1 A1:2F:5D:33  $.)..&i.r.<../]3
@ldts
Copy link
Contributor Author

ldts commented May 6, 2020

ok so RSA is now functional (interesting what we need to do for the nopad case). One issue is that the se050 will not provide the keypair back to the user but only a handle to it. so I might need to extend the keypair definition with a handler for the SE050 (not just the bignum struct)

Coming back to this issue, please could you elaborate on your suggestions for a fix?
in particular:

  1. caller sends 13 bytes.
  2. hardware reports 0 processed since it has not enough data.
  3. how do you recommend that that crypto driver responds back to the caller? what actions should it take?

@jenswi-linaro
Copy link
Contributor

how do you recommend that that crypto driver responds back to the caller? what actions should it take?

The function mbedtls_aes_crypt_ctr() linked above handles this case with help of a "stream block". The CAAM driver does it differently in do_update_streaming(), but I have to admit that I haven't been able to follow that in detail. You could let the capability of the hardware to decide with way to take. If it's possible to restart the counter that's yet another option.

@ldts
Copy link
Contributor Author

ldts commented May 17, 2020

thanks. I implemented this workaround to the current api and with the level of testing offered by the regression tests (4000) it seems good. so will close this issue.

@ldts ldts closed this as completed May 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants